Link Prediction Save

Representation learning for link prediction within social networks

Project README

Link Prediction Experiments

This repository contains a series of machine learning experiments for link prediction within social networks.

We first implement and apply a variety of link prediction methods to each of the ego networks contained within the SNAP Facebook dataset and SNAP Twitter dataset, as well as to various random networks generated using networkx, and then calculate and compare the ROC AUC, Average Precision, and runtime of each method.

If you use this repository in your work, please cite the corresponding DOI:

Link Prediction Methods Tested:

(Variational) Graph Auto-Encoders: An end-to-end trainable convolutional neural network model for unsupervised learning on graphs
Node2Vec/DeepWalk: A skip-gram based approach to learning node embeddings from random walks within a given graph
Spectral Clustering: Using spectral embeddings to create node representations from an adjacency matrix
Baseline Indexes: Adamic-Adar, Jaccard Coefficient, Preferential Attachment

Requirements

Pre-Use Installation

python setup.py install

Included Files

Network Data

facebook/: Original Facebook ego networks dataset, with added .allfeats files (with both ego and alter features)
fb-processed/: Pickle dumps of (adjacency_matrix, feature_matrix) tuples for each ego network, and for combined network
twitter/: Twitter ego networks dataset (combined), with adjacency matrix pickle dump
visualizations/: Visualizations of each network generated by networkx and matplotlib
network-statistics/: .txt and .pkl files of pre-calculated network characteristics (with info on connectivity, network size, etc.) for each network
train-test-splits/: Pickle dumps of pre-processed train-test splits for Facebook ego networks, with varying degrees of visibility (i.e. how many edges are hidden). Includes: adj_train, train_edges, train_edges_false, val_edges, val_edges_false, test_edges, test_edges_false
process-ego-networks.py: Script used to process raw Facebook data and generate pickle dumps
process-combined-network.py: Script used to combine Facebook ego networks and generate complete network pickle dump
process-twitter-network.py: Script used to process raw Twitter data and generate pickle dumps
fb-train-test-splits.py: Script used to generate and store train-test splits for each Facebook ego network
twitter-train-test-splits.py: Script used to generate and store train-test-splits for Twitter combined network

Annotated Link Prediction IPython Notebooks

link-prediction-baselines.ipynb: Adamic-Adar, Jaccard Coefficient, Preferential Attachment
spectral-clustering.ipynb: Using spectral embeddings for link prediction
node2vec.ipynb: Skip-gram based representation learning for node/edge embeddings
graph-vae.ipynb: (Variational) Graph Autoencoder, learns node embeddings to recreate adjacency matrix

Link Prediction Helper Scripts

link_prediction_scores.py: Utility functions for running various link prediction tests

Exploratory Analysis

network-visualizations.ipynb: Generate .pdf visualizations for each network, in addition to calculating and storing a variety of network metrics (e.g. transtivity, avg. clustering coefficient, etc.)

Full Link Prediction Experiments

nx-graph-experiments.ipynb: Run all link prediction tests on various types of random networks (Erdos-Renyi, Barabasi-Albert, etc.)
fb-graph-experiments.ipynb: Run all link prediction tests on each Facebook ego network
run-all-experiments.py: Run all link prediction experiments (on both Facebook networks and random networkx networks), save results as pickle dumps in results
run-twitter-experiments.py: Run all link prediction experiments on combined (directed) Twitter network. Warning: may take a while to run.

Results

results/: Pickle dumps of experiment results, with results (ROC AUC, ROC Curve, Avg. Precision, Runtime) stored for each link prediction method in Python dictionary form
investigate-results.ipynb: Generate and save bar plots/ROC curve plots for each method and graph type, save network characteristics to .txt files
result-plots-by-graph/: Bar plots for the results of each experiment (ROC AUC, AP, Minimum Runtime) organized by network, in .pdf form
result-plots-by-algorithm/: Bar plots for the AUROC results of each experiment organized by link prediction algorithm, in .pdf form

Open Source Agenda is not affiliated with "Link Prediction" Project. README Source: lucashu1/link-prediction

Stars

336

Open Issues

Last Commit

5 years ago

Repository

lucashu1/link-prediction

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/link-prediction"><img src="https://www.opensourceagenda.com/projects/link-prediction/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022