Representation learning for link prediction within social networks
This repository contains a series of machine learning experiments for link prediction within social networks.
We first implement and apply a variety of link prediction methods to each of the ego networks contained within the SNAP Facebook dataset and SNAP Twitter dataset, as well as to various random networks generated using networkx, and then calculate and compare the ROC AUC, Average Precision, and runtime of each method.
If you use this repository in your work, please cite the corresponding DOI:
python setup.py install
facebook/
: Original Facebook ego networks dataset, with added .allfeats files (with both ego and alter features)fb-processed/
: Pickle dumps of (adjacency_matrix, feature_matrix) tuples for each ego network, and for combined networktwitter/
: Twitter ego networks dataset (combined), with adjacency matrix pickle dumpvisualizations/
: Visualizations of each network generated by networkx and matplotlibnetwork-statistics/
: .txt and .pkl files of pre-calculated network characteristics (with info on connectivity, network size, etc.) for each networktrain-test-splits/
: Pickle dumps of pre-processed train-test splits for Facebook ego networks, with varying degrees of visibility (i.e. how many edges are hidden). Includes: adj_train
, train_edges
, train_edges_false
, val_edges
, val_edges_false
, test_edges
, test_edges_false
process-ego-networks.py
: Script used to process raw Facebook data and generate pickle dumpsprocess-combined-network.py
: Script used to combine Facebook ego networks and generate complete network pickle dumpprocess-twitter-network.py
: Script used to process raw Twitter data and generate pickle dumpsfb-train-test-splits.py
: Script used to generate and store train-test splits for each Facebook ego networktwitter-train-test-splits.py
: Script used to generate and store train-test-splits for Twitter combined networklink-prediction-baselines.ipynb
: Adamic-Adar, Jaccard Coefficient, Preferential Attachmentspectral-clustering.ipynb
: Using spectral embeddings for link predictionnode2vec.ipynb
: Skip-gram based representation learning for node/edge embeddingsgraph-vae.ipynb
: (Variational) Graph Autoencoder, learns node embeddings to recreate adjacency matrixlink_prediction_scores.py
: Utility functions for running various link prediction testsnetwork-visualizations.ipynb
: Generate .pdf visualizations for each network, in addition to calculating and storing a variety of network metrics (e.g. transtivity, avg. clustering coefficient, etc.)nx-graph-experiments.ipynb
: Run all link prediction tests on various types of random networks (Erdos-Renyi, Barabasi-Albert, etc.)fb-graph-experiments.ipynb
: Run all link prediction tests on each Facebook ego networkrun-all-experiments.py
: Run all link prediction experiments (on both Facebook networks and random networkx networks), save results as pickle dumps in results
run-twitter-experiments.py
: Run all link prediction experiments on combined (directed) Twitter network. Warning: may take a while to run.results/
: Pickle dumps of experiment results, with results (ROC AUC, ROC Curve, Avg. Precision, Runtime) stored for each link prediction method in Python dictionary forminvestigate-results.ipynb
: Generate and save bar plots/ROC curve plots for each method and graph type, save network characteristics to .txt
filesresult-plots-by-graph/
: Bar plots for the results of each experiment (ROC AUC, AP, Minimum Runtime) organized by network, in .pdf formresult-plots-by-algorithm/
: Bar plots for the AUROC results of each experiment organized by link prediction algorithm, in .pdf form