Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Social Network Analysis (SNA) has recently been gaining more and more popularity in various domains. Unfortunately, performing SNA is not always an easy task, due to the volume of data which translates to huge network/graph, it is very time consuming and computationally expensive to perform analysis on these graphs. Depending on the type of task, handling graphs with even just dozens of thousands of nodes can be unfeasible, as some studies show. An intuitive solution to deal with this situation, just as in any scenario where we have a massive amount of data, is to sample the graph and then perform relevant simulation/analysis on obtained sub-graph.
Graph sampling is a technique to pick a subset of vertices or edges from original graph. The biggest advantage of sampling methods are their execution efficiency so that the graph transformation procedure won’t take longer time than straightforward computation on original graph. This is a simple sampling repo that helps you find a representative sample of the original graph via different Sampling Techniques.
Exploration or traversal (also called topology-based) approaches are based on the idea of randomly selecting one node and then exploring its neighborhood. Sampling algorithms based on this techniques are :
In the unconnected graph, it is possible that there is no node in the component that could be added to the sample. To handle this we defined a time period and an expected growth size in that period and after some iterations check whether the sample growth is large enough and if not, select again the node randomly to continue random walk. This way we ensure that the sample will reach the required size .
sampled_graph = random_walk_sampling_simple(complete_graph, nodes_to_sample)
RWF is a variation of random walk to improve the performance. The Fly-back probability (p) is used to sample more than one neighboring node at any stage of already sampled node. RWF picks a node uniformly at random as start point and begins a sequence. At each step, with 1-p probability it selects one node among neighbors of the current node with equal probability and moves to that node. If the neighboring node or the corresponding edge does not exist in the sample graph, they will be added to the graph; with p probability, we will fly back to the starting point. This ensures that the neighborhood of a selected node could be sufficiently explored. The higher the fly back probability, the more similar random walk is to Breadth First Search.
To avoid being stuck we defined a time period and an expected growth size in that period and after iterations check whether the sample growth is large enough and if not, select again the node randomly to continue random walk. This way we ensure that the sample will reach the required size.
sampled_graph = random_walk_sampling_with_fly_back(complete_graph,nodes_to_sample,p)
So, we presented our new sampling strategy, Induced Subgraph Random Walk Sampling (ISRW), which tries to overcome the problem of undersampling of edges in SRW. We applied graph induction step to SRW to select additional edges between sampled nodes with the aim to restore connectivity and bring the structure closer to that of the original graph.
sampled_graph = random_walk_induced_graph_sampling(complete_graph, nodes_to_sample)
The Snowball sampling is a type of a sampling by exploration in which each individual in the sample is asked to name k different individuals in the population, where k is a specified integer; for example, each individual may be asked to name his "k colleagues". The individuals who were not in the random sample already but were named by individuals in it form the first stage. Each of the individuals in the first stage is then asked to name k different individuals. The individuals who were not in the random sample nor in the first stage but were named by individuals who were in the first stage form the second stage. Each of the individuals in the second stage is then asked to name k different individuals. The individuals who were not in the random sample nor in the first or second stages but were named by individuals who were in the second stage form the third stage. This procedure is continued until each of the individuals in the d-th stage has been asked to name k different individuals.
Snowball Sampling starts with a set of nodes, say k. For each node, exactly k of it's neighbours are extracted. If the neighbours of a node is less than k , then all the neighbours of the node are extracted. This process continues until the required sample size is reached. In this way, a sampled graph is extracted from the original graph.
sampled_graph = snowball(complete_graph, nodes_to_sample, k)
sampled_graph = forestfire(complete_graph, nodes_to_sample)
sampled_graph = mhrw(complete_graph, nodes_to_sample, initial_seed_node)
sampled_graph = induced_mhrw(complete_graph, nodes_to_sample, initial_seed_node)
Edge sampling focuses on the selection of edges rather than nodes to populate the sample. Thus, the node selection step in edge sampling algorithm proceeds by just sampling edges, and including both nodes when a particular edge is sampled.
sampled_graph = ties(complete_graph, nodes_to_sample, φ)
The Graph Sampling package requires Python 3.X . If you don't have the pre-installed python in your system, please follow up the python link to download it. This package also requires Networkx 2.1 or newer which helps to create the graphs and also perform manipulations on them.
If you have Git installed on your system, then it is also possible to install the development version of Graph Sampling package by running these commands on your terminal:
$ git clone https://github.com/Ashish7129/Graph_Sampling.git
$ cd Graph_Sampling
$ pip install -e .
Or you can install the current release of Graph Sampling package with pip. Please download the zip file and locate it into the current folder and then run the following command for installing the graph sampling package into your system:
$ python setup.py sdist bdist_wheel
$ pip install dist/Graph_Sampling-0.0.1-py3-none-any.whl
After installing the package, you can use the package by writing the following command:
>>> import Graph_Sampling
Check out the file test.py, which helps you to understand the procedure of executing different functions along with their type and number of arguments. For example, snowball sampling fuction is excecuted as follows:
>>> object = Graph_Sampling.Snowball()
>>> sampled_subgraph = object.snowball(G,size,k)
The object is the instance of the class Snowball. The class having the snowball function has 3 parameters as :