Graph Embedding via Frequent Subgraphs
This is the implementation of the GE-FSG method in the paper "Learning Graph Representation via Frequent Subgraphs", SDM 2018: https://epubs.siam.org/doi/10.1137/1.9781611975321.35
A graph consists of nodes and edges. Each graph has a label called graph label. Similarly, each node/edge can also has a label called node label/edge label. For example, a chemical compound is a graph whose nodes correspond to the atoms of the compound and edges correspond to chemical bonds.
To apply machine learning tasks such as classification and clustering to graphs, we need to represent each graph as a feature vector since machine learning methods typically require vectors as their input. This task is challenging since graphs have no feature vectors by default.
We propose GE-FSG which learns feature vectors (aka embeddings or representations) for graphs. GE-FSG combines a recently introduced neural document embedding model with a traditional pattern mining technique. It has two main steps: (1) decompose each graph into a set of frequent subgraphs (FSGs) and (2) learn an embedding for each graph by predicting its belonging FSGs. To this end, graphs which contain similar FSGs will be mapped to nearby points on the vector space.
Graph embeddings learnt by GE-FSG and other methods are visualized using t-SNE.
-graphset <file>
use graphs from <file> to mine FSGs
-graphlabel <file>
obtain graph labels from <file>
-minsup <float>
set minimum support threshold in [0,1]; default is 0.5
-fsg <file>
save discovered FSGs to <file> (optional)
-output <file>
convert each graph to a set of FSGs and save it to <file> (optional)
Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh, Dinh Phung (2018). Learning Graph Representation via Frequent Subgraphs. SDM 2018, San Diego, USA. SIAM, 306-314