Code for CIKM 2020 paper Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters
A PyTorch implementation for the CIKM 2020 paper below:
Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters.
Yingtong Dou, Zhiwei Liu, Li Sun, Yutong Deng, Hao Peng, Philip S. Yu.
[Paper][Toolbox][DGL Example][Benchmark]
The feature and label similarity scores presented in Table 2 of the paper are incorrect. The updated equations for calculating two similarity scores are shown below:
The code for calculating the similarity scores is in simi_comp.py.
The updated similarity scores for the two datasets are shown below. Note that we only compute the similarity scores for positive nodes to demonstrate the camouflage of fraudsters (positive nodes).
YelpChi | rur | rtr | rsr | homo |
---|---|---|---|---|
Avg. Feature Similarity | 0.991 | 0.988 | 0.988 | 0.988 |
Avg. Label Similarity | 0.909 | 0.176 | 0.186 | 0.184 |
Amazon | upu | usu | uvu | homo |
---|---|---|---|---|
Avg. Feature Similarity | 0.711 | 0.687 | 0.697 | 0.687 |
Avg. Label Similarity | 0.167 | 0.056 | 0.053 | 0.072 |
According to this issue, the weighted aggregation of CARE-Weight (a variant of CARE-GNN) has an error. After fixing it, the relation weight will not converge to the same value. Thus, the relation weight subfigure in Figure 3 and its associated conclusion are wrong.
Please check out RioGNN, a GNN model extended based on CARE-GNN with more reinforcement learning modules integrated. We are actively developing an efficient multi-layer version of CARE-GNN. Stay tuned.
CAmouflage-REsistant Graph Neural Network (CARE-GNN) is a GNN-based fraud detector based on a multi-relation graph equipped with three modules that enhance its performance against camouflaged fraudsters.
Three enhancement modules are:
CARE-GNN has following advantages:
We have integrated more than eight GNN-based fraud detectors as a TensorFlow toolbox.
You can download the project and install the required packages using the following commands:
git clone https://github.com/YingtongDou/CARE-GNN.git
cd CARE-GNN
pip3 install -r requirements.txt
To run the code, you need to have at least Python 3.6 or later versions.
unzip /data/Amazon.zip
and unzip /data/YelpChi.zip
to unzip the datasets;python data_process.py
to generate adjacency lists used by CARE-GNN;python train.py
to run CARE-GNN with default settings.For other dataset and parameter settings, please refer to the arg parser in train.py
. Our model supports both CPU and GPU mode.
To run CARE-GNN on your datasets, you need to prepare the following data:
scipy.sparse
matrix format, you can use sparse_to_adjlist()
in utils.py
to transfer the sparse matrix into adjacency lists used by CARE-GNN;scipy.sparse
matrix format.The repository is organized as follows:
data/
: dataset files;data_process.py
: transfer sparse matrix to adjacency lists;graphsage.py
: model code for vanilla GraphSAGE model;layers.py
: CARE-GNN layers implementations;model.py
: CARE-GNN model implementations;train.py
: training and testing all models;utils.py
: utility functions for data i/o and model evaluation.If you use our code, please cite the paper below:
@inproceedings{dou2020enhancing,
title={Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters},
author={Dou, Yingtong and Liu, Zhiwei and Sun, Li and Deng, Yutong and Peng, Hao and Yu, Philip S},
booktitle={Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM'20)},
year={2020}
}