Re-implementation and extension of the work described in "Learning to Represent Programs with Graphs"
This project re-implements the VarNaming task model described in the paper Learning to Represent Programs with Graphs, which can predict the name of a variable based on it's usage.
Furthermore, this project includes functionality for applying the VarNaming model to the MethodNaming task (predicting the name of a method from it's usage or definition).
If you use the provided implementation in your research, please cite the Learning to Represent Programs with Graphs paper, and include a link to this repository as a footnote.
Ensure you have the following packages installed (these can all be installed with pip3):
The corpus pre-processing functions are designed to work with .proto graph files, which can be extracted from program source code using the feature extractor available here.
Once you have obtained a corpus of .proto graph files, it is possible to use the corpus_extractor.py file located in the data_processing folder.
corpus_path: "path-to-corpus"
train_path: "path-to-train-data-output"
val_path: "path-to-val-data-output"
test_path: "path-to-test-data-output"
python3 ./data_processing/corpus_extractor.py
This will extract all samples from the corpus, randomly shuffle them, split them into train/val/test partitions, and copy these partitions into the specified train, val and test folders.
In order to train the model:
train_path: "path-to-train-data"
val_path: "path-to-val-data"
checkpoint_path: "path-to-checkpoint-folder/train.ckpt"
token_path: "path-to-vocabulary-txt-file"
python3 ./train.py
In order to use the model for inference:
test_path: "path-to-test-data"
checkpoint_path: "path-to-checkpoint-folder/train.ckpt"
token_path: "path-to-vocabulary-txt-file"
python3 ./infer.py
In order to use the model for inference, as well as for computing extra sample information (including variable usage information and type information):
test_path: "path-to-test-data"
checkpoint_path: "path-to-checkpoint-folder/train.ckpt"
token_path: "path-to-vocabulary-txt-file"
python3 ./detailed_infer.py
The type of task you want the model to run can be specified by passing appropriate input arguments as follows:
For example, in order to train the model for the MethodNaming task using definition information, the script call will be the following:
python3 ./train.py mth_def
Similarly, for running inference using the MethodNaming definition task, the script call will be the following:
python3 ./infer.py mth_usage
The saved_models directory includes pre-trained models, which can be used to run inference directly, without any training. The paths to the saved checkpoint and vocabulary files need to be specified in the config.yml file in the usual way, as described in the "Inference" section above.