Code of our paper Applying CodeBERT for Automated Program Repair of Java Simple Bugs which is accepted to MSR 2021.
You can find the paper here: https://arxiv.org/abs/2103.11626
Note: If you are facing issues regarding the LFS bandwidth, you can download the dataset from Zenodo: https://zenodo.org/record/6802730.
data
folder contains multiple folders and files:
repetition
folder contains MSR datasets WITH <buggy code, fixed code> duplicate pairsunique
folder contains MSR datasets WITHOUT <buggy code, fixed code> duplicate pairssstubs(Large|Small).json
files contain dataset in JSON formatsstubs(Large|Small)-(train|test|val).json
files contain dataset split in JSON formatsplit/(large|small)
folders contain dataset in text format (what the CodeBERT works with)git lfs install
git clone https://github.com/EhsanMashhadi/MSR2021-ProgramRepair.git
cd MSR2021-ProgramRepair
git clone https://huggingface.co/microsoft/codebert-base
pretrained_model
variable in script filespip install torch==1.4.0
pip install transformers==2.5.0
bash ./scripts/codebert/train.sh
bash ./scripts/codebert/test.sh
pip install OpenNMT-py==2.2.0
bash ./scripts/simple-lstm/build_vocab.sh
bash ./scripts/simple-lstm/train.sh
bash ./scripts/simple-lstm/test.sh
(This is the original version used to run the simple LSTM experiments in the paper.)
pip install OpenNMT-py==1.2.0
bash ./scripts/simple-lstm/legacy/preprocess.sh
bash ./scripts/simple-lstm/legacy/train.sh
bash ./scripts/simple-lstm/legacy/test.sh
size
and type
variables value in script files to run different experiments (large | small, unique | repetition).CUDA
and PyTorch
compatibilityCUDA_VISIBLE_DEVICES
, gpu_rank
, and world_size
based on your GPU numbers in all scripts.gpu_rank
, and world_size
options in all scripts.