MSR2021 ProgramRepair Save

Code of our paper Applying CodeBERT for Automated Program Repair of Java Simple Bugs which is accepted to MSR 2021.

Project README

MSR2021-ProgramRepair

Paper

You can find the paper here: https://arxiv.org/abs/2103.11626

Data

Note: If you are facing issues regarding the LFS bandwidth, you can download the dataset from Zenodo: https://zenodo.org/record/6802730.

data folder contains multiple folders and files:

repetition folder contains MSR datasets WITH <buggy code, fixed code> duplicate pairs
unique folder contains MSR datasets WITHOUT <buggy code, fixed code> duplicate pairs
sstubs(Large|Small).json files contain dataset in JSON format
sstubs(Large|Small)-(train|test|val).json files contain dataset split in JSON format
split/(large|small) folders contain dataset in text format (what the CodeBERT works with)

Running CodeBERT Experiments

Clone the repository
- git lfs install
- git clone https://github.com/EhsanMashhadi/MSR2021-ProgramRepair.git
Download the CodeBERT model
- cd MSR2021-ProgramRepair
- git clone https://huggingface.co/microsoft/codebert-base
- use the downloaded model's directory path as pretrained_model variable in script files
Install dependencies
- pip install torch==1.4.0
- pip install transformers==2.5.0
Train the model with MSR data
- bash ./scripts/codebert/train.sh
Evaluate the model
- bash ./scripts/codebert/test.sh

Running Simple LSTM Experiments

Install OpenNMT-py
- pip install OpenNMT-py==2.2.0
- If you face conflicts between pytorch and CUDA version, you can follow this link
Preprocess the MSR data
- bash ./scripts/simple-lstm/build_vocab.sh
Train the model
- bash ./scripts/simple-lstm/train.sh
Evaluate the model
- bash ./scripts/simple-lstm/test.sh

Running Simple LSTM Experiments using the legacy version of OpenNMT-py

(This is the original version used to run the simple LSTM experiments in the paper.)

Install OpenNMT-py legacy
- pip install OpenNMT-py==1.2.0
Preprocess the MSR data
- bash ./scripts/simple-lstm/legacy/preprocess.sh
Train the model
- bash ./scripts/simple-lstm/legacy/train.sh
Evaluate the model
- bash ./scripts/simple-lstm/legacy/test.sh

How to run all experiments?

You can change the size and type variables value in script files to run different experiments (large | small, unique | repetition).

Have trouble running on GPU?

Check the CUDA and PyTorch compatibility
Assign the correct values for CUDA_VISIBLE_DEVICES, gpu_rank, and world_size based on your GPU numbers in all scripts.
Run on GPU by removing the gpu_rank, and world_size options in all scripts.

Open Source Agenda is not affiliated with "MSR2021 ProgramRepair" Project. README Source: EhsanMashhadi/MSR2021-ProgramRepair

Stars

Open Issues

Last Commit

1 year ago

Repository

EhsanMashhadi/MSR2021-ProgramRepair

Homepage

https://arxiv.org/abs/2103.11626

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/msr2021-programrepair"><img src="https://www.opensourceagenda.com/projects/msr2021-programrepair/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022