Re-implement DeepCoder (https://openreview.net/pdf?id=ByldLrqlx)
This repository is a re-implementation of DeepCoder. DeepCoder can synthesize domain-specific programs from inputs/output examples.
I rewrite the implementation from scratch. The previous implementation is in the v0.0.0 tag.
make
g++
Warning
The notebook in the examples
directory will use Google Drive as data storage. Please be careful not to overwrite your data!
inference.ipynb
synthesizes the domain-specific language using pre-training model (examples/medium/trained-model
).
# Download this repository and DeepCoder-Utils
$ git clone https://github.com/HiroakiMikami/deep-coder
$ cd deep-coder
$ git submodule init
$ git submodule update
# Build the search tool
$ make -C DeepCoder_Utils/enumerative-search -j $(nproc)
# Install python modules
$ pip install -r requirements.txt
# Setup Jupyter notebooks to use local runtimes of Colab
$ ./bin/init.bash
The notebooks in examples/medium
directory show how to train DeepCoder.
Training consists of the following steps:
examples/medium/generate_dataset.ipynb
)
DeepCoder/dataset/length_3
).examples/medium/generate_baseline_results.ipynb
)examples/medium/train.ipynb
)examples/medium/comparison_with_baseline.ipynb
)$ python -m unittest discover test
examples/small/integer_embeddings.ipynb
shows the learned embedding of integers. The embedding was trained by using the dataset with length=1 programs and E=2
model.
It does not show the clear trend shown in Figure 8 in the paper. There are many possible causes (e.g., the procedure of dataset generation, training hyperparameters) and I don't know what the root cause of this difference is.
Timeout needed to solve | 20% | 40% | 60% |
---|---|---|---|
Baseline | 53ms | 122ms | 375ms |
DeepCoder | 5ms | 24ms | 87ms |
Speedup (this implementation) | 10.8x | 5.0x | 3.6x |
Speedup (Table 1 in the paper) | 62.2x | 54.6x | 31.5x |
The trained model speeds up the program synthesize. However, the performance of this implementation is worse than which of the paper. I think the reason for this difference is the same as the reason for the integer-embedding difference, but there is no basis.
The details of the results is in examples/medium/comparison_with_baseline.ipynb
.
The binary attribute that is predicted by DNN is heavily imbalanced because each program in the dataset contains only 1-3 functions. For example, the attribute of a <- int | b <- [int] | c <- TAKE a b
contains 33 False
and only 1 True
.
I doubted that this imbalance decreases the performance of the DNN model, and introduced cost-sensitive loss function (weighted_sigmoid_cross_entropy
in src/model.py
).
However, I cannot see the performance improvement in the medium scale experiment. examples/medium/loss_function_comparison.ipynb
shows the results of the training using the cost-sensitive loss function. examples/medium/train_w0_{0.25|0.5|0.75}.ipynb
shows the training logs.
4
dataset)