Experiments on Multilingual NMT
This codebase was used for the multilingual translation experiments for the paper "Parameter Sharing Methods for Multilingual Self-Attentional Translation Models, WMT-EMNLP 2018".
The multilingual model is based on the Transformer model and also contains the following features:
One can install the required packages from the requirements file.
pip install -r requirements.txt
bash download_teddata.sh
This command will download, decompress, and will save the train, dev, and test splits of the TED talks under data
directory.
ted_reader.py
to specify language pairs for both bilingual/multilingual translation tasks.python ted_reader.py -s ja en zh fr ro -t en zh fr ro ja
-ncp
python ted_reader.py -s ja en zh fr ro -t en zh fr ro ja -ncp
For convenience, there are some example shell scripts under tools directory
bash tools/bpe_pipeline_bilingual.sh src_lang tgt_lang
bash tools/bpe_pipeline_fully_shared_multilingual.sh src_lang tgt_lang1 tgt_lang2
bash tools/bpe_pipeline_MT.sh src_lang tgt_lang1 tgt_lang2 share_sublayer share_attn
An example of sharing the Key(k), Query(q) in both the attention layers (Self, Source)
bash tools/bpe_pipeline_MT.sh src_lang tgt_lang1 tgt_lang2 k,q self,source
Dataset | Train | Dev | Test |
---|---|---|---|
English-Vietnamese (IWSLT 2015) | 133,317 | 1,553 | 1,268 |
English-German (TED talks) | 167,888 | 4,148 | 4,491 |
English-Romanian (TED talks) | 180,484 | 3,904 | 4,631 |
English-Dutch (TED talks) | 183,767 | 4,459 | 5,006 |
language pairs | this repo | tensor2tensor | GNMT |
---|---|---|---|
En -> Vi (IWSLT 2015) | 28.84 | 28.12 | 26.50 |
En -> De | 29.31 | 28.68 | 27.01 |
En -> Ro | 26.81 | 26.38 | 23.92 |
En -> Nl | 32.42 | 31.74 | 30.64 |
De -> En | 37.33 | 36.96 | 35.46 |
Ro -> En | 37.00 | 35.45 | 34.77 |
Nl -> En | 38.59 | 37.71 | 35.81 |
Method | En->De+Tr | En->De+Ja | En->Ro+Fr | En->De+Nl |
---|---|---|---|---|
->De ->Tr | ->De ->Ja | ->Ro ->Fr | ->De ->Nl | |
GNMT NS | 27.01 16.07 | 27.01 16.62 | 24.38 40.50 | 27.01 30.64 |
GNMT FS | 29.07 18.09 | 28.24 17.33 | 26.41 42.46 | 28.52 31.72 |
Transformer NS | 29.31 18.62 | 29.31 17.92 | 26.81 42.95 | 29.31 32.43 |
Transformer FS | 28.74 18.69 | 29.68 18.50 | 28.52 44.28 | 30.45 33.69 |
Transformer PS | 30.71 19.67 | 30.48 19.00 | 27.58 43.84 | 30.70 34.05 |
If you find this code useful, please consider citing our paper as:
@InProceedings{devendra2018multilingual,
author = "Sachan, Devendra
and Neubig, Graham,
title = "Parameter Sharing Methods for Multilingual Self-Attentional Translation Models",
booktitle = "Proceedings of the Third Conference on Machine Translation",
year = "2018",
publisher = "Association for Computational Linguistics",
location = "Brussels, Belgium"
}