Joeynmt Versions Save

Minimalist NMT for educational purposes

v2.3

3 months ago

introduced DistributedDataParallel.
implemented language tags, see notebooks/torchhub.ipynb
released a iwslt14 de-en-fr multilingual model (trained using DDP)
special symbols definition refactoring
configuration refactoring
autocast refactoring
bugfixes
upgrade to python 3.11, torch 2.1.2
documentation refactoring

v2.2

1 year ago

compatibility with torch 1.13 tested
torchhub introduced
bugfixes, minor refactoring (#119, #198)

v2.1

1 year ago

upgrade to python 3.10, torch 1.12
replace Automated Mixed Precision from NVIDA's amp to Pytorch's amp package
replace discord.py with pycord in the Discord Bot demo
data Iterator refactoring (https://github.com/joeynmt/joeynmt/pull/189, https://github.com/joeynmt/joeynmt/pull/190, https://github.com/joeynmt/joeynmt/pull/191)
migrate to pytorch's torch.testing.assert_close to check tensors in unittests
add wmt14 ende / deen benchmark trained with joey v2 from scratch
bugfixes

2.0

1 year ago

Breaking changes:

upgrade to python 3.9, torch 1.11
torchtext.legacy dependencies are completely replaced by torch.utils.data
joeynmt/tokenizers.py: handles tokenization internally (also supports bpe-dropout!)
joeynmt/datasets.py: loads data from plaintext, tsv, and huggingface's datasets
scripts/build_vocab.py: trains subwords, creates joint vocab
enhancement in decoding
scoring with hypotheses or references
repetition penalty, ngram blocker
attention plots for transformers
yapf, isort, flake8 introduced
bugfixes, minor refactoring

1.5

2 years ago

Six >= 1.12

1.4

2 years ago

upgrade to sacrebleu 2.0, python 3.7, torch 1.8
bug fixes:
- heaps in checkpoint maintenance #153
- beam search stopping criterion #149
- removing final BPE merge markers in hypotheses (dsfsi/masakhane-web#33)
- keeping best and last ckpts #136
- using utf encoding when opening files #150
f-style formatting

1.3

3 years ago

You can now retrieve the n-best outputs during inference (rather than just the one best translation) and track the latest checkpoint (for continuing training). We also added a colab for training a small translation model on the Tatoeba task. Now operating on Torch v1.8.0 and using deprecated Torchtext dataset implementations from v0.9.

1.0

3 years ago

Additions:

Multi-GPU processing
Data loading improvements
Tokenizer integration
Japanese benchmarks

v0.9

4 years ago

Stable recurrent and Transformer models. Minor changes and refactoring might happen before v1.0.