Joeynmt Versions Save

Minimalist NMT for educational purposes

v2.3

3 months ago

v2.2

1 year ago

v2.1

1 year ago

2.0

1 year ago

Breaking changes:

  • upgrade to python 3.9, torch 1.11
  • torchtext.legacy dependencies are completely replaced by torch.utils.data
  • joeynmt/tokenizers.py: handles tokenization internally (also supports bpe-dropout!)
  • joeynmt/datasets.py: loads data from plaintext, tsv, and huggingface's datasets
  • scripts/build_vocab.py: trains subwords, creates joint vocab
  • enhancement in decoding
  • scoring with hypotheses or references
  • repetition penalty, ngram blocker
  • attention plots for transformers
  • yapf, isort, flake8 introduced
  • bugfixes, minor refactoring

1.5

2 years ago

Six >= 1.12

1.4

2 years ago
  • upgrade to sacrebleu 2.0, python 3.7, torch 1.8
  • bug fixes:
    • heaps in checkpoint maintenance #153
    • beam search stopping criterion #149
    • removing final BPE merge markers in hypotheses (dsfsi/masakhane-web#33)
    • keeping best and last ckpts #136
    • using utf encoding when opening files #150
  • f-style formatting

1.3

3 years ago

You can now retrieve the n-best outputs during inference (rather than just the one best translation) and track the latest checkpoint (for continuing training). We also added a colab for training a small translation model on the Tatoeba task. Now operating on Torch v1.8.0 and using deprecated Torchtext dataset implementations from v0.9.

1.0

3 years ago

Additions:

  • Multi-GPU processing
  • Data loading improvements
  • Tokenizer integration
  • Japanese benchmarks

v0.9

4 years ago

Stable recurrent and Transformer models. Minor changes and refactoring might happen before v1.0.