OpenNMT Tf Versions Save

Neural machine translation and sequence learning using TensorFlow

v2.32.0

9 months ago

New features

Support TensorFlow 2.12 and 2.13
Make timeout value configurable while searching for an optimal batch size

v2.31.0

1 year ago

New features

Add option --jit_compile to compile the model with XLA (only applied in training at the moment)

Fixes and improvements

Improve correctness of gradient accumulation and multi-GPU training by normalizing the gradients with the true global batch size instead of using an approximation
Report the total number of tokens per second in the training logs, in addition to the source and target numbers
Relax the sacreBLEU version requirement to include any 2.x versions

v2.30.0

1 year ago

Changes

The model attribute ctranslate2_spec has been removed as it is no longer relevant with the new CTranslate2 converter
The global gradient norm is no longer reported in TensorBoard because it was misleading: it did not take into account gradient accumulation and multi-GPU

New features

Support TensorFlow 2.11 (note that the new Keras optimizers are not yet supported, if you are creating optimizers manually please use an optimizer in tf.keras.optimizers.legacy for now)
Support CTranslate2 3.0
Add training parameter pad_to_bucket_boundary to pad the batch length to a multiple of length_bucket_width (this is useful to reduce the number of recompilation with XLA)
Integrate the scorers chrf and chrf++ from SacreBLEU

Fixes and improvements

Fix error when training with Horovod and using an early stopping condition
Fix error when using guided alignment with mixed precision

v2.29.1

1 year ago

Fixes and improvements

Fix error when using gzipped training data files
Remove unnecessary casting in MultiHeadAttention for a small performance improvement

v2.29.0

1 year ago

New features

Support TensorFlow 2.10
Add model configurations ScalingNmtEnDe and ScalingNmtEnFr from Ott et al. 2018
Add embedding parameter EmbeddingsSharingLevel.AUTO to automatically share embeddings when the vocabulary is shared
Extend method Runner.average_checkpoints to accept a list of checkpoints to average

Fixes and improvements

Make batch size autotuning faster when using gradient accumulation

v2.28.0

1 year ago

New features

Add initial_learning_rate parameter to the InvSqrtDecay schedule
Add new arguments to the Transformer constructor:
- mha_bias: to disable bias terms in the multi-head attention (as presented in the original paper)
- output_layer_bias: to disable bias in the output linear layer

Fixes and improvements

Fix incorrect dtype for SequenceRecordInputter length vector
Fix rounding error when batching datasets which could make the number of tokens in a batch greater than the configured batch size
Fix deprecation warning when using distutils.version.LooseVersion, use packaging.version.Version instead
Make the length dimension unknown in the dataset used for batch size autotuning so that it matches the behavior in training
Update SacreBLEU requirement to include new version 2.2

v2.27.1

1 year ago

Fixes and improvements

Fix evaluation and scoring with language models

v2.27.0

1 year ago

Changes

Remove support for older TensorFlow versions 2.4 and 2.5
Remove support for deprecated Python version 3.6

New features

Support TensorFlow 2.9
Integrate the new CTranslate2 converter to export more Transformer variants, including multi-features models

Fixes and improvements

Fix error when loading the SavedModel of Transformer models with relative position representations
Fix dataset error in inference with language models
Fix batch size autotuning error with language models
Fix division by zero error on some systems when the time to the last training log is too small

v2.26.1

2 years ago

Fixes and improvements

Fix documentation build error

v2.26.0

2 years ago

New features

Add learning rate schedule InvSqrtDecay
Enable CTranslate2 conversion for models using GELU or Swish activations

Fixes and improvements

Fix inference error when using the decoding_noise parameter
Clarify the inference log about buffered predictions