OpenSeq2Seq Versions Save

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP

18.12

5 years ago

Jasper speech recognition model and documentation
Audio classification model for Speech Commands dataset and documentation
Improved documentation (LARC, distributed training, WaveNet)
Minor enhancements (speech recognition data layer, custom decoder, TRT)
Various bug fixes

18.11

5 years ago

Significantly improved Speech recognition (WER 4.32%)
BPE support for Speech recognition
WaveNet
Enhanced cuDNN RNN support for Language Modeling
Improvements in distributed text2text (thanks to @vsuthichai )
Ability to freeze some layers (thanks to @ka-bu )
Various bug-fixes

v18.10

5 years ago

What's new

Improved and updated models with checkpoints (see the documentation)
New models:
- Transformer Big (for NMT)
- Sentiment Analysis (based on universal LM)
- Joint CTC Attention based ASR
Improved scalability
Speech Synthesis audio samples (in the documentation)
Support for CUDA10, Horovod 0.14

v18.09

5 years ago

What's new

Improved and updated models with checkpoints (see the documentation)
Dropped Python2 support
Switched to TensorFlow 1.10
Added TensorRT support for fast inference
Refactored and updated documentation: https://nvidia.github.io/OpenSeq2Seq
Switched versioning to month-based labels

v0.5

5 years ago

New Modality text2speech - spectrogram synthesis from text

New Models

Tacotron 2 - like model for text2speech (English)

Various improvements

in ConvS2S model for translation
in Wav2Letter - like model for speech2text
in DeepSpeech2 - like model for speech2text
Bugfixes

Other

Tensorflow's version increased to 1.9

v0.4

5 years ago

New models:

ConvS2S model for translation.
Wav2Letter model for speech recognition.
CIFAR-10 dataset support.
CNNEncoder that can be used to construct arbitrary (almost) CNN models. Based on that, integrated AlexNet and cifar10-nv.

New features:

Support for "iter_size" (accumulating gradients for "iter_size" steps without update).
"objects" benchmarking to evaluation and inference modes.
cuDNN compatible cells support for GNMT.
8-padding for transformer.
Improved config overwriting by train/eval/infer params (will not replace whole dicts, but update incrementally).
Audio normalization before preprocessing for speech2text models.
More summaries/parameters for different models.

Bug fixes:

Regularization in mixed precision mode (loss scaling was not applied, disabling regularizer).
Overwriting bool values from command line.
Multi-GPU evaluation in towers mode.
Multi-GPU inference for speech2text.
"reflect" padding changed to use zeros for audio preprocessing.
Unicode support for Python 2.

Important config/API changes:

Unified static/dynamic loss scaling into a single parameter.
Made RNN cells accept arbitrary parameters.
Exposed training step into maybe_print_logs and evaluate functions.

Other changes:

Improved unit tests and documentation.

v0.3

5 years ago

New models:

Added ResNet model and ImageNet data layer.
Improved DeepSpeech-2 models and reached 4.59% WER.
Added Transformers model.

New features:

Implemented evaluation in Horovod mode.
Added mixed precision support for Horovod mode.
Fixed evaluation in multi-GPU mode.
All string/numerical config parameters can now be rewritten from command line (nested dicts are separated with "/").
Moved start_experiment.sh functionality to run.py (--enable_logs parameter). Additionally it now logs exact command line arguments used to invoke the script.
Added new benchmarking functionality: now models can also report number of objects (e..g tokens or images) per second.
Added more summaries/parameters for different models.

API changes:

Replaced Seq2Seq class with EncoderDecoderModel to support arbitrary models that can be expressed in encoder-decoder-loss paradigm.
Changed data layer API to only work with tf.data (dropped placeholders support).
Hid Horovod/non Horovod differences from users (no need to take care about that when creating new models / data layers).

Other changes:

Improved unit tests and documentation.

v0.2

6 years ago

Massive API changes
Add mixed precision training support
Add speech-to-text models support
Improved documentation