OpenSeq2Seq Versions Save

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP

18.12

5 years ago
  • Jasper speech recognition model and documentation
  • Audio classification model for Speech Commands dataset and documentation
  • Improved documentation (LARC, distributed training, WaveNet)
  • Minor enhancements (speech recognition data layer, custom decoder, TRT)
  • Various bug fixes

18.11

5 years ago
  • Significantly improved Speech recognition (WER 4.32%)
  • BPE support for Speech recognition
  • WaveNet
  • Enhanced cuDNN RNN support for Language Modeling
  • Improvements in distributed text2text (thanks to @vsuthichai )
  • Ability to freeze some layers (thanks to @ka-bu )
  • Various bug-fixes

v18.10

5 years ago

What's new

  • Improved and updated models with checkpoints (see the documentation)
  • New models:
    • Transformer Big (for NMT)
    • Sentiment Analysis (based on universal LM)
    • Joint CTC Attention based ASR
  • Improved scalability
  • Speech Synthesis audio samples (in the documentation)
  • Support for CUDA10, Horovod 0.14

v18.09

5 years ago

What's new

  • Improved and updated models with checkpoints (see the documentation)
  • Dropped Python2 support
  • Switched to TensorFlow 1.10
  • Added TensorRT support for fast inference
  • Refactored and updated documentation: https://nvidia.github.io/OpenSeq2Seq
  • Switched versioning to month-based labels

v0.5

5 years ago

New Modality text2speech - spectrogram synthesis from text

New Models

  • Tacotron 2 - like model for text2speech (English)

Various improvements

  • in ConvS2S model for translation
  • in Wav2Letter - like model for speech2text
  • in DeepSpeech2 - like model for speech2text
  • Bugfixes

Other

  • Tensorflow's version increased to 1.9

v0.4

5 years ago

New models:

  • ConvS2S model for translation.
  • Wav2Letter model for speech recognition.
  • CIFAR-10 dataset support.
  • CNNEncoder that can be used to construct arbitrary (almost) CNN models. Based on that, integrated AlexNet and cifar10-nv.

New features:

  • Support for "iter_size" (accumulating gradients for "iter_size" steps without update).
  • "objects" benchmarking to evaluation and inference modes.
  • cuDNN compatible cells support for GNMT.
  • 8-padding for transformer.
  • Improved config overwriting by train/eval/infer params (will not replace whole dicts, but update incrementally).
  • Audio normalization before preprocessing for speech2text models.
  • More summaries/parameters for different models.

Bug fixes:

  • Regularization in mixed precision mode (loss scaling was not applied, disabling regularizer).
  • Overwriting bool values from command line.
  • Multi-GPU evaluation in towers mode.
  • Multi-GPU inference for speech2text.
  • "reflect" padding changed to use zeros for audio preprocessing.
  • Unicode support for Python 2.

Important config/API changes:

  • Unified static/dynamic loss scaling into a single parameter.
  • Made RNN cells accept arbitrary parameters.
  • Exposed training step into maybe_print_logs and evaluate functions.

Other changes:

  • Improved unit tests and documentation.

v0.3

5 years ago

New models:

  • Added ResNet model and ImageNet data layer.
  • Improved DeepSpeech-2 models and reached 4.59% WER.
  • Added Transformers model.

New features:

  • Implemented evaluation in Horovod mode.
  • Added mixed precision support for Horovod mode.
  • Fixed evaluation in multi-GPU mode.
  • All string/numerical config parameters can now be rewritten from command line (nested dicts are separated with "/").
  • Moved start_experiment.sh functionality to run.py (--enable_logs parameter). Additionally it now logs exact command line arguments used to invoke the script.
  • Added new benchmarking functionality: now models can also report number of objects (e..g tokens or images) per second.
  • Added more summaries/parameters for different models.

API changes:

  • Replaced Seq2Seq class with EncoderDecoderModel to support arbitrary models that can be expressed in encoder-decoder-loss paradigm.
  • Changed data layer API to only work with tf.data (dropped placeholders support).
  • Hid Horovod/non Horovod differences from users (no need to take care about that when creating new models / data layers).

Other changes:

  • Improved unit tests and documentation.

v0.2

6 years ago
  • Massive API changes
  • Add mixed precision training support
  • Add speech-to-text models support
  • Improved documentation