NeMo Versions Save

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

v1.15.0

1 year ago

Highlights

NeMo ASR

  • HybridTransducer-CTC ASR
  • Greedy timestamp decoding with inference script
  • MHA adapters
  • Conformer local attention (longformer)
  • High level beam search API
  • Multiblank transducer
  • Multi-channel audio processing model
  • AIstore for ASR datasets

NeMo Megatron

  • ALiBi position embeddings support for T5

NeMo TTS

  • Chinese TTS pipeline with polyphone disambiguation

NeMo Core

  • Optimizer based EMA
  • MLFlow logger support

Models

  • stt_eo_conformer_ctc_large (HF, NGC) Esperanto ASR model.
  • stt_eo_conformer_transducer_large (HF, NGC) Esperanto ASR model.

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.12

ASR

Changelog
  • optimized loop and bugfix by @Jorjeous :: PR: #5573
  • Update torchmetrics by @nithinraok :: PR: #5566
  • Add an option to defer data setup from init to setup by @anteju :: PR: #5569
  • AIStore for ASR datasets by @anteju :: PR: #5462
  • Add support for MHA adapters to ASR by @titu1994 :: PR: #5396
  • Update documentation and tutorials for Adapters by @titu1994 :: PR: #5610
  • Conformer local attention by @sam1373 :: PR: #5525
  • Add core classes and functions for online clustering diarizer part 1 by @tango4j :: PR: #5526
  • [Add] ASR+VAD Inference Pipeline by @stevehuang52 :: PR: #5575
  • [ASR] Audio processing base, multi-channel enhancement models by @anteju :: PR: #5356
  • Expose ClusteringDiarizer device by @SeanNaren :: PR: #5681
  • Add Beam Search support to ASR transcribe() by @titu1994 :: PR: #5443
  • Multiblank Transducer by @hainan-xv :: PR: #5527
  • pin torchmetrics version by @nithinraok :: PR: #5720
  • Update torchaudio dependency version for tutorials by @titu1994 :: PR: #5781
  • update torchmetrics to latest version by @nithinraok :: PR: #5801
  • Fix transducer and question answering tutorial bugs bugs by @Zhilin123 :: PR: #5809
  • [BugFix] Updated CTC decoders installation in tutorial by @vsl9 :: PR: #5833
  • update torchmetrics args confusionmatrix by @nithinraok :: PR: #5853
  • indentation fix by @nithinraok :: PR: #5861
  • Fix wrong label mapping in batch_inference for label_model by @fayejf :: PR: #5767

TTS

Changelog
  • Add support for MHA adapters to ASR by @titu1994 :: PR: #5396
  • [TTS] fix ranges of char set for accented letters. by @XuesongYang :: PR: #5607
  • [TTS] add type hints and change varialbe names for tokenizers and g2p by @XuesongYang :: PR: #5602
  • Fixed RadTTS unit test by @borisfom :: PR: #5572
  • [TTS][ZH] Disambiguate polyphones with augmented dict and Jieba segmenter for Chinese FastPitch by @yuekaizhang :: PR: #5541
  • Add duration padding support for RADTTS inference by @kevjshih :: PR: #5650
  • [TTS] add tts dict cust notebook by @ekmb :: PR: #5662
  • [TN/TTS docs] TN customization, g2p docs moved to tts by @ekmb :: PR: #5683
  • typo and link fixed by @ekmb :: PR: #5741
  • link fixed by @ekmb :: PR: #5745
  • Update Tacotron2 NGC checkpoint load to latest version by @redoctopus :: PR: #5760
  • Docs g2p update by @ekmb :: PR: #5769
  • [TTS][ZH] bugfix import jieba errors. by @XuesongYang :: PR: #5776

NLP / NMT

Changelog
  • Text generation improvement (UI client, data parallel support) by @yidong72 :: PR: #5437
  • O2 style amp for gpt3 ptuning by @JimmyZhang12 :: PR: #5246
  • Add support for MHA adapters to ASR by @titu1994 :: PR: #5396
  • Bert interleaved by @shanmugamr1992 :: PR: #5556
  • Port stateless timer to exp manager by @MaximumEntropy :: PR: #5584
  • Add interface for making amax reduction optional for FP8 by @ksivaman :: PR: #5447
  • Propagate attention_dropout flag for GPT-3 by @mikolajblaz :: PR: #5669
  • Enc-Dec model size reporting fixes by @MaximumEntropy :: PR: #5623
  • Add prompt learning tests by @arendu :: PR: #5649
  • Fix missing torchelastic fixes for PTL 1.8 by @MaximumEntropy :: PR: #5691
  • ALiBi Positional Embeddings by @michalivne :: PR: #5467
  • Megatron export triton update by @Davood-M :: PR: #5766
  • Fix transducer and question answering tutorial bugs bugs by @Zhilin123 :: PR: #5809
  • Update description for question answering tutorial by @Zhilin123 :: PR: #5814
  • TPMLP for T5-based models by @Davood-M :: PR: #5840
  • Megatron positional encoding alibi fix by @michalivne :: PR: #5808

Export

Changelog
  • Add keep_initializers_as_inputs to _export method by @pks :: PR: #5731
  • Megatron export triton update by @Davood-M :: PR: #5766

General Improvements

Changelog
  • Update to pytorch 22.12 container by @ericharper :: PR: #5694
  • optimized loop and bugfix by @Jorjeous :: PR: #5573
  • Expose ClusteringDiarizer device by @SeanNaren :: PR: #5681
  • remove useless files. by @XuesongYang :: PR: #5580
  • [Fix] setup_multiple validation/test data by @anteju :: PR: #5585
  • Move to optimizer based EMA implementation by @SeanNaren :: PR: #5169
  • [Temp workaround] Disable test with cache_audio to unblock CI by @anteju :: PR: #5615
  • [EMA] Change success message to reduce confusion by @SeanNaren :: PR: #5621
  • Temporarily disable prompt learning CI tests by @ericharper :: PR: #5633
  • [Dockerfile] Remove AIS archive from docker image by @anteju :: PR: #5629
  • [workflow] add exclude labels option to ignore cherry-picks in releas… by @XuesongYang :: PR: #5645
  • Add DLLogger support to exp_manager by @milesial :: PR: #5658
  • Fix EMA restart by allowing device to be set by the class init by @SeanNaren :: PR: #5668
  • Remove SDP (moved to separate repo) - merge to main by @erastorgueva-nv :: PR: #5630
  • temp disable speaker recognision CI test by @fayejf :: PR: #5696
  • Don't print exp_manager warning when max_steps == -1 by @milesial :: PR: #5725
  • Add tabular data generation documents to the index file by @yidong72 :: PR: #5733
  • fix token id bug by @yidong72 :: PR: #5777
  • Update numpy requirements from 1.21 to 1.22 by @Zhilin123 :: PR: #5785
  • Fix setuptools to usable version by @titu1994 :: PR: #5798
  • add apt-get upgrade -y in dockerfile by @fayejf :: PR: #5817
  • Update NeMo Multi-Run docs by @titu1994 :: PR: #5844
  • add ambernet to readme by @fayejf :: PR: #5872
  • update apex install instructions for 1.15 by @ericharper :: PR: #5901

v1.14.0

1 year ago

Highlights

NeMo ASR

  • Hybrid CTC + Transducer loss ASR #5364
  • Sampled Softmax RNNT (Enables large vocab RNNT, for speech translation and multilingual ASR) #5216
  • ASR Adapters hyper parameter search scripts #5159
  • RNNT {ONNX, TorchScript} x GPU export infer #5248
  • Exportable MelSpectrogram (TorchScript) #5512
  • Audio To Audio Dataset Processor #5196
  • Multi Channel Audio Transcription #5479
  • Silence Augmentation #5476

NeMo Megatron

  • Support for the Mixture of Experts for T5
  • Fix PTL model size output for GPT-3 and BERT
  • BERT with Tensor Parallelism & Pipeline Parallel Support

NeMo Core

  • Hydra Multirun core support + NeMo HP optim in YAML #5159

NeMo Models

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.11

ASR

Changelog
  • [Tools][ASR] Tool for generating data using simulated RIRs by @anteju :: PR: #5158
  • Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
  • Add Gradio App to ASR Docs by @titu1994 :: PR: #5270
  • Add support for Sampled Softmax for RNNT Joint by @titu1994 :: PR: #5216
  • Speed up HF data processing script for ASR by @titu1994 :: PR: #5330
  • bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
  • Add cpWER for evaluation of ASR with diarization by @tango4j :: PR: #5279
  • Fix for getting tokenizer in character-based ASR models when using tarred dataset by @jonghwanhyeon :: PR: #5442
  • Refactor/unify ASR offline and buffered inference by @fayejf :: PR: #5440
  • Standalone diarization+ASR evaluation script by @tango4j :: PR: #5439
  • [ASR] Transcribe for multi-channel signals by @anteju :: PR: #5479
  • Add Silence Augmentation by @fayejf :: PR: #5476
  • add exportable mel spec by @1-800-BAD-CODE :: PR: #5512
  • add RNN-T loss implemented by PyTorch and test code by @hainan-xv :: PR: #5312
  • [ASR] AudioToAudio datasets and related test by @anteju :: PR: #5196
  • Add StreamingFeatureBufferer class for real-life streaming decoding by @tango4j :: PR: #5534
  • Pool stats with padding by @1-800-BAD-CODE :: PR: #5403
  • Adding Hybrid RNNT-CTC model by @VahidooX :: PR: #5364
  • Fix ASR Buffered inference scripts by @titu1994 :: PR: #5552
  • Add wer details - insertion, deletion, substitution rate by @fayejf :: PR: #5557
  • Add support for Time Stamp calculation using transcribe_speech.py by @titu1994 :: PR: #5568
  • [STT] Add Esperanto (Eo) ASR Conformer-CTC and Conformer-Transducer models by @andrusenkoau :: PR: #5639

TTS

Changelog
  • [TTS] Fastpitch energy condition and refactoring by @subhankar-ghosh :: PR: #5218
  • [TTS] HiFi-TTS Download Script by @oleksiivolk :: PR: #5241
  • [TTS] Add Mandarin/English Bilingual Recipe for Training Fastpitch Models by @yuekaizhang :: PR: #5208
  • [TTS] fixed type of filepath and rename openslr. by @XuesongYang :: PR: #5276
  • [TTS] replace obsolete torch_tts unit test marker with run_only_on('CPU') by @XuesongYang :: PR: #5307
  • [TTS] bugfix IPAG2P and refactor to remove duplicate process. by @XuesongYang :: PR: #5304
  • Update path to get_data.py in TTS tutorial by @redoctopus :: PR: #5311
  • [TTS] Replace IPA lambda arguments with locale string by @rlangman :: PR: #5298
  • [TTS] expand to support flexible dictionary entry formats in IPAG2P. by @XuesongYang :: PR: #5318
  • [TTS] update organization of model checkpoints and their pointers. by @XuesongYang :: PR: #5327
  • [TTS] bugfix for the script of generating mels from fastpitch. by @XuesongYang :: PR: #5344
  • [TTS] Add Spanish model documentation by @rlangman :: PR: #5390
  • [TTS] Add Spanish FastPitch training configs by @rlangman :: PR: #5383
  • [TTS] replace pitch normalization params with ??? by @XuesongYang :: PR: #5392
  • [TTS] Create script for processing TTS training audio by @rlangman :: PR: #5262
  • [TTS] remove useless logic for set_tokenizer. by @XuesongYang :: PR: #5430
  • [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue by @borisfom :: PR: #5358
  • JOC Optimization in FastPitch by @subhankar-ghosh :: PR: #5450
  • [TTS] Support speaker level pitch normalization by @rlangman :: PR: #5455
  • TTS tutorial update: use speaker 9017 instead of 6097 by @redoctopus :: PR: #5532
  • [TTS] Remove unused TTS eval function by @redoctopus :: PR: #5605
  • [TTS][ZH] add fastpitch and hifigan model NGC urls and update NeMo docs. by @XuesongYang :: PR: #5596
  • [TTS][DOC] add notes about automatic conversion to target sampling ra… by @XuesongYang :: PR: #5624
  • [TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
  • [TTS][ZH] bugfix for ngc cli installation. by @XuesongYang :: PR: #5652
  • [TTS][ZH] fix broken link for the script. by @XuesongYang :: PR: #5666

NLP / NMT

Changelog
  • Option to pad the last validation input sequence if its smaller than the encoder sequence length for MegatronGPT by @anmolgupt :: PR: #5243
  • Fixes bugs with loss averaging with for Megatron GPT by @shanmugamr1992 :: PR: #5329
  • Fixing bug in Megatron BERT when loss mask is all zeros by @shanmugamr1992 :: PR: #5424
  • support to disable sequence length + 1 input tokens for each sample in MegatronGPT by @anmolgupt :: PR: #5363
  • [TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414
  • Bug fix/gpt by @shanmugamr1992 :: PR: #5493
  • prompt tuning fix for unscale grad errors by @arendu :: PR: #5523
  • Bert sequence parallel support by @shanmugamr1992 :: PR: #5494
  • NLP docs fixes by @vsl9 :: PR: #5528
  • Switch order of args in optimizer_step override by @ericharper :: PR: #5549
  • Upgrade to 22.11 by @ericharper :: PR: #5550
  • Merge r1.13.0 main by @ericharper :: PR: #5570
  • some tokenizers do not have additional_special_tokens_ids attribute by @arendu :: PR: #5642
  • Remove cell output from tutorial by @ericharper :: PR: #5689

Text Normalization / Inverse Text Normalization

Changelog
  • [ITN] fix year date graph, cardinals extension for hundreds by @ekmb :: PR: #5435
  • [TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414

Export

Changelog
  • Fixed the onnx bug in conformer for non-streaming models. by @VahidooX :: PR: #5242
  • Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
  • Fixes for Conformer-xl export by @borisfom :: PR: #5309
  • Remove onnx graphsurgery from Dockerfile by @titu1994 :: PR: #5320
  • add exportable mel spec by @1-800-BAD-CODE :: PR: #5512

General Improvements

Changelog
  • bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
  • Fix setting up of learning rate scheduler by @PeganovAnton :: PR: #5444
  • Better patch hydra by @titu1994 :: PR: #5591
  • [TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
  • Add fully torch.jit.script-able speaker clustering module by @tango4j :: PR: #5191
  • Update perturb.py by @stevehuang52 :: PR: #5231
  • remove CV requirements. by @XuesongYang :: PR: #5233
  • checks for accepted adapter type at module level by @arendu :: PR: #5194
  • fix hypotheses return by @nithinraok :: PR: #5253
  • Support for inserting additional subsampling in conformer encoder by @shan18 :: PR: #5224
  • update tutorials to use meeting config as default and VAD by @nithinraok :: PR: #5237
  • Specifying audio signal dropout separately for the Conformer Encoder by @shan18 :: PR: #5263
  • created by @bmwshop :: PR: #5268
  • Fix failing speaker counting for short audio samples by @tango4j :: PR: #5267
  • O2bert + apex pipeline functions by @shanmugamr1992 :: PR: #5221
  • Upperbound PTL by @titu1994 :: PR: #5302
  • Update Interface(s) phonetic entry by @blisc :: PR: #5212
  • add label inference support to EncDecSpeakerLabel class by @nithinraok :: PR: #5278
  • Add italian model checkpoints by @Kipok :: PR: #5315
  • Text Memmap Parsing Improvements by @michalivne :: PR: #5265
  • Update librosa signature in HF processing script by @titu1994 :: PR: #5321
  • Force wav file format for audio_filepath by @titu1994 :: PR: #5323
  • Updates to T0 Dataset and Model by @MaximumEntropy :: PR: #5201
  • [DOC] add sphinx-copybutton requirement to copy button on code snippets. by @XuesongYang :: PR: #5326
  • Add support for Hydra multirun to NeMo by @titu1994 :: PR: #5159
  • typo fix by @arendu :: PR: #5328
  • add precommit hood to automatic sort entries in requirements. by @XuesongYang :: PR: #5333
  • Add speaker clustering arguments to forward function by @tango4j :: PR: #5306
  • Fixing de-autocast by @borisfom :: PR: #5319
  • [Bugfix] Added rm -f / wget- nc command to avoid bash error in multispeaker sim notebook by @tango4j :: PR: #5292
  • [DOC] added ipython dependency to support IPython.sphinxext extension by @XuesongYang :: PR: #5345
  • Bug fix (removing old compute consumed samples) by @shanmugamr1992 :: PR: #5355
  • removed uninstall nemo_cv and nemo_simple_gan and relax numba version… by @XuesongYang :: PR: #5332
  • Enable mlflow logger by @whrichd :: PR: #4893
  • Fix Python type hints according to Python Docs by @artbataev :: PR: #5370
  • Distributed optimizer support for BERT by @timmoon10 :: PR: #5305
  • SpeakerClustering: fix tensor dimennsions in forward() by @virajkarandikar :: PR: #5387
  • add squad by @arendu :: PR: #5407
  • added python and c++ alignment code by @yzhang123 :: PR: #5346
  • Add MoE support for T5 model (w/o expert parallel) by @aklife97 :: PR: #5409
  • Fix for concat map dataset by @1-800-BAD-CODE :: PR: #5133
  • Support for finetuning and finetuning inference with .ckpt files & batch size refactoring by @MaximumEntropy :: PR: #5339
  • update doc in terms of get_label for lang id model by @fayejf :: PR: #5366
  • Debug support for interleaved pipeline parallelism with the distributed Adam optimizer by @timmoon10 :: PR: #5236
  • Create codeql.yml by @titu1994 :: PR: #5445
  • Update codeql.yml by @titu1994 :: PR: #5449
  • Fix support for legacy sentencepiece models by @Numeri :: PR: #5406
  • Update docs with Comparison tool info, and slightly change .sh for ea… by @Jorjeous :: PR: #5182
  • Add float32 type casting for get_samples function by @tango4j :: PR: #5399
  • Add missing import in transcribe_utils.py by @jonghwanhyeon :: PR: #5487
  • Add auto-labeler by @SeanNaren :: PR: #5498
  • Add more glob patterns for labeler by @SeanNaren :: PR: #5504
  • Fix issues with PL 1.8 by @SeanNaren :: PR: #5353
  • [BugFix] Removing tokens from decoding timestamp by @tango4j :: PR: #5481
  • Upperbound the torchmetrics version by @SeanNaren :: PR: #5537
  • Data parallel collect results by @michalivne :: PR: #5547
  • Fix log-rank-0-only logic by @mikolajblaz :: PR: #5555
  • Fixed Docker build by @borisfom :: PR: #5562
  • Patch hydra launch by @titu1994 :: PR: #5589
  • Fix race condition bug with hydra multirun by @titu1994 :: PR: #5594
  • Update Dockerfile to use numba==0.53.1 by @stevehuang52 :: PR: #5614
  • Fixed a missing import for gather_objects by @michalivne :: PR: #5622

v1.13.0

1 year ago

Highlights

NeMo ASR

  • Spoken Language Understanding (SLU) models based on Conformer encoder and transformer decoder
  • Support for codeswitched manifests during training
  • Support for Language ID during inference for ML models
  • Support of cache-aware streaming for offline models
  • Word confidence estimation for CTC & RNNT greedy decoding

NeMo Megatron

  • Interleaved Pipeline schedule
  • Transformer Engine for GPT
  • HF T5v1.1 -> NeMo-Megatron conversion and finetuning/p-tuning
  • IA3 and Adapter Tuning (Tensor + Pipeline Parallel)
  • Pipeline Parallel Support for T5 Prompt Learning
  • MegatronNMT export

NeMo TTS

  • TTS introductory tutorial
  • Phonemizer/espeak removal (Spanish/German)
  • Char-only support for Spanish/German models
  • Documentation Refactor

NeMo Core

  • Upgrade to NGC PyTorch 22.09 container
  • Add pre-commit hooks
  • Exponential moving average (EMA) of weights during training

NeMo Models

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.09

Known Issues

Issues
  • pytest for RadTTSModel_export_to_torchscript are failing intermittently due to random input values. Fixed in main.

ASR

Changelog
  • Add docs tutorial on kinyarwanda asr by @bene-ges :: PR: #4953
  • Asr codeswitch by @bmwshop :: PR: #4821
  • Add test for nested ASR model by @titu1994 :: PR: #5002
  • Greedy decoding confidence for CTC and RNNT by @GNroy :: PR: #4931
  • [ASR][Tools] RIR corpus generator by @anteju :: PR: #4927
  • Add Squeezeformer CTC model checkpoints on Librispeech by @titu1994 :: PR: #5121
  • adding loss normalization options to rnnt joint by @bmwshop :: PR: #4829
  • Asr concat dataloader by @bmwshop :: PR: #5108
  • Added ASR model comparison to SDE by @Jorjeous :: PR: #5043
  • Add scripts for converting Spoken Wikipedia to asr dataset by @bene-ges :: PR: #5138
  • ASR confidence bug fix for older Python versions by @GNroy :: PR: #5180
  • Update ASR Scores and Results by @titu1994 :: PR: #5254
  • [STT] Add Ru ASR Conformer-CTC and Conformer-Transducer by @ssh-meister :: PR: #5340

TTS

Changelog
  • [TTS] Adding speaker embedding conditioning in fastpitch by @subhankar-ghosh :: PR: #4986
  • [TTS] Remove PhonemizerTokenizer by @rlangman :: PR: #4990
  • [TTS] FastPitch speaker interpolation by @subhankar-ghosh :: PR: #4997
  • RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947
  • [TTS] remove phonemizer.py by @XuesongYang :: PR: #5090
  • [TTS] Add NeMo TTS Primer Tutorial by @rlangman :: PR: #4933
  • [TTS] Add SpanishCharsTokenizer by @rlangman :: PR: #5135
  • Fixes for docs/typos + remove max_utts parameter from tarred datasets as it causes hang in training by @Kipok :: PR: #5118
  • refactor TTS documentation organization and add new contents. by @XuesongYang :: PR: #5137
  • [TTS][DOC] update models trained on HifiTTS dataset. by @XuesongYang :: PR: #5173
  • [TTS] Fix TTS Primer image markup by @rlangman :: PR: #5192
  • [TTS] deprecate TextToWaveform base class. by @XuesongYang :: PR: #5205
  • [TTS] remove the avoidance of circular imports by @XuesongYang :: PR: #5214
  • [TTS] remove LinVocoder and apply Vocoder as parent class. by @XuesongYang :: PR: #5206
  • [TTS] unify requirements_tts.txt and requirements_torch_tts.txt by @XuesongYang :: PR: #5232
  • Minor typo fixes in TTS tutorial by @redoctopus :: PR: #5266
  • Radtts 1.13 by @borisfom :: PR: #5451
  • Radtts 1.13 plus by @borisfom :: PR: #5457

NLP / NMT

Changelog
  • IA3 support for GPT and T5 by @arendu :: PR: #4909
  • Fix and refactor consumed samples save/restore for Megatron models. by @MaximumEntropy :: PR: #5077
  • Remove unsupported arguments from MegatronNMT by @MaximumEntropy :: PR: #5065
  • Update megatron interface to dialogue by @Zhilin123 :: PR: #4936
  • gpt ia3 CI tests by @arendu :: PR: #5140
  • Fix NMT Eval Sampler by @aklife97 :: PR: #5154
  • Add interleaved pipeline schedule to GPT by @ericharper :: PR: #5025
  • fix for bug in bignlp by @arendu :: PR: #5172
  • Fixes some args that were not removed properly for multilingual Megatron NMT by @MaximumEntropy :: PR: #5142
  • Fix absolute path in GPT Adapter CI tests by @arendu :: PR: #5184
  • Add ability to configure drop last batch for validation datasets with MegatronGPT by @shanmugamr1992 :: PR: #5067
  • Megatron Export Update by @Davood-M :: PR: #5343
  • Fix GPT generation when using sentencepiece tokenizer by @MaximumEntropy :: PR: #5413
  • Disable sync_batch_comm in validation_step for GPT by @ericharper :: PR: #5397
  • Set sync_batch_comm=False in prompt learning and inference by @MaximumEntropy :: PR: #5448
  • Fix a bug with positional vs key-word based argument passing in the transformer layer by @MaximumEntropy :: PR: #5475

Text Normalization / Inverse Text Normalization

Changelog
  • [Chinese text normalization] speed up graph building by @pengzhendong :: PR: #5128

NeMo Tools

Changelog
  • Added ASR model comparison to SDE by @Jorjeous :: PR: #5043

Export

Changelog
  • Fix export bug by @VahidooX :: PR: #5009
  • RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947
  • Support TorchScript export for Squeezeformer by @titu1994 :: PR: #5164
  • Expose keep_initializers_as_inputs to Exportable class by @pks :: PR: #5052
  • Fix the self-attention export bug for cache-aware streaming Conformer by @VahidooX :: PR: #5114
  • replace ColumnParallelLinear with nn.Linear in export_utils by @arendu :: PR: #5217
  • Megatron Export Update by @Davood-M :: PR: #5343
  • Fix Conformer Export in 1.13.0 (cherry-pick from main) by @artbataev :: PR: #5446
  • export_utils bugfix by @Davood-M :: PR: #5480
  • Export fixes for Riva by @borisfom :: PR: #5496

General Improvements and Bugfixes

Changelog
  • don't use bfloat16 when in jit by @bmwshop :: PR: #5051
  • Set sync_batch_comm=False in prompt learning and inference by @MaximumEntropy :: PR: #5448
  • Fix a bug with positional vs key-word based argument passing in the transformer layer by @MaximumEntropy :: PR: #5475
  • Pin Transformers version to fix CI by @SeanNaren :: PR: #4955
  • Fix changelog builder (#4962) by @titu1994 :: PR: #4963
  • Checkpoint averaging class fix by @michalivne :: PR: #4946
  • Add ability to give seperate datasets for test, train and validation by @shanmugamr1992 :: PR: #4798
  • Add simple pre-commit file by @SeanNaren :: PR: #4983
  • Import pycuda.autoprimaryctx or pycuda.autoinit to init pycuda execut… by @liji-nv :: PR: #4951
  • Improvements to AMI script by @SeanNaren :: PR: #4974
  • clean warnings from tests and CI runs, and prepare for upgrade to PTL 1.8 by @nithinraok :: PR: #4830
  • Update libraries by @titu1994 :: PR: #5010
  • add close inactive issues and PRs github action. by @XuesongYang :: PR: #5015
  • Fix filename extraction in vad_utils.py by @GKPr0 :: PR: #4999
  • Add black to pre-commit by @SeanNaren :: PR: #5027
  • [CI] Enable previous build abort when new commit pushed by @SeanNaren :: PR: #5041
  • Tutorials and Docs for Multi-scale Diarization Decoder by @tango4j :: PR: #4930
  • Refactor output directory for MSDD Inference Notebook by @SeanNaren :: PR: #5044
  • text_memmap dataset index range testing fix by @michalivne :: PR: #5034
  • fix undefined constant in code example by @bene-ges :: PR: #5046
  • Text generation refactor and RETRO text generation implementation by @yidong72 :: PR: #4985
  • Lids by @bmwshop :: PR: #4820
  • Add datasets folder, add diarization datasets voxconverse/aishell by @SeanNaren :: PR: #5042
  • Fix the bugs in cache-aware streaming Conformer by @VahidooX :: PR: #5032
  • Bug fix - Limit val batches set to 1.0 by @shanmugamr1992 :: PR: #5023
  • [bug_fix] kv_channels is used when available by @arendu :: PR: #5066
  • Add spe_split_by_unicode_script arg by @piraka9011 :: PR: #5072
  • Transformer Engine Integration by @ericharper :: PR: #5104
  • Text memmap dataset index memory efficiency by @michalivne :: PR: #5056
  • Add NGC links for Aligner and FastPitch by @redoctopus :: PR: #5235
  • Fix link to inference notebook by @redoctopus :: PR: #5247
  • Fix links to speaker identification notebook by @SeanNaren :: PR: #5260
  • Fix bug into Dialogue tutorial by @Zhilin123 :: PR: #5277
  • PCLA tutorial typo fix by @jubick1337 :: PR: #5288
  • Fix dialogue tutorial bug by @Zhilin123 :: PR: #5297
  • small bugfix for r1.13.0 by @fayejf :: PR: #5310
  • Add italian model checkpoints by @Kipok :: PR: #5316
  • Pcla tutorial fixes by @jubick1337 :: PR: #5313
  • Fix issue with HF Model upload tutorial by @titu1994 :: PR: #5359
  • P&C LA tutorial fixes by @jubick1337 :: PR: #5354
  • Add SDP documentation by @erastorgueva-nv :: PR: #5274
  • [Bugfix] Added rm -f / wget- nc command in multispeaker sim notebook to r1.13.0 by @tango4j :: PR: #5375
  • Rename Speech Dataset Processor to Speech Data Processor by @erastorgueva-nv :: PR: #5378
  • fix for num worker 0 causing issues in losses after 1 epoch by @arendu :: PR: #5379
  • Fixed bug in notebook by @vadam5 :: PR: #5382
  • Force MHA QKV onto fp32 by @titu1994 :: PR: #5391
  • Fix for prompt table restore error by @vadam5 :: PR: #5393
  • Fix activation checkpoint args for T5 by @MaximumEntropy :: PR: #5410
  • Temporary hard code fix in PTL for CUDA Error by @yaoyu-33 :: PR: #5421
  • disable pc test by @ekmb :: PR: #5426
  • Revert Temporary hard code fix in PTL for CUDA Error by @yaoyu-33 :: PR: #5431
  • Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False by @MaximumEntropy :: PR: #5420
  • Add num layers check for full activation checkpointing by @MaximumEntropy :: PR: #5470
  • Cherry Pick T5 finetuning changes into 1.13 by @MaximumEntropy :: PR: #5478
  • T5 Eval bugfix by @Davood-M :: PR: #5521
  • added set_start_method + function param bugfix by @Davood-M :: PR: #5539
  • Remove notebook by @ericharper :: PR: #5548
  • Remove broadcast from T5 prompt learning inference by @MaximumEntropy :: PR: #5558
  • Fix all gather while writing to a file during T5 finetuning by @MaximumEntropy :: PR: #5561

v1.12.0

1 year ago

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.08

ASR

Changelog
  • Add support for RNNT Char/Word Timestamp Calculation by @titu1994 :: PR: #4665
  • add conditional logic to rnnt_wer to handle when arrays have no elements by @mgoldey :: PR: #4776
  • fix handling of the final word for rnnt word timestamps by @mgoldey :: PR: #4779
  • amend rnnt word timestamps by @mgoldey :: PR: #4782
  • fix type error in rnnt_wer.py, rnnt_wer_bpe.py, wer_bpe.py by @hainan-xv :: PR: #4822
  • add kab language asr models by @nithinraok :: PR: #4819
  • [Tutorial][ASR][Fix] Data paths in ASR with NeMo tutorial by @anteju :: PR: #4845
  • [ASR] Fix for multi-channel signals in AudioSegment by @anteju :: PR: #4824
  • [ASR] Generate multichannel noise by @anteju :: PR: #4870
  • Fix asr model order by @nithinraok :: PR: #4959
  • Fix ASR issues by @titu1994 :: PR: #4984
  • Fix diarization ASR inference link in notebook by @SeanNaren :: PR: #5016
  • Code switching by @KunalDhawan :: PR: #4784
  • Release SOTA Lang ID model by @fayejf :: PR: #5080
  • Stateless decoder for RNN-T by @hainan-xv :: PR: #4710

TTS

Changelog
  • [TTS] use consistent spline interpolation for fastpitch and hifigan. by @XuesongYang :: PR: #4679
  • TTS tokenizers moved to collections.common.tokenizers by @AlexGrinch :: PR: #4690
  • [TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781
  • ARP to IPA mapping, g2p_encode for IPATokenizer by @ekmb :: PR: #4850
  • IPA G2P bugfixes by @redoctopus :: PR: #4869
  • [TTS] add missing WikiHomograph data entries to CMUdict, updates to match new ipa set by @ekmb :: PR: #4886
  • [TTS] fix wrong g2p path. by @XuesongYang :: PR: #4902
  • [TTS] FastPitch training: speed up align_prior_matrix calculation by @racoiaws :: PR: #4718
  • [TTS] fix broken tutorial for MixerTTS. by @XuesongYang :: PR: #4949
  • [TTS] bugfix 'EnglishPhonemesTokenizer' object has no attribute 'encode_from_g2p' by @XuesongYang :: PR: #4992
  • [TTS] added missing German phoneme tokenizer by @XuesongYang :: PR: #5070
  • [TTS] fixed wrong val loss for epoch 0 and inconsistent metrics names by @XuesongYang :: PR: #5087

NLP / NMT

Changelog
  • Fix bug intent slot classification tokenizer to dialogue by @Zhilin123 :: PR: #4694
  • Intent slot model onnx export test by @Zhilin123 :: PR: #4731
  • Fix megatron p tuning notebook by @nithinraok :: PR: #4741
  • Add support for Apex distributed Adam optimizer with GPT-3 by @timmoon10 :: PR: #4487
  • Fixes NLPModel's load from checkpoint due to PTL private function changes by @MaximumEntropy :: PR: #4755
  • Adapter tuning for Megatron GPT models by @arendu :: PR: #4717
  • Megatron Encoder Decoder models with RPE and PP > 2 by @MaximumEntropy :: PR: #4663
  • add kab language asr models by @nithinraok :: PR: #4819
  • add chinese to language doc and fix bug by @yzhang123 :: PR: #4834
  • Spoken Language Identification by @fayejf :: PR: #4846
  • Fix decoding bug for megatron enc-dec models with O2 by @MaximumEntropy :: PR: #4989
  • Updating Megatron LM conversion according to PTL 1.7 by @Davood-M :: PR: #5038
  • Adding RETRO model Faiss sharding index and KNN sharding index by @yidong72 :: PR: #4713
  • MLP Prompt Learning Encoder by @vadam5 :: PR: #4849
  • Update the prompt learning to handle large lanague model by @yidong72 :: PR: #4906

Text Normalization / Inverse Text Normalization

Changelog
  • [TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781
  • [Chinese text normalization]Chinese TN part in text_normalization by @mzxcpp :: PR: #4826
  • Fix zh tn by @yzhang123 :: PR: #5035
  • Bug fixes for parallel mp3 to wav conversion, PC notebook, update Readme for TN requirements by @ekmb :: PR: #5047
  • Added P&C lexical audio model by @jubick1337 :: PR: #4802

Export

Changelog
  • Intent slot model onnx export test by @Zhilin123 :: PR: #4731

General Improvements

Changelog
  • Fix logger reference by @SeanNaren :: PR: #4786

  • Fix error with class method reference in msdd by @SeanNaren :: PR: #4865

  • Add sync for logging calls to ensure aggregation across devices by @SeanNaren :: PR: #4876

  • Fix saving the last checkpoint when using val check interval by @SeanNaren :: PR: #4905

  • Add support for skipping validation on resume + extend saving last ckpt test by @SeanNaren :: PR: #4922

  • Move trainer calls for ssl models to training and validation steps only by @sam1373 :: PR: #4685

  • Change Num Partitions size expansion fix by @aklife97 :: PR: #4719

  • upgrade to PTL 1.7 by @nithinraok :: PR: #4672

  • Fixing outputs of infer() and use of NeMo length regulator helper by @borisfom :: PR: #4724

  • bug fix: enable async grad reduction when DP > 1 by @erhoo82 :: PR: #4740

  • Add LayerNorm1P, weight decay for LN and unscaled initialization by @mikolajblaz :: PR: #4743

  • Data Simulator by @chooper1 :: PR: #4686

  • jenkins data simulator fix by @nithinraok :: PR: #4751

  • Mutiscale Diarization Decoder (MSDD) model and module files by @tango4j :: PR: #4650

  • Fix logging in gradient clipping with PTL 1.7.2 by @MaximumEntropy :: PR: #4769

  • Fix checkpoint restoring by @nithinraok :: PR: #4777

  • avoid data clipping after convolution with rir samples by @nithinraok :: PR: #4806

  • Fixed in_features dim if bidirectional is True by @farisalasmary :: PR: #4588

  • Fix float/integer type error in WER.update() by @fujimotos :: PR: #4816

  • [Speech Data Explorer] An option to explicitly specify the base dir by @anteju :: PR: #4678

  • adding instancenorm as an option for conv normalization by @bmwshop :: PR: #4827

  • Fix small spelling mistakes by @SeanNaren :: PR: #4839

  • [Tutorials] Fix matplotlib version and directory name in Multispeaker_Simulator by @anteju :: PR: #4804

  • Update diarization folder structure by @tango4j :: PR: #4823

  • Missing types in clustering by @SeanNaren :: PR: #4858

  • add new models by @Jorjeous :: PR: #4852

  • Fix decoding for T5 models with RPE by @MaximumEntropy :: PR: #4847

  • Update Speaker Diarization notebooks with unknown oracle_num_speakers by @fayejf :: PR: #4861

  • Fix mha bug by @yzhang123 :: PR: #4859

  • Updates to adapter training by @arendu :: PR: #4842

  • Changes to MSDD code after review, fix test log call by @SeanNaren :: PR: #4881

  • Fixed output of BERT to be [batch x seq x hidden] by @michalivne :: PR: #4887

  • Add AMI dataset script by @SeanNaren :: PR: #4864

  • Update label_models.py by @stevehuang52 :: PR: #4891

  • Update tutorials.rst for question answering by @Zhilin123 :: PR: #4895

  • removed unused imports for all domains. by @XuesongYang :: PR: #4901

  • Fix ptl_load_state not providing cls by @MaximumEntropy :: PR: #4914

  • Remove unused cv collection by @okuchaiev :: PR: #4907

  • Add mixed-representation config to PhonemizerTokenizer by @rlangman :: PR: #4904

  • Fix implicit bug in _AudioLabelDataset by @stevehuang52 :: PR: #4923

  • Fix and refactor label models by @fayejf :: PR: #4913

  • Sparrowhawk deployment fix by @ekmb :: PR: #4928

  • Upgrade to NGC PyTorch 22.08 Container by @ericharper :: PR: #4929

  • Fixes for Cherry Picked PRs by @titu1994 :: PR: #4962

  • Fix cherry pick workflow by @ericharper :: PR: #4964

  • check for active conda environment by @nithinraok :: PR: #4970

  • fix label models restoring issue from weighted cross entropy by @nithinraok :: PR: #4968

  • Add simple pre-commit file (#4983) by @SeanNaren :: PR: #4995

  • Fix bug in Squeezeformer Conv block by @titu1994 :: PR: #5011

  • Fix bugs by @Zhilin123 :: PR: #5036

  • Add black to pre-commit (#5027) by @SeanNaren :: PR: #5045

  • Fix bug in question answering tutorial by @Zhilin123 :: PR: #5049

  • Missing fixes from r1.11.0 to T5 finetuning eval by @MaximumEntropy :: PR: #5054

  • P&C docs by @jubick1337 :: PR: #5068

  • probabilites -> probabilities by @nithinraok :: PR: #5078

  • Notebook bug fixes by @vadam5 :: PR: #5084

  • update strategy in notebook from ddp_fork to dp by @Zhilin123 :: PR: #5088

  • Fix Unhashable type list for Numba Cuda spec augment kernel by @titu1994 :: PR: #5093

  • Remove numba import by @titu1994 :: PR: #5095

  • T5 prompt learning fixes missing from r.11.0 merge by @MaximumEntropy :: PR: #5075

  • T5 Decoding with PP > 2 fix by @MaximumEntropy :: PR: #5091

  • Multiprocessing fix by @jubick1337 :: PR: #5106

  • [Bug fix] PC lexical + audio by @ekmb :: PR: #5109

  • bugfix: pybtex.database.InvalidNameString: Too many commas in author … by @XuesongYang :: PR: #5112

v1.11.0

1 year ago

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.07

ASR

Changelog
  • Add ASR CTC Decoding module by @titu1994 :: PR: #4342
  • Fixing bugs in calling method ctc_decoder_predictions_tensor. by @VahidooX :: PR: #4414
  • Fixed WER initialization in ASR_with_Nemo notebook by @anteju :: PR: #4523
  • Update signature of Hypothesis alignments by @titu1994 :: PR: #4511
  • Add support for ASR Adapter Auxiliary Losses by @titu1994 :: PR: #4480
  • Catalan ASR NGC Resource by @stevehuang52 :: PR: #4576
  • Add kw asr models, add itn ru checkpoint (tagger-based) by @bene-ges :: PR: #4595
  • Add DALI char dataset support to SSL model by @piraka9011 :: PR: #4592
  • Customize arguments for trimming the leading/trailing silence by @XuesongYang :: PR: #4582
  • Update Offline ASR with CTC Decoding by @titu1994 :: PR: #4608
  • Add Squeezeformer to ASR by @titu1994 :: PR: #4416
  • Fix ASR notebooks by @titu1994 :: PR: #4738
  • Add pretrained ASR models for Croatian by @anteju :: PR: #4682
  • Dataloader, collector, loss and metric for multiscale diarization decoder by @tango4j :: PR: #4187
  • Multilingual VAD model by @fayejf :: PR: #4734
  • Adding support for models trained with full context for cache-aware streaming. by @VahidooX :: PR: #4687
  • Fp16 support for Conformer by @bmwshop :: PR: #4571
  • Tiny VAD refactoring for postprocessing by @fayejf :: PR: #4625
  • Add silence handling for speaker diarization pipeline by @nithinraok :: PR: #4512
  • Add Bucketing support to TarredAudioToClassificationLabelDataset by @entn-at :: PR: #4465

TTS

Changelog
  • Wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4388
  • Pitch, voiced_mask, prob_voiced have the same values which is not expected. by @XuesongYang :: PR: #4392
  • Add static method decorator. by @XuesongYang :: PR: #4443
  • Fix typo in HiFi-GAN config's max steps by @XuesongYang :: PR: #4450
  • Relaxed support for both CPUs and GPUs by @XuesongYang :: PR: #4461
  • Multi-speaker fastpitch model training recipe on HUI-Audio-Corpus-German by @XuesongYang :: PR: #4413
  • Created the finetuning Hifigan 44100Hz recipe on HUI-Audio-Corpus-German by @XuesongYang :: PR: #4478
  • Fix dataset parameter typo on tacotron2 example yaml by @saarus72 :: PR: #4471
  • Update cmudict by @jasro23 :: PR: #4510
  • Customize arguments for trimming the leading/trailing silence by @XuesongYang :: PR: #4582
  • Fix off-by-1 bug in Beta Binomial Prior by @rlangman :: PR: #4616
  • G2P Aligner by @redoctopus :: PR: #4604
  • RADTTS ADLR-NEMO porting by @MikyasDesta :: PR: #4538
  • Fixed wrong pronunciations for r1.11. by @XuesongYang :: PR: #4677
  • Incremented the version number to 22.08 in tutorials. by @XuesongYang :: PR: #4684
  • Bugfix for missing configs. by @XuesongYang :: PR: #4725
  • Fix pynini install in TTS tutorials by @redoctopus :: PR: #4729
  • Updated config with a German IPA phoneme tokenizer by @XuesongYang :: PR: #4756
  • Add multi-speaker German FastPitch and HiFiGAN NGC checkpoints by @XuesongYang :: PR: #4763
  • Add single male speaker German FastPitch and HiFiGAN NGC checkpoints by @XuesongYang :: PR: #4770
  • Deprecated old scripts for ljspeech. by @XuesongYang :: PR: #4780
  • Fix MixerTTS data loading index error by @redoctopus :: PR: #4811
  • G2P docs by @ekmb :: PR: #4841
  • NMESC speaker counting algorithm update by @tango4j :: PR: #4500

NLP / NMT

Changelog
  • Add O2 support for RETRO model by @yidong72 :: PR: #4411
  • Add MTEncDec Finetune support by @aklife97 :: PR: #4540
  • Fix metric setup for finetuning without a test set by @MaximumEntropy :: PR: #4585
  • T0 model and dataset by @MaximumEntropy :: PR: #4598
  • Add prompt learning for T5 by @HeyyyyyyG :: PR: #4391
  • Add MuTransfer Capablity to RETRO model pretraining by @yidong72 :: PR: #4643
  • Label Smoothing in VocabParallelCrossEntropy by @MaximumEntropy :: PR: #4602
  • Megatron BART BOS / EOS bug fix by @michalivne :: PR: #4495
  • GPT Prompt Learning Improvements by @vadam5 :: PR: #4496
  • Megatron perceiver with tensor parallelism only by @MaximumEntropy :: PR: #4318
  • Refactor for punctuation model by @jubick1337 :: PR: #4367
  • Update megatron prompt learning interface to dialogue by @Zhilin123 :: PR: #4545
  • Removed NLPDDPPlugin Import check by @vadam5 :: PR: #4555
  • Option to disregard document boundaries for t5, bart, ul2 by @MaximumEntropy :: PR: #4481
  • Add Tokenization and Normalization pre-proecssing script for NMT by @aklife97 :: PR: #4557
  • Integrating support for GPT/T5/BART for Question Answering by @ameyasm1154 :: PR: #4532
  • NeMo Megatron: Add sequence parallelism and selective activation checkpointing (rebased) by @ericharper :: PR: #4380
  • Update megatron t5 interface to dialogue by @Zhilin123 :: PR: #4626
  • Additional sentencepiece args - Byte fallback, split digits, split_on_whitespace by @MaximumEntropy :: PR: #4525
  • Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training by @MaximumEntropy :: PR: #4396
  • NeMo Megatron Doc updates1 by @okuchaiev :: PR: #4633
  • Asymmetric Encoder and Decoder Configuration for Megatron Models by @MaximumEntropy :: PR: #4568
  • Add sentencepiece legacy arg to megatron tokenizer configs by @MaximumEntropy :: PR: #4659
  • Megatron encode function with RPE fix by @MaximumEntropy :: PR: #4692
  • Updates to NeMo Megatron OSS docs by @okuchaiev :: PR: #4709
  • Changes to make Megatron NMT exportable by @Davood-M :: PR: #4499
  • fix bug relating to ddp strategy in joint intent slot classification … by @Zhilin123 :: PR: #4762
  • Fix qa notebook typos and branch by @ericharper :: PR: #4788
  • Colab py37 compatibility megatron by @Zhilin123 :: PR: #4791
  • added/fixed export for Megatron models by @Davood-M :: PR: #4712
  • Fix providing glue in seq2seq eval by @MaximumEntropy :: PR: #4843
  • Fix Megatron NMT consumed samples and ckpt_to_nemo split rank by @MaximumEntropy :: PR: #4884
  • Fixing Megatron BERT output dimensions to [batch x sec x hidden] by @michalivne :: PR: #4894
  • Prompt Learning Inference Improvements by @vadam5 :: PR: #4566
  • MegaMolBART Compatibility by @michalivne :: PR: #4603

Text Normalization / Inverse Text Normalization

Changelog
  • Add ITN pt by @guidefloripa :: PR: #4516
  • add kw asr models, add itn ru checkpoint (tagger-based) by @bene-ges :: PR: #4595
  • Fix ITN pt by @guidefloripa :: PR: #4623
  • Bug fix hundred in Audio-based, added method so split text in sentences by @ekmb :: PR: #4610
  • Fix itn pt time by @guidefloripa :: PR: #4630
  • Pin lightning version to be < 1.7.0 by @MaximumEntropy :: PR: #4660
  • G2P for OOV and heteronyms by @ekmb :: PR: #4624
  • Publish pretrained itn t5 model for English by @bene-ges :: PR: #4748
  • Added MLM Scoring by @yzhang123 :: PR: #4476

Export

Changelog
  • update fastpitch to add export controls by @blisc :: PR: #4509
  • Fix Fastpitch Export by @blisc :: PR: #4676
  • Changes to make Megatron NMT exportable by @Davood-M :: PR: #4499
  • Added/fixed export for Megatron models by @Davood-M :: PR: #4712

Bugfixes

Changelog
  • Wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4388
  • Pitch, voiced_mask, prob_voiced have the same values which is not expected. by @XuesongYang :: PR: #4392
  • Fix tarred dataset len when num shards is not divisible by workers by @itzsimpl :: PR: #4553
  • Fix multiple dev/test datasets after restoring from checkpoint by @PeganovAnton :: PR: #4636
  • Fix/need different cache dirs for different datasets by @PeganovAnton :: PR: #4640
  • Improve mAES algorithm with patches by @titu1994 :: PR: #4662

General Improvements

Changelog
  • Option to disable mp in VAD via num_workers=1 by @gkucsko :: PR: #4317
  • Remove redundant bias expand by @xrennvidia :: PR: #4382
  • Add option for specifying wandb save_dir from config by @shan18 :: PR: #4379
  • Quick wav2vec fix. In-place operation adding convolutional positions … by @bonham79 :: PR: #4383
  • Fixing import error in some cases by @borisfom :: PR: #4401
  • Update with new conformer checkpoints. by @VahidooX :: PR: #4417
  • Wav2vec fix by @bonham79 :: PR: #4467
  • Relative Audio Paths by @stevehuang52 :: PR: #4470
  • Allow Noam lr scheduler to run for more than max_steps by @alancucki :: PR: #4472
  • Support for Different LRs with Param Groups by @stevehuang52 :: PR: #4508
  • Fix runtime check by @borisfom :: PR: #4501
  • Update finetune label models by @nithinraok :: PR: #4504
  • Weighted bucketing by @tbartley94 :: PR: #4530
  • Relative Audio Path by @stevehuang52 :: PR: #4520
  • Fix duplex inference with grammars by @ekmb :: PR: #4517
  • Add nsys profiling by @ericharper :: PR: #4539
  • Remove the variable that is not used in the context. by @XuesongYang :: PR: #4547
  • Adding multispeaker fastpitch and hifigan en model links to available… by @subhankar-ghosh :: PR: #4550
  • Add length ratio filtering script by @MaximumEntropy :: PR: #4551
  • Relative audio path in speech data explorer by @anteju :: PR: #4570
  • Dividing generative question-answering CI tests by @ameyasm1154 :: PR: #4600
  • Updating the default parameters in the example adapters config file by @shan18 :: PR: #4607
  • Improve normalize_batch ValueError message by @piraka9011 :: PR: #4614
  • Support listing Hugging Face model info by @titu1994 :: PR: #4619
  • Update diarization data loader to train meeting data by @tango4j :: PR: #4567
  • Fix HF check for model card info by @titu1994 :: PR: #4628
  • Add Github Action for auto webpage build by @titu1994 :: PR: #4645
  • Empty commit by @titu1994 :: PR: #4646
  • Force git config for doc build by @titu1994 :: PR: #4647
  • Correct branch name for github page source by @titu1994 :: PR: #4648
  • Adding lang id to shard by @bmwshop :: PR: #4649
  • Fix special tokens in vocab to arguments of constructor by @gwarmstrong :: PR: #4631
  • Fix apex for r1.11 by @michalivne :: PR: #4666
  • Update readme by @nithinraok :: PR: #4667
  • Removed trailing spaces in CI test by @vadam5 :: PR: #4671
  • Pynini dependency fix by @ekmb :: PR: #4674
  • Fix for incorrect batch size issue while decoding by @rilango :: PR: #4675
  • Fix to fetch config file by @nithinraok :: PR: #4699
  • Fix notebook for buffered inference by @titu1994 :: PR: #4703
  • Prompt Learning Notebook Bug Fix by @vadam5 :: PR: #4689
  • Add psutils to mock imports by @ericharper :: PR: #4728
  • Update Aligner model and tutorial to add NGC checkpoint loading by @redoctopus :: PR: #4714
  • Updated docs and doc paths by @vadam5 :: PR: #4754
  • Update r1.11 to new heteronyms list by @redoctopus :: PR: #4745
  • Update CMUdict with more recent 0.7b entries by @redoctopus :: PR: #4768
  • Add pynini to Docker container by @artbataev :: PR: #4733
  • Fix tutorial formatting by @redoctopus :: PR: #4778
  • Fix initializing weights from ptl ckpt with exclude by @sam1373 :: PR: #4807
  • T5 prompt learning fixes by @MaximumEntropy :: PR: #4771
  • Updated inference code and squad scripts by @vadam5 :: PR: #4835
  • Fix uppercasing mismatch for IPA heteronyms by @redoctopus :: PR: #4860
  • Set the number of workers to 0 for validation and test sets in all enc-dec models by @MaximumEntropy :: PR: #4790
  • Fix mha by @yzhang123 :: PR: #4866
  • ipa bug fix by @ekmb :: PR: #4871
  • Added utf8 encoding by @vadam5 :: PR: #4892
  • Fix question answering docs r1p11 by @Zhilin123 :: PR: #4897

v1.10.0

1 year ago

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.05

Known Issues

Issues
  • Tutorial: Fastpitch_Training_GermanTTS.ipynb is experimental and still being tested.

ASR

Changelog
  • Multilang asr tutorial by @bmwshop :: PR: #3931
  • Add ASR with Adapters Tutorial by @titu1994 :: PR: #4149
  • Add support for Decoder + Joint Adapters for ASR by @titu1994 :: PR: #4189
  • updating PretrainedModelInfo and benchmark sheet for ASR models by @krishnacpuvvada :: PR: #4259
  • Remove verbose flag from Dali Index Creator by @titu1994 :: PR: #4309
  • updating PretrainedModelInfo for ASR SSL models by @krishnacpuvvada :: PR: #4292
  • Adding docs for ASR SSL by @krishnacpuvvada :: PR: #4303
  • Add ASR Scores to Docs by @titu1994 :: PR: #4412
  • [ASR] Replace all paths with /content/ by @titu1994 :: PR: #4427
  • added conformer mandarin model. by @VahidooX :: PR: #4201
  • Runtime audio segment sampling for SSL by @krishnacpuvvada :: PR: #4126

TTS

Changelog
  • [TTS] Add volume passthrough to fp for riva by @blisc :: PR: #4167
  • Update TTS Configs from LAMB to AdamW by @redoctopus :: PR: #4233
  • Add benchmark=false to all TTS configs by @redoctopus :: PR: #4263
  • [TTS] add staticmethod decoration for BetaBinomialInterpolator by @XuesongYang :: PR: #4319
  • [TTS] capture exception of non-supported windows. by @XuesongYang :: PR: #4320
  • [TTS] enforced pin_memory = True by @XuesongYang :: PR: #4341
  • [TTS] Training Fastpitch on German text and phonemes and finetuning HiFi-GAN on predicted mels by @aroraakshit :: PR: #4266
  • IPA support for TTS by @redoctopus :: PR: #4310
  • Bits of RADTTS support by @borisfom :: PR: #4343

NLP / NMT

Changelog
  • Megatron NMT Restore from T5/BART and finetune by @MaximumEntropy :: PR: #3977
  • Binarized memmap dataloader for Megatron NMT, Inference and checkpoint -> nemo by @MaximumEntropy :: PR: #4137
  • Use unique names for temporary directories in punctuation and capitalization tests by @PeganovAnton :: PR: #4298
  • Removes debug logging statements in Megatron NMT by @MaximumEntropy :: PR: #4312
  • Raise error if trainer object is None for MegatronBaseModel by @MaximumEntropy :: PR: #4356
  • Punctuation and capitalization tests race condition by @PeganovAnton :: PR: #4399
  • unify intent slot dataset util functions in tutorials by @Zhilin123 :: PR: #4445
  • Fix for TP=2,PP=2 decoding with megatron encoder-decoder models by @MaximumEntropy :: PR: #4484
  • Add RETRO model for pretraining by @yidong72 :: PR: #4121
  • Add async grad allreduce and chunk optimization by @xrennvidia :: PR: #4084
  • Implements the UL2 Dataset and config by @MaximumEntropy :: PR: #4184
  • Add RETRO indexed dataset and inference by @yidong72 :: PR: #4220
  • Finetune T5 on the prefix-lm objective by @MaximumEntropy :: PR: #4328
  • Fuse bias with geglu in ParallelMLP by @xrennvidia :: PR: #4213
  • Support larger datasets for question answering by @Zhilin123 :: PR: #4205
  • Refactor bias act fusion by @MaximumEntropy :: PR: #4376
  • Prompt Learning Pipeline Parallel by @vadam5 :: PR: #4291
  • Text memmap dataset by @michalivne :: PR: #4068
  • Fuse grad division into async grad allreduce by @xrennvidia :: PR: #4327

Text Normalization / Inverse Text Normalization

Changelog
  • [TN] WFST to normalize punctuation by @ekmb :: PR: #4108
  • [TN/TTS] Add graph to tag IPA words/sentences in square brackets and leave them unchanged by @ekmb :: PR: #4323
  • Tn tutorial by @yzhang123 :: PR: #4090
  • [TN] WFST to normalize punctuation by @ekmb :: PR: #4108
  • Tn add rules by @yzhang123 :: PR: #4302
  • [TN/TTS] Add graph to tag IPA words/sentences in square brackets and leave them unchanged by @ekmb :: PR: #4323
  • Tn install by @yzhang123 :: PR: #4055
  • Fix electronic bug, new time ITN rule by @ekmb :: PR: #4355
  • [TN] Bug fix: expand serial coverage of unknown symbol, remove constraints from word graph by @ekmb :: PR: #4463
  • Configure T5 finetuning metrics by @MaximumEntropy :: PR: #4122

Export

Changelog
  • Added support for subnet export by @borisfom :: PR: #4299

Core

Changelog
  • Add Module-level Adapters, Save-Restore and tests by @titu1994 :: PR: #4114
  • Add NeMo Adapters tutorial to Core by @titu1994 :: PR: #4311
  • NeMo Model to HF Hub Upload Tutorial by @titu1994 :: PR: #4322

General Improvements and Fixes

Changelog
  • Update container to 22.05 by @ericharper :: PR: #4329
  • Fix PTL step calculation by @titu1994 :: PR: #4307
  • [NLP] P&C Fix multi node cache issue, add pynini guard by @ekmb :: PR: #4410
  • NeMo Megatron GPT Unit Tests by @ericharper :: PR: #4099
  • Add the PP2 GPT eval CI test by @yidong72 :: PR: #4168
  • BigNLP perf regression fix by @MaximumEntropy :: PR: #4267
  • Fixes for Megatron Base Model Artifacts by @MaximumEntropy :: PR: #4248
  • Fix a wrong description in offline_diarization_with_asr.yaml by @tango4j :: PR: #4141
  • bugfix for import error in Offline_ASR_with_VAD_for_CTC_models by @fayejf :: PR: #4424
  • [Fix] ASR RNNT Tutorial by @stevehuang52 :: PR: #4352
  • [TTS] Fix Hifigan finetune tutorial by @subhankar-ghosh :: PR: #4182
  • [Bugfix][TTS] wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4432
  • [bugfix][TTS] pitch, voiced_mask, prob_voiced have the same values. by @XuesongYang :: PR: #4435
  • [TTS] [bugfix] German FastPitch HiFi-GAN tutorial and lr by @aroraakshit :: PR: #4459
  • [TTS] [bugfix] update indentation by @aroraakshit :: PR: #4468
  • Fix some 's' cases for IPA G2P by @redoctopus :: PR: #4460
  • Fix ASR Typos in tutorials by @titu1994 :: PR: #4384
  • Use unique names for temporary directories in punctuation and capitalization tests by @PeganovAnton :: PR: #4298
  • Punctuation and capitalization tests race condition by @PeganovAnton :: PR: #4399
  • Dialogue tasks unit test by @Zhilin123 :: PR: #4112
  • fix error by @yzhang123 :: PR: #4120
  • fix typo by @stevehuang52 :: PR: #4134
  • Fix cmudict typo: phoneme YI1 -> IY1 in NVME by @redoctopus :: PR: #4139
  • transcribe: scan directories recursively by @virajkarandikar :: PR: #4159
  • Add 44KHz yaml file for Fastpitch training by @subhankar-ghosh :: PR: #4161
  • [bugfix] consistent highfreq to both fastpitch and hifigan in their 44100 configs. by @XuesongYang :: PR: #4177
  • Upperbound OmegaConf by @titu1994 :: PR: #4191
  • Prompt tokenization bugfix by @vadam5 :: PR: #4197
  • Updated to Prompt Learning Model to Use Distributed Sampler by @vadam5 :: PR: #4208
  • Freesound fixes by @virajkarandikar :: PR: #4155
  • Patch Hydra by @titu1994 :: PR: #4202
  • Prompt Learning Model Saving Changes by @vadam5 :: PR: #4212
  • Speakertasks manifest by @yzhang123 :: PR: #4185
  • SSL Multi-loss Update by @sam1373 :: PR: #4186
  • Support load_adapters with just adapter_name by @titu1994 :: PR: #4255
  • Add special tokens to existing (trained) SentencePiece models by @aklife97 :: PR: #4203
  • Fixing the speed slow-down for speech models. by @VahidooX :: PR: #4260
  • Fix and add functions in speaker utils by @tango4j :: PR: #4138
  • pt container 1.10->1.11.0 by @ekmb :: PR: #4273
  • ssl fixes by @sam1373 :: PR: #4268
  • Save Virtual Prompt Weights Only by @vadam5 :: PR: #4237
  • add 'relative positional embedding (RPE)' feature - re-creating after… by @khcs :: PR: #4256
  • Docs CSS: Update h4 tag style for the right side bar by @nickolyamba :: PR: #4284
  • Fix Docs CSS: align docs left and increase width for large screens by @nickolyamba :: PR: #4154
  • remove redundant condition for fastpitch. by @XuesongYang :: PR: #4281
  • [Add] automaticly resolving relative audio path by @stevehuang52 :: PR: #4277
  • forcing conv subsampling to 32 bit by @bmwshop :: PR: #4293
  • Add library name and version when downloading from the Hugging Face Hub by @osanseviero :: PR: #4304
  • clear access registry when adding if not empty by @sam1373 :: PR: #4306
  • [collections] bugfix for capturing NotImplementedError of non-supported sup data types. by @XuesongYang :: PR: #4297
  • Adjust lr for AdamW from LAMB default by @redoctopus :: PR: #4308
  • Fix bugs in indexed dataset exam script by @yidong72 :: PR: #4325
  • Torchaudio installation fix by @GNroy :: PR: #4330
  • Speedup the speech commands dataset processing script by @shan18 :: PR: #4347
  • fix wrong requirement by @yzhang123 :: PR: #4349
  • Refactored path to manifest by @treacker :: PR: #4251
  • Fix the post LN bug by @yidong72 :: PR: #4350
  • [Fix] Hanging for Fully Randomized Bucketing by @stevehuang52 :: PR: #4348
  • Auto-switch the input dimensions in the conformer encoder adapter to correct value by @shan18 :: PR: #4354
  • Set headscale false by @MaximumEntropy :: PR: #4364
  • Add wandb as dependency by @titu1994 :: PR: #4365
  • Fix trainer.global_steps in WandB logging by @titu1994 :: PR: #4366
  • Finetuning changes for BART by @MaximumEntropy :: PR: #4003
  • Make position embedding expansion specific to a batch to avoid checkpoint size mismatches by @MaximumEntropy :: PR: #4357
  • Correct support for dataclasses in default module dim by @titu1994 :: PR: #4372
  • Fix no attribute 'pad_id' bug when pre-processing by @yidong72 :: PR: #4377
  • Question answering bug fix by @Zhilin123 :: PR: #4381
  • Docs for NeMo Adapters by @titu1994 :: PR: #4369
  • Update NeMo docs by @titu1994 :: PR: #4397
  • Fixing import error in some cases by @borisfom :: PR: #4402
  • Fix tutorial typos and docs by @titu1994 :: PR: #4415
  • Add reconfigure on validation epoch start by @MaximumEntropy :: PR: #4393
  • Re-apply fixes from r1.9.0 by @redoctopus :: PR: #4425
  • Fix hanging issue by multiprocessing in SD tutorial and add ETA for VAD processing by @fayejf :: PR: #4405
  • Fix notebook text by @yidong72 :: PR: #4438
  • Update dialogue tutorial version by @Zhilin123 :: PR: #4437
  • Docs: Add table overflow handling by @nickolyamba :: PR: #4441
  • Docs: Decrease Font Size on Tables by @nickolyamba :: PR: #4444
  • Notebook bug fix: add subfolder by @ekmb :: PR: #4442
  • Fix typo in HiFi-GAN config's max steps by @redoctopus :: PR: #4446
  • Updated notebook to fix batch configuration and precision bugs by @vadam5 :: PR: #4447
  • fix branch in link by @ekmb :: PR: #4454
  • t5-rpe-fix targeting r1.10.0; raise exception for PP>2. by @khcs :: PR: #4469
  • Add kwargs to exact string match by @MaximumEntropy :: PR: #4479

v1.9.0

1 year ago

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.04

ASR

Changelog
  • Fix changed function name in offline vad asr notebeook by @fayejf :: PR: #4007
  • NeMo Adapters Support + ASR Adapters by @titu1994 :: PR: #3942
  • Update ASR configs with num_workers and pin_memory by @titu1994 :: PR: #4270
  • Verbose k2 install, skip if failed by @GNroy :: PR: #4289
  • Torch conversion for VAD-Diarization pipeline by @tango4j :: PR: #3930
  • Multiprocess improvements by @nithinraok :: PR: #4127

TTS

Changelog
  • Tn tts e by @ekmb :: PR: #3988
  • Remove AudioToCharWithPriorAndPitchDataset dependency from fastpitch by @subhankar-ghosh :: PR: #4008
  • Deprecation by @blisc :: PR: #4082
  • FastPitch FT notebook - Improving Speech Quality clarifications by @redoctopus :: PR: #3954

NLP / NMT

Changelog
  • Option to remove bias terms from Megatron transformers by @MaximumEntropy :: PR: #3973
  • Add NMT method to translate with TN/ITN pre/post-processing by @MaximumEntropy :: PR: #4009
  • Fix Punctuation and Capitalization model batching. An issue with shuffling. by @PeganovAnton :: PR: #4050
  • Fix GPT model parallel eval by @yidong72 :: PR: #4054
  • Updating with main by @jpilaul :: PR: #4073
  • Cherry-pick fix for megatron ckpt conversion script when using BCP by @ericharper :: PR: #4089
  • Check implicit grad acc in GLUE dataset building by @MaximumEntropy :: PR: #4123
  • Fix/punctuation avoid overwritting tmp files by @PeganovAnton :: PR: #4144
  • Fix/punctuation/trainer required for setting test data by @PeganovAnton :: PR: #4199
  • Raise error if bicleaner is not installed in NMT Data preprocesing notebook by @MaximumEntropy :: PR: #4264
  • Fix epoch end for NeMo NMT by @MaximumEntropy :: PR: #4265
  • Update YAML with trainer.benchmark=False for NLP by @MaximumEntropy :: PR: #4261
  • Add NMT method to translate with TN/ITN pre/post-processing by @MaximumEntropy :: PR: #4009
  • Continuous prompt refactor by @vadam5 :: PR: #3877
  • T5 finetuning for generic small text-to-text datasets by @MaximumEntropy :: PR: #4032

Text Normalization / Inverse Text Normalization

Changelog
  • Tn special text support by @yzhang123 :: PR: #3969
  • Tn update numbers by @yzhang123 :: PR: #3992
  • Tn tts e by @ekmb :: PR: #3988
  • Itn vi by @yzhang123 :: PR: #4029
  • Refactor tn data folder, and update of measure by @yzhang123 :: PR: #4028
  • Remove conda dependency for tn by @yzhang123 :: PR: #4057
  • Tn electronic by @yzhang123 :: PR: #4053
  • ThutmoseTaggerModel, a new model for inverse text normalization by @bene-ges :: PR: #4011
  • Tutorial on ITN with Thutmose tagger and small fixes by @bene-ges :: PR: #4117
  • Cleaned up TN/ ITN doc by @yzhang123 :: PR: #4119
  • Update default for SH by @ekmb :: PR: #4135
  • Update ContextNet version by @titu1994 :: PR: #4207

NeMo Tools

Changelog
  • Added exception handling for audio player in SDE by @vsl9 :: PR: #4077

NeMo Core

Changelog
  • Support pre-extracted nemo checkpoint for restoration by @titu1994 :: PR: #4061
  • Fix type checking to be compatible with named tuples by @artbataev :: PR: #3986
  • Update num worker calculation due to PTL flag changes by @redoctopus :: PR: #4056
  • Refresh NeMo documentation to Sphinx Book Theme by @titu1994 :: PR: #3996
  • Generalize adapter merge strategy for future adapters by @titu1994 :: PR: #4091

General Improvements

Changelog
  • Fix Punctuation and Capitalization model batching. An issue with shuffling. by @PeganovAnton :: PR: #4050
  • Fix restoring from checkpoint for case when is provided by @PeganovAnton :: PR: #4136
  • Fix/punctuation avoid overwritting tmp files by @PeganovAnton :: PR: #4144
  • Fix/punctuation/trainer required for setting test data by @PeganovAnton :: PR: #4199
  • Ability to set log_prediction to false by @bmwshop :: PR: #3929
  • Glu activation variants by @MaximumEntropy :: PR: #3951
  • Ranking merge by @yzhang123 :: PR: #3906
  • Fix path in doc by @nithinraok :: PR: #3979
  • Adding fisher audio conversion script from old NeMo branch by @jbalam-nv :: PR: #3991
  • improvements to geet_commonvoice_data script by @bmwshop :: PR: #3999
  • Bugfix and variable name change for clustering code by @tango4j :: PR: #4023
  • Exp manager log rank 0 only arguments by @MaximumEntropy :: PR: #4026
  • Force import test on PR by @titu1994 :: PR: #4037
  • Drop support for kaldi-io by @titu1994 :: PR: #4042
  • Cherry pick HF integration and bug fixes from 1.8.1 by @ericharper :: PR: #4052
  • Make saving prompt encoder embeddings non-configurable by @vadam5 :: PR: #4071
  • Replace sampled tokens with EOD after EOD has been sampled once by @vadam5 :: PR: #4070
  • Added answer only loss for prompt learning by @vadam5 :: PR: #4069
  • added stacking suport to conformer. by @VahidooX :: PR: #4045
  • Update LJSpeech whitelist file path by @redoctopus :: PR: #4078
  • Added check for microbatch calculator by @vadam5 :: PR: #4043
  • Prompt Learning Docs by @vadam5 :: PR: #4046
  • Fix link to prompt tuning page by @SeanNaren :: PR: #4081
  • Add docs for by @titu1994 :: PR: #4079
  • Dialogue task by @Zhilin123 :: PR: #3884
  • RMSNorm, Normformer and fixes from merging 1.8.0 into main by @MaximumEntropy :: PR: #4048
  • Correct link to PTL by @titu1994 :: PR: #4088
  • Added encoder and decoder modules for RETRO model by @yidong72 :: PR: #4038
  • Upgrade container to NGC PyTorch 22.04 by @ericharper :: PR: #4085
  • Tarred fix label models by @nithinraok :: PR: #4092
  • Fix link to tutorial in dialogue docs by @Zhilin123 :: PR: #4093
  • Prompt learning Notebook by @vadam5 :: PR: #4031
  • Add more papers by @yzhang123 :: PR: #4097
  • Ignore speakers with few utterances by @nithinraok :: PR: #3722
  • Access mixin by @sam1373 :: PR: #4098
  • Add CharParser for Cyrillic letters by @karpov-nick :: PR: #4101
  • Restored tests previously disabled for 22.03 base by @borisfom :: PR: #4109
  • Add augmentation to label models by @nithinraok :: PR: #4113
  • Fix register artifacts by @ramanathan831 :: PR: #4116
  • Fix typo by @yzhang123 :: PR: #4140
  • bug_fix_diarization_manifest_creation by @yzhang123 :: PR: #4125
  • Tacotron2 retrain by @treacker :: PR: #4103
  • WaveGlow input type fixes by @redoctopus :: PR: #4151
  • Notebooks' link, typo and import fix by @fayejf :: PR: #4158
  • Thutmose tagger bug fixes by @bene-ges :: PR: #4162
  • Update speaker docs by @nithinraok :: PR: #4164
  • Set plugin to None when no apex by @ekmb :: PR: #4171
  • Fix doc by @yzhang123 :: PR: #4152
  • Small import name fix by @fayejf :: PR: #4180
  • Rename folder VAD -> vad by @fayejf :: PR: #4163
  • Fix the server key value problem in the notebook by @yidong72 :: PR: #4196
  • Pin omegaconf for r1.9.0 by @ericharper :: PR: #4195
  • Fix cherrypicks by @titu1994 :: PR: #4204
  • Fix bugs for dialogue tutorial by @Zhilin123 :: PR: #4211
  • Tacotron2 1.9.0 bugfixes by @redoctopus :: PR: #4209
  • Add docs for Thutmose Tagger by @bene-ges :: PR: #4173
  • Dialogue tutorial fix by @Zhilin123 :: PR: #4221
  • Fix syntax error in ipynb-file by @bene-ges :: PR: #4228
  • Fix JSON serialization problem by @yidong72 :: PR: #4235
  • Prompt Learning Typo Fixes by @vadam5 :: PR: #4238
  • Fixing bug 3642622 by @pasandi20 :: PR: #4250
  • Fix broken link in the tutorial by @bene-ges :: PR: #4257
  • Prompt learning notebook bugfix by @vadam5 :: PR: #4262
  • Fix missing validation dataset, whitelist certain keywords for datasets by @titu1994 :: PR: #4269
  • Set Save on train end to false by @vadam5 :: PR: #4274
  • Updated config to fix CI test OOM error by @vadam5 :: PR: #4279
  • Changed total virtual prompt tokens by @vadam5 :: PR: #4295

v1.8.2

2 years ago

Known Issues

  • Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.

TTS

  • Fastpitch Tutorial fix by @subhankar-ghosh :: PR: #4044

v1.8.1

2 years ago

Known Issues

  • Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.

TTS

  • Restore_buffer bug fix and update NeMo checkpoint URL by @subhankar-ghosh :: PR: #4041

Hugging Face Hub Integration

  • Add support for Huggingface Hub to NeMo by @titu1994 :: PR: #4030

Bug Fixes

  • Added apex import guard back
  • Patch commons.py by @ericharper :: PR: #4039
  • Fixing pretrained name by @borisfom :: PR: #4022
  • Add back Citrinet zh by @titu1994 :: PR: #4040

v1.8.0

2 years ago

Known Issues

Issues
  • Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.
  • pytest for Vietnamese inverse text normalization are failing. Fixed in main

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.03

ASR

Changelog
  • ASR SSL Update by @sam1373 :: PR: #3714
  • Polylang asr by @bmwshop :: PR: #3721
  • Test grad accumulation for RNNT loss by @titu1994 :: PR: #3731
  • Add readme files describing model execution flow for ASR tasks by @titu1994 :: PR: #3812
  • add fr asr ckpt to doc by @yzhang123 :: PR: #3809
  • Fix asr tests in 22.02 by @titu1994 :: PR: #3823
  • Add new pretrained Spanish ASR models by @erastorgueva-nv :: PR: #3830
  • Documentation updates for ASR by @titu1994 :: PR: #3846
  • Offline VAD+ASR tutorial by @fayejf :: PR: #3828
  • Added Hindi and Marathi Models in Nemo pretrained ASR_CTC_BPE models … by @meghmak13 :: PR: #3856
  • Add a missing line to ASR_with_NeMo.ipynb by @lifefeel :: PR: #3908
  • Multilang asr models by @bmwshop :: PR: #3907
  • added stt_en_conformer_transducer_large_ls to NGC by @VahidooX :: PR: #3920
  • Fix DALI test on 22.03 by @titu1994 :: PR: #3911
  • Adding RNN encoder for LSTM-Transducer and LSTM-CTC models by @VahidooX :: PR: #3886
  • Fix issue with Segfault in ASR models by @titu1994 :: PR: #3956
  • Added Mandarin pretrained Conformer-Transducer-Large model trained on AISHELL2. by @VahidooX :: PR: #3970

TTS

Changelog
  • Bump TTS deprecation version to 1.9 by @blisc :: PR: #3955
  • Add pinned pynini and scipy installs to TTS training tutorial by @redoctopus :: PR: #3967
  • Compatability override to load_state_dict for old TTS checkpoints by @redoctopus :: PR: #3978

NLP / NMT

Changelog
  • Use worker processes for data preprocessing by @crcrpar :: PR: #3665
  • Set find_unused_parameters to False in GPT example script by @ericharper :: PR: #3837
  • GPT multinode eval by @ericharper :: PR: #3821
  • Fix MegatronPretrainingRandomSampler by taking into account by @crcrpar :: PR: #3826
  • Add slot filling into DST Generative model by @Zhilin123 :: PR: #3695
  • Disable nvfuser for gpt by @ericharper :: PR: #3845
  • Multi-Label Joint Intent Slot Classification by @chenrichard10 :: PR: #3742
  • fix bug in intent/slot model reloading by @carolmanderson :: PR: #3874
  • Make test_gpt_eval unit test less strict by @yidong72 :: PR: #3898
  • Comment gpt resume ci test by @MaximumEntropy :: PR: #3901
  • Neural Machine Translation with Megatron Transformer Models (Tensor Parallel and Tarred Datasets Only) by @MaximumEntropy :: PR: #3861
  • Megatron support by @ramanathan831 :: PR: #3893
  • Populate the GPT/BERT with uploaded models by @yidong72 :: PR: #3885
  • Megatron BART by @michalivne :: PR: #3666
  • Additional Japanese processor for NMT that uses MeCab segmentation. Fix for BLEU in one-many NMT by @MaximumEntropy :: PR: #3889
  • NMT GRPC sever URL fix by @MaximumEntropy :: PR: #3918
  • Megatron legacy conversion support by @ramanathan831 :: PR: #3919
  • Update max_epochs on megatron configs by @ericharper :: PR: #3958
  • Fix NMT variable passing bug by @aklife97 :: PR: #3985
  • Fix nemo megatron restore with artifacts by @ericharper :: PR: #3997
  • Fix megatron notebook by @ramanathan831 :: PR: #4004
  • Megatron work-arounds by @borisfom :: PR: #3998
  • Add T5 model P-tuning support by @yidong72 :: PR: #3768
  • Make index mappings dir configurable by @ericharper :: PR: #3868
  • T5 pipeline parallel by @MaximumEntropy :: PR: #3750

Text Normalization / Inverse Text Normalization

Changelog
  • Tn es by @bonham79 :: PR: #3632
  • Fix single GPU training issue + change deprecated Lightning args by @aklife97 :: PR: #4010

Export

Changelog
  • Conformer WARs for TRT8.2 by @borisfom :: PR: #3787
  • bert_module: fix inputs of export model by @virajkarandikar :: PR: #3815
  • Exports 22.03 war by @borisfom :: PR: #3957

Bugfixes

Changelog
  • patch librosa deprecation and fix by @fayejf :: PR: #3818

General Improvements

Changelog
  • Pynini pip by @yzhang123 :: PR: #3726
  • upgrade PTL trainer flags by @nithinraok :: PR: #3589
  • Updated Speech Data Explorer by @vsl9 :: PR: #3710
  • Fix spelling error in num_workers parameter to actually set number of dataset workers specified in yaml configs by @themikem :: PR: #3800
  • Support for Camembert Huggingface bert-like models by @itzsimpl :: PR: #3799
  • Update to 22.02 by @ericharper :: PR: #3771
  • Fixing the defaults of conformer models in the config files by @VahidooX :: PR: #3836
  • Fix T5 Encoder Mask while decoding by @MaximumEntropy :: PR: #3838
  • fix: multilingual transcribe does not require lang id param by @bmwshop :: PR: #3833
  • Misc improvements by @titu1994 :: PR: #3843
  • Change container by @MaximumEntropy :: PR: #3844
  • Making gender assignment random for cardinals, fractions, and decimal… by @bonham79 :: PR: #3759
  • Jenkinsfile test changes by @chenrichard10 :: PR: #3879
  • Adding a RegEx tokenizers by @michalivne :: PR: #3839
  • enable bias+dropout+add fusion with nvfuser at inference by @erhoo82 :: PR: #3869
  • Add text_generation_util to support TopK, TopP sampling + Tabular Data Generation. by @yidong72 :: PR: #3834
  • Ptl requirements bound by @MaximumEntropy :: PR: #3903
  • doc links update by @ekmb :: PR: #3891
  • add citations by @yzhang123 :: PR: #3902
  • Update NeMo CI to 22.03 by @MaximumEntropy :: PR: #3900
  • Add domain groups to changelog builder by @titu1994 :: PR: #3904
  • add input threshhold by @yzhang123 :: PR: #3913
  • improvements to commonvoice data script by @bmwshop :: PR: #3892
  • fixes to the cleanup flag by @bmwshop :: PR: #3921
  • Upgrade to PTL 1.6.0 by @ericharper :: PR: #3890
  • JSON output from diarization now includes sentences. Optimized senten… by @demsarjure :: PR: #3897
  • Stateless timer fix for PTL 1.6 by @MaximumEntropy :: PR: #3925
  • fix save_best missing chpt bug, update for setup_tokenizer() changes by @ekmb :: PR: #3932
  • Fix tarred sentence dataset length by @MaximumEntropy :: PR: #3941
  • remove old doc by @ekmb :: PR: #3946
  • Fix issues with librosa deprecations by @titu1994 :: PR: #3950
  • Fix notebook bugs for branch r1.8.0 by @yidong72 :: PR: #3948
  • Fix global batch fit loop by @ericharper :: PR: #3936
  • Refactor restorefrom by @ramanathan831 :: PR: #3927
  • Fix variable name and move models to CPU in Change partition by @aklife97 :: PR: #3972
  • Fix notebook error by @yidong72 :: PR: #3975
  • Notebook Bug Fixes for r1.8.0 by @vadam5 :: PR: #3989
  • Fix compat override for TalkNet Aligner by @redoctopus :: PR: #3993
  • docs fixes by @ekmb :: PR: #3987
  • Fixes val_check_interval, skip loading train data during eval by @MaximumEntropy :: PR: #3968
  • LogProb calculation performance fix by @yidong72 :: PR: #3984
  • Fix P-Tune T5 model by @yidong72 :: PR: #4001
  • Fix the broadcast shape mismatch by @yidong72 :: PR: #4017
  • Add known issues to notebook by @ericharper :: PR: #4024