NeMo Versions Save

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

v1.15.0

1 year ago

Highlights

NeMo ASR

HybridTransducer-CTC ASR
Greedy timestamp decoding with inference script
MHA adapters
Conformer local attention (longformer)
High level beam search API
Multiblank transducer
Multi-channel audio processing model
AIstore for ASR datasets

NeMo Megatron

ALiBi position embeddings support for T5

NeMo TTS

Chinese TTS pipeline with polyphone disambiguation

NeMo Core

Optimizer based EMA
MLFlow logger support

Models

stt_eo_conformer_ctc_large (HF, NGC) Esperanto ASR model.
stt_eo_conformer_transducer_large (HF, NGC) Esperanto ASR model.

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.12

ASR

Changelog

optimized loop and bugfix by @Jorjeous :: PR: #5573
Update torchmetrics by @nithinraok :: PR: #5566
Add an option to defer data setup from init to setup by @anteju :: PR: #5569
AIStore for ASR datasets by @anteju :: PR: #5462
Add support for MHA adapters to ASR by @titu1994 :: PR: #5396
Update documentation and tutorials for Adapters by @titu1994 :: PR: #5610
Conformer local attention by @sam1373 :: PR: #5525
Add core classes and functions for online clustering diarizer part 1 by @tango4j :: PR: #5526
[Add] ASR+VAD Inference Pipeline by @stevehuang52 :: PR: #5575
[ASR] Audio processing base, multi-channel enhancement models by @anteju :: PR: #5356
Expose ClusteringDiarizer device by @SeanNaren :: PR: #5681
Add Beam Search support to ASR transcribe() by @titu1994 :: PR: #5443
Multiblank Transducer by @hainan-xv :: PR: #5527
pin torchmetrics version by @nithinraok :: PR: #5720
Update torchaudio dependency version for tutorials by @titu1994 :: PR: #5781
update torchmetrics to latest version by @nithinraok :: PR: #5801
Fix transducer and question answering tutorial bugs bugs by @Zhilin123 :: PR: #5809
[BugFix] Updated CTC decoders installation in tutorial by @vsl9 :: PR: #5833
update torchmetrics args confusionmatrix by @nithinraok :: PR: #5853
indentation fix by @nithinraok :: PR: #5861
Fix wrong label mapping in batch_inference for label_model by @fayejf :: PR: #5767

TTS

Changelog

Add support for MHA adapters to ASR by @titu1994 :: PR: #5396
[TTS] fix ranges of char set for accented letters. by @XuesongYang :: PR: #5607
[TTS] add type hints and change varialbe names for tokenizers and g2p by @XuesongYang :: PR: #5602
Fixed RadTTS unit test by @borisfom :: PR: #5572
[TTS][ZH] Disambiguate polyphones with augmented dict and Jieba segmenter for Chinese FastPitch by @yuekaizhang :: PR: #5541
Add duration padding support for RADTTS inference by @kevjshih :: PR: #5650
[TTS] add tts dict cust notebook by @ekmb :: PR: #5662
[TN/TTS docs] TN customization, g2p docs moved to tts by @ekmb :: PR: #5683
typo and link fixed by @ekmb :: PR: #5741
link fixed by @ekmb :: PR: #5745
Update Tacotron2 NGC checkpoint load to latest version by @redoctopus :: PR: #5760
Docs g2p update by @ekmb :: PR: #5769
[TTS][ZH] bugfix import jieba errors. by @XuesongYang :: PR: #5776

NLP / NMT

Changelog

Text generation improvement (UI client, data parallel support) by @yidong72 :: PR: #5437
O2 style amp for gpt3 ptuning by @JimmyZhang12 :: PR: #5246
Add support for MHA adapters to ASR by @titu1994 :: PR: #5396
Bert interleaved by @shanmugamr1992 :: PR: #5556
Port stateless timer to exp manager by @MaximumEntropy :: PR: #5584
Add interface for making amax reduction optional for FP8 by @ksivaman :: PR: #5447
Propagate attention_dropout flag for GPT-3 by @mikolajblaz :: PR: #5669
Enc-Dec model size reporting fixes by @MaximumEntropy :: PR: #5623
Add prompt learning tests by @arendu :: PR: #5649
Fix missing torchelastic fixes for PTL 1.8 by @MaximumEntropy :: PR: #5691
ALiBi Positional Embeddings by @michalivne :: PR: #5467
Megatron export triton update by @Davood-M :: PR: #5766
Fix transducer and question answering tutorial bugs bugs by @Zhilin123 :: PR: #5809
Update description for question answering tutorial by @Zhilin123 :: PR: #5814
TPMLP for T5-based models by @Davood-M :: PR: #5840
Megatron positional encoding alibi fix by @michalivne :: PR: #5808

Export

Changelog

Add keep_initializers_as_inputs to _export method by @pks :: PR: #5731
Megatron export triton update by @Davood-M :: PR: #5766

General Improvements

Changelog

Update to pytorch 22.12 container by @ericharper :: PR: #5694
optimized loop and bugfix by @Jorjeous :: PR: #5573
Expose ClusteringDiarizer device by @SeanNaren :: PR: #5681
remove useless files. by @XuesongYang :: PR: #5580
[Fix] setup_multiple validation/test data by @anteju :: PR: #5585
Move to optimizer based EMA implementation by @SeanNaren :: PR: #5169
[Temp workaround] Disable test with cache_audio to unblock CI by @anteju :: PR: #5615
[EMA] Change success message to reduce confusion by @SeanNaren :: PR: #5621
Temporarily disable prompt learning CI tests by @ericharper :: PR: #5633
[Dockerfile] Remove AIS archive from docker image by @anteju :: PR: #5629
[workflow] add exclude labels option to ignore cherry-picks in releas… by @XuesongYang :: PR: #5645
Add DLLogger support to exp_manager by @milesial :: PR: #5658
Fix EMA restart by allowing device to be set by the class init by @SeanNaren :: PR: #5668
Remove SDP (moved to separate repo) - merge to main by @erastorgueva-nv :: PR: #5630
temp disable speaker recognision CI test by @fayejf :: PR: #5696
Don't print exp_manager warning when max_steps == -1 by @milesial :: PR: #5725
Add tabular data generation documents to the index file by @yidong72 :: PR: #5733
fix token id bug by @yidong72 :: PR: #5777
Update numpy requirements from 1.21 to 1.22 by @Zhilin123 :: PR: #5785
Fix setuptools to usable version by @titu1994 :: PR: #5798
add apt-get upgrade -y in dockerfile by @fayejf :: PR: #5817
Update NeMo Multi-Run docs by @titu1994 :: PR: #5844
add ambernet to readme by @fayejf :: PR: #5872
update apex install instructions for 1.15 by @ericharper :: PR: #5901

v1.14.0

1 year ago

Highlights

NeMo ASR

Hybrid CTC + Transducer loss ASR #5364
Sampled Softmax RNNT (Enables large vocab RNNT, for speech translation and multilingual ASR) #5216
ASR Adapters hyper parameter search scripts #5159
RNNT {ONNX, TorchScript} x GPU export infer #5248
Exportable MelSpectrogram (TorchScript) #5512
Audio To Audio Dataset Processor #5196
Multi Channel Audio Transcription #5479
Silence Augmentation #5476

NeMo Megatron

Support for the Mixture of Experts for T5
Fix PTL model size output for GPT-3 and BERT
BERT with Tensor Parallelism & Pipeline Parallel Support

NeMo Core

Hydra Multirun core support + NeMo HP optim in YAML #5159

NeMo Models

TTS Zh Fastpitch HifiGan SFSpeech

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.11

ASR

Changelog

[Tools][ASR] Tool for generating data using simulated RIRs by @anteju :: PR: #5158
Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
Add Gradio App to ASR Docs by @titu1994 :: PR: #5270
Add support for Sampled Softmax for RNNT Joint by @titu1994 :: PR: #5216
Speed up HF data processing script for ASR by @titu1994 :: PR: #5330
bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
Add cpWER for evaluation of ASR with diarization by @tango4j :: PR: #5279
Fix for getting tokenizer in character-based ASR models when using tarred dataset by @jonghwanhyeon :: PR: #5442
Refactor/unify ASR offline and buffered inference by @fayejf :: PR: #5440
Standalone diarization+ASR evaluation script by @tango4j :: PR: #5439
[ASR] Transcribe for multi-channel signals by @anteju :: PR: #5479
Add Silence Augmentation by @fayejf :: PR: #5476
add exportable mel spec by @1-800-BAD-CODE :: PR: #5512
add RNN-T loss implemented by PyTorch and test code by @hainan-xv :: PR: #5312
[ASR] AudioToAudio datasets and related test by @anteju :: PR: #5196
Add StreamingFeatureBufferer class for real-life streaming decoding by @tango4j :: PR: #5534
Pool stats with padding by @1-800-BAD-CODE :: PR: #5403
Adding Hybrid RNNT-CTC model by @VahidooX :: PR: #5364
Fix ASR Buffered inference scripts by @titu1994 :: PR: #5552
Add wer details - insertion, deletion, substitution rate by @fayejf :: PR: #5557
Add support for Time Stamp calculation using transcribe_speech.py by @titu1994 :: PR: #5568
[STT] Add Esperanto (Eo) ASR Conformer-CTC and Conformer-Transducer models by @andrusenkoau :: PR: #5639

TTS

Changelog

[TTS] Fastpitch energy condition and refactoring by @subhankar-ghosh :: PR: #5218
[TTS] HiFi-TTS Download Script by @oleksiivolk :: PR: #5241
[TTS] Add Mandarin/English Bilingual Recipe for Training Fastpitch Models by @yuekaizhang :: PR: #5208
[TTS] fixed type of filepath and rename openslr. by @XuesongYang :: PR: #5276
[TTS] replace obsolete torch_tts unit test marker with run_only_on('CPU') by @XuesongYang :: PR: #5307
[TTS] bugfix IPAG2P and refactor to remove duplicate process. by @XuesongYang :: PR: #5304
Update path to get_data.py in TTS tutorial by @redoctopus :: PR: #5311
[TTS] Replace IPA lambda arguments with locale string by @rlangman :: PR: #5298
[TTS] expand to support flexible dictionary entry formats in IPAG2P. by @XuesongYang :: PR: #5318
[TTS] update organization of model checkpoints and their pointers. by @XuesongYang :: PR: #5327
[TTS] bugfix for the script of generating mels from fastpitch. by @XuesongYang :: PR: #5344
[TTS] Add Spanish model documentation by @rlangman :: PR: #5390
[TTS] Add Spanish FastPitch training configs by @rlangman :: PR: #5383
[TTS] replace pitch normalization params with ??? by @XuesongYang :: PR: #5392
[TTS] Create script for processing TTS training audio by @rlangman :: PR: #5262
[TTS] remove useless logic for set_tokenizer. by @XuesongYang :: PR: #5430
[TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue by @borisfom :: PR: #5358
JOC Optimization in FastPitch by @subhankar-ghosh :: PR: #5450
[TTS] Support speaker level pitch normalization by @rlangman :: PR: #5455
TTS tutorial update: use speaker 9017 instead of 6097 by @redoctopus :: PR: #5532
[TTS] Remove unused TTS eval function by @redoctopus :: PR: #5605
[TTS][ZH] add fastpitch and hifigan model NGC urls and update NeMo docs. by @XuesongYang :: PR: #5596
[TTS][DOC] add notes about automatic conversion to target sampling ra… by @XuesongYang :: PR: #5624
[TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
[TTS][ZH] bugfix for ngc cli installation. by @XuesongYang :: PR: #5652
[TTS][ZH] fix broken link for the script. by @XuesongYang :: PR: #5666

NLP / NMT

Changelog

Option to pad the last validation input sequence if its smaller than the encoder sequence length for MegatronGPT by @anmolgupt :: PR: #5243
Fixes bugs with loss averaging with for Megatron GPT by @shanmugamr1992 :: PR: #5329
Fixing bug in Megatron BERT when loss mask is all zeros by @shanmugamr1992 :: PR: #5424
support to disable sequence length + 1 input tokens for each sample in MegatronGPT by @anmolgupt :: PR: #5363
[TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414
Bug fix/gpt by @shanmugamr1992 :: PR: #5493
prompt tuning fix for unscale grad errors by @arendu :: PR: #5523
Bert sequence parallel support by @shanmugamr1992 :: PR: #5494
NLP docs fixes by @vsl9 :: PR: #5528
Switch order of args in optimizer_step override by @ericharper :: PR: #5549
Upgrade to 22.11 by @ericharper :: PR: #5550
Merge r1.13.0 main by @ericharper :: PR: #5570
some tokenizers do not have additional_special_tokens_ids attribute by @arendu :: PR: #5642
Remove cell output from tutorial by @ericharper :: PR: #5689

Text Normalization / Inverse Text Normalization

Changelog

[ITN] fix year date graph, cardinals extension for hundreds by @ekmb :: PR: #5435
[TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414

Export

Changelog

Fixed the onnx bug in conformer for non-streaming models. by @VahidooX :: PR: #5242
Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
Fixes for Conformer-xl export by @borisfom :: PR: #5309
Remove onnx graphsurgery from Dockerfile by @titu1994 :: PR: #5320
add exportable mel spec by @1-800-BAD-CODE :: PR: #5512

General Improvements

Changelog

bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
Fix setting up of learning rate scheduler by @PeganovAnton :: PR: #5444
Better patch hydra by @titu1994 :: PR: #5591
[TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
Add fully torch.jit.script-able speaker clustering module by @tango4j :: PR: #5191
Update perturb.py by @stevehuang52 :: PR: #5231
remove CV requirements. by @XuesongYang :: PR: #5233
checks for accepted adapter type at module level by @arendu :: PR: #5194
fix hypotheses return by @nithinraok :: PR: #5253
Support for inserting additional subsampling in conformer encoder by @shan18 :: PR: #5224
update tutorials to use meeting config as default and VAD by @nithinraok :: PR: #5237
Specifying audio signal dropout separately for the Conformer Encoder by @shan18 :: PR: #5263
created by @bmwshop :: PR: #5268
Fix failing speaker counting for short audio samples by @tango4j :: PR: #5267
O2bert + apex pipeline functions by @shanmugamr1992 :: PR: #5221
Upperbound PTL by @titu1994 :: PR: #5302
Update Interface(s) phonetic entry by @blisc :: PR: #5212
add label inference support to EncDecSpeakerLabel class by @nithinraok :: PR: #5278
Add italian model checkpoints by @Kipok :: PR: #5315
Text Memmap Parsing Improvements by @michalivne :: PR: #5265
Update librosa signature in HF processing script by @titu1994 :: PR: #5321
Force wav file format for audio_filepath by @titu1994 :: PR: #5323
Updates to T0 Dataset and Model by @MaximumEntropy :: PR: #5201
[DOC] add sphinx-copybutton requirement to copy button on code snippets. by @XuesongYang :: PR: #5326
Add support for Hydra multirun to NeMo by @titu1994 :: PR: #5159
typo fix by @arendu :: PR: #5328
add precommit hood to automatic sort entries in requirements. by @XuesongYang :: PR: #5333
Add speaker clustering arguments to forward function by @tango4j :: PR: #5306
Fixing de-autocast by @borisfom :: PR: #5319
[Bugfix] Added rm -f / wget- nc command to avoid bash error in multispeaker sim notebook by @tango4j :: PR: #5292
[DOC] added ipython dependency to support IPython.sphinxext extension by @XuesongYang :: PR: #5345
Bug fix (removing old compute consumed samples) by @shanmugamr1992 :: PR: #5355
removed uninstall nemo_cv and nemo_simple_gan and relax numba version… by @XuesongYang :: PR: #5332
Enable mlflow logger by @whrichd :: PR: #4893
Fix Python type hints according to Python Docs by @artbataev :: PR: #5370
Distributed optimizer support for BERT by @timmoon10 :: PR: #5305
SpeakerClustering: fix tensor dimennsions in forward() by @virajkarandikar :: PR: #5387
add squad by @arendu :: PR: #5407
added python and c++ alignment code by @yzhang123 :: PR: #5346
Add MoE support for T5 model (w/o expert parallel) by @aklife97 :: PR: #5409
Fix for concat map dataset by @1-800-BAD-CODE :: PR: #5133
Support for finetuning and finetuning inference with .ckpt files & batch size refactoring by @MaximumEntropy :: PR: #5339
update doc in terms of get_label for lang id model by @fayejf :: PR: #5366
Debug support for interleaved pipeline parallelism with the distributed Adam optimizer by @timmoon10 :: PR: #5236
Create codeql.yml by @titu1994 :: PR: #5445
Update codeql.yml by @titu1994 :: PR: #5449
Fix support for legacy sentencepiece models by @Numeri :: PR: #5406
Update docs with Comparison tool info, and slightly change .sh for ea… by @Jorjeous :: PR: #5182
Add float32 type casting for get_samples function by @tango4j :: PR: #5399
Add missing import in transcribe_utils.py by @jonghwanhyeon :: PR: #5487
Add auto-labeler by @SeanNaren :: PR: #5498
Add more glob patterns for labeler by @SeanNaren :: PR: #5504
Fix issues with PL 1.8 by @SeanNaren :: PR: #5353
[BugFix] Removing tokens from decoding timestamp by @tango4j :: PR: #5481
Upperbound the torchmetrics version by @SeanNaren :: PR: #5537
Data parallel collect results by @michalivne :: PR: #5547
Fix log-rank-0-only logic by @mikolajblaz :: PR: #5555
Fixed Docker build by @borisfom :: PR: #5562
Patch hydra launch by @titu1994 :: PR: #5589
Fix race condition bug with hydra multirun by @titu1994 :: PR: #5594
Update Dockerfile to use numba==0.53.1 by @stevehuang52 :: PR: #5614
Fixed a missing import for gather_objects by @michalivne :: PR: #5622

v1.13.0

1 year ago

Highlights

NeMo ASR

Spoken Language Understanding (SLU) models based on Conformer encoder and transformer decoder
Support for codeswitched manifests during training
Support for Language ID during inference for ML models
Support of cache-aware streaming for offline models
Word confidence estimation for CTC & RNNT greedy decoding

NeMo Megatron

Interleaved Pipeline schedule
Transformer Engine for GPT
HF T5v1.1 -> NeMo-Megatron conversion and finetuning/p-tuning
IA3 and Adapter Tuning (Tensor + Pipeline Parallel)
Pipeline Parallel Support for T5 Prompt Learning
MegatronNMT export

NeMo TTS

TTS introductory tutorial
Phonemizer/espeak removal (Spanish/German)
Char-only support for Spanish/German models
Documentation Refactor

NeMo Core

Upgrade to NGC PyTorch 22.09 container
Add pre-commit hooks
Exponential moving average (EMA) of weights during training

NeMo Models

ASR Conformer Croatian: stt_hr_conformer_ctc_large and stt_hr_conformer_transducer_large
ASR Conformer Belarusian: stt_be_conformer_ctc_large and stt_be_conformer_transducer_large
ASR Squeezeformer Librispeech: 6 checkpoints (XS, S, SM, M, ML, L)
SLURP Intent Classification / Slot Filling: slu_conformer_transformer_large_slurp
LanguageID AmberNet: langid_ambernet

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.09

Known Issues

Issues

pytest for RadTTSModel_export_to_torchscript are failing intermittently due to random input values. Fixed in main.

ASR

Changelog

Add docs tutorial on kinyarwanda asr by @bene-ges :: PR: #4953
Asr codeswitch by @bmwshop :: PR: #4821
Add test for nested ASR model by @titu1994 :: PR: #5002
Greedy decoding confidence for CTC and RNNT by @GNroy :: PR: #4931
[ASR][Tools] RIR corpus generator by @anteju :: PR: #4927
Add Squeezeformer CTC model checkpoints on Librispeech by @titu1994 :: PR: #5121
adding loss normalization options to rnnt joint by @bmwshop :: PR: #4829
Asr concat dataloader by @bmwshop :: PR: #5108
Added ASR model comparison to SDE by @Jorjeous :: PR: #5043
Add scripts for converting Spoken Wikipedia to asr dataset by @bene-ges :: PR: #5138
ASR confidence bug fix for older Python versions by @GNroy :: PR: #5180
Update ASR Scores and Results by @titu1994 :: PR: #5254
[STT] Add Ru ASR Conformer-CTC and Conformer-Transducer by @ssh-meister :: PR: #5340

TTS

Changelog

[TTS] Adding speaker embedding conditioning in fastpitch by @subhankar-ghosh :: PR: #4986
[TTS] Remove PhonemizerTokenizer by @rlangman :: PR: #4990
[TTS] FastPitch speaker interpolation by @subhankar-ghosh :: PR: #4997
RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947
[TTS] remove phonemizer.py by @XuesongYang :: PR: #5090
[TTS] Add NeMo TTS Primer Tutorial by @rlangman :: PR: #4933
[TTS] Add SpanishCharsTokenizer by @rlangman :: PR: #5135
Fixes for docs/typos + remove max_utts parameter from tarred datasets as it causes hang in training by @Kipok :: PR: #5118
refactor TTS documentation organization and add new contents. by @XuesongYang :: PR: #5137
[TTS][DOC] update models trained on HifiTTS dataset. by @XuesongYang :: PR: #5173
[TTS] Fix TTS Primer image markup by @rlangman :: PR: #5192
[TTS] deprecate TextToWaveform base class. by @XuesongYang :: PR: #5205
[TTS] remove the avoidance of circular imports by @XuesongYang :: PR: #5214
[TTS] remove LinVocoder and apply Vocoder as parent class. by @XuesongYang :: PR: #5206
[TTS] unify requirements_tts.txt and requirements_torch_tts.txt by @XuesongYang :: PR: #5232
Minor typo fixes in TTS tutorial by @redoctopus :: PR: #5266
Radtts 1.13 by @borisfom :: PR: #5451
Radtts 1.13 plus by @borisfom :: PR: #5457

NLP / NMT

Changelog

IA3 support for GPT and T5 by @arendu :: PR: #4909
Fix and refactor consumed samples save/restore for Megatron models. by @MaximumEntropy :: PR: #5077
Remove unsupported arguments from MegatronNMT by @MaximumEntropy :: PR: #5065
Update megatron interface to dialogue by @Zhilin123 :: PR: #4936
gpt ia3 CI tests by @arendu :: PR: #5140
Fix NMT Eval Sampler by @aklife97 :: PR: #5154
Add interleaved pipeline schedule to GPT by @ericharper :: PR: #5025
fix for bug in bignlp by @arendu :: PR: #5172
Fixes some args that were not removed properly for multilingual Megatron NMT by @MaximumEntropy :: PR: #5142
Fix absolute path in GPT Adapter CI tests by @arendu :: PR: #5184
Add ability to configure drop last batch for validation datasets with MegatronGPT by @shanmugamr1992 :: PR: #5067
Megatron Export Update by @Davood-M :: PR: #5343
Fix GPT generation when using sentencepiece tokenizer by @MaximumEntropy :: PR: #5413
Disable sync_batch_comm in validation_step for GPT by @ericharper :: PR: #5397
Set sync_batch_comm=False in prompt learning and inference by @MaximumEntropy :: PR: #5448
Fix a bug with positional vs key-word based argument passing in the transformer layer by @MaximumEntropy :: PR: #5475

Text Normalization / Inverse Text Normalization

Changelog

[Chinese text normalization] speed up graph building by @pengzhendong :: PR: #5128

NeMo Tools

Changelog

Added ASR model comparison to SDE by @Jorjeous :: PR: #5043

Export

Changelog

Fix export bug by @VahidooX :: PR: #5009
RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947
Support TorchScript export for Squeezeformer by @titu1994 :: PR: #5164
Expose keep_initializers_as_inputs to Exportable class by @pks :: PR: #5052
Fix the self-attention export bug for cache-aware streaming Conformer by @VahidooX :: PR: #5114
replace ColumnParallelLinear with nn.Linear in export_utils by @arendu :: PR: #5217
Megatron Export Update by @Davood-M :: PR: #5343
Fix Conformer Export in 1.13.0 (cherry-pick from main) by @artbataev :: PR: #5446
export_utils bugfix by @Davood-M :: PR: #5480
Export fixes for Riva by @borisfom :: PR: #5496

General Improvements and Bugfixes

Changelog

don't use bfloat16 when in jit by @bmwshop :: PR: #5051
Set sync_batch_comm=False in prompt learning and inference by @MaximumEntropy :: PR: #5448
Fix a bug with positional vs key-word based argument passing in the transformer layer by @MaximumEntropy :: PR: #5475
Pin Transformers version to fix CI by @SeanNaren :: PR: #4955
Fix changelog builder (#4962) by @titu1994 :: PR: #4963
Checkpoint averaging class fix by @michalivne :: PR: #4946
Add ability to give seperate datasets for test, train and validation by @shanmugamr1992 :: PR: #4798
Add simple pre-commit file by @SeanNaren :: PR: #4983
Import pycuda.autoprimaryctx or pycuda.autoinit to init pycuda execut… by @liji-nv :: PR: #4951
Improvements to AMI script by @SeanNaren :: PR: #4974
clean warnings from tests and CI runs, and prepare for upgrade to PTL 1.8 by @nithinraok :: PR: #4830
Update libraries by @titu1994 :: PR: #5010
add close inactive issues and PRs github action. by @XuesongYang :: PR: #5015
Fix filename extraction in vad_utils.py by @GKPr0 :: PR: #4999
Add black to pre-commit by @SeanNaren :: PR: #5027
[CI] Enable previous build abort when new commit pushed by @SeanNaren :: PR: #5041
Tutorials and Docs for Multi-scale Diarization Decoder by @tango4j :: PR: #4930
Refactor output directory for MSDD Inference Notebook by @SeanNaren :: PR: #5044
text_memmap dataset index range testing fix by @michalivne :: PR: #5034
fix undefined constant in code example by @bene-ges :: PR: #5046
Text generation refactor and RETRO text generation implementation by @yidong72 :: PR: #4985
Lids by @bmwshop :: PR: #4820
Add datasets folder, add diarization datasets voxconverse/aishell by @SeanNaren :: PR: #5042
Fix the bugs in cache-aware streaming Conformer by @VahidooX :: PR: #5032
Bug fix - Limit val batches set to 1.0 by @shanmugamr1992 :: PR: #5023
[bug_fix] kv_channels is used when available by @arendu :: PR: #5066
Add spe_split_by_unicode_script arg by @piraka9011 :: PR: #5072
Transformer Engine Integration by @ericharper :: PR: #5104
Text memmap dataset index memory efficiency by @michalivne :: PR: #5056
Add NGC links for Aligner and FastPitch by @redoctopus :: PR: #5235
Fix link to inference notebook by @redoctopus :: PR: #5247
Fix links to speaker identification notebook by @SeanNaren :: PR: #5260
Fix bug into Dialogue tutorial by @Zhilin123 :: PR: #5277
PCLA tutorial typo fix by @jubick1337 :: PR: #5288
Fix dialogue tutorial bug by @Zhilin123 :: PR: #5297
small bugfix for r1.13.0 by @fayejf :: PR: #5310
Add italian model checkpoints by @Kipok :: PR: #5316
Pcla tutorial fixes by @jubick1337 :: PR: #5313
Fix issue with HF Model upload tutorial by @titu1994 :: PR: #5359
P&C LA tutorial fixes by @jubick1337 :: PR: #5354
Add SDP documentation by @erastorgueva-nv :: PR: #5274
[Bugfix] Added rm -f / wget- nc command in multispeaker sim notebook to r1.13.0 by @tango4j :: PR: #5375
Rename Speech Dataset Processor to Speech Data Processor by @erastorgueva-nv :: PR: #5378
fix for num worker 0 causing issues in losses after 1 epoch by @arendu :: PR: #5379
Fixed bug in notebook by @vadam5 :: PR: #5382
Force MHA QKV onto fp32 by @titu1994 :: PR: #5391
Fix for prompt table restore error by @vadam5 :: PR: #5393
Fix activation checkpoint args for T5 by @MaximumEntropy :: PR: #5410
Temporary hard code fix in PTL for CUDA Error by @yaoyu-33 :: PR: #5421
disable pc test by @ekmb :: PR: #5426
Revert Temporary hard code fix in PTL for CUDA Error by @yaoyu-33 :: PR: #5431
Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False by @MaximumEntropy :: PR: #5420
Add num layers check for full activation checkpointing by @MaximumEntropy :: PR: #5470
Cherry Pick T5 finetuning changes into 1.13 by @MaximumEntropy :: PR: #5478
T5 Eval bugfix by @Davood-M :: PR: #5521
added set_start_method + function param bugfix by @Davood-M :: PR: #5539
Remove notebook by @ericharper :: PR: #5548
Remove broadcast from T5 prompt learning inference by @MaximumEntropy :: PR: #5558
Fix all gather while writing to a file during T5 finetuning by @MaximumEntropy :: PR: #5561

v1.12.0

1 year ago

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.08

ASR

Changelog

Add support for RNNT Char/Word Timestamp Calculation by @titu1994 :: PR: #4665
add conditional logic to rnnt_wer to handle when arrays have no elements by @mgoldey :: PR: #4776
fix handling of the final word for rnnt word timestamps by @mgoldey :: PR: #4779
amend rnnt word timestamps by @mgoldey :: PR: #4782
fix type error in rnnt_wer.py, rnnt_wer_bpe.py, wer_bpe.py by @hainan-xv :: PR: #4822
add kab language asr models by @nithinraok :: PR: #4819
[Tutorial][ASR][Fix] Data paths in ASR with NeMo tutorial by @anteju :: PR: #4845
[ASR] Fix for multi-channel signals in AudioSegment by @anteju :: PR: #4824
[ASR] Generate multichannel noise by @anteju :: PR: #4870
Fix asr model order by @nithinraok :: PR: #4959
Fix ASR issues by @titu1994 :: PR: #4984
Fix diarization ASR inference link in notebook by @SeanNaren :: PR: #5016
Code switching by @KunalDhawan :: PR: #4784
Release SOTA Lang ID model by @fayejf :: PR: #5080
Stateless decoder for RNN-T by @hainan-xv :: PR: #4710

TTS

Changelog

[TTS] use consistent spline interpolation for fastpitch and hifigan. by @XuesongYang :: PR: #4679
TTS tokenizers moved to collections.common.tokenizers by @AlexGrinch :: PR: #4690
[TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781
ARP to IPA mapping, g2p_encode for IPATokenizer by @ekmb :: PR: #4850
IPA G2P bugfixes by @redoctopus :: PR: #4869
[TTS] add missing WikiHomograph data entries to CMUdict, updates to match new ipa set by @ekmb :: PR: #4886
[TTS] fix wrong g2p path. by @XuesongYang :: PR: #4902
[TTS] FastPitch training: speed up align_prior_matrix calculation by @racoiaws :: PR: #4718
[TTS] fix broken tutorial for MixerTTS. by @XuesongYang :: PR: #4949
[TTS] bugfix 'EnglishPhonemesTokenizer' object has no attribute 'encode_from_g2p' by @XuesongYang :: PR: #4992
[TTS] added missing German phoneme tokenizer by @XuesongYang :: PR: #5070
[TTS] fixed wrong val loss for epoch 0 and inconsistent metrics names by @XuesongYang :: PR: #5087

NLP / NMT

Changelog

Fix bug intent slot classification tokenizer to dialogue by @Zhilin123 :: PR: #4694
Intent slot model onnx export test by @Zhilin123 :: PR: #4731
Fix megatron p tuning notebook by @nithinraok :: PR: #4741
Add support for Apex distributed Adam optimizer with GPT-3 by @timmoon10 :: PR: #4487
Fixes NLPModel's load from checkpoint due to PTL private function changes by @MaximumEntropy :: PR: #4755
Adapter tuning for Megatron GPT models by @arendu :: PR: #4717
Megatron Encoder Decoder models with RPE and PP > 2 by @MaximumEntropy :: PR: #4663
add kab language asr models by @nithinraok :: PR: #4819
add chinese to language doc and fix bug by @yzhang123 :: PR: #4834
Spoken Language Identification by @fayejf :: PR: #4846
Fix decoding bug for megatron enc-dec models with O2 by @MaximumEntropy :: PR: #4989
Updating Megatron LM conversion according to PTL 1.7 by @Davood-M :: PR: #5038
Adding RETRO model Faiss sharding index and KNN sharding index by @yidong72 :: PR: #4713
MLP Prompt Learning Encoder by @vadam5 :: PR: #4849
Update the prompt learning to handle large lanague model by @yidong72 :: PR: #4906

Text Normalization / Inverse Text Normalization

Changelog

[TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781
[Chinese text normalization]Chinese TN part in text_normalization by @mzxcpp :: PR: #4826
Fix zh tn by @yzhang123 :: PR: #5035
Bug fixes for parallel mp3 to wav conversion, PC notebook, update Readme for TN requirements by @ekmb :: PR: #5047
Added P&C lexical audio model by @jubick1337 :: PR: #4802

Export

Changelog

Intent slot model onnx export test by @Zhilin123 :: PR: #4731

General Improvements

Changelog

Fix logger reference by @SeanNaren :: PR: #4786
Fix error with class method reference in msdd by @SeanNaren :: PR: #4865
Add sync for logging calls to ensure aggregation across devices by @SeanNaren :: PR: #4876
Fix saving the last checkpoint when using val check interval by @SeanNaren :: PR: #4905
Add support for skipping validation on resume + extend saving last ckpt test by @SeanNaren :: PR: #4922
Move trainer calls for ssl models to training and validation steps only by @sam1373 :: PR: #4685
Change Num Partitions size expansion fix by @aklife97 :: PR: #4719
upgrade to PTL 1.7 by @nithinraok :: PR: #4672
Fixing outputs of infer() and use of NeMo length regulator helper by @borisfom :: PR: #4724
bug fix: enable async grad reduction when DP > 1 by @erhoo82 :: PR: #4740
Add LayerNorm1P, weight decay for LN and unscaled initialization by @mikolajblaz :: PR: #4743
Data Simulator by @chooper1 :: PR: #4686
jenkins data simulator fix by @nithinraok :: PR: #4751
Mutiscale Diarization Decoder (MSDD) model and module files by @tango4j :: PR: #4650
Fix logging in gradient clipping with PTL 1.7.2 by @MaximumEntropy :: PR: #4769
Fix checkpoint restoring by @nithinraok :: PR: #4777
avoid data clipping after convolution with rir samples by @nithinraok :: PR: #4806
Fixed in_features dim if bidirectional is True by @farisalasmary :: PR: #4588
Fix float/integer type error in WER.update() by @fujimotos :: PR: #4816
[Speech Data Explorer] An option to explicitly specify the base dir by @anteju :: PR: #4678
adding instancenorm as an option for conv normalization by @bmwshop :: PR: #4827
Fix small spelling mistakes by @SeanNaren :: PR: #4839
[Tutorials] Fix matplotlib version and directory name in Multispeaker_Simulator by @anteju :: PR: #4804
Update diarization folder structure by @tango4j :: PR: #4823
Missing types in clustering by @SeanNaren :: PR: #4858
add new models by @Jorjeous :: PR: #4852
Fix decoding for T5 models with RPE by @MaximumEntropy :: PR: #4847
Update Speaker Diarization notebooks with unknown oracle_num_speakers by @fayejf :: PR: #4861
Fix mha bug by @yzhang123 :: PR: #4859
Updates to adapter training by @arendu :: PR: #4842
Changes to MSDD code after review, fix test log call by @SeanNaren :: PR: #4881
Fixed output of BERT to be [batch x seq x hidden] by @michalivne :: PR: #4887
Add AMI dataset script by @SeanNaren :: PR: #4864
Update label_models.py by @stevehuang52 :: PR: #4891
Update tutorials.rst for question answering by @Zhilin123 :: PR: #4895
removed unused imports for all domains. by @XuesongYang :: PR: #4901
Fix ptl_load_state not providing cls by @MaximumEntropy :: PR: #4914
Remove unused cv collection by @okuchaiev :: PR: #4907
Add mixed-representation config to PhonemizerTokenizer by @rlangman :: PR: #4904
Fix implicit bug in _AudioLabelDataset by @stevehuang52 :: PR: #4923
Fix and refactor label models by @fayejf :: PR: #4913
Sparrowhawk deployment fix by @ekmb :: PR: #4928
Upgrade to NGC PyTorch 22.08 Container by @ericharper :: PR: #4929
Fixes for Cherry Picked PRs by @titu1994 :: PR: #4962
Fix cherry pick workflow by @ericharper :: PR: #4964
check for active conda environment by @nithinraok :: PR: #4970
fix label models restoring issue from weighted cross entropy by @nithinraok :: PR: #4968
Add simple pre-commit file (#4983) by @SeanNaren :: PR: #4995
Fix bug in Squeezeformer Conv block by @titu1994 :: PR: #5011
Fix bugs by @Zhilin123 :: PR: #5036
Add black to pre-commit (#5027) by @SeanNaren :: PR: #5045
Fix bug in question answering tutorial by @Zhilin123 :: PR: #5049
Missing fixes from r1.11.0 to T5 finetuning eval by @MaximumEntropy :: PR: #5054
P&C docs by @jubick1337 :: PR: #5068
probabilites -> probabilities by @nithinraok :: PR: #5078
Notebook bug fixes by @vadam5 :: PR: #5084
update strategy in notebook from ddp_fork to dp by @Zhilin123 :: PR: #5088
Fix Unhashable type list for Numba Cuda spec augment kernel by @titu1994 :: PR: #5093
Remove numba import by @titu1994 :: PR: #5095
T5 prompt learning fixes missing from r.11.0 merge by @MaximumEntropy :: PR: #5075
T5 Decoding with PP > 2 fix by @MaximumEntropy :: PR: #5091
Multiprocessing fix by @jubick1337 :: PR: #5106
[Bug fix] PC lexical + audio by @ekmb :: PR: #5109
bugfix: pybtex.database.InvalidNameString: Too many commas in author … by @XuesongYang :: PR: #5112

v1.11.0

1 year ago

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.07

ASR

Changelog

Add ASR CTC Decoding module by @titu1994 :: PR: #4342
Fixing bugs in calling method ctc_decoder_predictions_tensor. by @VahidooX :: PR: #4414
Fixed WER initialization in ASR_with_Nemo notebook by @anteju :: PR: #4523
Update signature of Hypothesis alignments by @titu1994 :: PR: #4511
Add support for ASR Adapter Auxiliary Losses by @titu1994 :: PR: #4480
Catalan ASR NGC Resource by @stevehuang52 :: PR: #4576
Add kw asr models, add itn ru checkpoint (tagger-based) by @bene-ges :: PR: #4595
Add DALI char dataset support to SSL model by @piraka9011 :: PR: #4592
Customize arguments for trimming the leading/trailing silence by @XuesongYang :: PR: #4582
Update Offline ASR with CTC Decoding by @titu1994 :: PR: #4608
Add Squeezeformer to ASR by @titu1994 :: PR: #4416
Fix ASR notebooks by @titu1994 :: PR: #4738
Add pretrained ASR models for Croatian by @anteju :: PR: #4682
Dataloader, collector, loss and metric for multiscale diarization decoder by @tango4j :: PR: #4187
Multilingual VAD model by @fayejf :: PR: #4734
Adding support for models trained with full context for cache-aware streaming. by @VahidooX :: PR: #4687
Fp16 support for Conformer by @bmwshop :: PR: #4571
Tiny VAD refactoring for postprocessing by @fayejf :: PR: #4625
Add silence handling for speaker diarization pipeline by @nithinraok :: PR: #4512
Add Bucketing support to TarredAudioToClassificationLabelDataset by @entn-at :: PR: #4465

TTS

Changelog

Wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4388
Pitch, voiced_mask, prob_voiced have the same values which is not expected. by @XuesongYang :: PR: #4392
Add static method decorator. by @XuesongYang :: PR: #4443
Fix typo in HiFi-GAN config's max steps by @XuesongYang :: PR: #4450
Relaxed support for both CPUs and GPUs by @XuesongYang :: PR: #4461
Multi-speaker fastpitch model training recipe on HUI-Audio-Corpus-German by @XuesongYang :: PR: #4413
Created the finetuning Hifigan 44100Hz recipe on HUI-Audio-Corpus-German by @XuesongYang :: PR: #4478
Fix dataset parameter typo on tacotron2 example yaml by @saarus72 :: PR: #4471
Update cmudict by @jasro23 :: PR: #4510
Customize arguments for trimming the leading/trailing silence by @XuesongYang :: PR: #4582
Fix off-by-1 bug in Beta Binomial Prior by @rlangman :: PR: #4616
G2P Aligner by @redoctopus :: PR: #4604
RADTTS ADLR-NEMO porting by @MikyasDesta :: PR: #4538
Fixed wrong pronunciations for r1.11. by @XuesongYang :: PR: #4677
Incremented the version number to 22.08 in tutorials. by @XuesongYang :: PR: #4684
Bugfix for missing configs. by @XuesongYang :: PR: #4725
Fix pynini install in TTS tutorials by @redoctopus :: PR: #4729
Updated config with a German IPA phoneme tokenizer by @XuesongYang :: PR: #4756
Add multi-speaker German FastPitch and HiFiGAN NGC checkpoints by @XuesongYang :: PR: #4763
Add single male speaker German FastPitch and HiFiGAN NGC checkpoints by @XuesongYang :: PR: #4770
Deprecated old scripts for ljspeech. by @XuesongYang :: PR: #4780
Fix MixerTTS data loading index error by @redoctopus :: PR: #4811
G2P docs by @ekmb :: PR: #4841
NMESC speaker counting algorithm update by @tango4j :: PR: #4500

NLP / NMT

Changelog

Add O2 support for RETRO model by @yidong72 :: PR: #4411
Add MTEncDec Finetune support by @aklife97 :: PR: #4540
Fix metric setup for finetuning without a test set by @MaximumEntropy :: PR: #4585
T0 model and dataset by @MaximumEntropy :: PR: #4598
Add prompt learning for T5 by @HeyyyyyyG :: PR: #4391
Add MuTransfer Capablity to RETRO model pretraining by @yidong72 :: PR: #4643
Label Smoothing in VocabParallelCrossEntropy by @MaximumEntropy :: PR: #4602
Megatron BART BOS / EOS bug fix by @michalivne :: PR: #4495
GPT Prompt Learning Improvements by @vadam5 :: PR: #4496
Megatron perceiver with tensor parallelism only by @MaximumEntropy :: PR: #4318
Refactor for punctuation model by @jubick1337 :: PR: #4367
Update megatron prompt learning interface to dialogue by @Zhilin123 :: PR: #4545
Removed NLPDDPPlugin Import check by @vadam5 :: PR: #4555
Option to disregard document boundaries for t5, bart, ul2 by @MaximumEntropy :: PR: #4481
Add Tokenization and Normalization pre-proecssing script for NMT by @aklife97 :: PR: #4557
Integrating support for GPT/T5/BART for Question Answering by @ameyasm1154 :: PR: #4532
NeMo Megatron: Add sequence parallelism and selective activation checkpointing (rebased) by @ericharper :: PR: #4380
Update megatron t5 interface to dialogue by @Zhilin123 :: PR: #4626
Additional sentencepiece args - Byte fallback, split digits, split_on_whitespace by @MaximumEntropy :: PR: #4525
Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training by @MaximumEntropy :: PR: #4396
NeMo Megatron Doc updates1 by @okuchaiev :: PR: #4633
Asymmetric Encoder and Decoder Configuration for Megatron Models by @MaximumEntropy :: PR: #4568
Add sentencepiece legacy arg to megatron tokenizer configs by @MaximumEntropy :: PR: #4659
Megatron encode function with RPE fix by @MaximumEntropy :: PR: #4692
Updates to NeMo Megatron OSS docs by @okuchaiev :: PR: #4709
Changes to make Megatron NMT exportable by @Davood-M :: PR: #4499
fix bug relating to ddp strategy in joint intent slot classification … by @Zhilin123 :: PR: #4762
Fix qa notebook typos and branch by @ericharper :: PR: #4788
Colab py37 compatibility megatron by @Zhilin123 :: PR: #4791
added/fixed export for Megatron models by @Davood-M :: PR: #4712
Fix providing glue in seq2seq eval by @MaximumEntropy :: PR: #4843
Fix Megatron NMT consumed samples and ckpt_to_nemo split rank by @MaximumEntropy :: PR: #4884
Fixing Megatron BERT output dimensions to [batch x sec x hidden] by @michalivne :: PR: #4894
Prompt Learning Inference Improvements by @vadam5 :: PR: #4566
MegaMolBART Compatibility by @michalivne :: PR: #4603

Text Normalization / Inverse Text Normalization

Changelog

Add ITN pt by @guidefloripa :: PR: #4516
add kw asr models, add itn ru checkpoint (tagger-based) by @bene-ges :: PR: #4595
Fix ITN pt by @guidefloripa :: PR: #4623
Bug fix hundred in Audio-based, added method so split text in sentences by @ekmb :: PR: #4610
Fix itn pt time by @guidefloripa :: PR: #4630
Pin lightning version to be < 1.7.0 by @MaximumEntropy :: PR: #4660
G2P for OOV and heteronyms by @ekmb :: PR: #4624
Publish pretrained itn t5 model for English by @bene-ges :: PR: #4748
Added MLM Scoring by @yzhang123 :: PR: #4476

Export

Changelog

update fastpitch to add export controls by @blisc :: PR: #4509
Fix Fastpitch Export by @blisc :: PR: #4676
Changes to make Megatron NMT exportable by @Davood-M :: PR: #4499
Added/fixed export for Megatron models by @Davood-M :: PR: #4712

Bugfixes

Changelog

Wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4388
Pitch, voiced_mask, prob_voiced have the same values which is not expected. by @XuesongYang :: PR: #4392
Fix tarred dataset len when num shards is not divisible by workers by @itzsimpl :: PR: #4553
Fix multiple dev/test datasets after restoring from checkpoint by @PeganovAnton :: PR: #4636
Fix/need different cache dirs for different datasets by @PeganovAnton :: PR: #4640
Improve mAES algorithm with patches by @titu1994 :: PR: #4662

General Improvements

Changelog

Option to disable mp in VAD via num_workers=1 by @gkucsko :: PR: #4317
Remove redundant bias expand by @xrennvidia :: PR: #4382
Add option for specifying wandb save_dir from config by @shan18 :: PR: #4379
Quick wav2vec fix. In-place operation adding convolutional positions … by @bonham79 :: PR: #4383
Fixing import error in some cases by @borisfom :: PR: #4401
Update with new conformer checkpoints. by @VahidooX :: PR: #4417
Wav2vec fix by @bonham79 :: PR: #4467
Relative Audio Paths by @stevehuang52 :: PR: #4470
Allow Noam lr scheduler to run for more than max_steps by @alancucki :: PR: #4472
Support for Different LRs with Param Groups by @stevehuang52 :: PR: #4508
Fix runtime check by @borisfom :: PR: #4501
Update finetune label models by @nithinraok :: PR: #4504
Weighted bucketing by @tbartley94 :: PR: #4530
Relative Audio Path by @stevehuang52 :: PR: #4520
Fix duplex inference with grammars by @ekmb :: PR: #4517
Add nsys profiling by @ericharper :: PR: #4539
Remove the variable that is not used in the context. by @XuesongYang :: PR: #4547
Adding multispeaker fastpitch and hifigan en model links to available… by @subhankar-ghosh :: PR: #4550
Add length ratio filtering script by @MaximumEntropy :: PR: #4551
Relative audio path in speech data explorer by @anteju :: PR: #4570
Dividing generative question-answering CI tests by @ameyasm1154 :: PR: #4600
Updating the default parameters in the example adapters config file by @shan18 :: PR: #4607
Improve normalize_batch ValueError message by @piraka9011 :: PR: #4614
Support listing Hugging Face model info by @titu1994 :: PR: #4619
Update diarization data loader to train meeting data by @tango4j :: PR: #4567
Fix HF check for model card info by @titu1994 :: PR: #4628
Add Github Action for auto webpage build by @titu1994 :: PR: #4645
Empty commit by @titu1994 :: PR: #4646
Force git config for doc build by @titu1994 :: PR: #4647
Correct branch name for github page source by @titu1994 :: PR: #4648
Adding lang id to shard by @bmwshop :: PR: #4649
Fix special tokens in vocab to arguments of constructor by @gwarmstrong :: PR: #4631
Fix apex for r1.11 by @michalivne :: PR: #4666
Update readme by @nithinraok :: PR: #4667
Removed trailing spaces in CI test by @vadam5 :: PR: #4671
Pynini dependency fix by @ekmb :: PR: #4674
Fix for incorrect batch size issue while decoding by @rilango :: PR: #4675
Fix to fetch config file by @nithinraok :: PR: #4699
Fix notebook for buffered inference by @titu1994 :: PR: #4703
Prompt Learning Notebook Bug Fix by @vadam5 :: PR: #4689
Add psutils to mock imports by @ericharper :: PR: #4728
Update Aligner model and tutorial to add NGC checkpoint loading by @redoctopus :: PR: #4714
Updated docs and doc paths by @vadam5 :: PR: #4754
Update r1.11 to new heteronyms list by @redoctopus :: PR: #4745
Update CMUdict with more recent 0.7b entries by @redoctopus :: PR: #4768
Add pynini to Docker container by @artbataev :: PR: #4733
Fix tutorial formatting by @redoctopus :: PR: #4778
Fix initializing weights from ptl ckpt with exclude by @sam1373 :: PR: #4807
T5 prompt learning fixes by @MaximumEntropy :: PR: #4771
Updated inference code and squad scripts by @vadam5 :: PR: #4835
Fix uppercasing mismatch for IPA heteronyms by @redoctopus :: PR: #4860
Set the number of workers to 0 for validation and test sets in all enc-dec models by @MaximumEntropy :: PR: #4790
Fix mha by @yzhang123 :: PR: #4866
ipa bug fix by @ekmb :: PR: #4871
Added utf8 encoding by @vadam5 :: PR: #4892
Fix question answering docs r1p11 by @Zhilin123 :: PR: #4897

v1.10.0

1 year ago

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.05

Known Issues

Issues

Tutorial: Fastpitch_Training_GermanTTS.ipynb is experimental and still being tested.

ASR

Changelog

Multilang asr tutorial by @bmwshop :: PR: #3931
Add ASR with Adapters Tutorial by @titu1994 :: PR: #4149
Add support for Decoder + Joint Adapters for ASR by @titu1994 :: PR: #4189
updating PretrainedModelInfo and benchmark sheet for ASR models by @krishnacpuvvada :: PR: #4259
Remove verbose flag from Dali Index Creator by @titu1994 :: PR: #4309
updating PretrainedModelInfo for ASR SSL models by @krishnacpuvvada :: PR: #4292
Adding docs for ASR SSL by @krishnacpuvvada :: PR: #4303
Add ASR Scores to Docs by @titu1994 :: PR: #4412
[ASR] Replace all paths with /content/ by @titu1994 :: PR: #4427
added conformer mandarin model. by @VahidooX :: PR: #4201
Runtime audio segment sampling for SSL by @krishnacpuvvada :: PR: #4126

TTS

Changelog

[TTS] Add volume passthrough to fp for riva by @blisc :: PR: #4167
Update TTS Configs from LAMB to AdamW by @redoctopus :: PR: #4233
Add benchmark=false to all TTS configs by @redoctopus :: PR: #4263
[TTS] add staticmethod decoration for BetaBinomialInterpolator by @XuesongYang :: PR: #4319
[TTS] capture exception of non-supported windows. by @XuesongYang :: PR: #4320
[TTS] enforced pin_memory = True by @XuesongYang :: PR: #4341
[TTS] Training Fastpitch on German text and phonemes and finetuning HiFi-GAN on predicted mels by @aroraakshit :: PR: #4266
IPA support for TTS by @redoctopus :: PR: #4310
Bits of RADTTS support by @borisfom :: PR: #4343

NLP / NMT

Changelog

Megatron NMT Restore from T5/BART and finetune by @MaximumEntropy :: PR: #3977
Binarized memmap dataloader for Megatron NMT, Inference and checkpoint -> nemo by @MaximumEntropy :: PR: #4137
Use unique names for temporary directories in punctuation and capitalization tests by @PeganovAnton :: PR: #4298
Removes debug logging statements in Megatron NMT by @MaximumEntropy :: PR: #4312
Raise error if trainer object is None for MegatronBaseModel by @MaximumEntropy :: PR: #4356
Punctuation and capitalization tests race condition by @PeganovAnton :: PR: #4399
unify intent slot dataset util functions in tutorials by @Zhilin123 :: PR: #4445
Fix for TP=2,PP=2 decoding with megatron encoder-decoder models by @MaximumEntropy :: PR: #4484
Add RETRO model for pretraining by @yidong72 :: PR: #4121
Add async grad allreduce and chunk optimization by @xrennvidia :: PR: #4084
Implements the UL2 Dataset and config by @MaximumEntropy :: PR: #4184
Add RETRO indexed dataset and inference by @yidong72 :: PR: #4220
Finetune T5 on the prefix-lm objective by @MaximumEntropy :: PR: #4328
Fuse bias with geglu in ParallelMLP by @xrennvidia :: PR: #4213
Support larger datasets for question answering by @Zhilin123 :: PR: #4205
Refactor bias act fusion by @MaximumEntropy :: PR: #4376
Prompt Learning Pipeline Parallel by @vadam5 :: PR: #4291
Text memmap dataset by @michalivne :: PR: #4068
Fuse grad division into async grad allreduce by @xrennvidia :: PR: #4327

Text Normalization / Inverse Text Normalization

Changelog

[TN] WFST to normalize punctuation by @ekmb :: PR: #4108
[TN/TTS] Add graph to tag IPA words/sentences in square brackets and leave them unchanged by @ekmb :: PR: #4323
Tn tutorial by @yzhang123 :: PR: #4090
[TN] WFST to normalize punctuation by @ekmb :: PR: #4108
Tn add rules by @yzhang123 :: PR: #4302
[TN/TTS] Add graph to tag IPA words/sentences in square brackets and leave them unchanged by @ekmb :: PR: #4323
Tn install by @yzhang123 :: PR: #4055
Fix electronic bug, new time ITN rule by @ekmb :: PR: #4355
[TN] Bug fix: expand serial coverage of unknown symbol, remove constraints from word graph by @ekmb :: PR: #4463
Configure T5 finetuning metrics by @MaximumEntropy :: PR: #4122

Export

Changelog

Added support for subnet export by @borisfom :: PR: #4299

Core

Changelog

Add Module-level Adapters, Save-Restore and tests by @titu1994 :: PR: #4114
Add NeMo Adapters tutorial to Core by @titu1994 :: PR: #4311
NeMo Model to HF Hub Upload Tutorial by @titu1994 :: PR: #4322

General Improvements and Fixes

Changelog

Update container to 22.05 by @ericharper :: PR: #4329
Fix PTL step calculation by @titu1994 :: PR: #4307
[NLP] P&C Fix multi node cache issue, add pynini guard by @ekmb :: PR: #4410
NeMo Megatron GPT Unit Tests by @ericharper :: PR: #4099
Add the PP2 GPT eval CI test by @yidong72 :: PR: #4168
BigNLP perf regression fix by @MaximumEntropy :: PR: #4267
Fixes for Megatron Base Model Artifacts by @MaximumEntropy :: PR: #4248
Fix a wrong description in offline_diarization_with_asr.yaml by @tango4j :: PR: #4141
bugfix for import error in Offline_ASR_with_VAD_for_CTC_models by @fayejf :: PR: #4424
[Fix] ASR RNNT Tutorial by @stevehuang52 :: PR: #4352
[TTS] Fix Hifigan finetune tutorial by @subhankar-ghosh :: PR: #4182
[Bugfix][TTS] wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4432
[bugfix][TTS] pitch, voiced_mask, prob_voiced have the same values. by @XuesongYang :: PR: #4435
[TTS] [bugfix] German FastPitch HiFi-GAN tutorial and lr by @aroraakshit :: PR: #4459
[TTS] [bugfix] update indentation by @aroraakshit :: PR: #4468
Fix some 's' cases for IPA G2P by @redoctopus :: PR: #4460
Fix ASR Typos in tutorials by @titu1994 :: PR: #4384
Use unique names for temporary directories in punctuation and capitalization tests by @PeganovAnton :: PR: #4298
Punctuation and capitalization tests race condition by @PeganovAnton :: PR: #4399
Dialogue tasks unit test by @Zhilin123 :: PR: #4112
fix error by @yzhang123 :: PR: #4120
fix typo by @stevehuang52 :: PR: #4134
Fix cmudict typo: phoneme YI1 -> IY1 in NVME by @redoctopus :: PR: #4139
transcribe: scan directories recursively by @virajkarandikar :: PR: #4159
Add 44KHz yaml file for Fastpitch training by @subhankar-ghosh :: PR: #4161
[bugfix] consistent highfreq to both fastpitch and hifigan in their 44100 configs. by @XuesongYang :: PR: #4177
Upperbound OmegaConf by @titu1994 :: PR: #4191
Prompt tokenization bugfix by @vadam5 :: PR: #4197
Updated to Prompt Learning Model to Use Distributed Sampler by @vadam5 :: PR: #4208
Freesound fixes by @virajkarandikar :: PR: #4155
Patch Hydra by @titu1994 :: PR: #4202
Prompt Learning Model Saving Changes by @vadam5 :: PR: #4212
Speakertasks manifest by @yzhang123 :: PR: #4185
SSL Multi-loss Update by @sam1373 :: PR: #4186
Support load_adapters with just adapter_name by @titu1994 :: PR: #4255
Add special tokens to existing (trained) SentencePiece models by @aklife97 :: PR: #4203
Fixing the speed slow-down for speech models. by @VahidooX :: PR: #4260
Fix and add functions in speaker utils by @tango4j :: PR: #4138
pt container 1.10->1.11.0 by @ekmb :: PR: #4273
ssl fixes by @sam1373 :: PR: #4268
Save Virtual Prompt Weights Only by @vadam5 :: PR: #4237
add 'relative positional embedding (RPE)' feature - re-creating after… by @khcs :: PR: #4256
Docs CSS: Update h4 tag style for the right side bar by @nickolyamba :: PR: #4284
Fix Docs CSS: align docs left and increase width for large screens by @nickolyamba :: PR: #4154
remove redundant condition for fastpitch. by @XuesongYang :: PR: #4281
[Add] automaticly resolving relative audio path by @stevehuang52 :: PR: #4277
forcing conv subsampling to 32 bit by @bmwshop :: PR: #4293
Add library name and version when downloading from the Hugging Face Hub by @osanseviero :: PR: #4304
clear access registry when adding if not empty by @sam1373 :: PR: #4306
[collections] bugfix for capturing NotImplementedError of non-supported sup data types. by @XuesongYang :: PR: #4297
Adjust lr for AdamW from LAMB default by @redoctopus :: PR: #4308
Fix bugs in indexed dataset exam script by @yidong72 :: PR: #4325
Torchaudio installation fix by @GNroy :: PR: #4330
Speedup the speech commands dataset processing script by @shan18 :: PR: #4347
fix wrong requirement by @yzhang123 :: PR: #4349
Refactored path to manifest by @treacker :: PR: #4251
Fix the post LN bug by @yidong72 :: PR: #4350
[Fix] Hanging for Fully Randomized Bucketing by @stevehuang52 :: PR: #4348
Auto-switch the input dimensions in the conformer encoder adapter to correct value by @shan18 :: PR: #4354
Set headscale false by @MaximumEntropy :: PR: #4364
Add wandb as dependency by @titu1994 :: PR: #4365
Fix trainer.global_steps in WandB logging by @titu1994 :: PR: #4366
Finetuning changes for BART by @MaximumEntropy :: PR: #4003
Make position embedding expansion specific to a batch to avoid checkpoint size mismatches by @MaximumEntropy :: PR: #4357
Correct support for dataclasses in default module dim by @titu1994 :: PR: #4372
Fix no attribute 'pad_id' bug when pre-processing by @yidong72 :: PR: #4377
Question answering bug fix by @Zhilin123 :: PR: #4381
Docs for NeMo Adapters by @titu1994 :: PR: #4369
Update NeMo docs by @titu1994 :: PR: #4397
Fixing import error in some cases by @borisfom :: PR: #4402
Fix tutorial typos and docs by @titu1994 :: PR: #4415
Add reconfigure on validation epoch start by @MaximumEntropy :: PR: #4393
Re-apply fixes from r1.9.0 by @redoctopus :: PR: #4425
Fix hanging issue by multiprocessing in SD tutorial and add ETA for VAD processing by @fayejf :: PR: #4405
Fix notebook text by @yidong72 :: PR: #4438
Update dialogue tutorial version by @Zhilin123 :: PR: #4437
Docs: Add table overflow handling by @nickolyamba :: PR: #4441
Docs: Decrease Font Size on Tables by @nickolyamba :: PR: #4444
Notebook bug fix: add subfolder by @ekmb :: PR: #4442
Fix typo in HiFi-GAN config's max steps by @redoctopus :: PR: #4446
Updated notebook to fix batch configuration and precision bugs by @vadam5 :: PR: #4447
fix branch in link by @ekmb :: PR: #4454
t5-rpe-fix targeting r1.10.0; raise exception for PP>2. by @khcs :: PR: #4469
Add kwargs to exact string match by @MaximumEntropy :: PR: #4479

v1.9.0

1 year ago

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.04

ASR

Changelog

Fix changed function name in offline vad asr notebeook by @fayejf :: PR: #4007
NeMo Adapters Support + ASR Adapters by @titu1994 :: PR: #3942
Update ASR configs with num_workers and pin_memory by @titu1994 :: PR: #4270
Verbose k2 install, skip if failed by @GNroy :: PR: #4289
Torch conversion for VAD-Diarization pipeline by @tango4j :: PR: #3930
Multiprocess improvements by @nithinraok :: PR: #4127

TTS

Changelog

Tn tts e by @ekmb :: PR: #3988
Remove AudioToCharWithPriorAndPitchDataset dependency from fastpitch by @subhankar-ghosh :: PR: #4008
Deprecation by @blisc :: PR: #4082
FastPitch FT notebook - Improving Speech Quality clarifications by @redoctopus :: PR: #3954

NLP / NMT

Changelog

Option to remove bias terms from Megatron transformers by @MaximumEntropy :: PR: #3973
Add NMT method to translate with TN/ITN pre/post-processing by @MaximumEntropy :: PR: #4009
Fix Punctuation and Capitalization model batching. An issue with shuffling. by @PeganovAnton :: PR: #4050
Fix GPT model parallel eval by @yidong72 :: PR: #4054
Updating with main by @jpilaul :: PR: #4073
Cherry-pick fix for megatron ckpt conversion script when using BCP by @ericharper :: PR: #4089
Check implicit grad acc in GLUE dataset building by @MaximumEntropy :: PR: #4123
Fix/punctuation avoid overwritting tmp files by @PeganovAnton :: PR: #4144
Fix/punctuation/trainer required for setting test data by @PeganovAnton :: PR: #4199
Raise error if bicleaner is not installed in NMT Data preprocesing notebook by @MaximumEntropy :: PR: #4264
Fix epoch end for NeMo NMT by @MaximumEntropy :: PR: #4265
Update YAML with trainer.benchmark=False for NLP by @MaximumEntropy :: PR: #4261
Add NMT method to translate with TN/ITN pre/post-processing by @MaximumEntropy :: PR: #4009
Continuous prompt refactor by @vadam5 :: PR: #3877
T5 finetuning for generic small text-to-text datasets by @MaximumEntropy :: PR: #4032

Text Normalization / Inverse Text Normalization

Changelog

Tn special text support by @yzhang123 :: PR: #3969
Tn update numbers by @yzhang123 :: PR: #3992
Tn tts e by @ekmb :: PR: #3988
Itn vi by @yzhang123 :: PR: #4029
Refactor tn data folder, and update of measure by @yzhang123 :: PR: #4028
Remove conda dependency for tn by @yzhang123 :: PR: #4057
Tn electronic by @yzhang123 :: PR: #4053
ThutmoseTaggerModel, a new model for inverse text normalization by @bene-ges :: PR: #4011
Tutorial on ITN with Thutmose tagger and small fixes by @bene-ges :: PR: #4117
Cleaned up TN/ ITN doc by @yzhang123 :: PR: #4119
Update default for SH by @ekmb :: PR: #4135
Update ContextNet version by @titu1994 :: PR: #4207

NeMo Tools

Changelog

Added exception handling for audio player in SDE by @vsl9 :: PR: #4077

NeMo Core

Changelog

Support pre-extracted nemo checkpoint for restoration by @titu1994 :: PR: #4061
Fix type checking to be compatible with named tuples by @artbataev :: PR: #3986
Update num worker calculation due to PTL flag changes by @redoctopus :: PR: #4056
Refresh NeMo documentation to Sphinx Book Theme by @titu1994 :: PR: #3996
Generalize adapter merge strategy for future adapters by @titu1994 :: PR: #4091

General Improvements

Changelog

Fix Punctuation and Capitalization model batching. An issue with shuffling. by @PeganovAnton :: PR: #4050
Fix restoring from checkpoint for case when is provided by @PeganovAnton :: PR: #4136
Fix/punctuation avoid overwritting tmp files by @PeganovAnton :: PR: #4144
Fix/punctuation/trainer required for setting test data by @PeganovAnton :: PR: #4199
Ability to set log_prediction to false by @bmwshop :: PR: #3929
Glu activation variants by @MaximumEntropy :: PR: #3951
Ranking merge by @yzhang123 :: PR: #3906
Fix path in doc by @nithinraok :: PR: #3979
Adding fisher audio conversion script from old NeMo branch by @jbalam-nv :: PR: #3991
improvements to geet_commonvoice_data script by @bmwshop :: PR: #3999
Bugfix and variable name change for clustering code by @tango4j :: PR: #4023
Exp manager log rank 0 only arguments by @MaximumEntropy :: PR: #4026
Force import test on PR by @titu1994 :: PR: #4037
Drop support for kaldi-io by @titu1994 :: PR: #4042
Cherry pick HF integration and bug fixes from 1.8.1 by @ericharper :: PR: #4052
Make saving prompt encoder embeddings non-configurable by @vadam5 :: PR: #4071
Replace sampled tokens with EOD after EOD has been sampled once by @vadam5 :: PR: #4070
Added answer only loss for prompt learning by @vadam5 :: PR: #4069
added stacking suport to conformer. by @VahidooX :: PR: #4045
Update LJSpeech whitelist file path by @redoctopus :: PR: #4078
Added check for microbatch calculator by @vadam5 :: PR: #4043
Prompt Learning Docs by @vadam5 :: PR: #4046
Fix link to prompt tuning page by @SeanNaren :: PR: #4081
Add docs for by @titu1994 :: PR: #4079
Dialogue task by @Zhilin123 :: PR: #3884
RMSNorm, Normformer and fixes from merging 1.8.0 into main by @MaximumEntropy :: PR: #4048
Correct link to PTL by @titu1994 :: PR: #4088
Added encoder and decoder modules for RETRO model by @yidong72 :: PR: #4038
Upgrade container to NGC PyTorch 22.04 by @ericharper :: PR: #4085
Tarred fix label models by @nithinraok :: PR: #4092
Fix link to tutorial in dialogue docs by @Zhilin123 :: PR: #4093
Prompt learning Notebook by @vadam5 :: PR: #4031
Add more papers by @yzhang123 :: PR: #4097
Ignore speakers with few utterances by @nithinraok :: PR: #3722
Access mixin by @sam1373 :: PR: #4098
Add CharParser for Cyrillic letters by @karpov-nick :: PR: #4101
Restored tests previously disabled for 22.03 base by @borisfom :: PR: #4109
Add augmentation to label models by @nithinraok :: PR: #4113
Fix register artifacts by @ramanathan831 :: PR: #4116
Fix typo by @yzhang123 :: PR: #4140
bug_fix_diarization_manifest_creation by @yzhang123 :: PR: #4125
Tacotron2 retrain by @treacker :: PR: #4103
WaveGlow input type fixes by @redoctopus :: PR: #4151
Notebooks' link, typo and import fix by @fayejf :: PR: #4158
Thutmose tagger bug fixes by @bene-ges :: PR: #4162
Update speaker docs by @nithinraok :: PR: #4164
Set plugin to None when no apex by @ekmb :: PR: #4171
Fix doc by @yzhang123 :: PR: #4152
Small import name fix by @fayejf :: PR: #4180
Rename folder VAD -> vad by @fayejf :: PR: #4163
Fix the server key value problem in the notebook by @yidong72 :: PR: #4196
Pin omegaconf for r1.9.0 by @ericharper :: PR: #4195
Fix cherrypicks by @titu1994 :: PR: #4204
Fix bugs for dialogue tutorial by @Zhilin123 :: PR: #4211
Tacotron2 1.9.0 bugfixes by @redoctopus :: PR: #4209
Add docs for Thutmose Tagger by @bene-ges :: PR: #4173
Dialogue tutorial fix by @Zhilin123 :: PR: #4221
Fix syntax error in ipynb-file by @bene-ges :: PR: #4228
Fix JSON serialization problem by @yidong72 :: PR: #4235
Prompt Learning Typo Fixes by @vadam5 :: PR: #4238
Fixing bug 3642622 by @pasandi20 :: PR: #4250
Fix broken link in the tutorial by @bene-ges :: PR: #4257
Prompt learning notebook bugfix by @vadam5 :: PR: #4262
Fix missing validation dataset, whitelist certain keywords for datasets by @titu1994 :: PR: #4269
Set Save on train end to false by @vadam5 :: PR: #4274
Updated config to fix CI test OOM error by @vadam5 :: PR: #4279
Changed total virtual prompt tokens by @vadam5 :: PR: #4295

v1.8.2

2 years ago

Known Issues

Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.

TTS

Fastpitch Tutorial fix by @subhankar-ghosh :: PR: #4044

v1.8.1

2 years ago

Known Issues

Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.

TTS

Restore_buffer bug fix and update NeMo checkpoint URL by @subhankar-ghosh :: PR: #4041

Hugging Face Hub Integration

Add support for Huggingface Hub to NeMo by @titu1994 :: PR: #4030

Bug Fixes

Added apex import guard back
Patch commons.py by @ericharper :: PR: #4039
Fixing pretrained name by @borisfom :: PR: #4022
Add back Citrinet zh by @titu1994 :: PR: #4040

v1.8.0

2 years ago

Known Issues

Issues

Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.
pytest for Vietnamese inverse text normalization are failing. Fixed in main

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.03

ASR

Changelog

ASR SSL Update by @sam1373 :: PR: #3714
Polylang asr by @bmwshop :: PR: #3721
Test grad accumulation for RNNT loss by @titu1994 :: PR: #3731
Add readme files describing model execution flow for ASR tasks by @titu1994 :: PR: #3812
add fr asr ckpt to doc by @yzhang123 :: PR: #3809
Fix asr tests in 22.02 by @titu1994 :: PR: #3823
Add new pretrained Spanish ASR models by @erastorgueva-nv :: PR: #3830
Documentation updates for ASR by @titu1994 :: PR: #3846
Offline VAD+ASR tutorial by @fayejf :: PR: #3828
Added Hindi and Marathi Models in Nemo pretrained ASR_CTC_BPE models … by @meghmak13 :: PR: #3856
Add a missing line to ASR_with_NeMo.ipynb by @lifefeel :: PR: #3908
Multilang asr models by @bmwshop :: PR: #3907
added stt_en_conformer_transducer_large_ls to NGC by @VahidooX :: PR: #3920
Fix DALI test on 22.03 by @titu1994 :: PR: #3911
Adding RNN encoder for LSTM-Transducer and LSTM-CTC models by @VahidooX :: PR: #3886
Fix issue with Segfault in ASR models by @titu1994 :: PR: #3956
Added Mandarin pretrained Conformer-Transducer-Large model trained on AISHELL2. by @VahidooX :: PR: #3970

TTS

Changelog

Bump TTS deprecation version to 1.9 by @blisc :: PR: #3955
Add pinned pynini and scipy installs to TTS training tutorial by @redoctopus :: PR: #3967
Compatability override to load_state_dict for old TTS checkpoints by @redoctopus :: PR: #3978

NLP / NMT

Changelog

Use worker processes for data preprocessing by @crcrpar :: PR: #3665
Set find_unused_parameters to False in GPT example script by @ericharper :: PR: #3837
GPT multinode eval by @ericharper :: PR: #3821
Fix MegatronPretrainingRandomSampler by taking into account by @crcrpar :: PR: #3826
Add slot filling into DST Generative model by @Zhilin123 :: PR: #3695
Disable nvfuser for gpt by @ericharper :: PR: #3845
Multi-Label Joint Intent Slot Classification by @chenrichard10 :: PR: #3742
fix bug in intent/slot model reloading by @carolmanderson :: PR: #3874
Make test_gpt_eval unit test less strict by @yidong72 :: PR: #3898
Comment gpt resume ci test by @MaximumEntropy :: PR: #3901
Neural Machine Translation with Megatron Transformer Models (Tensor Parallel and Tarred Datasets Only) by @MaximumEntropy :: PR: #3861
Megatron support by @ramanathan831 :: PR: #3893
Populate the GPT/BERT with uploaded models by @yidong72 :: PR: #3885
Megatron BART by @michalivne :: PR: #3666
Additional Japanese processor for NMT that uses MeCab segmentation. Fix for BLEU in one-many NMT by @MaximumEntropy :: PR: #3889
NMT GRPC sever URL fix by @MaximumEntropy :: PR: #3918
Megatron legacy conversion support by @ramanathan831 :: PR: #3919
Update max_epochs on megatron configs by @ericharper :: PR: #3958
Fix NMT variable passing bug by @aklife97 :: PR: #3985
Fix nemo megatron restore with artifacts by @ericharper :: PR: #3997
Fix megatron notebook by @ramanathan831 :: PR: #4004
Megatron work-arounds by @borisfom :: PR: #3998
Add T5 model P-tuning support by @yidong72 :: PR: #3768
Make index mappings dir configurable by @ericharper :: PR: #3868
T5 pipeline parallel by @MaximumEntropy :: PR: #3750

Text Normalization / Inverse Text Normalization

Changelog

Tn es by @bonham79 :: PR: #3632
Fix single GPU training issue + change deprecated Lightning args by @aklife97 :: PR: #4010

Export

Changelog

Conformer WARs for TRT8.2 by @borisfom :: PR: #3787
bert_module: fix inputs of export model by @virajkarandikar :: PR: #3815
Exports 22.03 war by @borisfom :: PR: #3957

Bugfixes

Changelog

patch librosa deprecation and fix by @fayejf :: PR: #3818

General Improvements

Changelog

Pynini pip by @yzhang123 :: PR: #3726
upgrade PTL trainer flags by @nithinraok :: PR: #3589
Updated Speech Data Explorer by @vsl9 :: PR: #3710
Fix spelling error in num_workers parameter to actually set number of dataset workers specified in yaml configs by @themikem :: PR: #3800
Support for Camembert Huggingface bert-like models by @itzsimpl :: PR: #3799
Update to 22.02 by @ericharper :: PR: #3771
Fixing the defaults of conformer models in the config files by @VahidooX :: PR: #3836
Fix T5 Encoder Mask while decoding by @MaximumEntropy :: PR: #3838
fix: multilingual transcribe does not require lang id param by @bmwshop :: PR: #3833
Misc improvements by @titu1994 :: PR: #3843
Change container by @MaximumEntropy :: PR: #3844
Making gender assignment random for cardinals, fractions, and decimal… by @bonham79 :: PR: #3759
Jenkinsfile test changes by @chenrichard10 :: PR: #3879
Adding a RegEx tokenizers by @michalivne :: PR: #3839
enable bias+dropout+add fusion with nvfuser at inference by @erhoo82 :: PR: #3869
Add text_generation_util to support TopK, TopP sampling + Tabular Data Generation. by @yidong72 :: PR: #3834
Ptl requirements bound by @MaximumEntropy :: PR: #3903
doc links update by @ekmb :: PR: #3891
add citations by @yzhang123 :: PR: #3902
Update NeMo CI to 22.03 by @MaximumEntropy :: PR: #3900
Add domain groups to changelog builder by @titu1994 :: PR: #3904
add input threshhold by @yzhang123 :: PR: #3913
improvements to commonvoice data script by @bmwshop :: PR: #3892
fixes to the cleanup flag by @bmwshop :: PR: #3921
Upgrade to PTL 1.6.0 by @ericharper :: PR: #3890
JSON output from diarization now includes sentences. Optimized senten… by @demsarjure :: PR: #3897
Stateless timer fix for PTL 1.6 by @MaximumEntropy :: PR: #3925
fix save_best missing chpt bug, update for setup_tokenizer() changes by @ekmb :: PR: #3932
Fix tarred sentence dataset length by @MaximumEntropy :: PR: #3941
remove old doc by @ekmb :: PR: #3946
Fix issues with librosa deprecations by @titu1994 :: PR: #3950
Fix notebook bugs for branch r1.8.0 by @yidong72 :: PR: #3948
Fix global batch fit loop by @ericharper :: PR: #3936
Refactor restorefrom by @ramanathan831 :: PR: #3927
Fix variable name and move models to CPU in Change partition by @aklife97 :: PR: #3972
Fix notebook error by @yidong72 :: PR: #3975
Notebook Bug Fixes for r1.8.0 by @vadam5 :: PR: #3989
Fix compat override for TalkNet Aligner by @redoctopus :: PR: #3993
docs fixes by @ekmb :: PR: #3987
Fixes val_check_interval, skip loading train data during eval by @MaximumEntropy :: PR: #3968
LogProb calculation performance fix by @yidong72 :: PR: #3984
Fix P-Tune T5 model by @yidong72 :: PR: #4001
Fix the broadcast shape mismatch by @yidong72 :: PR: #4017
Add known issues to notebook by @ericharper :: PR: #4024