NeMo Versions Save

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

v1.23.0

2 months ago

Highlights

Models

Nvidia Starcoder 2 - 15B

Announcement - https://developer.nvidia.com/blog/unlock-your-llm-coding-potential-with-starcoder2/
AI Foundation Model Inference - https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/starcoder2-15b
https://huggingface.co/bigcode/starcoder2-15b

NeMo Canary

Announcement - https://nvidia.github.io/NeMo/blogs/2024/2024-02-canary/

https://huggingface.co/nvidia/canary-1b

NeMo LLM

Falcon
Code Llama
StarCoder
GPT perf improvements
Context parallelism
Mistral
Mixtral (without expert parallelism)
Mcore GPT Dataset integration

NeMo MM

CLIP
Stable Diffusion (supporting LoRA)
Imagen
ControlNet (for SD)
Instruct pix2pix (for SD)
LLAVA
NeVA
DreamFusion++
NSFW filtering

NeMo ASR

Lhotse Dataloading support #7880
Canary: Multi task multi lingual ASR #8242
LongForm Audio for Diarization #7737
Faster algorithm for RNN-T Greedy #7926
Cache-Aware streaming notebook #8296

NeMo TTS

NeMo Vision

Known Issues

ASR

RNNT WER calculation when fused batch size > 1 during validation / test step()

Previously, the RNNT metric was stateful while the CTC one was not (r1.22.0, r1.23.0)

Therefore this calculation in the RNNT joint for fused operation worked properly. However with the unification of metrics in r1.23.0, a bug was introduced where only the last sub-batch of metrics calculates the scores and does not accumulate. This is patched via https://github.com/NVIDIA/NeMo/pull/8587 and will be fixed in the next release.

Workaround: Explicitly disable fused batch size during inference using the following command

from omegaconf import open_dict
model = ...
decoding_cfg = model.cfg.decoding
with open_dict(decoding_cfg):
  decoding_cfg.fused_batch_size = -1
model.change_decoding_strategy(decoding_cfg)

Note: This bug does not affect scores calculated via model.transcribe() (since it does not calculate metrics during inference, just text), or using the transcribe_speech.py or speech_to_text_eval.py in examples/asr.

Two failing unit tests due to a change in expected results, caused by lhotse version update.

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:24.01.speech

Detailed Changelogs

ASR

Changelog

Update link to yaml file in ASR_with_Transducers.ipynb by @Faith-Nchifor :: PR: #8014
Use convert_hf_dataset_to_nemo by @karpnv :: PR: #8017
Update asr_language_modeling.rst: Add a missing word by @martin0258 :: PR: #8007
spelling mistake by @orena1 :: PR: #7903
update asr eval by @stevehuang52 :: PR: #8045
fix noise aug by @stevehuang52 :: PR: #8057
Various fixes for typos and urls by @titu1994 :: PR: #8066
[Fix] Increase length check tolerance to prevent test failing by @anteju :: PR: #8067
Add text metrics to asr eval by @stevehuang52 :: PR: #8087
fix device setting to allow using accelerator cpu by @orena1 :: PR: #8084
.ctm in data simulator annotator compliant with RT-09 specification by @popcornell :: PR: #8004
Fix AST eval by @stevehuang52 :: PR: #8112
fix: numba.*_num_threads resets torch num_threads #8141 by @itzsimpl :: PR: #8145
Update dependencies by @titu1994 :: PR: #8156
NeMo + Lhotse integration by @pzelasko :: PR: #7880
Speedup RNN-T greedy decoding by @artbataev :: PR: #7926
[docker] Install k2 before NeMo for faster image rebuilding by @pzelasko :: PR: #8204
[docs] Add --force_codec to tarred dataset creation examples by @pzelasko :: PR: #8227
Temporarily use the previous RNN-T decoding algorithm as default by @artbataev :: PR: #8226
Make TDT inference not require duration params by @hainan-xv :: PR: #8207
Cache Aware Streaming tutorial notebook by @erastorgueva-nv :: PR: #8296
fix path location and branch by @nithinraok :: PR: #8304
Attention encoder-decoder models for multiple speech-to-text tasks … by @titu1994 :: PR: #8324
Remove asr webapp by @titu1994 :: PR: #8347
remove target at model level in aed model config [ASR] by @krishnacpuvvada :: PR: #8351
Add change_vocabulary and save_tokenizers() support to Multitask ASR models by @titu1994 :: PR: #8357
Change default beam size by @titu1994 :: PR: #8371
adding jenkins test for speech_to_text_aed model by @krishnacpuvvada :: PR: #8368
Add Finetuning tutorial with HF Datasets by @nithinraok :: PR: #8356
wer fix by @tbartley94 :: PR: #8404
add ensemble decoding fix by @nithinraok :: PR: #8427
Update k2 by @artbataev :: PR: #8492

TTS

Changelog

[TTS] Scale sampler steps by number of devices by @rlangman :: PR: #7947
Add All Multimodal Source Code Part 2: Text to image, x to nerf by @yaoyu-33 :: PR: #7970
[TTS] Add period discriminator and feature matching loss to codec recipe by @rlangman :: PR: #7884
Added VectorQuantizer base class by @anteju :: PR: #8011

LLMS

Changelog

Add interface to set NCCL options of each process group by @erhoo82 :: PR: #7923
Support O2 training of PEFT and SFT by @cuichenx :: PR: #7971
[NLP] Access scaler only in FP16 case by @janekl :: PR: #7916
[NLP] Minor improvements in Llama conversion script by @janekl :: PR: #7978
[NLP] Use helpers from utils_funcs.py in Llama conversion by @janekl :: PR: #7979
[NLP] Remove replace_sampler_ddp (deprecated in Trainer) by @janekl :: PR: #7981
Reworked MegatronPretrainingRandomBatchSampler to correctly handle epochs > 1 by @trias702 :: PR: #7920
Remove deprecated arguments from TE's TransformerLayer by @jbaczek :: PR: #7917
Add All Multimodal Source Code by @yaoyu-33 :: PR: #7791
First draft of mcore bert model in NeMo by @shanmugamr1992 :: PR: #7814
Support Falcon Variants (7B/40B/180B) in Mcore NeMo by @xuanzic :: PR: #7666
FSDP + Tensor Parallelism by @erhoo82 :: PR: #7897
Packed Sequence by @cuichenx :: PR: #7945
Adding method back that was removed accidentally by @ericharper :: PR: #8038
[NLP] ArtifactItem with init=True to make it debuggable by @janekl :: PR: #7980
SFT patch: (1) enable sequence parallelism and (2) enable profile by @erhoo82 :: PR: #7963
migration to PTL 2.0 for spellmapper model by @bene-ges :: PR: #7924
Change the megatron config lr scheduler default and fix to change partitions script by @shan18 :: PR: #8094
(1) Add SHARP interface to M-CORE, (2) use send/recv to send train loss to the first rank instead of b-cast by @erhoo82 :: PR: #7793
Reconfigure limit_val_batches only for int by @athitten :: PR: #8099
Fixing wrapper and moving it to base class by @shanmugamr1992 :: PR: #8055
fix gated_linear_unit bug by @Agoniii :: PR: #8042
Fix Adapter for MCore models by @cuichenx :: PR: #8124
add war fix for sync issues by @gshennvm :: PR: #8130
Improve PEFT UX by @cuichenx :: PR: #8131
Enhance flexibility by passing callbacks as method argument by @michal2409 :: PR: #8015
context parallelism by @xrennvidia :: PR: #7739
Make pipelined TP comm overlap available with mcore by @erhoo82 :: PR: #8005
remove deprecated scripts by @arendu :: PR: #8138
adding OnlineSampleMapping by @arendu :: PR: #8137
Add distopt support for FP8 params and BF16 optimizer state by @timmoon10 :: PR: #7909
Revert adding OnlineSampleMapping by @pablo-garay :: PR: #8164
Token count and sequence length logging for MegatronGPTSFTModel by @vysarge :: PR: #8136
Use latest apex internal API by @jbaczek :: PR: #8129
tune specific params in the base model by @arendu :: PR: #7745
Virtual pipeline parallel support for MegatronGPTSFTModel by @vysarge :: PR: #7964
removed deprecated peft model by @arendu :: PR: #8183
remove more deprecated files by @arendu :: PR: #8169
Pre-generate cu_seqlens argmin and max_seqlen to remove host-to-device sync by @erhoo82 :: PR: #8108
Add the interface to use SHARP to FSDP strategy by @erhoo82 :: PR: #8202
Multimodal required NLP base model changes by @yaoyu-33 :: PR: #8188
[NLP] Improve and unify loading state_dict for community models by @janekl :: PR: #7977
Rename Finetuning Scripts by @cuichenx :: PR: #8201
Final multimodal PR with our recent developments on MM side by @yaoyu-33 :: PR: #8127
Add include_text parameter to SFT dataloaders by @Kipok :: PR: #8198
Add random_seed argument to generate by @Kipok :: PR: #8162
Added support for neptune logger by @harishankar-gopalan :: PR: #8210
Pre-compute max_seqlen and cu_seqlens_argmin in all model-parallel cases by @erhoo82 :: PR: #8222
Use PackedSeqParams in accordance with changes in Megatron-LM by @cuichenx :: PR: #8205
Fix to peft & virtual pipeline parallel unsupported check by @vysarge :: PR: #8216
Fixed the tp overlap switch by @sanandaraj5597 :: PR: #8195
add knobs for rope/swiglu fusion by @lhb8125 :: PR: #8184
Added sample cpu_offloading switch to YAML by @sanandaraj5597 :: PR: #8148
Syncing random seed between ranks in generate by @Kipok :: PR: #8230
add first_val_step to mcore scheduler by @JimmyZhang12 :: PR: #8150
Correct padding for SFT input data to account for sequence parallel + TE's fp8 op dimension requirements by @vysarge :: PR: #8240
Mistral 7b conversion script by @akoumpa :: PR: #8052
switch to mcore dataset [with FIM support] by @dimapihtar :: PR: #8149
Mixtral to NeMo conversion script. by @akoumpa :: PR: #8155
fixes to accomendate mcore changes by @HuiyingLi :: PR: #8261
Allow MegatronPretrainingRandomSampler to do multi-epoch training by @trias702 :: PR: #8239
Add dist ckpt support for regular optimizers by @mikolajblaz :: PR: #7749
add deallocate pipeline output optimization by @JimmyZhang12 :: PR: #8279
Fix memory leak caused by context parallelism hanging references by omegaconf by @JimmyZhang12 :: PR: #8299
distributed fused adam + rampup bs support by @dimapihtar :: PR: #8302
Update PEFT Doc by @cuichenx :: PR: #8262
Converter script fixes for mixtral/mistral by @akoumpa :: PR: #8272
Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 by @erhoo82 :: PR: #8334
Enable megatron core loggers for GPT pretraining by @ashbhandare :: PR: #8354
mcore ds fix by @dimapihtar :: PR: #8283
release updates by @dimapihtar :: PR: #8378
Mcore customization doc by @HuiyingLi :: PR: #8298
updated link to pubmed by @nithinraok :: PR: #8402
mcore customization doc minor fix by @HuiyingLi :: PR: #8421
Fixing mcore bert for TP, PP and SP by @shanmugamr1992 :: PR: #8336
Add settings to suppress bf16 compile errors in CI on V100 by @athitten :: PR: #8481
MoE parameter passing by @akoumpa :: PR: #8255
Add fp8 support for SD/Update notebook paths by @Victor49152 :: PR: #8489

NeMo Tools

Changelog

SDE bugfix log by @Jorjeous :: PR: #8430

General Improvements

Changelog

Add news section to README by @ericharper :: PR: #7984
Fixing conversion script to work for code llama by @shanmugamr1992 :: PR: #7997
Fix crash when converting to mcore a model using rotary embeddings by @odelalleau :: PR: #7998
Added a procedure for Windows users, README by @Jorjeous :: PR: #7942
Update manifest.py to speedup loading tarred datasets by @stevehuang52 :: PR: #7900
[Fix] Fixed name of a test by @anteju :: PR: #7986
Fix lora merge script by @cuichenx :: PR: #8113
Support transcoding audio formats when saving tarred datasets (FLAC, OPUS) by @pzelasko :: PR: #8102
README edit to change Apple Silicon install instructions (to fix a break introduced by pytorch 2) by @stephenmcconnachie :: PR: #8122
Fixes NVIDIA/apex installation to not erroneously install the pkg by @terrykong :: PR: #8126
Graphviz fix by @GNroy :: PR: #7843
Update README.rst by @fayejf :: PR: #8154
Fix TP>1 issue for conversion script by @cuichenx :: PR: #8144
Support torch jit script by @artbataev :: PR: #8027
NeMo Multimodal Docs and Tests Initial PR by @yaoyu-33 :: PR: #8028
Remove left-over prints in NeMo+Lhotse code by @pzelasko :: PR: #8180
Upgrade to DLFW PyTorch 23.12 by @ericharper :: PR: #8163
Add Lhotse support for key in NeMo manifests by @pzelasko :: PR: #8197
Fix CPU Initialization and TP>1 for LoRA Merge Script by @cuichenx :: PR: #8199
Add support in Neural Typecheck to disable semantic checks by @titu1994 :: PR: #8212
Pin lhotse=1.19.2 in r1.23.0 by @pzelasko :: PR: #8303
Multimodal r1.23.0 bug fix by @yaoyu-33 :: PR: #8315
MCore dataset compatibility for tokenizers by @vysarge :: PR: #8390
Update NFA video download link by @erastorgueva-nv :: PR: #8406
Update MM Dataprep Tutorial by @cuichenx :: PR: #8410
Fix dreambooth data sampler issue by @yaoyu-33 :: PR: #8400
Fix a bug in CTM line processing function for multi-speaker data simulations by @tango4j :: PR: #8416
Akoumparouli/mistral bugfix by @akoumpa :: PR: #8353
pin to 0.5.0 by @ericharper :: PR: #8465
Update NeMo Multimodal Requirements by @yaoyu-33 :: PR: #8515
Fix link in multimodal dataprep tutorial by @cuichenx :: PR: #8517

v1.22.0

3 months ago

Highlights

Models

NeMo ASR

Multi-lookahead cache-aware streaming Conformer #6711
Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) by @burchim #7330
Speech ehancement tutorial #6492
Support punctuation error rate #7538

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.10

Detailed Changelogs

ASR

Changelog

Fix missing pip package 'einops' by @RobinDong :: PR: #7397
Fix failure of installing pyaudio in Online_Offline_Speech_Commands_Demo.ipynb by @RobinDong :: PR: #7396
[ASR] Confidence measure -> method renames by @GNroy :: PR: #7434
RNN-T confidence and alignment bugfix by @GNroy :: PR: #7381
Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) by @burchim :: PR: #7330
[TTS] Read audio as int32 to avoid flac read errors by @rlangman :: PR: #7477
Fix typos in confidence tutorial notebooks by @Kipok :: PR: #7581
Safeguard nemo_text_processing installation on ARM by @blisc :: PR: #7485
add fc large ls models by @nithinraok :: PR: #7641
[ASR] RNN-T greedy decoding max_frames fix for alignment and confidence by @GNroy :: PR: #7635
Create per.py by @ssh-meister :: PR: #7538
Update docs: readme, getting started, ASR intro by @erastorgueva-nv :: PR: #7679
[ASR] Multichannel mask estimator with flex number of channels by @anteju :: PR: #7317
Fix code block typo in docs by @erastorgueva-nv :: PR: #7717
Replace gpus with devices by @athitten :: PR: #7743
docs: fix typos by @shuoer86 :: PR: #7758
Snake act by @nithinraok :: PR: #7736
fix(clustering_diarizer.py): fix typo by @jqueguiner :: PR: #7772
Add some docs and update scripts for ASR by @titu1994 :: PR: #7790
remove TN from ctc_segm tut by @ekmb :: PR: #7807
Add support for finetuning with huggingface datasets by @stevehuang52 :: PR: #7834
Adding long-form audio speaker diarization (clustering) class and functions by @tango4j :: PR: #7737
Fix k2 installation: update for latest PyTorch, move script to dir by @artbataev :: PR: #7887
[ASR] GSS-based mask estimator by @anteju :: PR: #7849
add Dutch P&C FC model info by @zhehuaichen :: PR: #7892
Add checks for unit tests that are looking for data from CI machine by @ericharper :: PR: #7943
update branch name by @nithinraok :: PR: #7990
fix librosa display issue by @nithinraok :: PR: #7991
Fixes Notebooks for ASR by @titu1994 :: PR: #7994
cherry pick bug 4405781 by @karpnv :: PR: #8044
fix noise augmentation by @stevehuang52 :: PR: #8056
Fix various issues with broken links and bugs by @titu1994 :: PR: #8064
run with non-dev option by @nithinraok :: PR: #8077
update broken links by @nithinraok :: PR: #8079
langid bug fix by @karpnv :: PR: #8134

TTS

Changelog

Add steps for document of getting dataset 'SF Bilingual Speech' by @RobinDong :: PR: #7378
Fix checking of cuda/cpu device for inputs of Decoder by @RobinDong :: PR: #7444
Fix failure of ljspeech's get_data.py by @RobinDong :: PR: #7430
[TTS] Fix audio codec type checks by @rlangman :: PR: #7373
[TTS] Add dataset to path of logged artifacts by @rlangman :: PR: #7462
Fix adding positional embeddings in-place in FFTransformerDecoder by @The0nix :: PR: #7440
Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS by @RobinDong :: PR: #7409
[TTS] Fix FastPitch data prep tutorial by @rlangman :: PR: #7524
add italian tokenization by @GiacomoLeoneMaria :: PR: #7486
Remap speakers to continuous range of speaker_id for dataset AISHELL3 by @RobinDong :: PR: #7536
add ItalianPhonemesTokenizer by @GiacomoLeoneMaria :: PR: #7587
[TTS] Add STFT and SI-SDR loss to audio codec recipe by @rlangman :: PR: #7468
Fix typo in audio codec config, encoder target by @anteju :: PR: #7697
Group-residual vector quantizer by @anteju :: PR: #7643
French g2p with pronunciation dictionary by @mgrafu :: PR: #7601
add pleasefixme marker for potential failed nightly tests. by @XuesongYang :: PR: #7678
Add new text segmentation library for better TTS quality by @RobinDong :: PR: #7645
ConditionalInput: cat along the feature dim, not the batch dim by @anferico :: PR: #7785
Add selection criteria for reference audios in the submodule by @anferico :: PR: #7788
[Codec] Update codec checkpoint config by @anteju :: PR: #7835
[Codec] Finite scalar quantizer by @anteju :: PR: #7886
Tar codec by @nithinraok :: PR: #7867

LLM

Changelog

Allow disabling sanity checking when num_sanity_val_steps=0 by @athitten :: PR: #7413
Add comprehensive error messages by @PeganovAnton :: PR: #7261
layer selection for ia3 by @arendu :: PR: #7417
Add rope dynamic linear scaling by @hsiehjackson :: PR: #7437
Fix sft dataset truncation by @hsiehjackson :: PR: #7464
fix bug when loading dist ckpt in peft by @lhb8125 :: PR: #7452
Fix sft chat dataset truncation by @hsiehjackson :: PR: #7478
SFT model parallel fix for dist ckpt by @aklife97 :: PR: #7511
remove auto generated examples by @arendu :: PR: #7510
Add the argument to by @odelalleau :: PR: #7264
PEFT GPT & T5 Refactor by @meatybobby :: PR: #7308
fix a typo by @BestJuly :: PR: #7496
StarCoder SFT test + bump PyT NGC image to 23.09 by @janekl :: PR: #7540
fix llama2 70b lora tuning bug by @cuichenx :: PR: #7622
generalized chat sft prompt by @yidong72 :: PR: #7655
Set base frequency from config by @shan18 :: PR: #7734
Megatron LLM documentation updates by @ssh-meister :: PR: #7400
Remove incorrect extra argument of load_from_checkpoint_dir() by @RobinDong :: PR: #7500
Add nemo to mcore GPT conversion script by @cuichenx :: PR: #7730
set context for text memmap to fork by @arendu :: PR: #7784
Support flash decoding by @hsiehjackson :: PR: #7744
update text server to support compute logprobs by @Zhilin123 :: PR: #7733
Revert PEFT eval fix by @ericharper :: PR: #7693
Fix tn duplex by @ekmb :: PR: #7808
Multimodal merge by @yaoyu-33 :: PR: #7728
Fix flash decoding precision by @hsiehjackson :: PR: #7852
Removing duplicate Megatron-LM installation by @Davood-M :: PR: #7864
adding special_tokens from tokenizer config for transformer-lm model by @clumsy :: PR: #7613
Add Adapter and IA3 support for MCore models by @cuichenx :: PR: #7750
Add back import guard by @cuichenx :: PR: #7882
Change FP8 Defaults by @cuichenx :: PR: #7894
Added knob for ub_tp_comm_overlap for the MCORE pass by @sanandaraj5597 :: PR: #7902
Upgrade NeMo to latest mcore and TE by @dimapihtar :: PR: #7862
Pad sequences to multiples of 16 for GPTSFTDataset by @vysarge :: PR: #7904
upgrade to latest mcore and TE by @dimapihtar :: PR: #7908
added missing torch import by @Davood-M :: PR: #7913
Fix CPU initialization of GPT models by @cuichenx :: PR: #7889
Fix pinned triton version by @hsiehjackson :: PR: #7925
fix tp_overlap config var name by @xrennvidia :: PR: #7928
only enable query key scaling during fp16 by @gshennvm :: PR: #7946
Fix for gpt3 eval hang with PP (a dtype issue) by @yaoyu-33 :: PR: #7927
Pass in rotary_base to mcore and from HF by @Kipok :: PR: #7933
Use NLPDDPStrategyNotebook in Multitask_Prompt_and_PTuning.ipynb by @athitten :: PR: #8061

General Improvements

Changelog

Add fix for max time to quit trainer gracefully, without running validation by @SeanNaren :: PR: #7731
SDE Tutorial minor fix by @Jorjeous :: PR: #7598
Temporary pin Lightning-Utilities version due to broken NamedTuple by @artbataev :: PR: #8022
Karpnv/issue 7320 by @karpnv :: PR: #7418
Speech Simulator, update README.md: output_path --> output_manifest_filepath by @popcornell :: PR: #7442
Fix None dataloader issue in PTL2.0 by @KunalDhawan :: PR: #7455
HF StarCoder to NeMo conversion script by @janekl :: PR: #7421
[doc] fix broken link by @stas00 :: PR: #7481
dllogger - log on rank 0 only by @stas00 :: PR: #7513
Add two youtube introductory videos to README and Docs. by @XuesongYang :: PR: #7570
defaults changed by @arendu :: PR: #7600
Bound transformers version in requirements by @athitten :: PR: #7620
Fix import error no module name model_utils by @menon92 :: PR: #7629
Fix in the confidence ensemble test by @Kipok :: PR: #7682
move core install to /workspace by @aklife97 :: PR: #7706
distributed checkpoint average script by @yidong72 :: PR: #7721
fix hybrid eval by @karpnv :: PR: #7757
fix(diarization-README): typo by @jqueguiner :: PR: #7771
Configure MCore logger by @mikolajblaz :: PR: #7781
Nemo to HF converter for LLaMA model by @uppalutkarsh :: PR: #7770
[Fix] Save best NeMo model only when necessary by @anteju :: PR: #7836
add guard if its a distributed checkpoint by @gshennvm :: PR: #7845
Update transformers cache on Jenkins by @ericharper :: PR: #7854
Update README.rst for container update by @fayejf :: PR: #7844
Fix mcore conversion bug by @cuichenx :: PR: #7846
add comment on script and fix target check by @gshennvm :: PR: #7881
fix issues with convert_nemo_llama_to_hf.py by @Zhilin123 :: PR: #7922
Instructions for running ci on pr template by @ericharper :: PR: #7944
Distributed checkpoint averaging supports bf16 type by @yidong72 :: PR: #7888
Fix tokenizer argparse in scripts by @titu1994 :: PR: #8012
Check dependencies in installation script by @artbataev :: PR: #8019
[SE Tutorial] USe GPU for inference, when available by @anteju :: PR: #8048
update reqs by @ericharper :: PR: #8072
Remove typo by @ericharper :: PR: #8146

v1.21.0

6 months ago

Highlights

Models

NeMo ASR

Multi-lookahead cache-aware streaming
Speech enahncement tutorial #6492
Online code switching dataset #6579

NeMo TTS

AudioCodec: Training recipe for EnCodec #6852

NeMo Framework

GPT from Mcore #7093
GPT distributed checkpointing #7116
Hidden transformations #6332
LLama-2 #7299

NeMo Core

Update to PTL 2.0 #6433

NeMo Tools

Forced aligner tutorial #7210

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.08

ASR

Changelog

Fix require_grad typos by @kit1980 :: PR: #6930
rnnt_greedy_decoding.py: typos? auto-repressively -> auto-regressively by @vadimkantorov :: PR: #6989
Adding tutorial for confidence ensembles by @Kipok :: PR: #6932
Add support for Numba FP16 RNNT Loss by @titu1994 :: PR: #6991
fix install_beamsearch_decoders by @karpnv :: PR: #7011
rnnt and char utils by @karpnv :: PR: #6971
ASR Confidence update and tutorial by @GNroy :: PR: #6810
st standalone model by @AlexGrinch :: PR: #6969
Fix typo in ASR-TTS tutorial by @artbataev :: PR: #7049
Update Frame-VAD doc and fix onnx export by @stevehuang52 :: PR: #7076
Fast Conformer global token fix by @sam1373 :: PR: #7085
Added script to extract ASR CTC and RNNT models from ASR hybrid models by @trias702 :: PR: #7092
Fix absolute path in path join call by @kingjan1999 :: PR: #7099
NeMo ASR Demo by @lleaver :: PR: #7110
Fix plot function in vad_utils.py by @stevehuang52 :: PR: #7113
Fixed small bug with NoisePerturbationWithNormalization by @trias702 :: PR: #7118
Merge release r1.20.0 to main by @ericharper :: PR: #7167
minor fix for conformer subsampling docstring. by @XuesongYang :: PR: #7195
[ASR] Fix GPU memory leak in transcribe_speech.py by @rlangman :: PR: #7249
Adding Multilingual, Code-Switched, and Hybrid ASR models by @KunalDhawan :: PR: #7250
fix partial transcribe by @stevehuang52 :: PR: #7284
Conv1d subsampling by @burchim :: PR: #7294
add bf16 inference support and fix seq_len stft issue by @nithinraok :: PR: #7338
Add finetuning scripts by @nithinraok :: PR: #7263
Move parameter: trainer -> exp_manager (for PTL 2.0) by @artbataev :: PR: #7339
Fix typos by @omahs :: PR: #7361
Fix wrong calling of librosa.get_duration() in notebook by @RobinDong :: PR: #7376
RNN-T confidence and alignment bugfix (#7381) by @GNroy :: PR: #7459
update branch by @nithinraok :: PR: #7488
Replace strategy = None with strategy = auto for notebooks by @athitten :: PR: #7521
Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue by @KunalDhawan :: PR: #7531
gpus -> devices by @nithinraok :: PR: #7542
[BugFix] Add missing quotes for auto strategy in tutorial notebooks by @athitten :: PR: #7541
Append output of val_step to self.validation_step_outputs in EncMaskDecAudioToAudioModel by @athitten :: PR: #7543
fix validation_step_outputs initialization for multi-dataloader by @KunalDhawan :: PR: #7546
Append val/test output to instance variable in EncDecSpeakerLabelModel by @athitten :: PR: #7562
update strategy by @nithinraok :: PR: #7577
Typo fixes by @Kipok :: PR: #7591
Fix metrics for SE tutorial by @anteju :: PR: #7604
fix ssl models ptl monitor val through logging by @nithinraok :: PR: #7608
Fix py3.11 dataclasses issue by @titu1994 :: PR: #7582
bugfix: trainer.gpus, trainer.strategy, trainer.accelerator by @XuesongYang :: PR: #7621
Safeguard nemo_text_processing installation on ARM (#7485) by @blisc :: PR: #7619
[ASR] Fix type error in jasper by @rlangman :: PR: #7636
Fix vad & speech command tutorial - onnx by @fayejf :: PR: #7671
Replace strategy='dp'/None with 'auto' by @athitten :: PR: #7681
Fix multi rank finetune for ASR by @titu1994 :: PR: #7684
fix ptl_bugs in slu_models.py by @jzi040941 :: PR: #7689
Add NLPDDPStrategyNotebook and change trainer gpus to devices by @athitten :: PR: #7741
Updated installation of ctc-decoders by @vsl9 :: PR: #7746
Fix bug wrt change decoding strategy for bpe models by @titu1994 :: PR: #7762

TTS

Changelog

[TTS] Add cosine distance option to TTS aligner by @rlangman :: PR: #6806
[TTS] Add tutorial for TTS data prep scripts by @rlangman :: PR: #6922
update TTS readme by @XuesongYang :: PR: #7088
[TTS] Create EnCodec training recipe by @rlangman :: PR: #6852
[TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. by @XuesongYang :: PR: #6893
[TTS] Add output audio format to preprocessing by @rlangman :: PR: #6889
[TTS] Remove nested TTS configs by @rlangman :: PR: #7154
[TTS] Fix TTS recipes with PTL 2.0 by @rlangman :: PR: #7188
[TTS] Add license to ported EnCodec code by @rlangman :: PR: #7197
[Fix] Discriminator update in AudioCodecModel by @anteju :: PR: #7209
Adapter ipa Tutorial and config update by @styagi130 :: PR: #7260
[TTS] Audio codec fixes by @rlangman :: PR: #7266
[TTS] minor fix typos and input_types by @XuesongYang :: PR: #7272
specify explicitly to set pretrained model paths by @styagi130 :: PR: #7305
[TTS] Update AudioCodec API by @anteju :: PR: #7310
[TTS] Add additional config to preprocess_text and compute_feature_stats by @rlangman :: PR: #7321
[TTS] Change audio codec token type to TokenIndex by @rlangman :: PR: #7356
fixed trainer.strategy=auto from None. by @XuesongYang :: PR: #7369
[TTS] Added a callback for logging initial data by @anteju :: PR: #7384
[TTS] bugfix: trainer.accelerator=auto from None. by @XuesongYang :: PR: #7492
bugfix: specify trainer.strategy=auto when devices=1 by @XuesongYang :: PR: #7509
Fix dimensionality in get_dist function by @redoctopus :: PR: #7506
Fix TTS FastPitch tutorial by @hsiehjackson :: PR: #7494
[TTS] remove curly braces from in jupyer notebook cell. by @XuesongYang :: PR: #7554
[TTS] fixed trainer's accelerator and strategy. by @XuesongYang :: PR: #7569
Change hifigan finetune strategy to ddp_find_unused_parameters_true by @hsiehjackson :: PR: #7579
Fix validation in G2PModel and ThutmoseTaggerModel by @athitten :: PR: #7597
[TTS] Fix FastPitch data prep tutorial by @rlangman :: PR: #7602
[TTS] Add dataset to path of logged artifacts by @rlangman :: PR: #7651

NLP / NMT

Changelog

Minor MPT-7B fixes and creation script update by @trias702 :: PR: #6982
remove hard coded input and output fields by @arendu :: PR: #7008
RoPE length extrapolation with interpolation by @MaximumEntropy :: PR: #7005
add async + distopt to sft by @MaximumEntropy :: PR: #7018
ptuning inference table bug fix by @arendu :: PR: #7015
Fix missing import for GPT SFT by @MaximumEntropy :: PR: #7026
Add end_strings to SamplingParams by @markelsanz14 :: PR: #6986
Fix race condition for downloading cache when executing with multi-node by @findkim :: PR: #7016
added back the retro documents. by @yidong72 :: PR: #7033
remove pos emb from state dict for old models by @ekmb :: PR: #7068
memmap worker arg by @arendu :: PR: #7062
Disable distopt contiguous param buffer by default by @timmoon10 :: PR: #7095
[Fix] load_state_dict in nlp_model.py by @stevehuang52 :: PR: #7086
Fix tokenizer file caching where torch.distributed may not be initialized yet by @findkim :: PR: #7061
freeze base mode on init during peft by @arendu :: PR: #7152
Include the scripts for preprocessing OAST and unit tests for chat sft datasets by @yidong72 :: PR: #7112
T5 metrics fix by @jubick1337 :: PR: #7037
megatron gpt training fix by @anmolgupt :: PR: #7199
Fix T5 using FA by @hsiehjackson :: PR: #7196
fix-causal-fa-infer by @hsiehjackson :: PR: #7200
Fix gpt trainer test by @hsiehjackson :: PR: #6915
Load ub_cfg from hydra config by @jbaczek :: PR: #7003
Fixes for lightning 2.0 upgrade by @athitten :: PR: #7176
Fix which was off by one batch by @odelalleau :: PR: #7212
Start using ModelParallelConfig from Megatron Core by @ericharper :: PR: #6885
deprecation warning by @arendu :: PR: #7193
Fix attention mask inference by @hsiehjackson :: PR: #7213
Use GPTModel from mcore by @ericharper :: PR: #7093
Add bf16-mixed and 16-mixed in module.py by @athitten :: PR: #7227
Refactor LLM pretraining examples by @maanug-nv :: PR: #7159
Add only trainable parameters to optimizer group in PEFT by @guyueh1 :: PR: #7230
Dummy class for ModelParallelConfig by @ericharper :: PR: #7254
[TN][Docs] update language coverage matrix and refs by @mgrafu :: PR: #7247
tied weights for adapters by @arendu :: PR: #6928
Fix skip generation by @hsiehjackson :: PR: #7270
Hidden transforms model parallel config + CI with Perceiver by @michalivne :: PR: #7241
Fix restore sequence parallel by @hsiehjackson :: PR: #7273
fix ptuning and lora model_parallel_config by @blahBlahhhJ :: PR: #7287
Fix adapters and ptuning for amp O2 by @guyueh1 :: PR: #7285
remove additional line in peft state dict by @blahBlahhhJ :: PR: #7293
loss mask aware final layer applicaiton by @arendu :: PR: #7275
Adding server option to peft eval by @Davood-M :: PR: #7292
migrated class CSVFieldsMemmapDataset from BioNeMo by @dorotat-nv :: PR: #7314
remove old prompt table for storing cached ptunig representations by @arendu :: PR: #7295
Bugfix and optimization in by @odelalleau :: PR: #7267
Set a default value when getting by @yaox12 :: PR: #7115
Distributed checkpointing with mcore GPT by @ericharper :: PR: #7116
Fix activation checkpoint by @hsiehjackson :: PR: #7334
Replace prefetch with val iterator check in megatron models by @athitten :: PR: #7318
Fixing indentation bug in indexed_dataset memory deallocation by @michalivne :: PR: #7352
NeMo MCore llama2 support + MCore PEFT adapters by @blahBlahhhJ :: PR: #7299
Hiddens modules documentation by @michalivne :: PR: #7303
Support for flash attention 2.0 by @MaximumEntropy :: PR: #7063
multiple fields can form a context by @arendu :: PR: #7147
adding bias_dropout_add_fusion option for BERT by @clumsy :: PR: #7332
enable selective unfreeze by @arendu :: PR: #7326
Upgrade pytorch container to 23.08 by @ericharper :: PR: #7353
enable fp32 optimizer for output_layer in mcore by @lhb8125 :: PR: #7355
Revert comment by @ericharper :: PR: #7368
fix pipeline parallel inference by @blahBlahhhJ :: PR: #7367
fix for peft tied weights by @arendu :: PR: #7372
add O2 option in gpt eval by @blahBlahhhJ :: PR: #7358
Move model precision copy by @maanug-nv :: PR: #7336
Fix PEFT checkpoint loading by @blahBlahhhJ :: PR: #7388
Use distributed optimizer support for multiple dtypes by @timmoon10 :: PR: #7359
[PATCH] PEFT import mcore by @blahBlahhhJ :: PR: #7393
Use cfg attribute in bert by @maanug-nv :: PR: #7394
Add support for bias conversion in Swiglu models by @titu1994 :: PR: #7386
Update save_to and restore_from for dist checkpointing by @ericharper :: PR: #7343
fix forward for with mcore=false by @JimmyZhang12 :: PR: #7403
Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing by @athitten :: PR: #7374
Set Activation Checkpointing Defaults by @aklife97 :: PR: #7404
Make loss mask default to false by @ericharper :: PR: #7407
Add dummy userbuffer config files by @erhoo82 :: PR: #7408
Add missing ubconf files by @aklife97 :: PR: #7412
Update ptl training ckpt conversion script to work with dist ckpt by @ericharper :: PR: #7416
Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py by @athitten :: PR: #7454
fix bug when loading dist ckpt in peft by @lhb8125 :: PR: #7479
Fix CustomProgressBar for resume by @athitten :: PR: #7427
Append val output to self.validation_step_outputs in GLUEModel by @athitten :: PR: #7530
Cherry pick Fix sft dataset truncation (#7464) to r1.21.0 by @ericharper :: PR: #7550
Avoid duplicated dist checkpoint save by @mikolajblaz :: PR: #7555
layernorm1p fix by @dimapihtar :: PR: #7523
r1.21: SFT model parallel fix for dist ckpt by @aklife97 :: PR: #7520
PEFT needs mp config propagated for dist ckpt by @ericharper :: PR: #7589
Fix ptuning crash for llama 2 ckpt by @yuanzhedong :: PR: #7594
PEFT eval fix by @cuichenx :: PR: #7626
Propagate mp config for continue training by @ericharper :: PR: #7637
Add ddp_find_unused_parameters=True and change accelerator to auto by @athitten :: PR: #7623
Add find_unused_parameters_true for text_classiftn and punctuation_capitalization by @athitten :: PR: #7649
conversion issue fix by @dimapihtar :: PR: #7648
Fix a nlp nb onnx by @fayejf :: PR: #7703
Add activations_checkpoint related args for model cfg in lora.ipynb by @athitten :: PR: #7752
Change accelerator to 'auto' in nlp_checkpoint_port.py by @athitten :: PR: #7747
Add reconfigure microbatch calculator before inference and update GBS, MBS for inference by @athitten :: PR: #7763
Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer by @athitten :: PR: #7767

NeMo Tools

Changelog

Update doc, new tutorial on SDE by @Jorjeous :: PR: #7405
Fix branch version for SDE by @titu1994 :: PR: #7528

Export

Changelog

Added bool types to neural_types export by @tbartley94 :: PR: #7032

General Improvements

Changelog

Add migration guide for lightning 2.0 upgrade by @athitten :: PR: #7360
add support for max_total_length=4096 for 43b by @Zhilin123 :: PR: #6763
Change Jenkins timeout by @ericharper :: PR: #6997
Update SDP docs page with a new documentation link by @Kipok :: PR: #7029
Fixed tutorial's name by @vsl9 :: PR: #7047
Revert Fix import guard checks by @titu1994 :: PR: #7125
Fix import guard checks by @titu1994 :: PR: #7126
fix evaluator.py for various exceptions by ast by @stevehuang52 :: PR: #7150
NFA bugfix: remove any empty segments by @erastorgueva-nv :: PR: #7155
NFA subtitle file config - specify colors and vertical alignment by @erastorgueva-nv :: PR: #7160
add paths to labeler. by @XuesongYang :: PR: #7087
[Bugfix] Fix a bug in filtering checkpoints by @yaox12 :: PR: #6851
Update README.rst by @fayejf :: PR: #7175
Make NFA subtitles stay until end of video by @erastorgueva-nv :: PR: #7189
Uncomment removal of exp_dir in JenkinsFile by @athitten :: PR: #7198
NFA: replace ellipses in text with 3 periods by @erastorgueva-nv :: PR: #7208
NFA tutorial notebook by @erastorgueva-nv :: PR: #7210
NFA docs: update READMEs and links, add docs page by @erastorgueva-nv :: PR: #7219
Make image centering in NFA README actually work by @erastorgueva-nv :: PR: #7220
Add mcore installation to Dockerfile by @ericharper :: PR: #7237
Checkpoint averaging for model parallel by @Kipok :: PR: #7252
Upgrade hydra and omegaconf by @athitten :: PR: #7243
Update numba support in docker by @titu1994 :: PR: #7271
remove deprecated scripts from ci by @arendu :: PR: #7239
Logging model checkpoints as artifacts in MlFlow by @AlirezaMorsali :: PR: #7258
Adithyare/peft metric calculation by @arendu :: PR: #7304
Resume checkpoint priority by @maanug-nv :: PR: #7335
lora merge fix for O2 names by @arendu :: PR: #7325
Llama load buffers in checkpoint by @blahBlahhhJ :: PR: #7357
pin numba=0.57.1 to fix reinstall.sh error by @XuesongYang :: PR: #7366
Update to core 23.08 branch ToT by @aklife97 :: PR: #7371
Upper bounding ptl by @ericharper :: PR: #7370
minor fix for llama ckpt conversion script by @blahBlahhhJ :: PR: #7387
Update Core Commit by @aklife97 :: PR: #7402
Fix resume from checkpoint in exp_manager by @athitten :: PR: #7424
add sleep by @gshennvm :: PR: #7498
Fix exp manager check for sleep by @titu1994 :: PR: #7503
unpin setuptools by @fayejf :: PR: #7534
Update FFMPEG version to fix issue with torchaudio by @titu1994 :: PR: #7551
fix typos in nfa and speech enhancement tutorials by @erastorgueva-nv :: PR: #7580
best ckpt fix by @dimapihtar :: PR: #7564
add build os key by @nithinraok :: PR: #7596
Fix issues with Dockerfile by @titu1994 :: PR: #7650
Change confidence parameters in the test by @Kipok :: PR: #7680
bugfix: pin nemo-text-process to fix Chinese normalizer error. by @XuesongYang :: PR: #7627
Remove PUBLICATIONS.md, point to github.io NeMo page instead by @erastorgueva-nv :: PR: #7694
Pin mcore to 0.3 by @ericharper :: PR: #7751
fix hybrid eval by @karpnv :: PR: #7759
Update Apex install command in Dockerfile by @ericharper :: PR: #7794

v1.20.0

9 months ago

Highlights

Models

STT En Fast Conformer CTC XXLarge - 1.2 B param Fast Conformer CTC
STT En Fast Conformer Transducer XXLarge - 1.2 B param Fast Conformer Transducer
STT En Fast Conformer Transducer XLarge - XLarge Fast Conformer English
STT En Fast Conformer CTC XLarge - XLarge Fast Conformer CTC
STT En Fast Conformer Transducer XLarge - XLarge Fast Conformer Transducer
STT En Fast Conformer CTC Large - Large Fast Conformer CTC
STT En Fast Conformer Transducer Large - Large Fast Conformer Transducer
STT It Fast Conformer Hybrid Large P&C - Large P&C Italian Fast Conformer
STT Ua Fast Conformer Hybrid Large P&C - Large Ukranian Fast Conformer

NeMo ASR

Graph-RNN-T #6168
WildCard-RNN-T #6168
Confidence Ensembles for ASR
Token-and-Duration Transducer (TDT) #6536
Spellchecking ASR #6179
Numba FP16 RNNT Loss #6991

NeMo TTS

TTS Adapter Customization
TTS Dataloader Framework

NeMo Framework

LoRA for T5 and mT5 #6612
Flash Attention integration #6666
Mosaic 7B compatibility
Models with LongContext (32K) #6666, #6687, #6773

NeMo Tools

Speech Data Explorer: Utterance level ASR model comparsion #6669
Speech Data Processor: Spanish P&C
NeMo Forced Aligner: Large sequence alignment + memory reduction #6695

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.06

Detailed Changelogs

ASR

Changelog

[ASR] Adding ssl config for fast-conformer by @krishnacpuvvada :: PR: #6672
Fix for interctc test random failure by @Kipok :: PR: #6644
sharded manifests docs by @bmwshop :: PR: #6751
[TTS] Implement new vocoder dataset by @rlangman :: PR: #6670
TDT model pull request by @hainan-xv :: PR: #6536
Spec aug fix by @tbartley94 :: PR: #6775
Support large inputs to Conformer and Fast Conformer by @bmwshop :: PR: #6556
sharded manifests updated docs by @bmwshop :: PR: #6833
added fc-xl, xxl and titanet-s models by @nithinraok :: PR: #6832
Multi-lookahead cache-aware streaming models by @VahidooX :: PR: #6711
Update transcribe_utils.py by @stevehuang52 :: PR: #6865
Fix k2 build topo helper by @artbataev :: PR: #6887
Fix transcribe_utils.py for hybrid models in partial transcribe mode by @stevehuang52 :: PR: #6899
Add hybrid model support to transcribe_speech_parallel.py by @stevehuang52 :: PR: #6906
Update Frame-VAD doc by @stevehuang52 :: PR: #6902
Make sure asr_model.change_attention_model is run if either cfg.model_path or cfg.pretrained_name is specified by @erastorgueva-nv :: PR: #6908
Update fvad doc by @stevehuang52 :: PR: #6920
Online Code Switching Dataset for ASR by @trias702 :: PR: #6579
Fix AN4 dataset links by @artbataev :: PR: #6926
Fix confidence ensembles RNNT logprobs selection logic for exclude_blank scenario by @KunalDhawan :: PR: #6937
Adding cache-aware streaming ASR checkpoints. by @VahidooX :: PR: #6940
Remove from metrics by @titu1994 :: PR: #6979
Hybrid conformer export by @borisfom :: PR: #6983
Cache handling without input tensors mutation by @borisfom :: PR: #6980
Fixing an issue with confidence ensembles by @Kipok :: PR: #6987
Add ASR with TTS Tutorial. Fix enhancer usage. by @artbataev :: PR: #6955
fix install_beamsearch_decoders.sh by @karpnv :: PR: #7019
Add support for Numba FP16 RNNT Loss (#6991) by @titu1994 :: PR: #7038
Fix typo and branch in tutorial by @artbataev :: PR: #7048
Refined export_config by @borisfom :: PR: #7053
Fix documentation for Numba by @titu1994 :: PR: #7065
Adding docs and models for multiple lookahead cache-aware ASR by @VahidooX :: PR: #7067
Add updated fc ctc and rnnt xxl models by @nithinraok :: PR: #7128
Update notebook branch by @ericharper :: PR: #7135
Fixed main and merging this to r1.20 by @tango4j :: PR: #7127
Fix default context size by @nithinraok :: PR: #7141
Fix incorrect embedding grads with distopt BF16 grad reductions by @timmoon10 :: PR: #6958

TTS

Changelog

[TTS] Add callback for saving audio during FastPitch training by @rlangman :: PR: #6665
[TTS] Add script for text preprocessing by @rlangman :: PR: #6541
[TTS] Fix adapter duration issue by @hsiehjackson :: PR: #6697
[TTS] Filter out silent audio files during preprocessing by @rlangman :: PR: #6716
[TTS] fix inconsistent type hints for IpaG2p by @XuesongYang :: PR: #6733
[TTS] relax hardcoded prefix for phonemes and tones and infer phoneme set through dict by @XuesongYang :: PR: #6735
[TTS] corrected misleading deprecation warnings. by @XuesongYang :: PR: #6702
Fix TTS adapter tutorial by @hsiehjackson :: PR: #6741
[TTS][zh] refine hardcoded lowercase for ASCII letters. by @XuesongYang :: PR: #6781
[TTS] Append pretrained FastPitch & SpectrogamEnhancer pair to available models by @racoiaws :: PR: #7012

NLP / NMT

Changelog

minor fix for missing chat attr by @arendu :: PR: #6671
eval fix by @arendu :: PR: #6685
VP Fixes for converter + Config management by @titu1994 :: PR: #6698
lora notebook by @arendu :: PR: #6765
peft eval directly from ckpt by @arendu :: PR: #6785
GPT inference long context by @ekmb :: PR: #6687
Fix validation with drop_last=False by @mikolajblaz :: PR: #6704
fix spellmapper tutorial, change branch to main by @bene-ges :: PR: #6803
text_generation_utils memory reduction if no logprob needed by @yzhang123 :: PR: #6773
Add optional index mapping dir in mmap text datasets by @gheinrich :: PR: #6683
Add inference kv cache support for transformer TE path by @yen-shi :: PR: #6627
add reference to our paper by @bene-ges :: PR: #6821
added changes to ramp up bs by @dimapihtar :: PR: #6799
t5 lora tuning by @arendu :: PR: #6612
Added rouge monitoring support for T5 by @jubick1337 :: PR: #6737
GPT extrapolatable position embedding (xpos/sandwich/alibi/kerple) and Flash Attention by @hsiehjackson :: PR: #6666
Import Enum for chatbot component by @ericharper :: PR: #6877
typo fix from #6666 by @arendu :: PR: #6882
removed unnecessary print by @dimapihtar :: PR: #6884
Fix destructor for delayed mmap dataset case by @mikolajblaz :: PR: #6703
Make Gradio library optional by @yidong72 :: PR: #6904
Fix fast-glu activation in change partitions by @hsiehjackson :: PR: #6909
Documentation for ONNX export of Megatron Models by @asfiyab-nvidia :: PR: #6914
FixTextMemMapDataset index file creation in multi-node setup by @gheinrich :: PR: #6768
Fix flash-attention by @hsiehjackson :: PR: #6901
ptuning oom fix by @arendu :: PR: #6916
add rampup bs assertion by @dimapihtar :: PR: #6927
Enable methods in bert-like models by @sararb :: PR: #6898
support value attribution condition by @yidong72 :: PR: #6934
Add missing save restore connector to eval scripts by @titu1994 :: PR: #6935
Merge release r1.19.0 into main by @ericharper :: PR: #6948
Stop at the stop token by @yidong72 :: PR: #6957
fixes for spellmapper by @bene-ges :: PR: #6994
Fix tabular data text generation by @yidong72 :: PR: #7022
fix pos id - hf update by @ekmb :: PR: #7075
fix syntax error introduced in PR-7079 by @bene-ges :: PR: #7102

NeMo Tools

Changelog

SDE unt lvl comparison by @Jorjeous :: PR: #6669
hot fix SDE by @Jorjeous :: PR: #6897

Bugfixes

Changelog

small Bugfix by @fayejf :: PR: #7079
Fix caching bug in causal convolutions for cache-aware ASR models by @VahidooX :: PR: #7034
Fix masking bug for TTS Aligner by @redoctopus :: PR: #6677
[bugfix] avoid the random shuffle of phoneme and tone tokens. by @XuesongYang :: PR: #6855
fix ptuning residuals bug by @arendu :: PR: #6866
TE bug fix by @dimapihtar :: PR: #7027
Update distopt API for coalesced NCCL calls by @timmoon10 :: PR: #6886

General Improvements

Changelog

update batch size recommendation to min 32 for 43b by @Zhilin123 :: PR: #6675
Make Note usage consistent in adapter_mixins.py by @BrianMcBrayer :: PR: #6678
Update all invalid tree references to blobs for NeMo samples by @BrianMcBrayer :: PR: #6679
Update README.rst about container by @fayejf :: PR: #6686
karpnv/issues6690 by @karpnv :: PR: #6705
Limit codeql scope by @titu1994 :: PR: #6710
Not pinning Gradio version by @yidong72 :: PR: #6680
preprocess squad in sft format by @arendu :: PR: #6727
Fix Codeql config by @titu1994 :: PR: #6731
Fix fastpitch test nightly by @hsiehjackson :: PR: #6730
Lora/PEFT training script CI test by @arendu :: PR: #6664
fixed decor to show messages only when the wrapped object is called. by @XuesongYang :: PR: #6793
lora pp2 by @arendu :: PR: #6818
Upperbound Numpy to < 1.24 by @titu1994 :: PR: #6829
Fix typo in documentation by @Dounx :: PR: #6838
NFA updates by @erastorgueva-nv :: PR: #6695
Update container for import action by @ericharper :: PR: #6883
removed some tests by @arendu :: PR: #6900
Update container info in README.rst by @fayejf :: PR: #6913
Removed optional optimize_for_inference by @borisfom :: PR: #6933
Update core commit for CI by @aklife97 :: PR: #6939
lora inference ci by @arendu :: PR: #6931
Upgrade base pytorch container to 23.06 by @ericharper :: PR: #6938
Fix requirements for pydantic + inflect by @titu1994 :: PR: #6956
Remove pyyaml by @titu1994 :: PR: #7052
Fix links in Segmentation tutorial by @ekmb :: PR: #7117
Update evaluator.py by @stevehuang52 :: PR: #7151

v1.19.1

9 months ago

This release is a small patch to fix torchmetrics.

Remove deprecated arg compute_on_step. See #6979.

v1.19.0

10 months ago

Highlights

NeMo ASR

Sharded Manifests for Tarred Datasets #6395
Frame-VAD model + datasets support #6441
Noise Norm Perturbation #6445
Code Switched Dataset with IID Sampling #6448

NeMo TTS

Speaker adaptation for FastPitch #6416, #6417

NeMo Megatron

Batch size rampup #6424
Unify dataset and model classes for all PEFT #6391
LoRA for GPT #6391
Convert interleaved pipeline model to non-interleaved #6498
Dialog Dataset for SFT #6654
Dynamic length batches for GPT SFT #6510
Merge LoRA weights into base model #6597

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.04

Detailed Changelogs

ASR

Changelog

Sharded manifests for tarred datasets by @bmwshop :: PR: #6395
Update script for ngram rnnt and hat beam search decoding by @andrusenkoau :: PR: #6370
Add disclaimer about dataset for ASR by @titu1994 :: PR: #6496
New noise_norm perturbation based on Riva work by @trias702 :: PR: #6445
Add Frame-VAD model and datasets by @stevehuang52 :: PR: #6441
removing unnecessary avoid_bfloat16_autocast_context by @bmwshop :: PR: #6481
FC models in menu by @bmwshop :: PR: #6473
Separate punctuation by whitespace by @karpnv :: PR: #6574
Cherry pick commits in #6601 to main by @fayejf :: PR: #6611
Offline and streaming inference support for hybrid model by @fayejf :: PR: #6570
Disable interctc tests by @Kipok :: PR: #6638
ASR-TTS Models: Support hybrid RNNT-CTC, improve docs. by @artbataev :: PR: #6620
Confidence ensembles implementation by @Kipok :: PR: #6614
Confidence ensembles: fix issues and add tuning functionality by @Kipok :: PR: #6657
Add support for RNNT/hybrid models to partial transcribe by @stevehuang52 :: PR: #6609
eval_beamsearch_ngram.py with hybrid ctc by @karpnv :: PR: #6656

TTS

Changelog

[TTS] FastPitch adapter fine-tune and conditional layer normalization by @hsiehjackson :: PR: #6416
[TTS] whitelist broken path fix. by @XuesongYang :: PR: #6412
[TTS] FastPitch speaker encoder by @hsiehjackson :: PR: #6417
Update NeMo_TTS_Primer.ipynb by @pythinker :: PR: #6436
[TTS] Create functions for TTS preprocessing without dataloader by @rlangman :: PR: #6317
[TTS] Fix FastPitch energy code by @rlangman :: PR: #6511
[TTS] Add script for computing feature stats by @rlangman :: PR: #6508
[TTS] Add tutorials for FastPitch TTS speaker adaptation with adapters by @hsiehjackson :: PR: #6431
[TTS] Create initial TTS dataset feature processors by @rlangman :: PR: #6507
[TTS] Add script for mapping speaker names to indices by @rlangman :: PR: #6509
[TTS] Implement new TextToSpeech dataset by @rlangman :: PR: #6575

NLP / NMT

Changelog

Add patches for Virtual Parallel conversion by @titu1994 :: PR: #6589
Update wfst_text_normalization.rst by @jimregan :: PR: #6374
add rampup batch size support for Megatron GPT by @dimapihtar :: PR: #6424
Add interleaved pp support by @titu1994 :: PR: #6498
Support dynamic length batches with GPT SFT by @aklife97 :: PR: #6510
Framework for PEFT via mixins by @arendu :: PR: #6391
Add GPT eval mode fix for interleaved to main (#6449) by @aklife97 :: PR: #6610
sft model can use this script for eval by @arendu :: PR: #6637
Patch memory used for NeMo Megatron models by @titu1994 :: PR: #6615
merge lora weights into base model by @arendu :: PR: #6597
Dialogue dataset by @yidong72 :: PR: #6654
check for first or last stage by @ericharper :: PR: #6708
A few small typo fixes by @Kipok :: PR: #6599
Lddl bert by @wdykas :: PR: #6761
Debug Transformer Engine FP8 support with Megatron-core infrastructure by @timmoon10 :: PR: #6740
Tensor-parallel communication overlap with userbuffer backend by @erhoo82 :: PR: #6780
Add ub communicator initialization to validation step by @erhoo82 :: PR: #6807
Add trainer.validate example for GPT by @ericharper :: PR: #6794
Add API docs for NeMo Megatron by @ericharper :: PR: #6850
Apply garbage collection interval to validation steps by @erhoo82 :: PR: #6870

Bugfixes

Changelog

[BugFix] Force _get_batch_preds() to keep logits in decoder timestamps generator by @tango4j :: PR: #6499
small bugfix for asr_evaluator by @fayejf :: PR: #6636
fix bucketing bug issue for picking new bucket by @nithinraok :: PR: #6663
[TTS] Fix TTS audio preprocessing bugs by @rlangman :: PR: #6628
Fix a bug, use _ceil_to_nearest instead as _round_to_nearest is not d… by @BestJuly :: PR: #6681
Bug fix to restore act ckpt by @markelsanz14 :: PR: #6753
Bug fix to reset sequence parallelism by @markelsanz14 :: PR: #6756
Bug fix for reset_sequence_parallel_args by @markelsanz14 :: PR: #6802
Fix adapter tutorial r1.19.0 by @hsiehjackson :: PR: #6776
Fix error appearing when using tar datasets by @Jorjeous :: PR: #6502
Fix normalization of impulse response in ImpulsePerturbation by @anteju :: PR: #6505
Fix typos by @titu1994 :: PR: #6523
Fix notebook bad json by @titu1994 :: PR: #6561
[ASR] Fix for old models in change_attention_model by @sam1373 :: PR: #6608
Fix k2 installation in Docker with CUDA 12 by @artbataev :: PR: #6707
Tutorial fixes by @titu1994 :: PR: #6717
Vp fixes by @titu1994 :: PR: #6738
[TTS] Fix aligner nan loss in fp32 by @hsiehjackson :: PR: #6435
fix conversion and eval by @arendu :: PR: #6648
Fix checkpointed forward and add test for full activation checkpointing by @aklife97 :: PR: #6744
add call to p2p overlap by @aklife97 :: PR: #6779
Fix get_parameters when using main params optimizer by @ericharper :: PR: #6764
Fix GPTDataset Assert by @MaximumEntropy :: PR: #6798
fix notebook error by @yidong72 :: PR: #6840
final fix of notebook by @yidong72 :: PR: #6842

General Improvements

Changelog

Code-Switching dataset creation - upgrading to aggregate tokenizer manifest format by @KunalDhawan :: PR: #6448
Fix an invalid link in get_data.py of ljspeech by @pythinker :: PR: #6456
Update manifest.py to use os.path for get_full_path by @stevehuang52 :: PR: #6598
Cherry pick commits in #6528 to main by @timmoon10 :: PR: #6613
Move black parameters to pyproject.toml by @artbataev :: PR: #6647
handle artifacts when path is an extracted dir by @arendu :: PR: #6658
remove upgrading setuptools in reinstall.sh by @XuesongYang :: PR: #6659
Upgrade to PyTorch 23.04 Container by @ericharper :: PR: #6660
Fix fastpitch test nightly by @hsiehjackson :: PR: #6742
Fix Links for tutorials by @titu1994 :: PR: #6777
Update core version in Jenkinsfile by @aklife97 :: PR: #6817
Update mcore requirement to 0.2.0 by @ericharper :: PR: #6875

v1.18.1

11 months ago

Highlights

For the complete release note, please see NeMo 1.18.0 Release Notes

Bugfix

This patch release fixes a major bug in ASR Bucketing datasets that was introduced in r1.17.0 in PR https://github.com/NVIDIA/NeMo/pull/6191. Due to this bug, while each bucket is randomly shuffled before selection on each rank, only a single bucket would loop infinitely - without continuing onto subsequent buckets.

Effect: Significantly worse WER would be obtained since not all buckets would be used.

This has been patched and should work correctly in 1.18.1 onwards.

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.03

v1.18.0

11 months ago

Highlights

Models

NeMo ASR

Hybrid Autoregressive Transducer (HAT) #6260
Apple MPS Support for ASR Inference #6289
InterCTC Support for Hybrid ASR Models #6215
RNNT N-Gram Fusion with mAES algo #6118
ASR + Apple M2 CPU/GPU MPS #6289

NeMo TTS

TTS directory structure refactor
User-set symbol vocabulary #6172

NeMo Megatron

Model parallelism from Megatron Core #6393
Continued training for P-tuning #6273
SFT for GPT-3 #6210
Tensor and pipeline model parallel conversion #6218
Megatron NMT Export to Riva

NeMo Core

Detailed Changelogs

ASR

Changelog

minor cleanup by @messiaen :: PR: #6311
docs on the use of heterogeneous test / val manifests by @bmwshop :: PR: #6352
[WIP] add buffered chunked streaming for nemo force aligner by @Slyne :: PR: #6185
Word boosting for Flashlight decoder by @trias702 :: PR: #6367
Add installation and ASR inference instructions for Mac by @artbataev :: PR: #6377
specaug speedup by @1-800-BAD-CODE :: PR: #6347
updated lr for FC configs by @bmwshop :: PR: #6379
Make possible to control tqdm progress bar in ASR models by @SN4KEBYTE :: PR: #6375
[ASR] Conformer global tokens in local attention by @sam1373 :: PR: #6253
fixed torch warning on using a list of numpy arrays by @MKNachesa :: PR: #6382
Fix FastConformer config: correct bucketing strategy by @artbataev :: PR: #6413
fixing the ability to use temp sampling with concat datasets by @bmwshop :: PR: #6423
add conformer configs for hat model by @andrusenkoau :: PR: #6372
[ASR] Add optimization util for linear sum assignment algorithm by @tango4j :: PR: #6349
Added/updated new Conformer configs by @VahidooX :: PR: #6426
Fix typos by @titu1994 :: PR: #6494
Fix typos (#6523) by @titu1994 :: PR: #6539
added back the fast emit section to the configs. by @VahidooX :: PR: #6540
Add FastConformer Hybrid ASR models for EN, ES, IT, DE, PL, HR, UA, BY by @KunalDhawan :: PR: #6549
Add scores for FastConformer models by @titu1994 :: PR: #6557
Patch transcribe and support offline transcribe for hybrid model by @fayejf :: PR: #6550
More streaming conformer export fixes by @messiaen :: PR: #6567
Documentation for ASR-TTS models by @artbataev :: PR: #6594
Patch transcribe_util for steaming mode and add wer calculation back to inference scripts by @fayejf :: PR: #6601
Add HAT image to docs by @andrusenkoau :: PR: #6619
Patch decoding for PC models by @titu1994 :: PR: #6630
Fix wer.py where 'errors' variable was not set by @stevehuang52 :: PR: #6633
Fix for old models in change_attention_model by @VahidooX :: PR: #6635

TTS

Changelog

VITS HiFiTTS doc by @treacker :: PR: #6288
fix broken links r1.18.0 by @ekmb :: PR: #6501
[TTS] fixed broken path. by @XuesongYang :: PR: #6514

NLP / NMT

Changelog

[Core] return_config=True now extracts just config, not full tarfile by @titu1994 :: PR: #6346
restore path for p-tuning by @arendu :: PR: #6273
taskname and early stopping for adapters by @arendu :: PR: #6366
Adapter tuning accepts expanded language model dir by @arendu :: PR: #6376
Update gpt_training.rst by @blisc :: PR: #6378
Megatron GPT model finetuning by @MaximumEntropy :: PR: #6210
[NeMo Megatron] Cleanup configs to infer the models TP PP config automatically by @titu1994 :: PR: #6368
Fix prompt template unescaping by @MaximumEntropy :: PR: #6399
Add support for Megatron GPT Untied Embd TP PP Change by @titu1994 :: PR: #6388
Move Parallelism usage from Apex -> Megatron Core by @aklife97 :: PR: #6393
Add ability to enable/disable act ckpt and seq parallelism in GPT by @markelsanz14 :: PR: #6327
Refactor PP conversion + add support for TP only conversion by @titu1994 :: PR: #6419
fix CPU overheads of GPT synthetic dataset by @xrennvidia :: PR: #6427
check if grad is none before calling all_reduce by @arendu :: PR: #6428
Fix replace_bos_with_pad not found by @aklife97 :: PR: #6443
Support Swiglu in TP PP Conversion by @titu1994 :: PR: #6437
BERT pre-training mp fork to spawn by @aklife97 :: PR: #6442
Meagtron encoder decoder fix for empty validation outputs by @michalivne :: PR: #6459
Reduce workers on NMT CI by @aklife97 :: PR: #6472
Switch to NVIDIA Megatron repo by @aklife97 :: PR: #6465
Megatron KERPLE positional embeddings by @michalivne :: PR: #6478
Support in external sample mapping for Megatron datasets by @michalivne :: PR: #6462
Fix custom by @aklife97 :: PR: #6512
GPT fp16 inference fix by @MaximumEntropy :: PR: #6543
Fix for T5 FT model by @aklife97 :: PR: #6529
Pass instead of scaler object to core by @aklife97 :: PR: #6545
Change Megatron Enc Dec model to use persistent_workers by @aklife97 :: PR: #6548
Turn autocast off when precision is fp32 by @aklife97 :: PR: #6554
Fix batch size reconf for T5 FT for multi-validation by @aklife97 :: PR: #6582
Make tensor split contiguous for qkv and kv in attention by @aklife97 :: PR: #6580
Patches from main to r1.18.0 for Virtual Parallel by @titu1994 :: PR: #6592
Create dummy iters to satisy iter type len checks in core + update core commit by @aklife97 :: PR: #6600
Restore GPT support for interleaved pipeline parallelism by @timmoon10 :: PR: #6528
Add megatron_core to requirements by @ericharper :: PR: #6639

Export

Changelog

Bugfixes

Changelog

Fix the GPT SFT datasets loss mask bug by @yidong72 :: PR: #6409
[BugFix] Fix multi-processing bug in data simulator by @tango4j :: PR: #6310
Fix cache aware hybrid bugs by @VahidooX :: PR: #6466
[BugFix] Force _get_batch_preds() to keep logits in decoder timestamp… by @tango4j :: PR: #6500
Fixing bug in unsort_tensor by @borisfom :: PR: #6320
Bugfix for BF16 grad reductions with distopt by @timmoon10 :: PR: #6340
Limit urllib3 version to patch issue with RTD by @aklife97 :: PR: #6568

General improvements

Changelog

Pin the version to hopefully fix rtd build by @SeanNaren :: PR: #6334
enabling diverse datasets in val / test by @bmwshop :: PR: #6306
extract inference weights by @arendu :: PR: #6353
Add opengraph support for NeMo docs by @titu1994 :: PR: #6380
Adding basic preemption code by @athitten :: PR: #6161
Add documentation for preemption support by @athitten :: PR: #6403
Update hyperparameter recommendation based on experiments by @Zhilin123 :: PR: #6405
exceptions with empty test / val ds config sections by @bmwshop :: PR: #6421
Upgrade pt 23.03 by @ericharper :: PR: #6430
Update README to add core installation by @aklife97 :: PR: #6488
Not doing CastToFloat by default by @borisfom :: PR: #6524
Update manifest.py for speedup by @stevehuang52 :: PR: #6565
Update SDP docs by @erastorgueva-nv :: PR: #6485
Update core commit hash in readme by @aklife97 :: PR: #6622
Remove from jenkins by @ericharper :: PR: #6641
Remove dup by @ericharper :: PR: #6643

v1.17.0

1 year ago

Highlights

NeMo ASR

Online Clustering Diarizer
High Level Diarization API
PyCTC Decode Beam Search Support
RNNT Beam Search Alignment Extraction
InterCTC Loss
AIStore Documentation
ASR & AWS Multi-node Integration
Convolution Invariant SDR losses

NeMo TTS

NeMo Megatron

SqaredReLU, SwiGLU, No-Dropout
Rotary Position Embedding
Untie word embeddings and output projection

NeMo Core

Dynamic freezing of modules during training
NeMo Multi-Run Documentation
ClearML Logging
Early Stopping
Experiment Manager Docs Update

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.02

Detailed Changelogs

ASR

Changelog

Support Alignment Extraction for all RNNT Beam decoding methods by @titu1994 :: PR: #5925
Use module-based k2 import guard by @artbataev :: PR: #6006
Default RNNT loss to int64 targets by @titu1994 :: PR: #6011
Added documentation section for ASR datasets from AIStore by @anteju :: PR: #6008
Change perturb rng for reproducing results easily by @fayejf :: PR: #6042
InterCTC loss and stochastic depth implementation by @Kipok :: PR: #6013
Add pyctcdecode to high level beam search API by @titu1994 :: PR: #6026
Convert esperanto into a notebook by @SeanNaren :: PR: #6070
[ASR] Added a script for evaluating metrics for audio-to-audio by @anteju :: PR: #5971
[ASR] Convolution-invariant SDR loss + unit tests by @anteju :: PR: #5992
Adjust stochastic depth dropout probability calculation by @anteju :: PR: #6120
Add file class based inference API for diarization by @SeanNaren :: PR: #5945
Ngram by @karpnv :: PR: #6063
remove duplicate definition of manifest read and write func. by @XuesongYang :: PR: #6088
Streaming conformer CTC export by @messiaen :: PR: #5837
[TTS] Make mel spectrogram norm configurable by @rlangman :: PR: #6155
Ngram lm fusion for RNNT maes decoding by @andrusenkoau :: PR: #6118
ASR Beam search documentation by @titu1994 :: PR: #6244

TTS

Changelog

[TTS][ZH] added new NGC model cards with polyphone disambiguation. by @XuesongYang :: PR: #5940
[TTS] deprecate AudioToCharWithPriorAndPitchDataset. by @XuesongYang :: PR: #5959
[TTS][G2P] deprecate add_symbols by @XuesongYang :: PR: #5961
Added list_available_models by @treacker :: PR: #5967
Update Fastpitch energy bug by @blisc :: PR: #5969
removed WHATEVER(1) ˌhwʌˈtɛvɚ from scripts/tts_dataset_files/ipa_cmudict-0.7b_nv22.10.txt by @MikyasDesta :: PR: #5869
ONNX export for RadTTS by @borisfom :: PR: #5880
Add some info about FastPitch SSL model by @redoctopus :: PR: #5994
Vits doc by @treacker :: PR: #5989
Ragged batching changes for RadTTS, some refactoring by @borisfom :: PR: #6020
Working enabled ragged batching with ONNX by @borisfom :: PR: #6030
[TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982
[TTS] Add Spanish IPA dictionaries and heteronyms by @rlangman :: PR: #6037
[TTS] Separate TTS tokenization and g2p util to fix circular import by @rlangman :: PR: #6080
[TTS][refactor] Part 7 - move module from model file. by @XuesongYang :: PR: #6098
[TTS][refactor] Part 1 - nemo.collections.tts.data by @XuesongYang :: PR: #6099
[TTS][refactor] Part 2 - nemo.colletions.tts.parts by @XuesongYang :: PR: #6105
[TTS][refactor] Part 6 - remove nemo.collections.tts.torch.README.md and tts_dataset.yaml by @XuesongYang :: PR: #6103
[TTS][refactor] Part 3 - nemo.collections.tts.g2p.models by @XuesongYang :: PR: #6113
[TTS] update German NGC models trained on Thorsten Datasets by @XuesongYang :: PR: #6125
[TTS] remove old waveglow model that relies on torch_stft. by @XuesongYang :: PR: #6128
[TTS] Move Spanish polyphones from heteronym to dictionary by @rlangman :: PR: #6123
[TTS][refactor] Part 8 - added model inference tests to safeguard changes. by @XuesongYang :: PR: #6129
remove duplicate definition of manifest read and write func. by @XuesongYang :: PR: #6088
[TTS][refactor] update tutorial import paths. by @XuesongYang :: PR: #6176
[TTS] Add univnet scheduler by @ArtyomZemlyak :: PR: #6157
[TTS] Make mel spectrogram norm configurable by @rlangman :: PR: #6155

NLP / NMT

Changelog

add new lannguages to doc by @yzhang123 :: PR: #5939
Distributed Adam optimizer overlaps param all-gather with forward compute by @timmoon10 :: PR: #5684
Refactor the retrieval services for microservice architecture by @yidong72 :: PR: #5910
make validation accuracy reporting optional for adapters/ptuning by @arendu :: PR: #5843
Add BERT support for overlapping forward compute with distopt communication by @timmoon10 :: PR: #6024
[TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982
adding early stop callback to ptuning by @arendu :: PR: #6028
Pr doc tn by @yzhang123 :: PR: #6041
Adds several configurable flags for Megatron GPT models by @MaximumEntropy :: PR: #5991
P-tuning refactor Part 1/N by @arendu :: PR: #6054
Fast glu activations by @MaximumEntropy :: PR: #6058
P-tuning refactor Part 2/N by @arendu :: PR: #6056
P-tuning refactor Part 3/N by @arendu :: PR: #6106
Explicitly check for united embeddings when logging params by @MaximumEntropy :: PR: #6085
Add flag to get attention from fusion by @ericharper :: PR: #6049
Improving text memmap generated index files error messages by @michalivne :: PR: #6093
Megatron Encoder-Decoder Sampler Function by @michalivne :: PR: #6095
Sentence piece legacy false compatibility by @arendu :: PR: #6154
convert Megatron LM ckpt to NeMo PP support. by @yidong72 :: PR: #6159
Avoid multiple warnings for loss mask by @mikolajblaz :: PR: #6062
Propagate LayerNorm1P to TE by @mikolajblaz :: PR: #6061
Filter p-tuning by example length by @arendu :: PR: #6182
Add sequence parallel support to Rope positional embedding by @yidong72 :: PR: #6178
Use a separate communicator for DP AMAX reduction by @erhoo82 :: PR: #6022
Add persistent workers to GPT by @ericharper :: PR: #6205
Micro batch loader for bert model by @shanmugamr1992 :: PR: #6046
GPT P tuning Eval changes (#5952) by @aklife97 :: PR: #6272
add template for taskname=taskname by @Zhilin123 :: PR: #6283
added RPE + fixed RMSNorm by @Davood-M :: PR: #6304
simplified notebook for p-tuning by @arendu :: PR: #6326
Added num decoder blocks in megatron export by @Davood-M :: PR: #6331

Text Normalization / Inverse Text Normalization

Changelog

[TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982

Export

Changelog

ONNX export for RadTTS by @borisfom :: PR: #5880
Working enabled ragged batching with ONNX by @borisfom :: PR: #6030
Update docs for ExpManager and Exportable frameworks by @titu1994 :: PR: #6165
Streaming conformer CTC export by @messiaen :: PR: #5837
MixedFusedRMSNorm Export Fix by @Davood-M :: PR: #6296
Added num decoder blocks in megatron export by @Davood-M :: PR: #6331

Bugfixes

Changelog

Fix bug where GPT always enabled distopt overlapped param sync by @timmoon10 :: PR: #5995
CS bugfix by @bmwshop :: PR: #6122
RNNT patch by @titu1994 :: PR: #6231
Notebook fixes by @titu1994 :: PR: #6212
Small fixes for flashlight decoder by @trias702 :: PR: #6071
Various fixes in docs and RNNT by @titu1994 :: PR: #6156
Fix k2 and torchaudio installation (Docker, macOS) by @artbataev :: PR: #6094
update and deprecate warning for Mic notebook by @fayejf :: PR: #6307
small bugfix and add asr evaluator to doc by @fayejf :: PR: #6229
Bug fixing for bucketing dataset by @VahidooX :: PR: #6191
Fix character beam decoding algorithm with vocab index map by @titu1994 :: PR: #6140
fix typo in asr evaluator readme by @fayejf :: PR: #6053
Fix typos by @titu1994 :: PR: #6241
[ASR]:fixed augmentor arguments for transcribe functionality of Hybrid CTC-RNNT model by @KunalDhawan :: PR: #6290
Fix hybrid transcribe by @ArtyomZemlyak :: PR: #6003
Fix buckeing seeding by @VahidooX :: PR: #6254
Fix for CTC decoder setup by @vsl9 :: PR: #6303
Fix RNNT Joint narrow() by @titu1994 :: PR: #6336
Fix bugs with interctc mixin by @Kipok :: PR: #6228
Update IPA dict path in tutorial by @redoctopus :: PR: #6208
[TTS] fix broken tutorial for Tacotron2 by @XuesongYang :: PR: #6199
[TTS] fix bugs for chinese and german tutorials. by @XuesongYang :: PR: #6216
Fix radtts sort r17 by @borisfom :: PR: #6344
Quick Fix for RadTTS test by @blisc :: PR: #6034
Disabling radtts tests untin we have real model by @borisfom :: PR: #6036
fix val loss computation in megatron by @anmolgupt :: PR: #5871
Fix incomplete batches by @mikolajblaz :: PR: #6083
Avoid unnecessarily accessing data loader with pipeline parallelism by @timmoon10 :: PR: #6164
bugfix: file handlers are not closed. by @XuesongYang :: PR: #5956
Fix Silence Sampling Algorithm for ASR Multi-speaker Data Simulator by @stevehuang52 :: PR: #5897
Fix Windows bug with save_restore_connector by @trias702 :: PR: #5919
fix broken link by @ericharper :: PR: #5968
Fix torchaudio installation by @artbataev :: PR: #5850
Fix reinstall.sh dependencies by @titu1994 :: PR: #6027
Adding changes to fix the mv error by @tango4j :: PR: #6087
Fix README by @flx42 :: PR: #6137
Fix typos in voiceapp notebook by @titu1994 :: PR: #6262
[BugFix] Fix diarization result path errors in tutorial notebook for r1.17.0 by @tango4j :: PR: #6234
[BugFix] Fix the wrong branch name in speaker diarization inference notebook by @tango4j :: PR: #6301

General Improvements

Changelog

Dynamic freezing in Nemo by @trias702 :: PR: #5879
Move settings to . Remove deprecated by @artbataev :: PR: #5947
update container info in readme by @fayejf :: PR: #5981
Update PUBLICATIONS.md by @titu1994 :: PR: #5963
[G2P] backward compatibility for english tokenizer and bugfix by @github-actions[bot] :: PR: #5984
replace symbols by @github-actions[bot] :: PR: #5990
correct bash style according to SC2236. by @XuesongYang :: PR: #6025
Update align.py by @github-actions[bot] :: PR: #6045
Add Customization Dataset Preparation Tool by @Zhilin123 :: PR: #6029
Updated data simulator config part in Speaker_Diarization_Training.ipynb by @tango4j :: PR: #6072
Add citation by @ericharper :: PR: #6077
[TTS] Spectrogram Enhancer: correct dim for length when loading data by @github-actions[bot] :: PR: #6074
Add ClearML Logging by @ArtyomZemlyak :: PR: #6014
update readme with new badges by @XuesongYang :: PR: #6110
[CI] Set readthedocs python version to 3.8 by @SeanNaren :: PR: #6079
Update dataset preparation tool to fix bug relating to non jsonl input file by @Zhilin123 :: PR: #6147
update finetune configs by @nithinraok :: PR: #6152
Added ckpt to nemo for T5/T0 models by @Davood-M :: PR: #6141
Save model parallel .nemo in ExpManager by @arendu :: PR: #6115
Upgrade setuptools by @fayejf :: PR: #6163
Update container version in main readme by @fayejf :: PR: #6171
metric update by @arendu :: PR: #6169
Upgrade base container to PyTorch 23.02 by @ericharper :: PR: #6162
Link to nm launcher by @ericharper :: PR: #6226
Make AIS CLI installation optional by @anteju :: PR: #6314
remove pinned numba version in Dockerfile by @fayejf :: PR: #6341
Cherry-pick recent distopt commits by @timmoon10 :: PR: #6343
Update readme by @ericharper :: PR: #6363

v1.16.0

1 year ago

Highlights

NeMo ASR

ASR Evaluator
Multi-channel dereverberation algorithm
Hybrid ASR-TTS Models
Flashlight Decoder Beam Search
FastConformer Encoder with 8x subsampling

NeMo TTS

SSL Voice Conversion
Spectrogram Enhancer
VITS

NeMo Megatron

Per microbatch dataloader for GPT and BERT
Adapters compatible with Faster Transformer

NeMo Core

Nested model support

NeMo Tools

NeMo Forced Aligner

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.01

ASR

Changelog

Fix for incorrect computation of batched alignment in transducers by @Kipok :: PR: #5692
Set the stream position to 0 for pydub by @jonghwanhyeon :: PR: #5752
[Fix] ConformerEncoder forward when length is None by @anteju :: PR: #5761
ASR evaluator by @fayejf :: PR: #5728
[ASR][Test] Enable test for cache audio with a single worker by @anteju :: PR: #5763
Flashlight Decoder for Nemo by @trias702 :: PR: #5790
Fix data simulator by @stevehuang52 :: PR: #5813
[ASR] Mask-based dereverb algorithm by @anteju :: PR: #5693
Concat dataset and aistore support for label models by @Kipok :: PR: #5826
Adding new features and speed up for multi-speaker data simulator by @tango4j :: PR: #5846
Add Esperanto ASR example by @andrusenkoau :: PR: #5772
Fix memory allocation of NeMo Multi-speaker Data Simulator by @stevehuang52 :: PR: #5864
[ASR] Separate Audio-to-Text (BPE, Char) dataset construction by @artbataev :: PR: #5774
Reduce memory usage in getMultiScaleCosAffinityMatrix function by @gabitza-tech :: PR: #5876
Hybrid ASR-TTS models by @artbataev :: PR: #5659
Set providers for onnxruntime inference session by @athitten :: PR: #5903
[ASR] Configurable metrics for audio-to-audio + removed experimental decorators by @anteju :: PR: #5827
Correct doc for RNNT transcribe() function by @titu1994 :: PR: #5904
Update isort to the latest version by @artbataev :: PR: #5895
FilterbankFeaturesTA to match FilterbankFeatures by @msis :: PR: #5913
Fix hybridasr bug by @VahidooX :: PR: #5950
replace symbols by @nithinraok :: PR: #5974
fast conformer configs and doc by @bmwshop :: PR: #5970
Update TitaNet-L and MSDD models by @nithinraok :: PR: #6023
Fix enhancer usage by @artbataev :: PR: #6059
update librosa args by @nithinraok :: PR: #6086
Fix enhancer usage in ASR-TTS examples by @artbataev :: PR: #6116
Fix k2 and torchaudio installation (Docker, macOS). Cherry-pick (#6094) by @artbataev :: PR: #6124

TTS

Changelog

[TTS] Update Spanish TTS model to 1.15 by @rlangman :: PR: #5742
[TTS][DE] refine grapheme-based tokenizer and fastpitch training recipe on thorsten's neutral datasets. by @XuesongYang :: PR: #5753
No-script TS export, prepared for ONNX export by @borisfom :: PR: #5653
Fixing masking in RadTTS bottleneck layer by @borisfom :: PR: #5771
Port Riva's mel cepstral distortion w/ dynamic time warping notebook by @redoctopus :: PR: #5778
Update radtts' infer path by @blisc :: PR: #5788
[TTS][DE] Augment tokenization/G2P to preserve capitalization of words and mix phonemes with word-level graphemes for an input text. by @XuesongYang :: PR: #5805
[TTS] porting VITS implementation by @treacker :: PR: #5600
[TTS][DE] updated IPA dictionary and heteronyms by @XuesongYang :: PR: #5860
[TTS] GAN-based spectrogram enhancer by @racoiaws :: PR: #5565
TTS inference with Heteronym classification model, hc model inference refactoring by @ekmb :: PR: #5768
Remove MCD_DTW tarball by @redoctopus :: PR: #5889
Hybrid ASR-TTS models by @artbataev :: PR: #5659
Moved eval notebook data to aws by @redoctopus :: PR: #5911
[G2P] fixed typos and broken import library. by @XuesongYang :: PR: #5978
[G2P] backward compatibility for english tokenizer and bugfix by @XuesongYang :: PR: #5980
fix links, add missing file by @ekmb :: PR: #6044
[TTS] Spectrogram Enhancer: correct dim for length when loading data by @racoiaws :: PR: #6048
[TTS] bugfix for fastpitch German tutorial by @XuesongYang :: PR: #6051
[TTS] bugfix Chinese Fastpitch tutorial by @XuesongYang :: PR: #6055
Fix enhancer usage by @artbataev :: PR: #6059
[TTS] Spectrogram Enhancer: support arbitrary input length by @racoiaws :: PR: #6060
Fix enhancer usage in ASR-TTS examples by @artbataev :: PR: #6116
[TTS] Spectrogram Enhancer: add option to zero out the initial tensor by @racoiaws :: PR: #6136
[TTS][DE] Augment tokenization/G2P to preserve capitalization of words and mix phonemes with word-level graphemes for an input text. by @XuesongYang :: PR: #5805

NLP / NMT

Changelog

Fix P-Tuning Truncation by @vadam5 :: PR: #5663
Adithyare/prompt learning seed by @arendu :: PR: #5749
Add extra data args to support proper finetuning of HF converted T5 checkpoints by @MaximumEntropy :: PR: #5719
Don't add output directory twice when creating shared sentencepiece tokenizer by @pks :: PR: #5737
add constraint info on batch size for tar dataset by @yzhang123 :: PR: #5812
remove transformer version upper bound by @Zhilin123 :: PR: #5831
Adithyare/adapter new placement by @arendu :: PR: #5791
Add SSL import functionality for Audio Lexical PNC Models by @trias702 :: PR: #5834
validation batch sizing and drop_last controls by @arendu :: PR: #5830
Remove ending newlines when encoding strings w/ sentencepiece tokenizer by @pks :: PR: #5739
Fix segmenting for pcla inference by @jubick1337 :: PR: #5849
RETRO model finetuning by @yidong72 :: PR: #5800
Optimizing distributed Adam when running with one work queue by @timmoon10 :: PR: #5560
Add option to disable distributed parameters in distributed Adam optimizer by @timmoon10 :: PR: #5685
set max_steps for lr decay through config by @anmolgupt :: PR: #5780
Fix Prompt text space issue by @aklife97 :: PR: #5983
Add batch_size to prompt_learning generate by @aklife97 :: PR: #6091

NeMo Tools

Changelog

[Tools] NeMo Forced Aligner by @erastorgueva-nv :: PR: #5571
[Tools] Fix ctc segmentation: exclude audacity files by @ekmb :: PR: #6009

Export

Changelog

No-script TS export, prepared for ONNX export by @borisfom :: PR: #5653
Set providers for onnxruntime inference session by @athitten :: PR: #5903
Add segmentation export to Audacity label file by @Ca-ressemble-a-du-fake :: PR: #5857

General Improvements

Changelog

Pin lightning version less than 1.9.0 by @SeanNaren :: PR: #5822
Davidm/cherrypick r1.16.0 by @Davood-M :: PR: #6082
Update files for lightning 1.9.0 by @SeanNaren :: PR: #5823
Tn doc 16 by @yzhang123 :: PR: #5954
Ensure EMA checkpoints are also deleted when normal checkpoints are by @SeanNaren :: PR: #5724
[Fix] ConformerEncoder forward when length is None by @anteju :: PR: #5761
Fix EMA topk checkpoint deletion by @SeanNaren :: PR: #5758
[BugFix] decoder timestamp count has a mismatch when is decoded by @tango4j :: PR: #5825
Update 00_NeMo_Primer.ipynb by @schaltung :: PR: #5740
Sanitize params before DLLogger log_hyperparams by @milesial :: PR: #5736
NeMo Forced Aligner by @erastorgueva-nv :: PR: #5571
Add EMA Docs, fix common collection documentation by @SeanNaren :: PR: #5757
Add container info to main page by @fayejf :: PR: #5816
CommonVoice support for script by @SeanNaren :: PR: #5797
Support nested NeMo models by @artbataev :: PR: #5671
fix max len generation t5 by @ekmb :: PR: #5852
NFA samples fix by @erastorgueva-nv :: PR: #5856
fix(readme): fix typo by @jqueguiner :: PR: #5883
Block large files from being merged into NeMo main by @SeanNaren :: PR: #5898
Pin isort version by @artbataev :: PR: #5914
fixed missing long_description_content_type by @XuesongYang :: PR: #5909
Update container to 23.01 by @ericharper :: PR: #5917
remove conda pynini install by @ekmb :: PR: #5921
Update align.py by @Slyne :: PR: #6043
Fixing data simulator argument and bash scripting error by @tango4j :: PR: #6112
Update apex commit by @ericharper :: PR: #6148

NeMo Versions Save

v1.23.0

Highlights

Models

Nvidia Starcoder 2 - 15B

NeMo Canary

NeMo LLM

NeMo MM

NeMo ASR

NeMo TTS

NeMo Vision

Known Issues

ASR

RNNT WER calculation when fused batch size > 1 during validation / test step()

Two failing unit tests due to a change in expected results, caused by lhotse version update.

Container

Detailed Changelogs

ASR

TTS

LLMS

NeMo Tools

General Improvements

v1.22.0

Highlights

Models

NeMo Parakeet

NeMo Parakeet-TDT

ASR

NeMo ASR

Container

Detailed Changelogs

ASR

TTS

LLM

General Improvements

v1.21.0

Highlights

Models

NeMo ASR

NeMo TTS

NeMo Framework

NeMo Core

NeMo Tools

Container

ASR

TTS

NLP / NMT

NeMo Tools

Export

General Improvements

v1.20.0

Highlights

Models

NeMo ASR

NeMo TTS

NeMo Framework

NeMo Tools

Container

Detailed Changelogs

ASR

TTS

NLP / NMT

NeMo Tools

Bugfixes

General Improvements

v1.19.1

v1.19.0

Highlights

NeMo ASR

NeMo TTS

NeMo Megatron

Container

Detailed Changelogs

ASR

TTS

NLP / NMT

Bugfixes

General Improvements

v1.18.1

Highlights