NeMo Versions Save

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

v1.23.0

2 months ago

Highlights

Models

Nvidia Starcoder 2 - 15B

NeMo Canary

Announcement - https://nvidia.github.io/NeMo/blogs/2024/2024-02-canary/

NeMo LLM

  • Falcon
  • Code Llama
  • StarCoder
  • GPT perf improvements
  • Context parallelism
  • Mistral
  • Mixtral (without expert parallelism)
  • Mcore GPT Dataset integration

NeMo MM

  • CLIP
  • Stable Diffusion (supporting LoRA)
  • Imagen
  • ControlNet (for SD)
  • Instruct pix2pix (for SD)
  • LLAVA
  • NeVA
  • DreamFusion++
  • NSFW filtering

NeMo ASR

  • Lhotse Dataloading support #7880
  • Canary: Multi task multi lingual ASR #8242
  • LongForm Audio for Diarization #7737
  • Faster algorithm for RNN-T Greedy #7926
  • Cache-Aware streaming notebook #8296

NeMo TTS

NeMo Vision

Known Issues

ASR

RNNT WER calculation when fused batch size > 1 during validation / test step()

Previously, the RNNT metric was stateful while the CTC one was not (r1.22.0, r1.23.0)

Therefore this calculation in the RNNT joint for fused operation worked properly. However with the unification of metrics in r1.23.0, a bug was introduced where only the last sub-batch of metrics calculates the scores and does not accumulate. This is patched via https://github.com/NVIDIA/NeMo/pull/8587 and will be fixed in the next release.

Workaround: Explicitly disable fused batch size during inference using the following command

from omegaconf import open_dict
model = ...
decoding_cfg = model.cfg.decoding
with open_dict(decoding_cfg):
  decoding_cfg.fused_batch_size = -1
model.change_decoding_strategy(decoding_cfg)

Note: This bug does not affect scores calculated via model.transcribe() (since it does not calculate metrics during inference, just text), or using the transcribe_speech.py or speech_to_text_eval.py in examples/asr.

Two failing unit tests due to a change in expected results, caused by lhotse version update.

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:24.01.speech

Detailed Changelogs

ASR

Changelog
  • Update link to yaml file in ASR_with_Transducers.ipynb by @Faith-Nchifor :: PR: #8014
  • Use convert_hf_dataset_to_nemo by @karpnv :: PR: #8017
  • Update asr_language_modeling.rst: Add a missing word by @martin0258 :: PR: #8007
  • spelling mistake by @orena1 :: PR: #7903
  • update asr eval by @stevehuang52 :: PR: #8045
  • fix noise aug by @stevehuang52 :: PR: #8057
  • Various fixes for typos and urls by @titu1994 :: PR: #8066
  • [Fix] Increase length check tolerance to prevent test failing by @anteju :: PR: #8067
  • Add text metrics to asr eval by @stevehuang52 :: PR: #8087
  • fix device setting to allow using accelerator cpu by @orena1 :: PR: #8084
  • .ctm in data simulator annotator compliant with RT-09 specification by @popcornell :: PR: #8004
  • Fix AST eval by @stevehuang52 :: PR: #8112
  • fix: numba.*_num_threads resets torch num_threads #8141 by @itzsimpl :: PR: #8145
  • Update dependencies by @titu1994 :: PR: #8156
  • NeMo + Lhotse integration by @pzelasko :: PR: #7880
  • Speedup RNN-T greedy decoding by @artbataev :: PR: #7926
  • [docker] Install k2 before NeMo for faster image rebuilding by @pzelasko :: PR: #8204
  • [docs] Add --force_codec to tarred dataset creation examples by @pzelasko :: PR: #8227
  • Temporarily use the previous RNN-T decoding algorithm as default by @artbataev :: PR: #8226
  • Make TDT inference not require duration params by @hainan-xv :: PR: #8207
  • Cache Aware Streaming tutorial notebook by @erastorgueva-nv :: PR: #8296
  • fix path location and branch by @nithinraok :: PR: #8304
  • Attention encoder-decoder models for multiple speech-to-text tasks … by @titu1994 :: PR: #8324
  • Remove asr webapp by @titu1994 :: PR: #8347
  • remove target at model level in aed model config [ASR] by @krishnacpuvvada :: PR: #8351
  • Add change_vocabulary and save_tokenizers() support to Multitask ASR models by @titu1994 :: PR: #8357
  • Change default beam size by @titu1994 :: PR: #8371
  • adding jenkins test for speech_to_text_aed model by @krishnacpuvvada :: PR: #8368
  • Add Finetuning tutorial with HF Datasets by @nithinraok :: PR: #8356
  • wer fix by @tbartley94 :: PR: #8404
  • add ensemble decoding fix by @nithinraok :: PR: #8427
  • Update k2 by @artbataev :: PR: #8492

TTS

Changelog
  • [TTS] Scale sampler steps by number of devices by @rlangman :: PR: #7947
  • Add All Multimodal Source Code Part 2: Text to image, x to nerf by @yaoyu-33 :: PR: #7970
  • [TTS] Add period discriminator and feature matching loss to codec recipe by @rlangman :: PR: #7884
  • Added VectorQuantizer base class by @anteju :: PR: #8011

LLMS

Changelog
  • Add interface to set NCCL options of each process group by @erhoo82 :: PR: #7923
  • Support O2 training of PEFT and SFT by @cuichenx :: PR: #7971
  • [NLP] Access scaler only in FP16 case by @janekl :: PR: #7916
  • [NLP] Minor improvements in Llama conversion script by @janekl :: PR: #7978
  • [NLP] Use helpers from utils_funcs.py in Llama conversion by @janekl :: PR: #7979
  • [NLP] Remove replace_sampler_ddp (deprecated in Trainer) by @janekl :: PR: #7981
  • Reworked MegatronPretrainingRandomBatchSampler to correctly handle epochs > 1 by @trias702 :: PR: #7920
  • Remove deprecated arguments from TE's TransformerLayer by @jbaczek :: PR: #7917
  • Add All Multimodal Source Code by @yaoyu-33 :: PR: #7791
  • First draft of mcore bert model in NeMo by @shanmugamr1992 :: PR: #7814
  • Support Falcon Variants (7B/40B/180B) in Mcore NeMo by @xuanzic :: PR: #7666
  • FSDP + Tensor Parallelism by @erhoo82 :: PR: #7897
  • Packed Sequence by @cuichenx :: PR: #7945
  • Adding method back that was removed accidentally by @ericharper :: PR: #8038
  • [NLP] ArtifactItem with init=True to make it debuggable by @janekl :: PR: #7980
  • SFT patch: (1) enable sequence parallelism and (2) enable profile by @erhoo82 :: PR: #7963
  • migration to PTL 2.0 for spellmapper model by @bene-ges :: PR: #7924
  • Change the megatron config lr scheduler default and fix to change partitions script by @shan18 :: PR: #8094
  • (1) Add SHARP interface to M-CORE, (2) use send/recv to send train loss to the first rank instead of b-cast by @erhoo82 :: PR: #7793
  • Reconfigure limit_val_batches only for int by @athitten :: PR: #8099
  • Fixing wrapper and moving it to base class by @shanmugamr1992 :: PR: #8055
  • fix gated_linear_unit bug by @Agoniii :: PR: #8042
  • Fix Adapter for MCore models by @cuichenx :: PR: #8124
  • add war fix for sync issues by @gshennvm :: PR: #8130
  • Improve PEFT UX by @cuichenx :: PR: #8131
  • Enhance flexibility by passing callbacks as method argument by @michal2409 :: PR: #8015
  • context parallelism by @xrennvidia :: PR: #7739
  • Make pipelined TP comm overlap available with mcore by @erhoo82 :: PR: #8005
  • remove deprecated scripts by @arendu :: PR: #8138
  • adding OnlineSampleMapping by @arendu :: PR: #8137
  • Add distopt support for FP8 params and BF16 optimizer state by @timmoon10 :: PR: #7909
  • Revert adding OnlineSampleMapping by @pablo-garay :: PR: #8164
  • Token count and sequence length logging for MegatronGPTSFTModel by @vysarge :: PR: #8136
  • Use latest apex internal API by @jbaczek :: PR: #8129
  • tune specific params in the base model by @arendu :: PR: #7745
  • Virtual pipeline parallel support for MegatronGPTSFTModel by @vysarge :: PR: #7964
  • removed deprecated peft model by @arendu :: PR: #8183
  • remove more deprecated files by @arendu :: PR: #8169
  • Pre-generate cu_seqlens argmin and max_seqlen to remove host-to-device sync by @erhoo82 :: PR: #8108
  • Add the interface to use SHARP to FSDP strategy by @erhoo82 :: PR: #8202
  • Multimodal required NLP base model changes by @yaoyu-33 :: PR: #8188
  • [NLP] Improve and unify loading state_dict for community models by @janekl :: PR: #7977
  • Rename Finetuning Scripts by @cuichenx :: PR: #8201
  • Final multimodal PR with our recent developments on MM side by @yaoyu-33 :: PR: #8127
  • Add include_text parameter to SFT dataloaders by @Kipok :: PR: #8198
  • Add random_seed argument to generate by @Kipok :: PR: #8162
  • Added support for neptune logger by @harishankar-gopalan :: PR: #8210
  • Pre-compute max_seqlen and cu_seqlens_argmin in all model-parallel cases by @erhoo82 :: PR: #8222
  • Use PackedSeqParams in accordance with changes in Megatron-LM by @cuichenx :: PR: #8205
  • Fix to peft & virtual pipeline parallel unsupported check by @vysarge :: PR: #8216
  • Fixed the tp overlap switch by @sanandaraj5597 :: PR: #8195
  • add knobs for rope/swiglu fusion by @lhb8125 :: PR: #8184
  • Added sample cpu_offloading switch to YAML by @sanandaraj5597 :: PR: #8148
  • Syncing random seed between ranks in generate by @Kipok :: PR: #8230
  • add first_val_step to mcore scheduler by @JimmyZhang12 :: PR: #8150
  • Correct padding for SFT input data to account for sequence parallel + TE's fp8 op dimension requirements by @vysarge :: PR: #8240
  • Mistral 7b conversion script by @akoumpa :: PR: #8052
  • switch to mcore dataset [with FIM support] by @dimapihtar :: PR: #8149
  • Mixtral to NeMo conversion script. by @akoumpa :: PR: #8155
  • fixes to accomendate mcore changes by @HuiyingLi :: PR: #8261
  • Allow MegatronPretrainingRandomSampler to do multi-epoch training by @trias702 :: PR: #8239
  • Add dist ckpt support for regular optimizers by @mikolajblaz :: PR: #7749
  • add deallocate pipeline output optimization by @JimmyZhang12 :: PR: #8279
  • Fix memory leak caused by context parallelism hanging references by omegaconf by @JimmyZhang12 :: PR: #8299
  • distributed fused adam + rampup bs support by @dimapihtar :: PR: #8302
  • Update PEFT Doc by @cuichenx :: PR: #8262
  • Converter script fixes for mixtral/mistral by @akoumpa :: PR: #8272
  • Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 by @erhoo82 :: PR: #8334
  • Enable megatron core loggers for GPT pretraining by @ashbhandare :: PR: #8354
  • mcore ds fix by @dimapihtar :: PR: #8283
  • release updates by @dimapihtar :: PR: #8378
  • Mcore customization doc by @HuiyingLi :: PR: #8298
  • updated link to pubmed by @nithinraok :: PR: #8402
  • mcore customization doc minor fix by @HuiyingLi :: PR: #8421
  • Fixing mcore bert for TP, PP and SP by @shanmugamr1992 :: PR: #8336
  • Add settings to suppress bf16 compile errors in CI on V100 by @athitten :: PR: #8481
  • MoE parameter passing by @akoumpa :: PR: #8255
  • Add fp8 support for SD/Update notebook paths by @Victor49152 :: PR: #8489

NeMo Tools

Changelog
  • SDE bugfix log by @Jorjeous :: PR: #8430

General Improvements

Changelog
  • Add news section to README by @ericharper :: PR: #7984
  • Fixing conversion script to work for code llama by @shanmugamr1992 :: PR: #7997
  • Fix crash when converting to mcore a model using rotary embeddings by @odelalleau :: PR: #7998
  • Added a procedure for Windows users, README by @Jorjeous :: PR: #7942
  • Update manifest.py to speedup loading tarred datasets by @stevehuang52 :: PR: #7900
  • [Fix] Fixed name of a test by @anteju :: PR: #7986
  • Fix lora merge script by @cuichenx :: PR: #8113
  • Support transcoding audio formats when saving tarred datasets (FLAC, OPUS) by @pzelasko :: PR: #8102
  • README edit to change Apple Silicon install instructions (to fix a break introduced by pytorch 2) by @stephenmcconnachie :: PR: #8122
  • Fixes NVIDIA/apex installation to not erroneously install the pkg by @terrykong :: PR: #8126
  • Graphviz fix by @GNroy :: PR: #7843
  • Update README.rst by @fayejf :: PR: #8154
  • Fix TP>1 issue for conversion script by @cuichenx :: PR: #8144
  • Support torch jit script by @artbataev :: PR: #8027
  • NeMo Multimodal Docs and Tests Initial PR by @yaoyu-33 :: PR: #8028
  • Remove left-over prints in NeMo+Lhotse code by @pzelasko :: PR: #8180
  • Upgrade to DLFW PyTorch 23.12 by @ericharper :: PR: #8163
  • Add Lhotse support for key in NeMo manifests by @pzelasko :: PR: #8197
  • Fix CPU Initialization and TP>1 for LoRA Merge Script by @cuichenx :: PR: #8199
  • Add support in Neural Typecheck to disable semantic checks by @titu1994 :: PR: #8212
  • Pin lhotse=1.19.2 in r1.23.0 by @pzelasko :: PR: #8303
  • Multimodal r1.23.0 bug fix by @yaoyu-33 :: PR: #8315
  • MCore dataset compatibility for tokenizers by @vysarge :: PR: #8390
  • Update NFA video download link by @erastorgueva-nv :: PR: #8406
  • Update MM Dataprep Tutorial by @cuichenx :: PR: #8410
  • Fix dreambooth data sampler issue by @yaoyu-33 :: PR: #8400
  • Fix a bug in CTM line processing function for multi-speaker data simulations by @tango4j :: PR: #8416
  • Akoumparouli/mistral bugfix by @akoumpa :: PR: #8353
  • pin to 0.5.0 by @ericharper :: PR: #8465
  • Update NeMo Multimodal Requirements by @yaoyu-33 :: PR: #8515
  • Fix link in multimodal dataprep tutorial by @cuichenx :: PR: #8517

v1.22.0

3 months ago

Highlights

Models

NeMo Parakeet

Announcement - https://nvidia.github.io/NeMo/blogs/2024/2024-01-parakeet/

NeMo Parakeet-TDT

Announcement - https://nvidia.github.io/NeMo/blogs/2024/2024-01-parakeet-tdt/

ASR

NeMo ASR

  • Multi-lookahead cache-aware streaming Conformer #6711
  • Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) by @burchim #7330
  • Speech ehancement tutorial #6492
  • Support punctuation error rate #7538

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.10

Detailed Changelogs

ASR

Changelog
  • Fix missing pip package 'einops' by @RobinDong :: PR: #7397
  • Fix failure of installing pyaudio in Online_Offline_Speech_Commands_Demo.ipynb by @RobinDong :: PR: #7396
  • [ASR] Confidence measure -> method renames by @GNroy :: PR: #7434
  • RNN-T confidence and alignment bugfix by @GNroy :: PR: #7381
  • Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) by @burchim :: PR: #7330
  • [TTS] Read audio as int32 to avoid flac read errors by @rlangman :: PR: #7477
  • Fix typos in confidence tutorial notebooks by @Kipok :: PR: #7581
  • Safeguard nemo_text_processing installation on ARM by @blisc :: PR: #7485
  • add fc large ls models by @nithinraok :: PR: #7641
  • [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence by @GNroy :: PR: #7635
  • Create per.py by @ssh-meister :: PR: #7538
  • Update docs: readme, getting started, ASR intro by @erastorgueva-nv :: PR: #7679
  • [ASR] Multichannel mask estimator with flex number of channels by @anteju :: PR: #7317
  • Fix code block typo in docs by @erastorgueva-nv :: PR: #7717
  • Replace gpus with devices by @athitten :: PR: #7743
  • docs: fix typos by @shuoer86 :: PR: #7758
  • Snake act by @nithinraok :: PR: #7736
  • fix(clustering_diarizer.py): fix typo by @jqueguiner :: PR: #7772
  • Add some docs and update scripts for ASR by @titu1994 :: PR: #7790
  • remove TN from ctc_segm tut by @ekmb :: PR: #7807
  • Add support for finetuning with huggingface datasets by @stevehuang52 :: PR: #7834
  • Adding long-form audio speaker diarization (clustering) class and functions by @tango4j :: PR: #7737
  • Fix k2 installation: update for latest PyTorch, move script to dir by @artbataev :: PR: #7887
  • [ASR] GSS-based mask estimator by @anteju :: PR: #7849
  • add Dutch P&C FC model info by @zhehuaichen :: PR: #7892
  • Add checks for unit tests that are looking for data from CI machine by @ericharper :: PR: #7943
  • update branch name by @nithinraok :: PR: #7990
  • fix librosa display issue by @nithinraok :: PR: #7991
  • Fixes Notebooks for ASR by @titu1994 :: PR: #7994
  • cherry pick bug 4405781 by @karpnv :: PR: #8044
  • fix noise augmentation by @stevehuang52 :: PR: #8056
  • Fix various issues with broken links and bugs by @titu1994 :: PR: #8064
  • run with non-dev option by @nithinraok :: PR: #8077
  • update broken links by @nithinraok :: PR: #8079
  • langid bug fix by @karpnv :: PR: #8134

TTS

Changelog
  • Add steps for document of getting dataset 'SF Bilingual Speech' by @RobinDong :: PR: #7378
  • Fix checking of cuda/cpu device for inputs of Decoder by @RobinDong :: PR: #7444
  • Fix failure of ljspeech's get_data.py by @RobinDong :: PR: #7430
  • [TTS] Fix audio codec type checks by @rlangman :: PR: #7373
  • [TTS] Add dataset to path of logged artifacts by @rlangman :: PR: #7462
  • Fix adding positional embeddings in-place in FFTransformerDecoder by @The0nix :: PR: #7440
  • Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS by @RobinDong :: PR: #7409
  • [TTS] Fix FastPitch data prep tutorial by @rlangman :: PR: #7524
  • add italian tokenization by @GiacomoLeoneMaria :: PR: #7486
  • Remap speakers to continuous range of speaker_id for dataset AISHELL3 by @RobinDong :: PR: #7536
  • add ItalianPhonemesTokenizer by @GiacomoLeoneMaria :: PR: #7587
  • [TTS] Add STFT and SI-SDR loss to audio codec recipe by @rlangman :: PR: #7468
  • Fix typo in audio codec config, encoder target by @anteju :: PR: #7697
  • Group-residual vector quantizer by @anteju :: PR: #7643
  • French g2p with pronunciation dictionary by @mgrafu :: PR: #7601
  • add pleasefixme marker for potential failed nightly tests. by @XuesongYang :: PR: #7678
  • Add new text segmentation library for better TTS quality by @RobinDong :: PR: #7645
  • ConditionalInput: cat along the feature dim, not the batch dim by @anferico :: PR: #7785
  • Add selection criteria for reference audios in the submodule by @anferico :: PR: #7788
  • [Codec] Update codec checkpoint config by @anteju :: PR: #7835
  • [Codec] Finite scalar quantizer by @anteju :: PR: #7886
  • Tar codec by @nithinraok :: PR: #7867

LLM

Changelog
  • Allow disabling sanity checking when num_sanity_val_steps=0 by @athitten :: PR: #7413
  • Add comprehensive error messages by @PeganovAnton :: PR: #7261
  • layer selection for ia3 by @arendu :: PR: #7417
  • Add rope dynamic linear scaling by @hsiehjackson :: PR: #7437
  • Fix sft dataset truncation by @hsiehjackson :: PR: #7464
  • fix bug when loading dist ckpt in peft by @lhb8125 :: PR: #7452
  • Fix sft chat dataset truncation by @hsiehjackson :: PR: #7478
  • SFT model parallel fix for dist ckpt by @aklife97 :: PR: #7511
  • remove auto generated examples by @arendu :: PR: #7510
  • Add the argument to by @odelalleau :: PR: #7264
  • PEFT GPT & T5 Refactor by @meatybobby :: PR: #7308
  • fix a typo by @BestJuly :: PR: #7496
  • StarCoder SFT test + bump PyT NGC image to 23.09 by @janekl :: PR: #7540
  • fix llama2 70b lora tuning bug by @cuichenx :: PR: #7622
  • generalized chat sft prompt by @yidong72 :: PR: #7655
  • Set base frequency from config by @shan18 :: PR: #7734
  • Megatron LLM documentation updates by @ssh-meister :: PR: #7400
  • Remove incorrect extra argument of load_from_checkpoint_dir() by @RobinDong :: PR: #7500
  • Add nemo to mcore GPT conversion script by @cuichenx :: PR: #7730
  • set context for text memmap to fork by @arendu :: PR: #7784
  • Support flash decoding by @hsiehjackson :: PR: #7744
  • update text server to support compute logprobs by @Zhilin123 :: PR: #7733
  • Revert PEFT eval fix by @ericharper :: PR: #7693
  • Fix tn duplex by @ekmb :: PR: #7808
  • Multimodal merge by @yaoyu-33 :: PR: #7728
  • Fix flash decoding precision by @hsiehjackson :: PR: #7852
  • Removing duplicate Megatron-LM installation by @Davood-M :: PR: #7864
  • adding special_tokens from tokenizer config for transformer-lm model by @clumsy :: PR: #7613
  • Add Adapter and IA3 support for MCore models by @cuichenx :: PR: #7750
  • Add back import guard by @cuichenx :: PR: #7882
  • Change FP8 Defaults by @cuichenx :: PR: #7894
  • Added knob for ub_tp_comm_overlap for the MCORE pass by @sanandaraj5597 :: PR: #7902
  • Upgrade NeMo to latest mcore and TE by @dimapihtar :: PR: #7862
  • Pad sequences to multiples of 16 for GPTSFTDataset by @vysarge :: PR: #7904
  • upgrade to latest mcore and TE by @dimapihtar :: PR: #7908
  • added missing torch import by @Davood-M :: PR: #7913
  • Fix CPU initialization of GPT models by @cuichenx :: PR: #7889
  • Fix pinned triton version by @hsiehjackson :: PR: #7925
  • fix tp_overlap config var name by @xrennvidia :: PR: #7928
  • only enable query key scaling during fp16 by @gshennvm :: PR: #7946
  • Fix for gpt3 eval hang with PP (a dtype issue) by @yaoyu-33 :: PR: #7927
  • Pass in rotary_base to mcore and from HF by @Kipok :: PR: #7933
  • Use NLPDDPStrategyNotebook in Multitask_Prompt_and_PTuning.ipynb by @athitten :: PR: #8061

General Improvements

Changelog
  • Add fix for max time to quit trainer gracefully, without running validation by @SeanNaren :: PR: #7731
  • SDE Tutorial minor fix by @Jorjeous :: PR: #7598
  • Temporary pin Lightning-Utilities version due to broken NamedTuple by @artbataev :: PR: #8022
  • Karpnv/issue 7320 by @karpnv :: PR: #7418
  • Speech Simulator, update README.md: output_path --> output_manifest_filepath by @popcornell :: PR: #7442
  • Fix None dataloader issue in PTL2.0 by @KunalDhawan :: PR: #7455
  • HF StarCoder to NeMo conversion script by @janekl :: PR: #7421
  • [doc] fix broken link by @stas00 :: PR: #7481
  • dllogger - log on rank 0 only by @stas00 :: PR: #7513
  • Add two youtube introductory videos to README and Docs. by @XuesongYang :: PR: #7570
  • defaults changed by @arendu :: PR: #7600
  • Bound transformers version in requirements by @athitten :: PR: #7620
  • Fix import error no module name model_utils by @menon92 :: PR: #7629
  • Fix in the confidence ensemble test by @Kipok :: PR: #7682
  • move core install to /workspace by @aklife97 :: PR: #7706
  • distributed checkpoint average script by @yidong72 :: PR: #7721
  • fix hybrid eval by @karpnv :: PR: #7757
  • fix(diarization-README): typo by @jqueguiner :: PR: #7771
  • Configure MCore logger by @mikolajblaz :: PR: #7781
  • Nemo to HF converter for LLaMA model by @uppalutkarsh :: PR: #7770
  • [Fix] Save best NeMo model only when necessary by @anteju :: PR: #7836
  • add guard if its a distributed checkpoint by @gshennvm :: PR: #7845
  • Update transformers cache on Jenkins by @ericharper :: PR: #7854
  • Update README.rst for container update by @fayejf :: PR: #7844
  • Fix mcore conversion bug by @cuichenx :: PR: #7846
  • add comment on script and fix target check by @gshennvm :: PR: #7881
  • fix issues with convert_nemo_llama_to_hf.py by @Zhilin123 :: PR: #7922
  • Instructions for running ci on pr template by @ericharper :: PR: #7944
  • Distributed checkpoint averaging supports bf16 type by @yidong72 :: PR: #7888
  • Fix tokenizer argparse in scripts by @titu1994 :: PR: #8012
  • Check dependencies in installation script by @artbataev :: PR: #8019
  • [SE Tutorial] USe GPU for inference, when available by @anteju :: PR: #8048
  • update reqs by @ericharper :: PR: #8072
  • Remove typo by @ericharper :: PR: #8146

v1.21.0

6 months ago

Highlights

Models

NeMo ASR

  • Multi-lookahead cache-aware streaming
  • Speech enahncement tutorial #6492
  • Online code switching dataset #6579

NeMo TTS

  • AudioCodec: Training recipe for EnCodec #6852

NeMo Framework

  • GPT from Mcore #7093
  • GPT distributed checkpointing #7116
  • Hidden transformations #6332
  • LLama-2 #7299

NeMo Core

  • Update to PTL 2.0 #6433

NeMo Tools

  • Forced aligner tutorial #7210

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.08

ASR

Changelog
  • Fix require_grad typos by @kit1980 :: PR: #6930
  • rnnt_greedy_decoding.py: typos? auto-repressively -> auto-regressively by @vadimkantorov :: PR: #6989
  • Adding tutorial for confidence ensembles by @Kipok :: PR: #6932
  • Add support for Numba FP16 RNNT Loss by @titu1994 :: PR: #6991
  • fix install_beamsearch_decoders by @karpnv :: PR: #7011
  • rnnt and char utils by @karpnv :: PR: #6971
  • ASR Confidence update and tutorial by @GNroy :: PR: #6810
  • st standalone model by @AlexGrinch :: PR: #6969
  • Fix typo in ASR-TTS tutorial by @artbataev :: PR: #7049
  • Update Frame-VAD doc and fix onnx export by @stevehuang52 :: PR: #7076
  • Fast Conformer global token fix by @sam1373 :: PR: #7085
  • Added script to extract ASR CTC and RNNT models from ASR hybrid models by @trias702 :: PR: #7092
  • Fix absolute path in path join call by @kingjan1999 :: PR: #7099
  • NeMo ASR Demo by @lleaver :: PR: #7110
  • Fix plot function in vad_utils.py by @stevehuang52 :: PR: #7113
  • Fixed small bug with NoisePerturbationWithNormalization by @trias702 :: PR: #7118
  • Merge release r1.20.0 to main by @ericharper :: PR: #7167
  • minor fix for conformer subsampling docstring. by @XuesongYang :: PR: #7195
  • [ASR] Fix GPU memory leak in transcribe_speech.py by @rlangman :: PR: #7249
  • Adding Multilingual, Code-Switched, and Hybrid ASR models by @KunalDhawan :: PR: #7250
  • fix partial transcribe by @stevehuang52 :: PR: #7284
  • Conv1d subsampling by @burchim :: PR: #7294
  • add bf16 inference support and fix seq_len stft issue by @nithinraok :: PR: #7338
  • Add finetuning scripts by @nithinraok :: PR: #7263
  • Move parameter: trainer -> exp_manager (for PTL 2.0) by @artbataev :: PR: #7339
  • Fix typos by @omahs :: PR: #7361
  • Fix wrong calling of librosa.get_duration() in notebook by @RobinDong :: PR: #7376
  • RNN-T confidence and alignment bugfix (#7381) by @GNroy :: PR: #7459
  • update branch by @nithinraok :: PR: #7488
  • Replace strategy = None with strategy = auto for notebooks by @athitten :: PR: #7521
  • Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue by @KunalDhawan :: PR: #7531
  • gpus -> devices by @nithinraok :: PR: #7542
  • [BugFix] Add missing quotes for auto strategy in tutorial notebooks by @athitten :: PR: #7541
  • Append output of val_step to self.validation_step_outputs in EncMaskDecAudioToAudioModel by @athitten :: PR: #7543
  • fix validation_step_outputs initialization for multi-dataloader by @KunalDhawan :: PR: #7546
  • Append val/test output to instance variable in EncDecSpeakerLabelModel by @athitten :: PR: #7562
  • update strategy by @nithinraok :: PR: #7577
  • Typo fixes by @Kipok :: PR: #7591
  • Fix metrics for SE tutorial by @anteju :: PR: #7604
  • fix ssl models ptl monitor val through logging by @nithinraok :: PR: #7608
  • Fix py3.11 dataclasses issue by @titu1994 :: PR: #7582
  • bugfix: trainer.gpus, trainer.strategy, trainer.accelerator by @XuesongYang :: PR: #7621
  • Safeguard nemo_text_processing installation on ARM (#7485) by @blisc :: PR: #7619
  • [ASR] Fix type error in jasper by @rlangman :: PR: #7636
  • Fix vad & speech command tutorial - onnx by @fayejf :: PR: #7671
  • Replace strategy='dp'/None with 'auto' by @athitten :: PR: #7681
  • Fix multi rank finetune for ASR by @titu1994 :: PR: #7684
  • fix ptl_bugs in slu_models.py by @jzi040941 :: PR: #7689
  • Add NLPDDPStrategyNotebook and change trainer gpus to devices by @athitten :: PR: #7741
  • Updated installation of ctc-decoders by @vsl9 :: PR: #7746
  • Fix bug wrt change decoding strategy for bpe models by @titu1994 :: PR: #7762

TTS

Changelog
  • [TTS] Add cosine distance option to TTS aligner by @rlangman :: PR: #6806
  • [TTS] Add tutorial for TTS data prep scripts by @rlangman :: PR: #6922
  • update TTS readme by @XuesongYang :: PR: #7088
  • [TTS] Create EnCodec training recipe by @rlangman :: PR: #6852
  • [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. by @XuesongYang :: PR: #6893
  • [TTS] Add output audio format to preprocessing by @rlangman :: PR: #6889
  • [TTS] Remove nested TTS configs by @rlangman :: PR: #7154
  • [TTS] Fix TTS recipes with PTL 2.0 by @rlangman :: PR: #7188
  • [TTS] Add license to ported EnCodec code by @rlangman :: PR: #7197
  • [Fix] Discriminator update in AudioCodecModel by @anteju :: PR: #7209
  • Adapter ipa Tutorial and config update by @styagi130 :: PR: #7260
  • [TTS] Audio codec fixes by @rlangman :: PR: #7266
  • [TTS] minor fix typos and input_types by @XuesongYang :: PR: #7272
  • specify explicitly to set pretrained model paths by @styagi130 :: PR: #7305
  • [TTS] Update AudioCodec API by @anteju :: PR: #7310
  • [TTS] Add additional config to preprocess_text and compute_feature_stats by @rlangman :: PR: #7321
  • [TTS] Change audio codec token type to TokenIndex by @rlangman :: PR: #7356
  • fixed trainer.strategy=auto from None. by @XuesongYang :: PR: #7369
  • [TTS] Added a callback for logging initial data by @anteju :: PR: #7384
  • [TTS] bugfix: trainer.accelerator=auto from None. by @XuesongYang :: PR: #7492
  • bugfix: specify trainer.strategy=auto when devices=1 by @XuesongYang :: PR: #7509
  • Fix dimensionality in get_dist function by @redoctopus :: PR: #7506
  • Fix TTS FastPitch tutorial by @hsiehjackson :: PR: #7494
  • [TTS] remove curly braces from in jupyer notebook cell. by @XuesongYang :: PR: #7554
  • [TTS] fixed trainer's accelerator and strategy. by @XuesongYang :: PR: #7569
  • Change hifigan finetune strategy to ddp_find_unused_parameters_true by @hsiehjackson :: PR: #7579
  • Fix validation in G2PModel and ThutmoseTaggerModel by @athitten :: PR: #7597
  • [TTS] Fix FastPitch data prep tutorial by @rlangman :: PR: #7602
  • [TTS] Add dataset to path of logged artifacts by @rlangman :: PR: #7651

NLP / NMT

Changelog
  • Minor MPT-7B fixes and creation script update by @trias702 :: PR: #6982
  • remove hard coded input and output fields by @arendu :: PR: #7008
  • RoPE length extrapolation with interpolation by @MaximumEntropy :: PR: #7005
  • add async + distopt to sft by @MaximumEntropy :: PR: #7018
  • ptuning inference table bug fix by @arendu :: PR: #7015
  • Fix missing import for GPT SFT by @MaximumEntropy :: PR: #7026
  • Add end_strings to SamplingParams by @markelsanz14 :: PR: #6986
  • Fix race condition for downloading cache when executing with multi-node by @findkim :: PR: #7016
  • added back the retro documents. by @yidong72 :: PR: #7033
  • remove pos emb from state dict for old models by @ekmb :: PR: #7068
  • memmap worker arg by @arendu :: PR: #7062
  • Disable distopt contiguous param buffer by default by @timmoon10 :: PR: #7095
  • [Fix] load_state_dict in nlp_model.py by @stevehuang52 :: PR: #7086
  • Fix tokenizer file caching where torch.distributed may not be initialized yet by @findkim :: PR: #7061
  • freeze base mode on init during peft by @arendu :: PR: #7152
  • Include the scripts for preprocessing OAST and unit tests for chat sft datasets by @yidong72 :: PR: #7112
  • T5 metrics fix by @jubick1337 :: PR: #7037
  • megatron gpt training fix by @anmolgupt :: PR: #7199
  • Fix T5 using FA by @hsiehjackson :: PR: #7196
  • fix-causal-fa-infer by @hsiehjackson :: PR: #7200
  • Fix gpt trainer test by @hsiehjackson :: PR: #6915
  • Load ub_cfg from hydra config by @jbaczek :: PR: #7003
  • Fixes for lightning 2.0 upgrade by @athitten :: PR: #7176
  • Fix which was off by one batch by @odelalleau :: PR: #7212
  • Start using ModelParallelConfig from Megatron Core by @ericharper :: PR: #6885
  • deprecation warning by @arendu :: PR: #7193
  • Fix attention mask inference by @hsiehjackson :: PR: #7213
  • Use GPTModel from mcore by @ericharper :: PR: #7093
  • Add bf16-mixed and 16-mixed in module.py by @athitten :: PR: #7227
  • Refactor LLM pretraining examples by @maanug-nv :: PR: #7159
  • Add only trainable parameters to optimizer group in PEFT by @guyueh1 :: PR: #7230
  • Dummy class for ModelParallelConfig by @ericharper :: PR: #7254
  • [TN][Docs] update language coverage matrix and refs by @mgrafu :: PR: #7247
  • tied weights for adapters by @arendu :: PR: #6928
  • Fix skip generation by @hsiehjackson :: PR: #7270
  • Hidden transforms model parallel config + CI with Perceiver by @michalivne :: PR: #7241
  • Fix restore sequence parallel by @hsiehjackson :: PR: #7273
  • fix ptuning and lora model_parallel_config by @blahBlahhhJ :: PR: #7287
  • Fix adapters and ptuning for amp O2 by @guyueh1 :: PR: #7285
  • remove additional line in peft state dict by @blahBlahhhJ :: PR: #7293
  • loss mask aware final layer applicaiton by @arendu :: PR: #7275
  • Adding server option to peft eval by @Davood-M :: PR: #7292
  • migrated class CSVFieldsMemmapDataset from BioNeMo by @dorotat-nv :: PR: #7314
  • remove old prompt table for storing cached ptunig representations by @arendu :: PR: #7295
  • Bugfix and optimization in by @odelalleau :: PR: #7267
  • Set a default value when getting by @yaox12 :: PR: #7115
  • Distributed checkpointing with mcore GPT by @ericharper :: PR: #7116
  • Fix activation checkpoint by @hsiehjackson :: PR: #7334
  • Replace prefetch with val iterator check in megatron models by @athitten :: PR: #7318
  • Fixing indentation bug in indexed_dataset memory deallocation by @michalivne :: PR: #7352
  • NeMo MCore llama2 support + MCore PEFT adapters by @blahBlahhhJ :: PR: #7299
  • Hiddens modules documentation by @michalivne :: PR: #7303
  • Support for flash attention 2.0 by @MaximumEntropy :: PR: #7063
  • multiple fields can form a context by @arendu :: PR: #7147
  • adding bias_dropout_add_fusion option for BERT by @clumsy :: PR: #7332
  • enable selective unfreeze by @arendu :: PR: #7326
  • Upgrade pytorch container to 23.08 by @ericharper :: PR: #7353
  • enable fp32 optimizer for output_layer in mcore by @lhb8125 :: PR: #7355
  • Revert comment by @ericharper :: PR: #7368
  • fix pipeline parallel inference by @blahBlahhhJ :: PR: #7367
  • fix for peft tied weights by @arendu :: PR: #7372
  • add O2 option in gpt eval by @blahBlahhhJ :: PR: #7358
  • Move model precision copy by @maanug-nv :: PR: #7336
  • Fix PEFT checkpoint loading by @blahBlahhhJ :: PR: #7388
  • Use distributed optimizer support for multiple dtypes by @timmoon10 :: PR: #7359
  • [PATCH] PEFT import mcore by @blahBlahhhJ :: PR: #7393
  • Use cfg attribute in bert by @maanug-nv :: PR: #7394
  • Add support for bias conversion in Swiglu models by @titu1994 :: PR: #7386
  • Update save_to and restore_from for dist checkpointing by @ericharper :: PR: #7343
  • fix forward for with mcore=false by @JimmyZhang12 :: PR: #7403
  • Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing by @athitten :: PR: #7374
  • Set Activation Checkpointing Defaults by @aklife97 :: PR: #7404
  • Make loss mask default to false by @ericharper :: PR: #7407
  • Add dummy userbuffer config files by @erhoo82 :: PR: #7408
  • Add missing ubconf files by @aklife97 :: PR: #7412
  • Update ptl training ckpt conversion script to work with dist ckpt by @ericharper :: PR: #7416
  • Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py by @athitten :: PR: #7454
  • fix bug when loading dist ckpt in peft by @lhb8125 :: PR: #7479
  • Fix CustomProgressBar for resume by @athitten :: PR: #7427
  • Append val output to self.validation_step_outputs in GLUEModel by @athitten :: PR: #7530
  • Cherry pick Fix sft dataset truncation (#7464) to r1.21.0 by @ericharper :: PR: #7550
  • Avoid duplicated dist checkpoint save by @mikolajblaz :: PR: #7555
  • layernorm1p fix by @dimapihtar :: PR: #7523
  • r1.21: SFT model parallel fix for dist ckpt by @aklife97 :: PR: #7520
  • PEFT needs mp config propagated for dist ckpt by @ericharper :: PR: #7589
  • Fix ptuning crash for llama 2 ckpt by @yuanzhedong :: PR: #7594
  • PEFT eval fix by @cuichenx :: PR: #7626
  • Propagate mp config for continue training by @ericharper :: PR: #7637
  • Add ddp_find_unused_parameters=True and change accelerator to auto by @athitten :: PR: #7623
  • Add find_unused_parameters_true for text_classiftn and punctuation_capitalization by @athitten :: PR: #7649
  • conversion issue fix by @dimapihtar :: PR: #7648
  • Fix a nlp nb onnx by @fayejf :: PR: #7703
  • Add activations_checkpoint related args for model cfg in lora.ipynb by @athitten :: PR: #7752
  • Change accelerator to 'auto' in nlp_checkpoint_port.py by @athitten :: PR: #7747
  • Add reconfigure microbatch calculator before inference and update GBS, MBS for inference by @athitten :: PR: #7763
  • Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer by @athitten :: PR: #7767

NeMo Tools

Changelog
  • Update doc, new tutorial on SDE by @Jorjeous :: PR: #7405
  • Fix branch version for SDE by @titu1994 :: PR: #7528

Export

Changelog
  • Added bool types to neural_types export by @tbartley94 :: PR: #7032

General Improvements

Changelog
  • Add migration guide for lightning 2.0 upgrade by @athitten :: PR: #7360
  • add support for max_total_length=4096 for 43b by @Zhilin123 :: PR: #6763
  • Change Jenkins timeout by @ericharper :: PR: #6997
  • Update SDP docs page with a new documentation link by @Kipok :: PR: #7029
  • Fixed tutorial's name by @vsl9 :: PR: #7047
  • Revert Fix import guard checks by @titu1994 :: PR: #7125
  • Fix import guard checks by @titu1994 :: PR: #7126
  • fix evaluator.py for various exceptions by ast by @stevehuang52 :: PR: #7150
  • NFA bugfix: remove any empty segments by @erastorgueva-nv :: PR: #7155
  • NFA subtitle file config - specify colors and vertical alignment by @erastorgueva-nv :: PR: #7160
  • add paths to labeler. by @XuesongYang :: PR: #7087
  • [Bugfix] Fix a bug in filtering checkpoints by @yaox12 :: PR: #6851
  • Update README.rst by @fayejf :: PR: #7175
  • Make NFA subtitles stay until end of video by @erastorgueva-nv :: PR: #7189
  • Uncomment removal of exp_dir in JenkinsFile by @athitten :: PR: #7198
  • NFA: replace ellipses in text with 3 periods by @erastorgueva-nv :: PR: #7208
  • NFA tutorial notebook by @erastorgueva-nv :: PR: #7210
  • NFA docs: update READMEs and links, add docs page by @erastorgueva-nv :: PR: #7219
  • Make image centering in NFA README actually work by @erastorgueva-nv :: PR: #7220
  • Add mcore installation to Dockerfile by @ericharper :: PR: #7237
  • Checkpoint averaging for model parallel by @Kipok :: PR: #7252
  • Upgrade hydra and omegaconf by @athitten :: PR: #7243
  • Update numba support in docker by @titu1994 :: PR: #7271
  • remove deprecated scripts from ci by @arendu :: PR: #7239
  • Logging model checkpoints as artifacts in MlFlow by @AlirezaMorsali :: PR: #7258
  • Adithyare/peft metric calculation by @arendu :: PR: #7304
  • Resume checkpoint priority by @maanug-nv :: PR: #7335
  • lora merge fix for O2 names by @arendu :: PR: #7325
  • Llama load buffers in checkpoint by @blahBlahhhJ :: PR: #7357
  • pin numba=0.57.1 to fix reinstall.sh error by @XuesongYang :: PR: #7366
  • Update to core 23.08 branch ToT by @aklife97 :: PR: #7371
  • Upper bounding ptl by @ericharper :: PR: #7370
  • minor fix for llama ckpt conversion script by @blahBlahhhJ :: PR: #7387
  • Update Core Commit by @aklife97 :: PR: #7402
  • Fix resume from checkpoint in exp_manager by @athitten :: PR: #7424
  • add sleep by @gshennvm :: PR: #7498
  • Fix exp manager check for sleep by @titu1994 :: PR: #7503
  • unpin setuptools by @fayejf :: PR: #7534
  • Update FFMPEG version to fix issue with torchaudio by @titu1994 :: PR: #7551
  • fix typos in nfa and speech enhancement tutorials by @erastorgueva-nv :: PR: #7580
  • best ckpt fix by @dimapihtar :: PR: #7564
  • add build os key by @nithinraok :: PR: #7596
  • Fix issues with Dockerfile by @titu1994 :: PR: #7650
  • Change confidence parameters in the test by @Kipok :: PR: #7680
  • bugfix: pin nemo-text-process to fix Chinese normalizer error. by @XuesongYang :: PR: #7627
  • Remove PUBLICATIONS.md, point to github.io NeMo page instead by @erastorgueva-nv :: PR: #7694
  • Pin mcore to 0.3 by @ericharper :: PR: #7751
  • fix hybrid eval by @karpnv :: PR: #7759
  • Update Apex install command in Dockerfile by @ericharper :: PR: #7794

v1.20.0

9 months ago

Highlights

Models

NeMo ASR

  • Graph-RNN-T #6168
  • WildCard-RNN-T #6168
  • Confidence Ensembles for ASR
  • Token-and-Duration Transducer (TDT) #6536
  • Spellchecking ASR #6179
  • Numba FP16 RNNT Loss #6991

NeMo TTS

  • TTS Adapter Customization
  • TTS Dataloader Framework

NeMo Framework

  • LoRA for T5 and mT5 #6612
  • Flash Attention integration #6666
  • Mosaic 7B compatibility
  • Models with LongContext (32K) #6666, #6687, #6773

NeMo Tools

  • Speech Data Explorer: Utterance level ASR model comparsion #6669
  • Speech Data Processor: Spanish P&C
  • NeMo Forced Aligner: Large sequence alignment + memory reduction #6695

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.06

Detailed Changelogs

ASR

Changelog
  • [ASR] Adding ssl config for fast-conformer by @krishnacpuvvada :: PR: #6672
  • Fix for interctc test random failure by @Kipok :: PR: #6644
  • sharded manifests docs by @bmwshop :: PR: #6751
  • [TTS] Implement new vocoder dataset by @rlangman :: PR: #6670
  • TDT model pull request by @hainan-xv :: PR: #6536
  • Spec aug fix by @tbartley94 :: PR: #6775
  • Support large inputs to Conformer and Fast Conformer by @bmwshop :: PR: #6556
  • sharded manifests updated docs by @bmwshop :: PR: #6833
  • added fc-xl, xxl and titanet-s models by @nithinraok :: PR: #6832
  • Multi-lookahead cache-aware streaming models by @VahidooX :: PR: #6711
  • Update transcribe_utils.py by @stevehuang52 :: PR: #6865
  • Fix k2 build topo helper by @artbataev :: PR: #6887
  • Fix transcribe_utils.py for hybrid models in partial transcribe mode by @stevehuang52 :: PR: #6899
  • Add hybrid model support to transcribe_speech_parallel.py by @stevehuang52 :: PR: #6906
  • Update Frame-VAD doc by @stevehuang52 :: PR: #6902
  • Make sure asr_model.change_attention_model is run if either cfg.model_path or cfg.pretrained_name is specified by @erastorgueva-nv :: PR: #6908
  • Update fvad doc by @stevehuang52 :: PR: #6920
  • Online Code Switching Dataset for ASR by @trias702 :: PR: #6579
  • Fix AN4 dataset links by @artbataev :: PR: #6926
  • Fix confidence ensembles RNNT logprobs selection logic for exclude_blank scenario by @KunalDhawan :: PR: #6937
  • Adding cache-aware streaming ASR checkpoints. by @VahidooX :: PR: #6940
  • Remove from metrics by @titu1994 :: PR: #6979
  • Hybrid conformer export by @borisfom :: PR: #6983
  • Cache handling without input tensors mutation by @borisfom :: PR: #6980
  • Fixing an issue with confidence ensembles by @Kipok :: PR: #6987
  • Add ASR with TTS Tutorial. Fix enhancer usage. by @artbataev :: PR: #6955
  • fix install_beamsearch_decoders.sh by @karpnv :: PR: #7019
  • Add support for Numba FP16 RNNT Loss (#6991) by @titu1994 :: PR: #7038
  • Fix typo and branch in tutorial by @artbataev :: PR: #7048
  • Refined export_config by @borisfom :: PR: #7053
  • Fix documentation for Numba by @titu1994 :: PR: #7065
  • Adding docs and models for multiple lookahead cache-aware ASR by @VahidooX :: PR: #7067
  • Add updated fc ctc and rnnt xxl models by @nithinraok :: PR: #7128
  • Update notebook branch by @ericharper :: PR: #7135
  • Fixed main and merging this to r1.20 by @tango4j :: PR: #7127
  • Fix default context size by @nithinraok :: PR: #7141
  • Fix incorrect embedding grads with distopt BF16 grad reductions by @timmoon10 :: PR: #6958

TTS

Changelog
  • [TTS] Add callback for saving audio during FastPitch training by @rlangman :: PR: #6665
  • [TTS] Add script for text preprocessing by @rlangman :: PR: #6541
  • [TTS] Fix adapter duration issue by @hsiehjackson :: PR: #6697
  • [TTS] Filter out silent audio files during preprocessing by @rlangman :: PR: #6716
  • [TTS] fix inconsistent type hints for IpaG2p by @XuesongYang :: PR: #6733
  • [TTS] relax hardcoded prefix for phonemes and tones and infer phoneme set through dict by @XuesongYang :: PR: #6735
  • [TTS] corrected misleading deprecation warnings. by @XuesongYang :: PR: #6702
  • Fix TTS adapter tutorial by @hsiehjackson :: PR: #6741
  • [TTS][zh] refine hardcoded lowercase for ASCII letters. by @XuesongYang :: PR: #6781
  • [TTS] Append pretrained FastPitch & SpectrogamEnhancer pair to available models by @racoiaws :: PR: #7012

NLP / NMT

Changelog
  • minor fix for missing chat attr by @arendu :: PR: #6671
  • eval fix by @arendu :: PR: #6685
  • VP Fixes for converter + Config management by @titu1994 :: PR: #6698
  • lora notebook by @arendu :: PR: #6765
  • peft eval directly from ckpt by @arendu :: PR: #6785
  • GPT inference long context by @ekmb :: PR: #6687
  • Fix validation with drop_last=False by @mikolajblaz :: PR: #6704
  • fix spellmapper tutorial, change branch to main by @bene-ges :: PR: #6803
  • text_generation_utils memory reduction if no logprob needed by @yzhang123 :: PR: #6773
  • Add optional index mapping dir in mmap text datasets by @gheinrich :: PR: #6683
  • Add inference kv cache support for transformer TE path by @yen-shi :: PR: #6627
  • add reference to our paper by @bene-ges :: PR: #6821
  • added changes to ramp up bs by @dimapihtar :: PR: #6799
  • t5 lora tuning by @arendu :: PR: #6612
  • Added rouge monitoring support for T5 by @jubick1337 :: PR: #6737
  • GPT extrapolatable position embedding (xpos/sandwich/alibi/kerple) and Flash Attention by @hsiehjackson :: PR: #6666
  • Import Enum for chatbot component by @ericharper :: PR: #6877
  • typo fix from #6666 by @arendu :: PR: #6882
  • removed unnecessary print by @dimapihtar :: PR: #6884
  • Fix destructor for delayed mmap dataset case by @mikolajblaz :: PR: #6703
  • Make Gradio library optional by @yidong72 :: PR: #6904
  • Fix fast-glu activation in change partitions by @hsiehjackson :: PR: #6909
  • Documentation for ONNX export of Megatron Models by @asfiyab-nvidia :: PR: #6914
  • FixTextMemMapDataset index file creation in multi-node setup by @gheinrich :: PR: #6768
  • Fix flash-attention by @hsiehjackson :: PR: #6901
  • ptuning oom fix by @arendu :: PR: #6916
  • add rampup bs assertion by @dimapihtar :: PR: #6927
  • Enable methods in bert-like models by @sararb :: PR: #6898
  • support value attribution condition by @yidong72 :: PR: #6934
  • Add missing save restore connector to eval scripts by @titu1994 :: PR: #6935
  • Merge release r1.19.0 into main by @ericharper :: PR: #6948
  • Stop at the stop token by @yidong72 :: PR: #6957
  • fixes for spellmapper by @bene-ges :: PR: #6994
  • Fix tabular data text generation by @yidong72 :: PR: #7022
  • fix pos id - hf update by @ekmb :: PR: #7075
  • fix syntax error introduced in PR-7079 by @bene-ges :: PR: #7102

NeMo Tools

Changelog
  • SDE unt lvl comparison by @Jorjeous :: PR: #6669
  • hot fix SDE by @Jorjeous :: PR: #6897

Bugfixes

Changelog
  • small Bugfix by @fayejf :: PR: #7079
  • Fix caching bug in causal convolutions for cache-aware ASR models by @VahidooX :: PR: #7034
  • Fix masking bug for TTS Aligner by @redoctopus :: PR: #6677
  • [bugfix] avoid the random shuffle of phoneme and tone tokens. by @XuesongYang :: PR: #6855
  • fix ptuning residuals bug by @arendu :: PR: #6866
  • TE bug fix by @dimapihtar :: PR: #7027
  • Update distopt API for coalesced NCCL calls by @timmoon10 :: PR: #6886

General Improvements

Changelog
  • update batch size recommendation to min 32 for 43b by @Zhilin123 :: PR: #6675
  • Make Note usage consistent in adapter_mixins.py by @BrianMcBrayer :: PR: #6678
  • Update all invalid tree references to blobs for NeMo samples by @BrianMcBrayer :: PR: #6679
  • Update README.rst about container by @fayejf :: PR: #6686
  • karpnv/issues6690 by @karpnv :: PR: #6705
  • Limit codeql scope by @titu1994 :: PR: #6710
  • Not pinning Gradio version by @yidong72 :: PR: #6680
  • preprocess squad in sft format by @arendu :: PR: #6727
  • Fix Codeql config by @titu1994 :: PR: #6731
  • Fix fastpitch test nightly by @hsiehjackson :: PR: #6730
  • Lora/PEFT training script CI test by @arendu :: PR: #6664
  • fixed decor to show messages only when the wrapped object is called. by @XuesongYang :: PR: #6793
  • lora pp2 by @arendu :: PR: #6818
  • Upperbound Numpy to < 1.24 by @titu1994 :: PR: #6829
  • Fix typo in documentation by @Dounx :: PR: #6838
  • NFA updates by @erastorgueva-nv :: PR: #6695
  • Update container for import action by @ericharper :: PR: #6883
  • removed some tests by @arendu :: PR: #6900
  • Update container info in README.rst by @fayejf :: PR: #6913
  • Removed optional optimize_for_inference by @borisfom :: PR: #6933
  • Update core commit for CI by @aklife97 :: PR: #6939
  • lora inference ci by @arendu :: PR: #6931
  • Upgrade base pytorch container to 23.06 by @ericharper :: PR: #6938
  • Fix requirements for pydantic + inflect by @titu1994 :: PR: #6956
  • Remove pyyaml by @titu1994 :: PR: #7052
  • Fix links in Segmentation tutorial by @ekmb :: PR: #7117
  • Update evaluator.py by @stevehuang52 :: PR: #7151

v1.19.1

9 months ago

This release is a small patch to fix torchmetrics.

  • Remove deprecated arg compute_on_step. See #6979.

v1.19.0

10 months ago

Highlights

NeMo ASR

  • Sharded Manifests for Tarred Datasets #6395
  • Frame-VAD model + datasets support #6441
  • Noise Norm Perturbation #6445
  • Code Switched Dataset with IID Sampling #6448

NeMo TTS

  • Speaker adaptation for FastPitch #6416, #6417

NeMo Megatron

  • Batch size rampup #6424
  • Unify dataset and model classes for all PEFT #6391
  • LoRA for GPT #6391
  • Convert interleaved pipeline model to non-interleaved #6498
  • Dialog Dataset for SFT #6654
  • Dynamic length batches for GPT SFT #6510
  • Merge LoRA weights into base model #6597

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.04

Detailed Changelogs

ASR

Changelog
  • Sharded manifests for tarred datasets by @bmwshop :: PR: #6395
  • Update script for ngram rnnt and hat beam search decoding by @andrusenkoau :: PR: #6370
  • Add disclaimer about dataset for ASR by @titu1994 :: PR: #6496
  • New noise_norm perturbation based on Riva work by @trias702 :: PR: #6445
  • Add Frame-VAD model and datasets by @stevehuang52 :: PR: #6441
  • removing unnecessary avoid_bfloat16_autocast_context by @bmwshop :: PR: #6481
  • FC models in menu by @bmwshop :: PR: #6473
  • Separate punctuation by whitespace by @karpnv :: PR: #6574
  • Cherry pick commits in #6601 to main by @fayejf :: PR: #6611
  • Offline and streaming inference support for hybrid model by @fayejf :: PR: #6570
  • Disable interctc tests by @Kipok :: PR: #6638
  • ASR-TTS Models: Support hybrid RNNT-CTC, improve docs. by @artbataev :: PR: #6620
  • Confidence ensembles implementation by @Kipok :: PR: #6614
  • Confidence ensembles: fix issues and add tuning functionality by @Kipok :: PR: #6657
  • Add support for RNNT/hybrid models to partial transcribe by @stevehuang52 :: PR: #6609
  • eval_beamsearch_ngram.py with hybrid ctc by @karpnv :: PR: #6656

TTS

Changelog
  • [TTS] FastPitch adapter fine-tune and conditional layer normalization by @hsiehjackson :: PR: #6416
  • [TTS] whitelist broken path fix. by @XuesongYang :: PR: #6412
  • [TTS] FastPitch speaker encoder by @hsiehjackson :: PR: #6417
  • Update NeMo_TTS_Primer.ipynb by @pythinker :: PR: #6436
  • [TTS] Create functions for TTS preprocessing without dataloader by @rlangman :: PR: #6317
  • [TTS] Fix FastPitch energy code by @rlangman :: PR: #6511
  • [TTS] Add script for computing feature stats by @rlangman :: PR: #6508
  • [TTS] Add tutorials for FastPitch TTS speaker adaptation with adapters by @hsiehjackson :: PR: #6431
  • [TTS] Create initial TTS dataset feature processors by @rlangman :: PR: #6507
  • [TTS] Add script for mapping speaker names to indices by @rlangman :: PR: #6509
  • [TTS] Implement new TextToSpeech dataset by @rlangman :: PR: #6575

NLP / NMT

Changelog
  • Add patches for Virtual Parallel conversion by @titu1994 :: PR: #6589
  • Update wfst_text_normalization.rst by @jimregan :: PR: #6374
  • add rampup batch size support for Megatron GPT by @dimapihtar :: PR: #6424
  • Add interleaved pp support by @titu1994 :: PR: #6498
  • Support dynamic length batches with GPT SFT by @aklife97 :: PR: #6510
  • Framework for PEFT via mixins by @arendu :: PR: #6391
  • Add GPT eval mode fix for interleaved to main (#6449) by @aklife97 :: PR: #6610
  • sft model can use this script for eval by @arendu :: PR: #6637
  • Patch memory used for NeMo Megatron models by @titu1994 :: PR: #6615
  • merge lora weights into base model by @arendu :: PR: #6597
  • Dialogue dataset by @yidong72 :: PR: #6654
  • check for first or last stage by @ericharper :: PR: #6708
  • A few small typo fixes by @Kipok :: PR: #6599
  • Lddl bert by @wdykas :: PR: #6761
  • Debug Transformer Engine FP8 support with Megatron-core infrastructure by @timmoon10 :: PR: #6740
  • Tensor-parallel communication overlap with userbuffer backend by @erhoo82 :: PR: #6780
  • Add ub communicator initialization to validation step by @erhoo82 :: PR: #6807
  • Add trainer.validate example for GPT by @ericharper :: PR: #6794
  • Add API docs for NeMo Megatron by @ericharper :: PR: #6850
  • Apply garbage collection interval to validation steps by @erhoo82 :: PR: #6870

Bugfixes

Changelog
  • [BugFix] Force _get_batch_preds() to keep logits in decoder timestamps generator by @tango4j :: PR: #6499
  • small bugfix for asr_evaluator by @fayejf :: PR: #6636
  • fix bucketing bug issue for picking new bucket by @nithinraok :: PR: #6663
  • [TTS] Fix TTS audio preprocessing bugs by @rlangman :: PR: #6628
  • Fix a bug, use _ceil_to_nearest instead as _round_to_nearest is not d… by @BestJuly :: PR: #6681
  • Bug fix to restore act ckpt by @markelsanz14 :: PR: #6753
  • Bug fix to reset sequence parallelism by @markelsanz14 :: PR: #6756
  • Bug fix for reset_sequence_parallel_args by @markelsanz14 :: PR: #6802
  • Fix adapter tutorial r1.19.0 by @hsiehjackson :: PR: #6776
  • Fix error appearing when using tar datasets by @Jorjeous :: PR: #6502
  • Fix normalization of impulse response in ImpulsePerturbation by @anteju :: PR: #6505
  • Fix typos by @titu1994 :: PR: #6523
  • Fix notebook bad json by @titu1994 :: PR: #6561
  • [ASR] Fix for old models in change_attention_model by @sam1373 :: PR: #6608
  • Fix k2 installation in Docker with CUDA 12 by @artbataev :: PR: #6707
  • Tutorial fixes by @titu1994 :: PR: #6717
  • Vp fixes by @titu1994 :: PR: #6738
  • [TTS] Fix aligner nan loss in fp32 by @hsiehjackson :: PR: #6435
  • fix conversion and eval by @arendu :: PR: #6648
  • Fix checkpointed forward and add test for full activation checkpointing by @aklife97 :: PR: #6744
  • add call to p2p overlap by @aklife97 :: PR: #6779
  • Fix get_parameters when using main params optimizer by @ericharper :: PR: #6764
  • Fix GPTDataset Assert by @MaximumEntropy :: PR: #6798
  • fix notebook error by @yidong72 :: PR: #6840
  • final fix of notebook by @yidong72 :: PR: #6842

General Improvements

Changelog
  • Code-Switching dataset creation - upgrading to aggregate tokenizer manifest format by @KunalDhawan :: PR: #6448
  • Fix an invalid link in get_data.py of ljspeech by @pythinker :: PR: #6456
  • Update manifest.py to use os.path for get_full_path by @stevehuang52 :: PR: #6598
  • Cherry pick commits in #6528 to main by @timmoon10 :: PR: #6613
  • Move black parameters to pyproject.toml by @artbataev :: PR: #6647
  • handle artifacts when path is an extracted dir by @arendu :: PR: #6658
  • remove upgrading setuptools in reinstall.sh by @XuesongYang :: PR: #6659
  • Upgrade to PyTorch 23.04 Container by @ericharper :: PR: #6660
  • Fix fastpitch test nightly by @hsiehjackson :: PR: #6742
  • Fix Links for tutorials by @titu1994 :: PR: #6777
  • Update core version in Jenkinsfile by @aklife97 :: PR: #6817
  • Update mcore requirement to 0.2.0 by @ericharper :: PR: #6875

v1.18.1

11 months ago

Highlights

For the complete release note, please see NeMo 1.18.0 Release Notes

Bugfix

This patch release fixes a major bug in ASR Bucketing datasets that was introduced in r1.17.0 in PR https://github.com/NVIDIA/NeMo/pull/6191. Due to this bug, while each bucket is randomly shuffled before selection on each rank, only a single bucket would loop infinitely - without continuing onto subsequent buckets.

Effect: Significantly worse WER would be obtained since not all buckets would be used.

This has been patched and should work correctly in 1.18.1 onwards.

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.03

v1.18.0

11 months ago

Highlights

Models

NeMo ASR

  • Hybrid Autoregressive Transducer (HAT) #6260
  • Apple MPS Support for ASR Inference #6289
  • InterCTC Support for Hybrid ASR Models #6215
  • RNNT N-Gram Fusion with mAES algo #6118
  • ASR + Apple M2 CPU/GPU MPS #6289

NeMo TTS

  • TTS directory structure refactor
  • User-set symbol vocabulary #6172

NeMo Megatron

  • Model parallelism from Megatron Core #6393
  • Continued training for P-tuning #6273
  • SFT for GPT-3 #6210
  • Tensor and pipeline model parallel conversion #6218
  • Megatron NMT Export to Riva

NeMo Core

Detailed Changelogs

ASR

Changelog
  • minor cleanup by @messiaen :: PR: #6311
  • docs on the use of heterogeneous test / val manifests by @bmwshop :: PR: #6352
  • [WIP] add buffered chunked streaming for nemo force aligner by @Slyne :: PR: #6185
  • Word boosting for Flashlight decoder by @trias702 :: PR: #6367
  • Add installation and ASR inference instructions for Mac by @artbataev :: PR: #6377
  • specaug speedup by @1-800-BAD-CODE :: PR: #6347
  • updated lr for FC configs by @bmwshop :: PR: #6379
  • Make possible to control tqdm progress bar in ASR models by @SN4KEBYTE :: PR: #6375
  • [ASR] Conformer global tokens in local attention by @sam1373 :: PR: #6253
  • fixed torch warning on using a list of numpy arrays by @MKNachesa :: PR: #6382
  • Fix FastConformer config: correct bucketing strategy by @artbataev :: PR: #6413
  • fixing the ability to use temp sampling with concat datasets by @bmwshop :: PR: #6423
  • add conformer configs for hat model by @andrusenkoau :: PR: #6372
  • [ASR] Add optimization util for linear sum assignment algorithm by @tango4j :: PR: #6349
  • Added/updated new Conformer configs by @VahidooX :: PR: #6426
  • Fix typos by @titu1994 :: PR: #6494
  • Fix typos (#6523) by @titu1994 :: PR: #6539
  • added back the fast emit section to the configs. by @VahidooX :: PR: #6540
  • Add FastConformer Hybrid ASR models for EN, ES, IT, DE, PL, HR, UA, BY by @KunalDhawan :: PR: #6549
  • Add scores for FastConformer models by @titu1994 :: PR: #6557
  • Patch transcribe and support offline transcribe for hybrid model by @fayejf :: PR: #6550
  • More streaming conformer export fixes by @messiaen :: PR: #6567
  • Documentation for ASR-TTS models by @artbataev :: PR: #6594
  • Patch transcribe_util for steaming mode and add wer calculation back to inference scripts by @fayejf :: PR: #6601
  • Add HAT image to docs by @andrusenkoau :: PR: #6619
  • Patch decoding for PC models by @titu1994 :: PR: #6630
  • Fix wer.py where 'errors' variable was not set by @stevehuang52 :: PR: #6633
  • Fix for old models in change_attention_model by @VahidooX :: PR: #6635

TTS

Changelog
  • VITS HiFiTTS doc by @treacker :: PR: #6288
  • fix broken links r1.18.0 by @ekmb :: PR: #6501
  • [TTS] fixed broken path. by @XuesongYang :: PR: #6514

NLP / NMT

Changelog
  • [Core] return_config=True now extracts just config, not full tarfile by @titu1994 :: PR: #6346
  • restore path for p-tuning by @arendu :: PR: #6273
  • taskname and early stopping for adapters by @arendu :: PR: #6366
  • Adapter tuning accepts expanded language model dir by @arendu :: PR: #6376
  • Update gpt_training.rst by @blisc :: PR: #6378
  • Megatron GPT model finetuning by @MaximumEntropy :: PR: #6210
  • [NeMo Megatron] Cleanup configs to infer the models TP PP config automatically by @titu1994 :: PR: #6368
  • Fix prompt template unescaping by @MaximumEntropy :: PR: #6399
  • Add support for Megatron GPT Untied Embd TP PP Change by @titu1994 :: PR: #6388
  • Move Parallelism usage from Apex -> Megatron Core by @aklife97 :: PR: #6393
  • Add ability to enable/disable act ckpt and seq parallelism in GPT by @markelsanz14 :: PR: #6327
  • Refactor PP conversion + add support for TP only conversion by @titu1994 :: PR: #6419
  • fix CPU overheads of GPT synthetic dataset by @xrennvidia :: PR: #6427
  • check if grad is none before calling all_reduce by @arendu :: PR: #6428
  • Fix replace_bos_with_pad not found by @aklife97 :: PR: #6443
  • Support Swiglu in TP PP Conversion by @titu1994 :: PR: #6437
  • BERT pre-training mp fork to spawn by @aklife97 :: PR: #6442
  • Meagtron encoder decoder fix for empty validation outputs by @michalivne :: PR: #6459
  • Reduce workers on NMT CI by @aklife97 :: PR: #6472
  • Switch to NVIDIA Megatron repo by @aklife97 :: PR: #6465
  • Megatron KERPLE positional embeddings by @michalivne :: PR: #6478
  • Support in external sample mapping for Megatron datasets by @michalivne :: PR: #6462
  • Fix custom by @aklife97 :: PR: #6512
  • GPT fp16 inference fix by @MaximumEntropy :: PR: #6543
  • Fix for T5 FT model by @aklife97 :: PR: #6529
  • Pass instead of scaler object to core by @aklife97 :: PR: #6545
  • Change Megatron Enc Dec model to use persistent_workers by @aklife97 :: PR: #6548
  • Turn autocast off when precision is fp32 by @aklife97 :: PR: #6554
  • Fix batch size reconf for T5 FT for multi-validation by @aklife97 :: PR: #6582
  • Make tensor split contiguous for qkv and kv in attention by @aklife97 :: PR: #6580
  • Patches from main to r1.18.0 for Virtual Parallel by @titu1994 :: PR: #6592
  • Create dummy iters to satisy iter type len checks in core + update core commit by @aklife97 :: PR: #6600
  • Restore GPT support for interleaved pipeline parallelism by @timmoon10 :: PR: #6528
  • Add megatron_core to requirements by @ericharper :: PR: #6639

Export

Changelog

Bugfixes

Changelog
  • Fix the GPT SFT datasets loss mask bug by @yidong72 :: PR: #6409
  • [BugFix] Fix multi-processing bug in data simulator by @tango4j :: PR: #6310
  • Fix cache aware hybrid bugs by @VahidooX :: PR: #6466
  • [BugFix] Force _get_batch_preds() to keep logits in decoder timestamp… by @tango4j :: PR: #6500
  • Fixing bug in unsort_tensor by @borisfom :: PR: #6320
  • Bugfix for BF16 grad reductions with distopt by @timmoon10 :: PR: #6340
  • Limit urllib3 version to patch issue with RTD by @aklife97 :: PR: #6568

General improvements

Changelog
  • Pin the version to hopefully fix rtd build by @SeanNaren :: PR: #6334
  • enabling diverse datasets in val / test by @bmwshop :: PR: #6306
  • extract inference weights by @arendu :: PR: #6353
  • Add opengraph support for NeMo docs by @titu1994 :: PR: #6380
  • Adding basic preemption code by @athitten :: PR: #6161
  • Add documentation for preemption support by @athitten :: PR: #6403
  • Update hyperparameter recommendation based on experiments by @Zhilin123 :: PR: #6405
  • exceptions with empty test / val ds config sections by @bmwshop :: PR: #6421
  • Upgrade pt 23.03 by @ericharper :: PR: #6430
  • Update README to add core installation by @aklife97 :: PR: #6488
  • Not doing CastToFloat by default by @borisfom :: PR: #6524
  • Update manifest.py for speedup by @stevehuang52 :: PR: #6565
  • Update SDP docs by @erastorgueva-nv :: PR: #6485
  • Update core commit hash in readme by @aklife97 :: PR: #6622
  • Remove from jenkins by @ericharper :: PR: #6641
  • Remove dup by @ericharper :: PR: #6643

v1.17.0

1 year ago

Highlights

NeMo ASR

  • Online Clustering Diarizer
  • High Level Diarization API
  • PyCTC Decode Beam Search Support
  • RNNT Beam Search Alignment Extraction
  • InterCTC Loss
  • AIStore Documentation
  • ASR & AWS Multi-node Integration
  • Convolution Invariant SDR losses

NeMo TTS

NeMo Megatron

  • SqaredReLU, SwiGLU, No-Dropout
  • Rotary Position Embedding
  • Untie word embeddings and output projection

NeMo Core

  • Dynamic freezing of modules during training
  • NeMo Multi-Run Documentation
  • ClearML Logging
  • Early Stopping
  • Experiment Manager Docs Update

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.02

Detailed Changelogs

ASR

Changelog
  • Support Alignment Extraction for all RNNT Beam decoding methods by @titu1994 :: PR: #5925
  • Use module-based k2 import guard by @artbataev :: PR: #6006
  • Default RNNT loss to int64 targets by @titu1994 :: PR: #6011
  • Added documentation section for ASR datasets from AIStore by @anteju :: PR: #6008
  • Change perturb rng for reproducing results easily by @fayejf :: PR: #6042
  • InterCTC loss and stochastic depth implementation by @Kipok :: PR: #6013
  • Add pyctcdecode to high level beam search API by @titu1994 :: PR: #6026
  • Convert esperanto into a notebook by @SeanNaren :: PR: #6070
  • [ASR] Added a script for evaluating metrics for audio-to-audio by @anteju :: PR: #5971
  • [ASR] Convolution-invariant SDR loss + unit tests by @anteju :: PR: #5992
  • Adjust stochastic depth dropout probability calculation by @anteju :: PR: #6120
  • Add file class based inference API for diarization by @SeanNaren :: PR: #5945
  • Ngram by @karpnv :: PR: #6063
  • remove duplicate definition of manifest read and write func. by @XuesongYang :: PR: #6088
  • Streaming conformer CTC export by @messiaen :: PR: #5837
  • [TTS] Make mel spectrogram norm configurable by @rlangman :: PR: #6155
  • Ngram lm fusion for RNNT maes decoding by @andrusenkoau :: PR: #6118
  • ASR Beam search documentation by @titu1994 :: PR: #6244

TTS

Changelog
  • [TTS][ZH] added new NGC model cards with polyphone disambiguation. by @XuesongYang :: PR: #5940
  • [TTS] deprecate AudioToCharWithPriorAndPitchDataset. by @XuesongYang :: PR: #5959
  • [TTS][G2P] deprecate add_symbols by @XuesongYang :: PR: #5961
  • Added list_available_models by @treacker :: PR: #5967
  • Update Fastpitch energy bug by @blisc :: PR: #5969
  • removed WHATEVER(1) ˌhwʌˈtɛvɚ from scripts/tts_dataset_files/ipa_cmudict-0.7b_nv22.10.txt by @MikyasDesta :: PR: #5869
  • ONNX export for RadTTS by @borisfom :: PR: #5880
  • Add some info about FastPitch SSL model by @redoctopus :: PR: #5994
  • Vits doc by @treacker :: PR: #5989
  • Ragged batching changes for RadTTS, some refactoring by @borisfom :: PR: #6020
  • Working enabled ragged batching with ONNX by @borisfom :: PR: #6030
  • [TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982
  • [TTS] Add Spanish IPA dictionaries and heteronyms by @rlangman :: PR: #6037
  • [TTS] Separate TTS tokenization and g2p util to fix circular import by @rlangman :: PR: #6080
  • [TTS][refactor] Part 7 - move module from model file. by @XuesongYang :: PR: #6098
  • [TTS][refactor] Part 1 - nemo.collections.tts.data by @XuesongYang :: PR: #6099
  • [TTS][refactor] Part 2 - nemo.colletions.tts.parts by @XuesongYang :: PR: #6105
  • [TTS][refactor] Part 6 - remove nemo.collections.tts.torch.README.md and tts_dataset.yaml by @XuesongYang :: PR: #6103
  • [TTS][refactor] Part 3 - nemo.collections.tts.g2p.models by @XuesongYang :: PR: #6113
  • [TTS] update German NGC models trained on Thorsten Datasets by @XuesongYang :: PR: #6125
  • [TTS] remove old waveglow model that relies on torch_stft. by @XuesongYang :: PR: #6128
  • [TTS] Move Spanish polyphones from heteronym to dictionary by @rlangman :: PR: #6123
  • [TTS][refactor] Part 8 - added model inference tests to safeguard changes. by @XuesongYang :: PR: #6129
  • remove duplicate definition of manifest read and write func. by @XuesongYang :: PR: #6088
  • [TTS][refactor] update tutorial import paths. by @XuesongYang :: PR: #6176
  • [TTS] Add univnet scheduler by @ArtyomZemlyak :: PR: #6157
  • [TTS] Make mel spectrogram norm configurable by @rlangman :: PR: #6155

NLP / NMT

Changelog
  • add new lannguages to doc by @yzhang123 :: PR: #5939
  • Distributed Adam optimizer overlaps param all-gather with forward compute by @timmoon10 :: PR: #5684
  • Refactor the retrieval services for microservice architecture by @yidong72 :: PR: #5910
  • make validation accuracy reporting optional for adapters/ptuning by @arendu :: PR: #5843
  • Add BERT support for overlapping forward compute with distopt communication by @timmoon10 :: PR: #6024
  • [TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982
  • adding early stop callback to ptuning by @arendu :: PR: #6028
  • Pr doc tn by @yzhang123 :: PR: #6041
  • Adds several configurable flags for Megatron GPT models by @MaximumEntropy :: PR: #5991
  • P-tuning refactor Part 1/N by @arendu :: PR: #6054
  • Fast glu activations by @MaximumEntropy :: PR: #6058
  • P-tuning refactor Part 2/N by @arendu :: PR: #6056
  • P-tuning refactor Part 3/N by @arendu :: PR: #6106
  • Explicitly check for united embeddings when logging params by @MaximumEntropy :: PR: #6085
  • Add flag to get attention from fusion by @ericharper :: PR: #6049
  • Improving text memmap generated index files error messages by @michalivne :: PR: #6093
  • Megatron Encoder-Decoder Sampler Function by @michalivne :: PR: #6095
  • Sentence piece legacy false compatibility by @arendu :: PR: #6154
  • convert Megatron LM ckpt to NeMo PP support. by @yidong72 :: PR: #6159
  • Avoid multiple warnings for loss mask by @mikolajblaz :: PR: #6062
  • Propagate LayerNorm1P to TE by @mikolajblaz :: PR: #6061
  • Filter p-tuning by example length by @arendu :: PR: #6182
  • Add sequence parallel support to Rope positional embedding by @yidong72 :: PR: #6178
  • Use a separate communicator for DP AMAX reduction by @erhoo82 :: PR: #6022
  • Add persistent workers to GPT by @ericharper :: PR: #6205
  • Micro batch loader for bert model by @shanmugamr1992 :: PR: #6046
  • GPT P tuning Eval changes (#5952) by @aklife97 :: PR: #6272
  • add template for taskname=taskname by @Zhilin123 :: PR: #6283
  • added RPE + fixed RMSNorm by @Davood-M :: PR: #6304
  • simplified notebook for p-tuning by @arendu :: PR: #6326
  • Added num decoder blocks in megatron export by @Davood-M :: PR: #6331

Text Normalization / Inverse Text Normalization

Changelog
  • [TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982

Export

Changelog
  • ONNX export for RadTTS by @borisfom :: PR: #5880
  • Working enabled ragged batching with ONNX by @borisfom :: PR: #6030
  • Update docs for ExpManager and Exportable frameworks by @titu1994 :: PR: #6165
  • Streaming conformer CTC export by @messiaen :: PR: #5837
  • MixedFusedRMSNorm Export Fix by @Davood-M :: PR: #6296
  • Added num decoder blocks in megatron export by @Davood-M :: PR: #6331

Bugfixes

Changelog
  • Fix bug where GPT always enabled distopt overlapped param sync by @timmoon10 :: PR: #5995
  • CS bugfix by @bmwshop :: PR: #6122
  • RNNT patch by @titu1994 :: PR: #6231
  • Notebook fixes by @titu1994 :: PR: #6212
  • Small fixes for flashlight decoder by @trias702 :: PR: #6071
  • Various fixes in docs and RNNT by @titu1994 :: PR: #6156
  • Fix k2 and torchaudio installation (Docker, macOS) by @artbataev :: PR: #6094
  • update and deprecate warning for Mic notebook by @fayejf :: PR: #6307
  • small bugfix and add asr evaluator to doc by @fayejf :: PR: #6229
  • Bug fixing for bucketing dataset by @VahidooX :: PR: #6191
  • Fix character beam decoding algorithm with vocab index map by @titu1994 :: PR: #6140
  • fix typo in asr evaluator readme by @fayejf :: PR: #6053
  • Fix typos by @titu1994 :: PR: #6241
  • [ASR]:fixed augmentor arguments for transcribe functionality of Hybrid CTC-RNNT model by @KunalDhawan :: PR: #6290
  • Fix hybrid transcribe by @ArtyomZemlyak :: PR: #6003
  • Fix buckeing seeding by @VahidooX :: PR: #6254
  • Fix for CTC decoder setup by @vsl9 :: PR: #6303
  • Fix RNNT Joint narrow() by @titu1994 :: PR: #6336
  • Fix bugs with interctc mixin by @Kipok :: PR: #6228
  • Update IPA dict path in tutorial by @redoctopus :: PR: #6208
  • [TTS] fix broken tutorial for Tacotron2 by @XuesongYang :: PR: #6199
  • [TTS] fix bugs for chinese and german tutorials. by @XuesongYang :: PR: #6216
  • Fix radtts sort r17 by @borisfom :: PR: #6344
  • Quick Fix for RadTTS test by @blisc :: PR: #6034
  • Disabling radtts tests untin we have real model by @borisfom :: PR: #6036
  • fix val loss computation in megatron by @anmolgupt :: PR: #5871
  • Fix incomplete batches by @mikolajblaz :: PR: #6083
  • Avoid unnecessarily accessing data loader with pipeline parallelism by @timmoon10 :: PR: #6164
  • bugfix: file handlers are not closed. by @XuesongYang :: PR: #5956
  • Fix Silence Sampling Algorithm for ASR Multi-speaker Data Simulator by @stevehuang52 :: PR: #5897
  • Fix Windows bug with save_restore_connector by @trias702 :: PR: #5919
  • fix broken link by @ericharper :: PR: #5968
  • Fix torchaudio installation by @artbataev :: PR: #5850
  • Fix reinstall.sh dependencies by @titu1994 :: PR: #6027
  • Adding changes to fix the mv error by @tango4j :: PR: #6087
  • Fix README by @flx42 :: PR: #6137
  • Fix typos in voiceapp notebook by @titu1994 :: PR: #6262
  • [BugFix] Fix diarization result path errors in tutorial notebook for r1.17.0 by @tango4j :: PR: #6234
  • [BugFix] Fix the wrong branch name in speaker diarization inference notebook by @tango4j :: PR: #6301

General Improvements

Changelog
  • Dynamic freezing in Nemo by @trias702 :: PR: #5879
  • Move settings to . Remove deprecated by @artbataev :: PR: #5947
  • update container info in readme by @fayejf :: PR: #5981
  • Update PUBLICATIONS.md by @titu1994 :: PR: #5963
  • [G2P] backward compatibility for english tokenizer and bugfix by @github-actions[bot] :: PR: #5984
  • replace symbols by @github-actions[bot] :: PR: #5990
  • correct bash style according to SC2236. by @XuesongYang :: PR: #6025
  • Update align.py by @github-actions[bot] :: PR: #6045
  • Add Customization Dataset Preparation Tool by @Zhilin123 :: PR: #6029
  • Updated data simulator config part in Speaker_Diarization_Training.ipynb by @tango4j :: PR: #6072
  • Add citation by @ericharper :: PR: #6077
  • [TTS] Spectrogram Enhancer: correct dim for length when loading data by @github-actions[bot] :: PR: #6074
  • Add ClearML Logging by @ArtyomZemlyak :: PR: #6014
  • update readme with new badges by @XuesongYang :: PR: #6110
  • [CI] Set readthedocs python version to 3.8 by @SeanNaren :: PR: #6079
  • Update dataset preparation tool to fix bug relating to non jsonl input file by @Zhilin123 :: PR: #6147
  • update finetune configs by @nithinraok :: PR: #6152
  • Added ckpt to nemo for T5/T0 models by @Davood-M :: PR: #6141
  • Save model parallel .nemo in ExpManager by @arendu :: PR: #6115
  • Upgrade setuptools by @fayejf :: PR: #6163
  • Update container version in main readme by @fayejf :: PR: #6171
  • metric update by @arendu :: PR: #6169
  • Upgrade base container to PyTorch 23.02 by @ericharper :: PR: #6162
  • Link to nm launcher by @ericharper :: PR: #6226
  • Make AIS CLI installation optional by @anteju :: PR: #6314
  • remove pinned numba version in Dockerfile by @fayejf :: PR: #6341
  • Cherry-pick recent distopt commits by @timmoon10 :: PR: #6343
  • Update readme by @ericharper :: PR: #6363

v1.16.0

1 year ago

Highlights

NeMo ASR

  • ASR Evaluator
  • Multi-channel dereverberation algorithm
  • Hybrid ASR-TTS Models
  • Flashlight Decoder Beam Search
  • FastConformer Encoder with 8x subsampling

NeMo TTS

  • SSL Voice Conversion
  • Spectrogram Enhancer
  • VITS

NeMo Megatron

  • Per microbatch dataloader for GPT and BERT
  • Adapters compatible with Faster Transformer

NeMo Core

  • Nested model support

NeMo Tools

  • NeMo Forced Aligner

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.01

ASR

Changelog
  • Fix for incorrect computation of batched alignment in transducers by @Kipok :: PR: #5692
  • Set the stream position to 0 for pydub by @jonghwanhyeon :: PR: #5752
  • [Fix] ConformerEncoder forward when length is None by @anteju :: PR: #5761
  • ASR evaluator by @fayejf :: PR: #5728
  • [ASR][Test] Enable test for cache audio with a single worker by @anteju :: PR: #5763
  • Flashlight Decoder for Nemo by @trias702 :: PR: #5790
  • Fix data simulator by @stevehuang52 :: PR: #5813
  • [ASR] Mask-based dereverb algorithm by @anteju :: PR: #5693
  • Concat dataset and aistore support for label models by @Kipok :: PR: #5826
  • Adding new features and speed up for multi-speaker data simulator by @tango4j :: PR: #5846
  • Add Esperanto ASR example by @andrusenkoau :: PR: #5772
  • Fix memory allocation of NeMo Multi-speaker Data Simulator by @stevehuang52 :: PR: #5864
  • [ASR] Separate Audio-to-Text (BPE, Char) dataset construction by @artbataev :: PR: #5774
  • Reduce memory usage in getMultiScaleCosAffinityMatrix function by @gabitza-tech :: PR: #5876
  • Hybrid ASR-TTS models by @artbataev :: PR: #5659
  • Set providers for onnxruntime inference session by @athitten :: PR: #5903
  • [ASR] Configurable metrics for audio-to-audio + removed experimental decorators by @anteju :: PR: #5827
  • Correct doc for RNNT transcribe() function by @titu1994 :: PR: #5904
  • Update isort to the latest version by @artbataev :: PR: #5895
  • FilterbankFeaturesTA to match FilterbankFeatures by @msis :: PR: #5913
  • Fix hybridasr bug by @VahidooX :: PR: #5950
  • replace symbols by @nithinraok :: PR: #5974
  • fast conformer configs and doc by @bmwshop :: PR: #5970
  • Update TitaNet-L and MSDD models by @nithinraok :: PR: #6023
  • Fix enhancer usage by @artbataev :: PR: #6059
  • update librosa args by @nithinraok :: PR: #6086
  • Fix enhancer usage in ASR-TTS examples by @artbataev :: PR: #6116
  • Fix k2 and torchaudio installation (Docker, macOS). Cherry-pick (#6094) by @artbataev :: PR: #6124

TTS

Changelog
  • [TTS] Update Spanish TTS model to 1.15 by @rlangman :: PR: #5742
  • [TTS][DE] refine grapheme-based tokenizer and fastpitch training recipe on thorsten's neutral datasets. by @XuesongYang :: PR: #5753
  • No-script TS export, prepared for ONNX export by @borisfom :: PR: #5653
  • Fixing masking in RadTTS bottleneck layer by @borisfom :: PR: #5771
  • Port Riva's mel cepstral distortion w/ dynamic time warping notebook by @redoctopus :: PR: #5778
  • Update radtts' infer path by @blisc :: PR: #5788
  • [TTS][DE] Augment tokenization/G2P to preserve capitalization of words and mix phonemes with word-level graphemes for an input text. by @XuesongYang :: PR: #5805
  • [TTS] porting VITS implementation by @treacker :: PR: #5600
  • [TTS][DE] updated IPA dictionary and heteronyms by @XuesongYang :: PR: #5860
  • [TTS] GAN-based spectrogram enhancer by @racoiaws :: PR: #5565
  • TTS inference with Heteronym classification model, hc model inference refactoring by @ekmb :: PR: #5768
  • Remove MCD_DTW tarball by @redoctopus :: PR: #5889
  • Hybrid ASR-TTS models by @artbataev :: PR: #5659
  • Moved eval notebook data to aws by @redoctopus :: PR: #5911
  • [G2P] fixed typos and broken import library. by @XuesongYang :: PR: #5978
  • [G2P] backward compatibility for english tokenizer and bugfix by @XuesongYang :: PR: #5980
  • fix links, add missing file by @ekmb :: PR: #6044
  • [TTS] Spectrogram Enhancer: correct dim for length when loading data by @racoiaws :: PR: #6048
  • [TTS] bugfix for fastpitch German tutorial by @XuesongYang :: PR: #6051
  • [TTS] bugfix Chinese Fastpitch tutorial by @XuesongYang :: PR: #6055
  • Fix enhancer usage by @artbataev :: PR: #6059
  • [TTS] Spectrogram Enhancer: support arbitrary input length by @racoiaws :: PR: #6060
  • Fix enhancer usage in ASR-TTS examples by @artbataev :: PR: #6116
  • [TTS] Spectrogram Enhancer: add option to zero out the initial tensor by @racoiaws :: PR: #6136
  • [TTS][DE] Augment tokenization/G2P to preserve capitalization of words and mix phonemes with word-level graphemes for an input text. by @XuesongYang :: PR: #5805

NLP / NMT

Changelog
  • Fix P-Tuning Truncation by @vadam5 :: PR: #5663
  • Adithyare/prompt learning seed by @arendu :: PR: #5749
  • Add extra data args to support proper finetuning of HF converted T5 checkpoints by @MaximumEntropy :: PR: #5719
  • Don't add output directory twice when creating shared sentencepiece tokenizer by @pks :: PR: #5737
  • add constraint info on batch size for tar dataset by @yzhang123 :: PR: #5812
  • remove transformer version upper bound by @Zhilin123 :: PR: #5831
  • Adithyare/adapter new placement by @arendu :: PR: #5791
  • Add SSL import functionality for Audio Lexical PNC Models by @trias702 :: PR: #5834
  • validation batch sizing and drop_last controls by @arendu :: PR: #5830
  • Remove ending newlines when encoding strings w/ sentencepiece tokenizer by @pks :: PR: #5739
  • Fix segmenting for pcla inference by @jubick1337 :: PR: #5849
  • RETRO model finetuning by @yidong72 :: PR: #5800
  • Optimizing distributed Adam when running with one work queue by @timmoon10 :: PR: #5560
  • Add option to disable distributed parameters in distributed Adam optimizer by @timmoon10 :: PR: #5685
  • set max_steps for lr decay through config by @anmolgupt :: PR: #5780
  • Fix Prompt text space issue by @aklife97 :: PR: #5983
  • Add batch_size to prompt_learning generate by @aklife97 :: PR: #6091

NeMo Tools

Changelog
  • [Tools] NeMo Forced Aligner by @erastorgueva-nv :: PR: #5571
  • [Tools] Fix ctc segmentation: exclude audacity files by @ekmb :: PR: #6009

Export

Changelog
  • No-script TS export, prepared for ONNX export by @borisfom :: PR: #5653
  • Set providers for onnxruntime inference session by @athitten :: PR: #5903
  • Add segmentation export to Audacity label file by @Ca-ressemble-a-du-fake :: PR: #5857

General Improvements

Changelog
  • Pin lightning version less than 1.9.0 by @SeanNaren :: PR: #5822
  • Davidm/cherrypick r1.16.0 by @Davood-M :: PR: #6082
  • Update files for lightning 1.9.0 by @SeanNaren :: PR: #5823
  • Tn doc 16 by @yzhang123 :: PR: #5954
  • Ensure EMA checkpoints are also deleted when normal checkpoints are by @SeanNaren :: PR: #5724
  • [Fix] ConformerEncoder forward when length is None by @anteju :: PR: #5761
  • Fix EMA topk checkpoint deletion by @SeanNaren :: PR: #5758
  • [BugFix] decoder timestamp count has a mismatch when is decoded by @tango4j :: PR: #5825
  • Update 00_NeMo_Primer.ipynb by @schaltung :: PR: #5740
  • Sanitize params before DLLogger log_hyperparams by @milesial :: PR: #5736
  • NeMo Forced Aligner by @erastorgueva-nv :: PR: #5571
  • Add EMA Docs, fix common collection documentation by @SeanNaren :: PR: #5757
  • Add container info to main page by @fayejf :: PR: #5816
  • CommonVoice support for script by @SeanNaren :: PR: #5797
  • Support nested NeMo models by @artbataev :: PR: #5671
  • fix max len generation t5 by @ekmb :: PR: #5852
  • NFA samples fix by @erastorgueva-nv :: PR: #5856
  • fix(readme): fix typo by @jqueguiner :: PR: #5883
  • Block large files from being merged into NeMo main by @SeanNaren :: PR: #5898
  • Pin isort version by @artbataev :: PR: #5914
  • fixed missing long_description_content_type by @XuesongYang :: PR: #5909
  • Update container to 23.01 by @ericharper :: PR: #5917
  • remove conda pynini install by @ekmb :: PR: #5921
  • Update align.py by @Slyne :: PR: #6043
  • Fixing data simulator argument and bash scripting error by @tango4j :: PR: #6112
  • Update apex commit by @ericharper :: PR: #6148