NeMo Versions Save

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

v1.7.2

2 years ago

GPT Bugfixes

  • GPT dataloader improvements and fixes by @crcrpar :: PRs #3826 , #3665
  • Disable nvfuser by @ericharper :: PR #3845
  • Set find_unused_parameters to False by @ericharper :: PR #3837

T5 XNLI Example

  • T5 xnli eval by @yaoyu-33 :: PR: #3848

v1.7.1

2 years ago

Known Issues

  • find_unused_parameters should be False when training GPT: #3837

Bugfixes

  • revert changes by @yzhang123 :: PR: #3785
  • Fixed soft prompt eval loading bug by @vadam5 :: PR: #3805
  • mT5 whole word masking and T5 finetuning config fixes by @MaximumEntropy :: PR: #3776
  • Raise error if FP16 training is tried with O2 recipe. by @ericharper :: PR: #3806

v1.7.0

2 years ago

Known Issues

  • Megatron GPT training with O2 and FP16 is bugged. FP16 and O1 still works.
  • find_unused_parameters should be False when training GPT: #3837
  • FastPitch training may result in stalled GPUs. Users will have to manually kill their runs and continue training from the latest checkpoint.
  • mT5 issue with whole word masking, see #3776
  • T5 finetuning config issue, see #3776

Container

NOTE: From NeMo 1.7.0 onwards, NeMo containers will follow the YY.MM conversion for naming, where the YY.MM value is based on the base container. For additional information regarding NeMo containers, please visit : https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.01

ASR

  • Wav2vec by @tbartley94 :: PR: #3297
  • Fix bug in multi-checkpoint loading by @sam1373 :: PR: #3536
  • Add HuggingFace Datasets to NeMo ASR Dataset script by @titu1994 :: PR: #3513
  • Add support for Gradient Clipping (clamp) in RNNT Numba loss by @titu1994 :: PR: #3550
  • Enable Tarred Dataset Support for NVIDIA DALI by @titu1994 :: PR: #3485
  • Add initial support for Buffered RNNT Scripts by @titu1994 :: PR: #3602
  • Significantly speed up RNNT loss on CUDA by @titu1994 :: PR: #3653
  • Fixing the bug in the stateful rnnt decoder. by @VahidooX :: PR: #3673
  • Add Buffered RNNT with LCS Merge algorithm by @titu1994 :: PR: #3669
  • Asr noise data scripts by @jbalam-nv :: PR: #3660
  • ASR SSL update by @sam1373 :: PR: #3746
  • Add randomized bucketing by @VahidooX :: PR: #3445
  • Self-supervised tutorial & update by @sam1373 :: PR: #3344
  • Updated conformer models. by @VahidooX :: PR: #3741
  • Added speaker identification script with cosine and neural classifier… by @nithinraok :: PR: #3672
  • Fix in clustering diarizer by @nithinraok :: PR: #3701
  • Add a function that writes cluster label in diarization pipeline by @tango4j :: PR: #3643

TTS

  • port UnivNet to NeMo TTS collection by @L0SG :: PR: #3186
  • E2E TTS fixes by @redoctopus :: PR: #3508
  • New structure for TTS datasets in scripts/dataset_processing, VocoderDataset, update TTSDataset by @Oktai15 :: PR: #3484
  • Depreciate some TTS models and TTS datasets by @Oktai15 :: PR: #3576
  • Fix bugs in HiFi-GAN (scheduler, optimizers) and add input_example() in Mixer-TTS/Mixer-TTS-X by @Oktai15 :: PR: #3564
  • Update UnivNet, HiFi-GAN and WaveGlow, small fixes in Mixer-TTS, FastPitch and Exportable by @Oktai15 :: PR: #3585
  • Fix typo in FastPitch config (pitch_avg -> pitch_mean) by @eyentei :: PR: #3593
  • Fix incorrect usage of TTSDataset in some files and fix one-line bug in NVIDIA's CMUDict by @Oktai15 :: PR: #3594
  • Convert entry from UTF-16 to UTF-8 by @redoctopus :: PR: #3597
  • remove CheckInstall by @blisc :: PR: #3577
  • Fix UnivNet LibriTTS pretrained location by @m-toman :: PR: #3615
  • FastPitch training tutorial by @subhankar-ghosh :: PR: #3631
  • Update Aligner, add new methods to AlignmentEncoder by @Oktai15 :: PR: #3641
  • Add Mixed Representation Training by @blisc :: PR: #3473
  • Add speakerID to libritts/get_data.py by @subhankar-ghosh :: PR: #3662
  • Update TTS tutorials, Simplification of testing Mixer-TTS and FastPitch by @Oktai15 :: PR: #3680
  • Clean FastPitch_Finetuning.ipynb notebook by @Oktai15 :: PR: #3698
  • Add cache_size to BetaBinomialInterpolator, fix bugs in TTS tutorials and FastPitch by @Oktai15 :: PR: #3706
  • Fix bugs in VocoderDataset and TTSDataset by @Oktai15 :: PR: #3713
  • Fix bugs in E2E TTS, Mixer-TTS and FastPitch by @Oktai15 :: PR: #3740

NLP / NMT

  • NLPDDPPlugin find_unused_parameters is configurable by @mlgill :: PR: #3478
  • Megatron encoder-decoder refactor by @michalivne :: PR: #3542
  • Finetuning NeMo Megatron T5 Models on GLUE by @MaximumEntropy :: PR: #3408
  • Pipeline parallelism for GPT by @ericharper :: PR: #3388
  • Generalized the P-tuning method to support various NLP tasks by @yidong72 :: PR: #3623
  • Megatron_LM checkpoint to NeMo checkpoint support by @yidong72 :: PR: #3692
  • Bugfix for GPT eval by @ericharper :: PR: #3744
  • Yuya/megatron t5 glue eval by @yaoyu-33 :: PR: #3751
  • Enforce legacy tokenizer for sentencepiece to add special tokens for T5 by @MaximumEntropy :: PR: #3457
  • Added P-Tuning method by @yidong72 :: PR: #3488
  • O2 style mixed precision training for T5 by @MaximumEntropy :: PR: #3664
  • LM adapted T5 dataset by @MaximumEntropy :: PR: #3654
  • Fix consumed samples calculation + PTune Model bugs by @yidong72 :: PR: #3738
  • Add pipeline support to eval methods by @ericharper :: PR: #3684
  • XNli benchmark by @yidong72 :: PR: #3693
  • Refactor dialogue state tracking for modelling/dataset interoperability by @Zhilin123 :: PR: #3526
  • Changes to support mean n-gram size masking for T5 by @MaximumEntropy :: PR: #3646
  • Dialogue state tracking refactor by @Zhilin123 :: PR: #3667
  • Parallel prompt tuning by @vadam5 :: PR: #3670
  • GEGLU activation for T5 by @MaximumEntropy :: PR: #3694

Text Normalization / Inverse Text Normalization

  • Text normalization takes too much time for a string which contains a lot of dates by @PeganovAnton :: PR: #3451
  • ITN bug fixes, ip address, card num support, whitelist clean up by @ekmb :: PR: #3574
  • Fix tn bugs by @yzhang123 :: PR: #3580
  • add serial number to itn by @yzhang123 :: PR: #3584
  • ITN: SH bug fixes for telephone by @ekmb :: PR: #3592
  • Tn bug 1.7.0 by @yzhang123 :: PR: #3730
  • TN docs update by @ekmb :: PR: #3735

Export

  • Update UnivNet, HiFi-GAN and WaveGlow, small fixes in Mixer-TTS, FastPitch and Exportable by @Oktai15 :: PR: #3585
  • Conformer onnx fix by @borisfom :: PR: #3524
  • Add onnx support for speaker models by @nithinraok :: PR: #3650
  • Jasper mask/export fix by @borisfom :: PR: #3691

Bugfixes

  • Text normalization takes too much time for a string which contains a lot of dates by @PeganovAnton :: PR: #3451
  • Dialogue state tracking refactor/ SGDGEN patch 2 by @Zhilin123 :: PR: #3674
  • lower bound PTL to 1.5.10 and remove last ckpt patch fix by @nithinraok :: PR: #3690

Improvements

  • Wfst tutorial by @tbartley94 :: PR: #3479
  • Update CMUdict with ADLR version pronunciations by @redoctopus :: PR: #3446
  • Fix docs by @yzhang123 :: PR: #3523
  • Add docstring to UnivNetModel by @L0SG :: PR: #3529
  • Increase lower bound due to security vulnerability by @ericharper :: PR: #3537
  • Add Change Log builder to NeMo by @titu1994 :: PR: #3527
  • Bugfix, need to freeze the model by @yidong72 :: PR: #3540
  • Bucketing quick fix by @tbartley94 :: PR: #3543
  • More fixes to SentencePiece for T5 by @MaximumEntropy :: PR: #3515
  • Update CONTRIBUTING.md by @Oktai15 :: PR: #3569
  • Update pr template and re-add Changelog builder by @titu1994 :: PR: #3575
  • Apex quick fix by @ekmb :: PR: #3591
  • Upgrade to 22.01 container by @ericharper :: PR: #3571
  • Fix typo and update minimal version of scipy by @Oktai15 :: PR: #3604
  • Add env variable to force transformers to run offline during CI by @ericharper :: PR: #3607
  • Correctly install NeMo wheel by @titu1994 :: PR: #3599
  • Fix wheel build by @titu1994 :: PR: #3610
  • Fixed EH and error reporting in restore_from by @borisfom :: PR: #3583
  • Clarifying documentation by @itzsimpl :: PR: #3616
  • Improve docs for finetuning by @titu1994 :: PR: #3622
  • Add NeMo version to all new .nemo files by @titu1994 :: PR: #3605
  • Update numba if NVIDIA_PYTORCH_VERSION not correct by @itzsimpl :: PR: #3614
  • Remove @experimental decorator in diarization related files. by @tango4j :: PR: #3625
  • Remove compression from .nemo files by @okuchaiev :: PR: #3626
  • Update adobe analytics by @ericharper :: PR: #3645
  • Add ssl tutorial to tutorial docs page by @sam1373 :: PR: #3649
  • Fix number of channels>1 issue by @ekmb :: PR: #3652
  • Fixed the bug in bucketing. by @VahidooX :: PR: #3663
  • Adding guard by @yzhang123 :: PR: #3655
  • Add tutorial paths by @titu1994 :: PR: #3651
  • Folder name update by @ekmb :: PR: #3671
  • Test HF online for SGD-GEN only by @MaximumEntropy :: PR: #3681
  • Update Librosa support to 0.9 by @titu1994 :: PR: #3682
  • Comment out numba in 22.01 release by @titu1994 :: PR: #3685
  • Fix failing tests inside of the 22.01 container in PR 3571 by @fayejf :: PR: #3609
  • Fixed Apex guard when imported classes are used for default values by @michalivne :: PR: #3700
  • Update citrinet_512.yaml by @Jorjeous :: PR: #3642
  • update torchaudio in Dockerfile to match torch version by @GNroy :: PR: #3637
  • Enforce import tests on the three domains by @titu1994 :: PR: #3702
  • Audio based norm speed up by @ekmb :: PR: #3703
  • Fix device on notebook by @titu1994 :: PR: #3732
  • pynini pip by @yzhang123 :: PR: #3729
  • Removed fp16 converting in complete method by @dimapihtar :: PR: #3709
  • Mirror AN4 while CMU servers are down by @titu1994 :: PR: #3743
  • Fix SSL configs for 1.7 by @sam1373 :: PR: #3748
  • Punct process bug fix by @ekmb :: PR: #3747
  • Specify gpus in SSL notebook by @sam1373 :: PR: #3753
  • Duplex model inference fix, money encoder fix by @ekmb :: PR: #3754
  • Update decoding strategy docs and override general value for tutorials by @titu1994 :: PR: #3755
  • Fix directories in ssl notebook by @sam1373 :: PR: #3758
  • Update Tacotron2_Training.ipynb by @blisc :: PR: #3769
  • Fix dockerfile by @yzhang123 :: PR: #3778
  • Prompt-Tuning-Documentation by @vadam5 :: PR: #3777
  • Prompt tuning bug fix by @vadam5 :: PR: #3780

v1.6.2

2 years ago

Bug fix

  • Changed Apex not found error to warning to enable NLP models which aren't apex dependent when Apex isn't installed.

v1.6.1

2 years ago

Bug Fixes

  • Fix embedding name for verifying speakers #3578
  • Add rank check and barrier helpers compilation for megatron dataset #3581
  • Add apex import guards #3579

v1.6.0

2 years ago

ASR

  • Add new features to ASR with diarization with modified tutorial and README. by @tango4j :: PR: #3007
  • Enable stateful decoding of RNNT over multiple transcribe calls by @titu1994 :: PR: #3037
  • Move vocabs from asr to common by @Oktai15 :: PR: #3084
  • Adding parallel transcribe for ASR models - suppports multi-gpu/multi-node by @VahidooX :: PR: #3017
  • CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
  • Adding pretrained French ASR models to ctc_bpe and rnnt_bpe listings by @tbartley94 :: PR: #3225
  • adding german conformer ctc and rnnt by @yzhang123 :: PR: #3242
  • Add aishell and fisher dataset processing scripts for ASR by @jbalam-nv :: PR: #3203
  • Better default for RNNT greedy decoding by @titu1994 :: PR: #3332
  • Add uniform ASR evaluation script for all models by @titu1994 :: PR: #3334
  • CTC Segmentation-Citrinet support by @ekmb :: PR: #3279
  • Updates on ASR with diarization util files by @tango4j :: PR: #3359
  • Asr fr by @tbartley94 :: PR: #3404
  • Refactor ASR Examples Directory by @titu1994 :: PR: #3392
  • Asr patches by @titu1994 :: PR: #3443
  • Properly support -1 for labels in ctc char models by @titu1994 :: PR: #3487

TTS

  • MixerTTS, MixerTTSDataset and small updates in tts tokenizers by @Oktai15 :: PR: #2859
  • ONNX and TorchScript support for Mixer-TTS by @Oktai15 :: PR: #3082
  • Update name of files to one style in TTS folder by @Oktai15 :: PR: #3189
  • Update TTS Dataset, FastPitch with TTS dataset and small improvements in HiFiGAN by @Oktai15 :: PR: #3205
  • Add Beta-binomial Interpolator to TTSDataset by @Oktai15 :: PR: #3230
  • Normalizer to TTS models, TTS tokenizer updates, AxisKind updates by @Oktai15 :: PR: #3271
  • Update Mixer-TTS, FastPitch and TTSDataset by @Oktai15 :: PR: #3366
  • Minor Updates to TTS Finetuning by @blisc :: PR: #3455

NLP / NMT

  • NMT timing and tokenizer stats utils by @michalivne :: PR: #3004
  • Add offsets calculation to MegatronGPTModel.complete method by @dimapihtar :: PR: #3117
  • NMT checkpoint averaging by @michalivne :: PR: #3096
  • NMT validation examples with inputs by @michalivne :: PR: #3194
  • Improve data pipeline for punctuation capitalization model and make other useful changes by @PeganovAnton :: PR: #3159
  • Reduce test time of punctuation and capitalization model by @PeganovAnton :: PR: #3286
  • NLP text augmentation by @michalivne :: PR: #3291
  • Adding Megatron NeMo Bert support by @yidong72 :: PR: #3303
  • Added Script to convert Megatron LM to . nemo file by @yidong72 :: PR: #3371
  • Support Changing Number of Tensor Parallel Partitions for Megatron by @aklife97 :: PR: #3365
  • Megatron AMP fix for scheduler step counter by @titu1994 :: PR: #3293
  • T5 Pre-training in NeMo using Megatron by @MaximumEntropy :: PR: #3036
  • NMT MIM mean variance fix by @michalivne :: PR: #3385
  • NMT Shared Embeddings Weights by @michalivne :: PR: #3340
  • Make saving .nemo during on_train_end configurable by @ericharper :: PR: #3427
  • Byte-level Multilingual NMT by @aklife97 :: PR: #3368
  • BioMegatron token classification tutorial fix to be compatible with current Megatron BERT by @yidong72 :: PR: #3435
  • NMT documentation for bottleneck architecture by @michalivne :: PR: #3464
  • (1) O2-style mixed precision recipe, (2) Persistent layer-norm, (3) Grade scale hysteresis, (4) gradient_as_bucket_view by @erhoo82 :: PR: #3259

Text Normalization / Inverse Text Normalization

  • Tn clean upsample by @yzhang123 :: PR: #3024
  • Tn add nn wfst and doc by @yzhang123 :: PR: #3135
  • Update english tn ckpt by @yzhang123 :: PR: #3143
  • WFST_tutorial for ITN development by @tbartley94 :: PR: #3128
  • German TN wfst by @yzhang123 :: PR: #3174
  • Add ITN Vietnamese by @binh234 :: PR: #3217
  • WFST TN updates by @ekmb :: PR: #3235
  • Itn german refactor by @yzhang123 :: PR: #3262
  • Tn german deterministic by @yzhang123 :: PR: #3308
  • TN updates by @ekmb :: PR: #3285
  • Added double digits to EN ITN by @yzhang123 :: PR: #3321
  • TN_non_deterministic optimized by @ekmb :: PR: #3343
  • Missing init for TN German by @ekmb :: PR: #3355
  • Ru TN by @ekmb :: PR: #3390
  • Update ContextNet models trained on more datasets by @titu1994 :: PR: #3440

NeMo Tools

  • CTC Segmentation-Citrinet support by @ekmb :: PR: #3279
  • Updated NumPy SDE requirement by @vsl9 :: PR: #3442

Export

  • ONNX and TorchScript support for Mixer-TTS by @Oktai15 :: PR: #3082
  • CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072

Documentation

  • Merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3133
  • Tn add nn wfst and doc by @yzhang123 :: PR: #3135
  • Add apex into by @PeganovAnton :: PR: #3214
  • Final merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3232
  • Nemo container docker building instruction - merge to main by @fayejf :: PR: #3236
  • Doc link fixes by @nithinraok :: PR: #3264
  • French ASR Doc updates by @tbartley94 :: PR: #3322
  • german asr doc page update by @yzhang123 :: PR: #3325
  • update docs and replace speakernet with titanet in tutorials by @nithinraok :: PR: #3405
  • Asr fr by @tbartley94 :: PR: #3404
  • Update copyright to 2022 by @ericharper :: PR: #3426
  • Update Speech Classificatoin - VAD doc by @fayejf :: PR: #3430
  • Update speaker diarization docs by @tango4j :: PR: #3419
  • NMT documentation for bottleneck architecture by @michalivne :: PR: #3464
  • Add verification helper function and update docs by @nithinraok :: PR: #3514
  • Prompt tuning documentation by @vadam5 :: PR: #3541
  • French ASR Doc updates by @tbartley94 :: PR: #3322
  • German asr doc page update by @yzhang123 :: PR: #3325

Bugfixes

  • Fixed wrong tgt_length for timing by @michalivne :: PR: #3050
  • Update nltk version with a CVE fix by @thomasdhc :: PR: #3054
  • Fix README by @ericharper :: PR: #3070
  • Transformer Decoder: Fix swapped input name issue by @aklife97 :: PR: #3066
  • Fixes bugs in collect_tokenizer_dataset_stats.py by @michalivne :: PR: #3060
  • Attribute is not working in . by @PeganovAnton :: PR: #3099
  • Merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3133
  • A quick fix for issue #3094 index out-of-bound when truncating long text to max_seq_length by @bugface :: PR: #3131
  • Fixed two typos by @bene-ges :: PR: #3157
  • Merge r1.5.0 bugfixes to main by @ericharper :: PR: #3173
  • LJSpeech alignment scripts fixed for latest MFA by @m-toman :: PR: #3177
  • Add apex into by @PeganovAnton :: PR: #3214
  • Patch omegaconf for cfg by @fayejf :: PR: #3224
  • Final merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3232
  • CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
  • Fix Masked SE for Citrinets + export Limited Context Citrinet by @titu1994 :: PR: #3216
  • Fix text length type in TTSDataset for beta_binomial_interpolator by @Oktai15 :: PR: #3233
  • Fix cast type in _se_pool_step_script related functions by @Oktai15 :: PR: #3239
  • Doc link fixes by @nithinraok :: PR: #3264
  • Escape chars fix by @ekmb :: PR: #3253
  • Fix asr output - eval mode by @nithinraok :: PR: #3274
  • Remove ArrayLike because it is not supported in numpy 1.18 by @PeganovAnton :: PR: #3282
  • Fix megatron_gpt_ckpt_to_nemo.py with torch distributed by @yaoyu-33 :: PR: #3278
  • Reduce test time of punctuation and capitalization model by @PeganovAnton :: PR: #3286
  • Tn en money fix by @yzhang123 :: PR: #3290
  • Fixing the bucketing_batch_size bug. by @VahidooX :: PR: #3294
  • Adaptiv fixed positional embeddings by @michalivne :: PR: #3263
  • Fix specaugment time start for numba kernel by @titu1994 :: PR: #3299
  • Fix for Stalled ASR training/eval on Pytorch 1.10+ (multigpu/multinode) by @titu1994 :: PR: #3304
  • Fix bucketing list bug. by @VahidooX :: PR: #3315
  • Fix MixerTTS types and dimensions by @Oktai15 :: PR: #3330
  • Fix german and vietnames grammar by @yzhang123 :: PR: #3331
  • Fix readme to show cmd by @yzhang123 :: PR: #3345
  • Fix speaker label models training convergence by @nithinraok :: PR: #3354
  • Tqdm get datasets by @bmwshop :: PR: #3358
  • Fixed future masking in cross attention of Perceiver by @michalivne :: PR: #3314
  • Fixed the bug of fixed-size bucketing. by @VahidooX :: PR: #3364
  • Fix minor problems in punctuation and capitalization model by @PeganovAnton :: PR: #3376
  • Megatron AMP fix for scheduler step counter by @titu1994 :: PR: #3293
  • fixed the bug of bucketing when fixed-size batch is used. by @VahidooX :: PR: #3399
  • TalkNet Fix by @stasbel :: PR: #3092
  • Fix linear annealing not annealing lr to min_lr by @MaximumEntropy :: PR: #3400
  • Resume training on SLURM multi-node multi-gpu by @itzsimpl :: PR: #3374
  • Fix running token classification in multinode setting by @PeganovAnton :: PR: #3413
  • Fix order of lang checking to ignore input langs by @MaximumEntropy :: PR: #3417
  • NMT MIM mean variance fix by @michalivne :: PR: #3385
  • Fix bug for missing variable by @MaximumEntropy :: PR: #3437
  • Asr patches by @titu1994 :: PR: #3443
  • Prompt tuning loss mask fix by @vadam5 :: PR: #3438
  • BioMegatron token classification tutorial fix to be compatible with current Megatron BERT by @yidong72 :: PR: #3435
  • Fix hysterisis loading by @MaximumEntropy :: PR: #3460
  • Fix the tutorial notebooks bug by @yidong72 :: PR: #3465
  • Fix the errors/bugs in ASR with diarization tutorial by @tango4j :: PR: #3461
  • WFST Punct post fix + punct tutorial fixes by @ekmb :: PR: #3469
  • Process correctly label ids dataset parameter + standardize type of label ids model attribute + minor changes (error messages, typing) by @PeganovAnton :: PR: #3471
  • file name fix - Segmentation tutorial by @ekmb :: PR: #3474
  • Patch fix for the multiple last checkpoints issue by @nithinraok :: PR: #3468
  • Fix bug with arguments for TalkNet's preprocessor by @Oktai15 :: PR: #3481
  • Fix description by @PeganovAnton :: PR: #3482
  • typo fix in diarization notebooks by @nithinraok :: PR: #3480
  • Fix checkpoint converter in O2 style by @yaoyu-33 :: PR: #3486
  • Remove pickled features from tarred dataset by @PeganovAnton :: PR: #3491
  • Fix link to NGC page for ASR by @titu1994 :: PR: #3512
  • vad typo fix by @fayejf :: PR: #3490
  • fixed the num_classes bug of conv decoder. by @VahidooX :: PR: #3525
  • Fixed section typo by @vadam5 :: PR: #3522
  • Fixed duplicate cell bug by @vadam5 :: PR: #3518
  • Fix bug in inference tts notebook by @Oktai15 :: PR: #3532
  • Fix nmt resume by @ericharper :: PR: #3539
  • TN bug fix by @ekmb :: PR: #3538
  • Fix bug with pretrained method in Inference_ModelSelect.ipynb by @Oktai15 :: PR: #3546
  • Fix an issue with wandb not displaying updated config changes by @titu1994 :: PR: #3552
  • Fix bug in inference tts notebook by @Oktai15 :: PR: #3532
  • Fix bug with pretrained method in Inference_ModelSelect.ipynb by @Oktai15 :: PR: #3546
  • Fix asr output - eval mode by @nithinraok :: PR: #3274
  • Fix for Stalled ASR training/eval on Pytorch 1.10+ (multigpu/multinode) by @titu1994 :: PR: #3304
  • Fix text length type in TTSDataset for beta_binomial_interpolator by @Oktai15 :: PR: #3233
  • Fix MixerTTS types and dimensions by @Oktai15 :: PR: #3330
  • Fix the errors/bugs in ASR with diarization tutorial by @tango4j :: PR: #3461
  • Fix link to NGC page for ASR by @titu1994 :: PR: #3512
  • Fix megatron_gpt_ckpt_to_nemo.py with torch distributed by @yaoyu-33 :: PR: #3278
  • Fix minor problems in punctuation and capitalization model by @PeganovAnton :: PR: #3376
  • Fix running token classification in multinode setting by @PeganovAnton :: PR: #3413
  • Fix description by @PeganovAnton :: PR: #3482
  • Fix nmt resume by @ericharper :: PR: #3539
  • TN bug fix by @ekmb :: PR: #3538
  • Fix german and vietnames grammar by @yzhang123 :: PR: #3331
  • Tn en money fix by @yzhang123 :: PR: #3290

Improvements:

  • Remove STFT checks due to min PT version of 1.10 by @titu1994 :: PR: #3034
  • Add a stateless timer to specify max_time per run instead of global m… by @MaximumEntropy :: PR: #3056
  • (1) reduce the validation loss within a epoch, (2) convert global-bat… by @erhoo82 :: PR: #3055
  • Timer class monitors total time (train + validation + testing) to monitor when to end training by @MaximumEntropy :: PR: #3061
  • Add new by @PeganovAnton :: PR: #2963
  • Add PUBLICATIONS.md by @titu1994 :: PR: #3051
  • Hg cache by @yzhang123 :: PR: #3080
  • Add sequence axis to AxisKind.from_str() and improve time axis by @Oktai15 :: PR: #3090
  • Add logging to LS script by @titu1994 :: PR: #3141
  • Modify speaker input by @nithinraok :: PR: #3100
  • Typo correction in README.rst by @satpalsr :: PR: #3103
  • Self-supervised pre-training for speech models by @sam1373 :: PR: #3139
  • Add AISHELL 2 processing script by @titu1994 :: PR: #3195
  • Add support for multi-speaker FastPitch export by @ryanleary :: PR: #3192
  • Reduce number of log files for large runs by @blisc :: PR: #3191
  • Add support to modify nemo cache directory by @titu1994 :: PR: #3208
  • Add Pitch, Duration Tensors for Riva by @blisc :: PR: #3207
  • Upgrade to NVIDIA PyTorch 21.11 Container by @ericharper :: PR: #3234
  • Add WMT21 paper to Publications by @MaximumEntropy :: PR: #3256
  • Support for gecko tool by @nithinraok :: PR: #3266
  • Adding adaptive bucketing for tarred datasets. by @VahidooX :: PR: #3222
  • Initial refactor by @borisfom :: PR: #3272
  • Refactored prepare_for_export calls to ensure input size of example i… by @borisfom :: PR: #3305
  • Replacing outdated exports scripts by @borisfom :: PR: #3311
  • Batch implementation by @dimapihtar :: PR: #3276
  • Multiscale processing feature for speaker diarization by @tango4j :: PR: #3296
  • Add titanet by @nithinraok :: PR: #3333
  • update sparrowhawk export grammars to able to skip pynini by @yzhang123 :: PR: #3346
  • Prompt tuning by @vadam5 :: PR: #3309
  • Remove wordninja by @ekmb :: PR: #3363
  • Repair arbitrary file or folder deletion vulnerability by @haby0 :: PR: #3362
  • Moved shebangs to the first line by @davidalami :: PR: #3361
  • Added new method for logprobs computation by @dimapihtar :: PR: #3329
  • Update speaker collate functions by @nithinraok :: PR: #3381
  • Cache_hf by @ekmb :: PR: #3406
  • Update to NVIDIA PyTorch 21.12 Container by @ericharper :: PR: #3424
  • Working around Pytorch exporter issue with expand() by @borisfom :: PR: #3422
  • Remove apex by @ekmb :: PR: #3428
  • Vad infer refactor by @fayejf :: PR: #3394
  • Update LJSpeech preprocessing by @Oktai15 :: PR: #3423
  • Preprocess an entire folder of .json or .json.gz files into a single .bin and .idx file. by @MaximumEntropy :: PR: #3425
  • TimingCallback default buffer_size=1 by @michalivne :: PR: #3439
  • Extending input_example() to take max batch and dimension arguments by @borisfom :: PR: #3429
  • Refactor data preprocessing script by @yzhang123 :: PR: #3444
  • Test only if the model was trained on single GPU for accurate results. by @titu1994 :: PR: #3470
  • Upper bound ptl for r1.6.0, lower bound numpy in general by @ericharper :: PR: #3466
  • Add Apex import guard by @ericharper :: PR: #3467
  • Adding missing init files by @yzhang123 :: PR: #3505
  • Typos by @ekmb :: PR: #3504
  • Update titanet conf by @nithinraok :: PR: #3507
  • Raise PTL upper bound on r1.6.0 by @ericharper :: PR: #3510
  • Enforce utf-8 on all file r/w by @titu1994 :: PR: #3520
  • Pushing updated WFST Tutorial to r1.6.0 by @tbartley94 :: PR: #3521
  • WFST tutorial update by @tbartley94 :: PR: #3531
  • Update nvidia container check by @ericharper :: PR: #3535
  • Remove extra instance during restore by @ericharper :: PR: #3551
  • Remove wordtokenizer example from NLP tokenizer notebook by @aklife97 :: PR: #3477

v1.5.1

2 years ago

Features

  • Minor updates to expose speaker id, pitch, and duration on export of FastPitch #3192, #3207

Known Issues

v1.5.0

2 years ago

Features

  • Megatron GPT pre-training with tensor model parallelism #2975
  • NMT encoder and decoder with different hidden size #2856
  • Logging timing of train/val/test steps #2936
  • Logging NMT encoder and decoder timing #2956
  • Logging timing per sentence length and tokenized text statistics #3004
  • Upgrade to PyTorch Lightning 1.5.0, bfloat support #2975
  • French Inverse Text Normalization #2921
  • Bucketing of tarred datasets for ASR models #2999
  • ASR with diarization #3007
  • Adding parallel transcribe for ASR models - suppports multi-gpu/multi-node #3017

Documentation Updates

  • RNNT

Contributors

@ericharper @michalivne @MaximumEntropy @VahidooX @titu1994 @blisc @okuchaiev @tango4j @erastorgueva-nv @fayejf @vadam5 @ekmb @yaoyu-33 @nithinraok @erhoo82 @tbartley94 @PeganovAnton @madhukarkm @yzhang123 (Please let us know if you have contributed to this release and we have missed you here.)

v1.4.0

2 years ago

Features

  • Improved speaker clustering #2729
  • Upgrade to NVIDIA PyTorch 21.08 container #2799
  • RNNT mAES beam search support #2802
  • Transfer learning for new speakers #2684
  • Simplify speaker scripts #2777
  • Perceiver-encoder architecture #2737
  • Relative paths in tarred datasets #2776
  • Torch only TTS package #2643
  • Inverse text normalization for Spanish #2489

Tutorial Notebooks

  • Duration and pitch control for TTS # 2700

Bug fixes

  • Fixed max delta generation #2727
  • Waveglow export #2671, #2699

Contributors

@tango4j @titu1994 @paarthneekhara @nithinraok @michalivne @erastorgueva-nv @borisfom @blisc (some contributors may not be listed explicitly)

v1.3.0

2 years ago

Added

  • RNNT Exportable to ONNX #2510
  • Multi-batch inference support for speaker diarization #2522
  • DALI Integration for char/subword ASR #2567
  • VAD Postprocessing #2636
  • Perceiver encoder for NMT #2621
  • gRPC NMT server #2656
  • German ITN # 2486
  • Russian TN and ITN #2519
  • Save/restore connector # 2592
  • PTL 1.4+ # 2600

Tutorial Notebooks

  • Non-English downstream NLP task #2532
  • RNNT Basics #2651

Bug Fixes

  • NMESE clustering for very small audio files #2566

Contributors

@pasandi20 @ekmb @nithinraok @titu1994 @ryanleary @yzhang123 @ericharper @michalivne @MaximumEntropy @fayejf (some contributors may not be listed explicitly)