NeMo Versions Save

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

v1.7.2

2 years ago

GPT Bugfixes

GPT dataloader improvements and fixes by @crcrpar :: PRs #3826 , #3665
Disable nvfuser by @ericharper :: PR #3845
Set find_unused_parameters to False by @ericharper :: PR #3837

T5 XNLI Example

T5 xnli eval by @yaoyu-33 :: PR: #3848

v1.7.1

2 years ago

Known Issues

find_unused_parameters should be False when training GPT: #3837

Bugfixes

revert changes by @yzhang123 :: PR: #3785
Fixed soft prompt eval loading bug by @vadam5 :: PR: #3805
mT5 whole word masking and T5 finetuning config fixes by @MaximumEntropy :: PR: #3776
Raise error if FP16 training is tried with O2 recipe. by @ericharper :: PR: #3806

v1.7.0

2 years ago

Known Issues

Megatron GPT training with O2 and FP16 is bugged. FP16 and O1 still works.
find_unused_parameters should be False when training GPT: #3837
FastPitch training may result in stalled GPUs. Users will have to manually kill their runs and continue training from the latest checkpoint.
mT5 issue with whole word masking, see #3776
T5 finetuning config issue, see #3776

Container

NOTE: From NeMo 1.7.0 onwards, NeMo containers will follow the YY.MM conversion for naming, where the YY.MM value is based on the base container. For additional information regarding NeMo containers, please visit : https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.01

ASR

Wav2vec by @tbartley94 :: PR: #3297
Fix bug in multi-checkpoint loading by @sam1373 :: PR: #3536
Add HuggingFace Datasets to NeMo ASR Dataset script by @titu1994 :: PR: #3513
Add support for Gradient Clipping (clamp) in RNNT Numba loss by @titu1994 :: PR: #3550
Enable Tarred Dataset Support for NVIDIA DALI by @titu1994 :: PR: #3485
Add initial support for Buffered RNNT Scripts by @titu1994 :: PR: #3602
Significantly speed up RNNT loss on CUDA by @titu1994 :: PR: #3653
Fixing the bug in the stateful rnnt decoder. by @VahidooX :: PR: #3673
Add Buffered RNNT with LCS Merge algorithm by @titu1994 :: PR: #3669
Asr noise data scripts by @jbalam-nv :: PR: #3660
ASR SSL update by @sam1373 :: PR: #3746
Add randomized bucketing by @VahidooX :: PR: #3445
Self-supervised tutorial & update by @sam1373 :: PR: #3344
Updated conformer models. by @VahidooX :: PR: #3741
Added speaker identification script with cosine and neural classifier… by @nithinraok :: PR: #3672
Fix in clustering diarizer by @nithinraok :: PR: #3701
Add a function that writes cluster label in diarization pipeline by @tango4j :: PR: #3643

TTS

port UnivNet to NeMo TTS collection by @L0SG :: PR: #3186
E2E TTS fixes by @redoctopus :: PR: #3508
New structure for TTS datasets in scripts/dataset_processing, VocoderDataset, update TTSDataset by @Oktai15 :: PR: #3484
Depreciate some TTS models and TTS datasets by @Oktai15 :: PR: #3576
Fix bugs in HiFi-GAN (scheduler, optimizers) and add input_example() in Mixer-TTS/Mixer-TTS-X by @Oktai15 :: PR: #3564
Update UnivNet, HiFi-GAN and WaveGlow, small fixes in Mixer-TTS, FastPitch and Exportable by @Oktai15 :: PR: #3585
Fix typo in FastPitch config (pitch_avg -> pitch_mean) by @eyentei :: PR: #3593
Fix incorrect usage of TTSDataset in some files and fix one-line bug in NVIDIA's CMUDict by @Oktai15 :: PR: #3594
Convert entry from UTF-16 to UTF-8 by @redoctopus :: PR: #3597
remove CheckInstall by @blisc :: PR: #3577
Fix UnivNet LibriTTS pretrained location by @m-toman :: PR: #3615
FastPitch training tutorial by @subhankar-ghosh :: PR: #3631
Update Aligner, add new methods to AlignmentEncoder by @Oktai15 :: PR: #3641
Add Mixed Representation Training by @blisc :: PR: #3473
Add speakerID to libritts/get_data.py by @subhankar-ghosh :: PR: #3662
Update TTS tutorials, Simplification of testing Mixer-TTS and FastPitch by @Oktai15 :: PR: #3680
Clean FastPitch_Finetuning.ipynb notebook by @Oktai15 :: PR: #3698
Add cache_size to BetaBinomialInterpolator, fix bugs in TTS tutorials and FastPitch by @Oktai15 :: PR: #3706
Fix bugs in VocoderDataset and TTSDataset by @Oktai15 :: PR: #3713
Fix bugs in E2E TTS, Mixer-TTS and FastPitch by @Oktai15 :: PR: #3740

NLP / NMT

NLPDDPPlugin find_unused_parameters is configurable by @mlgill :: PR: #3478
Megatron encoder-decoder refactor by @michalivne :: PR: #3542
Finetuning NeMo Megatron T5 Models on GLUE by @MaximumEntropy :: PR: #3408
Pipeline parallelism for GPT by @ericharper :: PR: #3388
Generalized the P-tuning method to support various NLP tasks by @yidong72 :: PR: #3623
Megatron_LM checkpoint to NeMo checkpoint support by @yidong72 :: PR: #3692
Bugfix for GPT eval by @ericharper :: PR: #3744
Yuya/megatron t5 glue eval by @yaoyu-33 :: PR: #3751
Enforce legacy tokenizer for sentencepiece to add special tokens for T5 by @MaximumEntropy :: PR: #3457
Added P-Tuning method by @yidong72 :: PR: #3488
O2 style mixed precision training for T5 by @MaximumEntropy :: PR: #3664
LM adapted T5 dataset by @MaximumEntropy :: PR: #3654
Fix consumed samples calculation + PTune Model bugs by @yidong72 :: PR: #3738
Add pipeline support to eval methods by @ericharper :: PR: #3684
XNli benchmark by @yidong72 :: PR: #3693
Refactor dialogue state tracking for modelling/dataset interoperability by @Zhilin123 :: PR: #3526
Changes to support mean n-gram size masking for T5 by @MaximumEntropy :: PR: #3646
Dialogue state tracking refactor by @Zhilin123 :: PR: #3667
Parallel prompt tuning by @vadam5 :: PR: #3670
GEGLU activation for T5 by @MaximumEntropy :: PR: #3694

Text Normalization / Inverse Text Normalization

Text normalization takes too much time for a string which contains a lot of dates by @PeganovAnton :: PR: #3451
ITN bug fixes, ip address, card num support, whitelist clean up by @ekmb :: PR: #3574
Fix tn bugs by @yzhang123 :: PR: #3580
add serial number to itn by @yzhang123 :: PR: #3584
ITN: SH bug fixes for telephone by @ekmb :: PR: #3592
Tn bug 1.7.0 by @yzhang123 :: PR: #3730
TN docs update by @ekmb :: PR: #3735

Export

Update UnivNet, HiFi-GAN and WaveGlow, small fixes in Mixer-TTS, FastPitch and Exportable by @Oktai15 :: PR: #3585
Conformer onnx fix by @borisfom :: PR: #3524
Add onnx support for speaker models by @nithinraok :: PR: #3650
Jasper mask/export fix by @borisfom :: PR: #3691

Bugfixes

Text normalization takes too much time for a string which contains a lot of dates by @PeganovAnton :: PR: #3451
Dialogue state tracking refactor/ SGDGEN patch 2 by @Zhilin123 :: PR: #3674
lower bound PTL to 1.5.10 and remove last ckpt patch fix by @nithinraok :: PR: #3690

Improvements

Wfst tutorial by @tbartley94 :: PR: #3479
Update CMUdict with ADLR version pronunciations by @redoctopus :: PR: #3446
Fix docs by @yzhang123 :: PR: #3523
Add docstring to UnivNetModel by @L0SG :: PR: #3529
Increase lower bound due to security vulnerability by @ericharper :: PR: #3537
Add Change Log builder to NeMo by @titu1994 :: PR: #3527
Bugfix, need to freeze the model by @yidong72 :: PR: #3540
Bucketing quick fix by @tbartley94 :: PR: #3543
More fixes to SentencePiece for T5 by @MaximumEntropy :: PR: #3515
Update CONTRIBUTING.md by @Oktai15 :: PR: #3569
Update pr template and re-add Changelog builder by @titu1994 :: PR: #3575
Apex quick fix by @ekmb :: PR: #3591
Upgrade to 22.01 container by @ericharper :: PR: #3571
Fix typo and update minimal version of scipy by @Oktai15 :: PR: #3604
Add env variable to force transformers to run offline during CI by @ericharper :: PR: #3607
Correctly install NeMo wheel by @titu1994 :: PR: #3599
Fix wheel build by @titu1994 :: PR: #3610
Fixed EH and error reporting in restore_from by @borisfom :: PR: #3583
Clarifying documentation by @itzsimpl :: PR: #3616
Improve docs for finetuning by @titu1994 :: PR: #3622
Add NeMo version to all new .nemo files by @titu1994 :: PR: #3605
Update numba if NVIDIA_PYTORCH_VERSION not correct by @itzsimpl :: PR: #3614
Remove @experimental decorator in diarization related files. by @tango4j :: PR: #3625
Remove compression from .nemo files by @okuchaiev :: PR: #3626
Update adobe analytics by @ericharper :: PR: #3645
Add ssl tutorial to tutorial docs page by @sam1373 :: PR: #3649
Fix number of channels>1 issue by @ekmb :: PR: #3652
Fixed the bug in bucketing. by @VahidooX :: PR: #3663
Adding guard by @yzhang123 :: PR: #3655
Add tutorial paths by @titu1994 :: PR: #3651
Folder name update by @ekmb :: PR: #3671
Test HF online for SGD-GEN only by @MaximumEntropy :: PR: #3681
Update Librosa support to 0.9 by @titu1994 :: PR: #3682
Comment out numba in 22.01 release by @titu1994 :: PR: #3685
Fix failing tests inside of the 22.01 container in PR 3571 by @fayejf :: PR: #3609
Fixed Apex guard when imported classes are used for default values by @michalivne :: PR: #3700
Update citrinet_512.yaml by @Jorjeous :: PR: #3642
update torchaudio in Dockerfile to match torch version by @GNroy :: PR: #3637
Enforce import tests on the three domains by @titu1994 :: PR: #3702
Audio based norm speed up by @ekmb :: PR: #3703
Fix device on notebook by @titu1994 :: PR: #3732
pynini pip by @yzhang123 :: PR: #3729
Removed fp16 converting in complete method by @dimapihtar :: PR: #3709
Mirror AN4 while CMU servers are down by @titu1994 :: PR: #3743
Fix SSL configs for 1.7 by @sam1373 :: PR: #3748
Punct process bug fix by @ekmb :: PR: #3747
Specify gpus in SSL notebook by @sam1373 :: PR: #3753
Duplex model inference fix, money encoder fix by @ekmb :: PR: #3754
Update decoding strategy docs and override general value for tutorials by @titu1994 :: PR: #3755
Fix directories in ssl notebook by @sam1373 :: PR: #3758
Update Tacotron2_Training.ipynb by @blisc :: PR: #3769
Fix dockerfile by @yzhang123 :: PR: #3778
Prompt-Tuning-Documentation by @vadam5 :: PR: #3777
Prompt tuning bug fix by @vadam5 :: PR: #3780

v1.6.2

2 years ago

Bug fix

Changed Apex not found error to warning to enable NLP models which aren't apex dependent when Apex isn't installed.

v1.6.1

2 years ago

Bug Fixes

Fix embedding name for verifying speakers #3578
Add rank check and barrier helpers compilation for megatron dataset #3581
Add apex import guards #3579

v1.6.0

2 years ago

ASR

Add new features to ASR with diarization with modified tutorial and README. by @tango4j :: PR: #3007
Enable stateful decoding of RNNT over multiple transcribe calls by @titu1994 :: PR: #3037
Move vocabs from asr to common by @Oktai15 :: PR: #3084
Adding parallel transcribe for ASR models - suppports multi-gpu/multi-node by @VahidooX :: PR: #3017
CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
Adding pretrained French ASR models to ctc_bpe and rnnt_bpe listings by @tbartley94 :: PR: #3225
adding german conformer ctc and rnnt by @yzhang123 :: PR: #3242
Add aishell and fisher dataset processing scripts for ASR by @jbalam-nv :: PR: #3203
Better default for RNNT greedy decoding by @titu1994 :: PR: #3332
Add uniform ASR evaluation script for all models by @titu1994 :: PR: #3334
CTC Segmentation-Citrinet support by @ekmb :: PR: #3279
Updates on ASR with diarization util files by @tango4j :: PR: #3359
Asr fr by @tbartley94 :: PR: #3404
Refactor ASR Examples Directory by @titu1994 :: PR: #3392
Asr patches by @titu1994 :: PR: #3443
Properly support -1 for labels in ctc char models by @titu1994 :: PR: #3487

TTS

MixerTTS, MixerTTSDataset and small updates in tts tokenizers by @Oktai15 :: PR: #2859
ONNX and TorchScript support for Mixer-TTS by @Oktai15 :: PR: #3082
Update name of files to one style in TTS folder by @Oktai15 :: PR: #3189
Update TTS Dataset, FastPitch with TTS dataset and small improvements in HiFiGAN by @Oktai15 :: PR: #3205
Add Beta-binomial Interpolator to TTSDataset by @Oktai15 :: PR: #3230
Normalizer to TTS models, TTS tokenizer updates, AxisKind updates by @Oktai15 :: PR: #3271
Update Mixer-TTS, FastPitch and TTSDataset by @Oktai15 :: PR: #3366
Minor Updates to TTS Finetuning by @blisc :: PR: #3455

NLP / NMT

NMT timing and tokenizer stats utils by @michalivne :: PR: #3004
Add offsets calculation to MegatronGPTModel.complete method by @dimapihtar :: PR: #3117
NMT checkpoint averaging by @michalivne :: PR: #3096
NMT validation examples with inputs by @michalivne :: PR: #3194
Improve data pipeline for punctuation capitalization model and make other useful changes by @PeganovAnton :: PR: #3159
Reduce test time of punctuation and capitalization model by @PeganovAnton :: PR: #3286
NLP text augmentation by @michalivne :: PR: #3291
Adding Megatron NeMo Bert support by @yidong72 :: PR: #3303
Added Script to convert Megatron LM to . nemo file by @yidong72 :: PR: #3371
Support Changing Number of Tensor Parallel Partitions for Megatron by @aklife97 :: PR: #3365
Megatron AMP fix for scheduler step counter by @titu1994 :: PR: #3293
T5 Pre-training in NeMo using Megatron by @MaximumEntropy :: PR: #3036
NMT MIM mean variance fix by @michalivne :: PR: #3385
NMT Shared Embeddings Weights by @michalivne :: PR: #3340
Make saving .nemo during on_train_end configurable by @ericharper :: PR: #3427
Byte-level Multilingual NMT by @aklife97 :: PR: #3368
BioMegatron token classification tutorial fix to be compatible with current Megatron BERT by @yidong72 :: PR: #3435
NMT documentation for bottleneck architecture by @michalivne :: PR: #3464
(1) O2-style mixed precision recipe, (2) Persistent layer-norm, (3) Grade scale hysteresis, (4) gradient_as_bucket_view by @erhoo82 :: PR: #3259

Text Normalization / Inverse Text Normalization

Tn clean upsample by @yzhang123 :: PR: #3024
Tn add nn wfst and doc by @yzhang123 :: PR: #3135
Update english tn ckpt by @yzhang123 :: PR: #3143
WFST_tutorial for ITN development by @tbartley94 :: PR: #3128
German TN wfst by @yzhang123 :: PR: #3174
Add ITN Vietnamese by @binh234 :: PR: #3217
WFST TN updates by @ekmb :: PR: #3235
Itn german refactor by @yzhang123 :: PR: #3262
Tn german deterministic by @yzhang123 :: PR: #3308
TN updates by @ekmb :: PR: #3285
Added double digits to EN ITN by @yzhang123 :: PR: #3321
TN_non_deterministic optimized by @ekmb :: PR: #3343
Missing init for TN German by @ekmb :: PR: #3355
Ru TN by @ekmb :: PR: #3390
Update ContextNet models trained on more datasets by @titu1994 :: PR: #3440

NeMo Tools

CTC Segmentation-Citrinet support by @ekmb :: PR: #3279
Updated NumPy SDE requirement by @vsl9 :: PR: #3442

Export

ONNX and TorchScript support for Mixer-TTS by @Oktai15 :: PR: #3082
CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072

Documentation

Merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3133
Tn add nn wfst and doc by @yzhang123 :: PR: #3135
Add apex into by @PeganovAnton :: PR: #3214
Final merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3232
Nemo container docker building instruction - merge to main by @fayejf :: PR: #3236
Doc link fixes by @nithinraok :: PR: #3264
French ASR Doc updates by @tbartley94 :: PR: #3322
german asr doc page update by @yzhang123 :: PR: #3325
update docs and replace speakernet with titanet in tutorials by @nithinraok :: PR: #3405
Asr fr by @tbartley94 :: PR: #3404
Update copyright to 2022 by @ericharper :: PR: #3426
Update Speech Classificatoin - VAD doc by @fayejf :: PR: #3430
Update speaker diarization docs by @tango4j :: PR: #3419
NMT documentation for bottleneck architecture by @michalivne :: PR: #3464
Add verification helper function and update docs by @nithinraok :: PR: #3514
Prompt tuning documentation by @vadam5 :: PR: #3541
French ASR Doc updates by @tbartley94 :: PR: #3322
German asr doc page update by @yzhang123 :: PR: #3325

Bugfixes

Fixed wrong tgt_length for timing by @michalivne :: PR: #3050
Update nltk version with a CVE fix by @thomasdhc :: PR: #3054
Fix README by @ericharper :: PR: #3070
Transformer Decoder: Fix swapped input name issue by @aklife97 :: PR: #3066
Fixes bugs in collect_tokenizer_dataset_stats.py by @michalivne :: PR: #3060
Attribute is not working in . by @PeganovAnton :: PR: #3099
Merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3133
A quick fix for issue #3094 index out-of-bound when truncating long text to max_seq_length by @bugface :: PR: #3131
Fixed two typos by @bene-ges :: PR: #3157
Merge r1.5.0 bugfixes to main by @ericharper :: PR: #3173
LJSpeech alignment scripts fixed for latest MFA by @m-toman :: PR: #3177
Add apex into by @PeganovAnton :: PR: #3214
Patch omegaconf for cfg by @fayejf :: PR: #3224
Final merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3232
CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
Fix Masked SE for Citrinets + export Limited Context Citrinet by @titu1994 :: PR: #3216
Fix text length type in TTSDataset for beta_binomial_interpolator by @Oktai15 :: PR: #3233
Fix cast type in _se_pool_step_script related functions by @Oktai15 :: PR: #3239
Doc link fixes by @nithinraok :: PR: #3264
Escape chars fix by @ekmb :: PR: #3253
Fix asr output - eval mode by @nithinraok :: PR: #3274
Remove ArrayLike because it is not supported in numpy 1.18 by @PeganovAnton :: PR: #3282
Fix megatron_gpt_ckpt_to_nemo.py with torch distributed by @yaoyu-33 :: PR: #3278
Reduce test time of punctuation and capitalization model by @PeganovAnton :: PR: #3286
Tn en money fix by @yzhang123 :: PR: #3290
Fixing the bucketing_batch_size bug. by @VahidooX :: PR: #3294
Adaptiv fixed positional embeddings by @michalivne :: PR: #3263
Fix specaugment time start for numba kernel by @titu1994 :: PR: #3299
Fix for Stalled ASR training/eval on Pytorch 1.10+ (multigpu/multinode) by @titu1994 :: PR: #3304
Fix bucketing list bug. by @VahidooX :: PR: #3315
Fix MixerTTS types and dimensions by @Oktai15 :: PR: #3330
Fix german and vietnames grammar by @yzhang123 :: PR: #3331
Fix readme to show cmd by @yzhang123 :: PR: #3345
Fix speaker label models training convergence by @nithinraok :: PR: #3354
Tqdm get datasets by @bmwshop :: PR: #3358
Fixed future masking in cross attention of Perceiver by @michalivne :: PR: #3314
Fixed the bug of fixed-size bucketing. by @VahidooX :: PR: #3364
Fix minor problems in punctuation and capitalization model by @PeganovAnton :: PR: #3376
Megatron AMP fix for scheduler step counter by @titu1994 :: PR: #3293
fixed the bug of bucketing when fixed-size batch is used. by @VahidooX :: PR: #3399
TalkNet Fix by @stasbel :: PR: #3092
Fix linear annealing not annealing lr to min_lr by @MaximumEntropy :: PR: #3400
Resume training on SLURM multi-node multi-gpu by @itzsimpl :: PR: #3374
Fix running token classification in multinode setting by @PeganovAnton :: PR: #3413
Fix order of lang checking to ignore input langs by @MaximumEntropy :: PR: #3417
NMT MIM mean variance fix by @michalivne :: PR: #3385
Fix bug for missing variable by @MaximumEntropy :: PR: #3437
Asr patches by @titu1994 :: PR: #3443
Prompt tuning loss mask fix by @vadam5 :: PR: #3438
BioMegatron token classification tutorial fix to be compatible with current Megatron BERT by @yidong72 :: PR: #3435
Fix hysterisis loading by @MaximumEntropy :: PR: #3460
Fix the tutorial notebooks bug by @yidong72 :: PR: #3465
Fix the errors/bugs in ASR with diarization tutorial by @tango4j :: PR: #3461
WFST Punct post fix + punct tutorial fixes by @ekmb :: PR: #3469
Process correctly label ids dataset parameter + standardize type of label ids model attribute + minor changes (error messages, typing) by @PeganovAnton :: PR: #3471
file name fix - Segmentation tutorial by @ekmb :: PR: #3474
Patch fix for the multiple last checkpoints issue by @nithinraok :: PR: #3468
Fix bug with arguments for TalkNet's preprocessor by @Oktai15 :: PR: #3481
Fix description by @PeganovAnton :: PR: #3482
typo fix in diarization notebooks by @nithinraok :: PR: #3480
Fix checkpoint converter in O2 style by @yaoyu-33 :: PR: #3486
Remove pickled features from tarred dataset by @PeganovAnton :: PR: #3491
Fix link to NGC page for ASR by @titu1994 :: PR: #3512
vad typo fix by @fayejf :: PR: #3490
fixed the num_classes bug of conv decoder. by @VahidooX :: PR: #3525
Fixed section typo by @vadam5 :: PR: #3522
Fixed duplicate cell bug by @vadam5 :: PR: #3518
Fix bug in inference tts notebook by @Oktai15 :: PR: #3532
Fix nmt resume by @ericharper :: PR: #3539
TN bug fix by @ekmb :: PR: #3538
Fix bug with pretrained method in Inference_ModelSelect.ipynb by @Oktai15 :: PR: #3546
Fix an issue with wandb not displaying updated config changes by @titu1994 :: PR: #3552
Fix bug in inference tts notebook by @Oktai15 :: PR: #3532
Fix bug with pretrained method in Inference_ModelSelect.ipynb by @Oktai15 :: PR: #3546
Fix asr output - eval mode by @nithinraok :: PR: #3274
Fix for Stalled ASR training/eval on Pytorch 1.10+ (multigpu/multinode) by @titu1994 :: PR: #3304
Fix text length type in TTSDataset for beta_binomial_interpolator by @Oktai15 :: PR: #3233
Fix MixerTTS types and dimensions by @Oktai15 :: PR: #3330
Fix the errors/bugs in ASR with diarization tutorial by @tango4j :: PR: #3461
Fix link to NGC page for ASR by @titu1994 :: PR: #3512
Fix megatron_gpt_ckpt_to_nemo.py with torch distributed by @yaoyu-33 :: PR: #3278
Fix minor problems in punctuation and capitalization model by @PeganovAnton :: PR: #3376
Fix running token classification in multinode setting by @PeganovAnton :: PR: #3413
Fix description by @PeganovAnton :: PR: #3482
Fix nmt resume by @ericharper :: PR: #3539
TN bug fix by @ekmb :: PR: #3538
Fix german and vietnames grammar by @yzhang123 :: PR: #3331
Tn en money fix by @yzhang123 :: PR: #3290

Improvements:

Remove STFT checks due to min PT version of 1.10 by @titu1994 :: PR: #3034
Add a stateless timer to specify max_time per run instead of global m… by @MaximumEntropy :: PR: #3056
(1) reduce the validation loss within a epoch, (2) convert global-bat… by @erhoo82 :: PR: #3055
Timer class monitors total time (train + validation + testing) to monitor when to end training by @MaximumEntropy :: PR: #3061
Add new by @PeganovAnton :: PR: #2963
Add PUBLICATIONS.md by @titu1994 :: PR: #3051
Hg cache by @yzhang123 :: PR: #3080
Add sequence axis to AxisKind.from_str() and improve time axis by @Oktai15 :: PR: #3090
Add logging to LS script by @titu1994 :: PR: #3141
Modify speaker input by @nithinraok :: PR: #3100
Typo correction in README.rst by @satpalsr :: PR: #3103
Self-supervised pre-training for speech models by @sam1373 :: PR: #3139
Add AISHELL 2 processing script by @titu1994 :: PR: #3195
Add support for multi-speaker FastPitch export by @ryanleary :: PR: #3192
Reduce number of log files for large runs by @blisc :: PR: #3191
Add support to modify nemo cache directory by @titu1994 :: PR: #3208
Add Pitch, Duration Tensors for Riva by @blisc :: PR: #3207
Upgrade to NVIDIA PyTorch 21.11 Container by @ericharper :: PR: #3234
Add WMT21 paper to Publications by @MaximumEntropy :: PR: #3256
Support for gecko tool by @nithinraok :: PR: #3266
Adding adaptive bucketing for tarred datasets. by @VahidooX :: PR: #3222
Initial refactor by @borisfom :: PR: #3272
Refactored prepare_for_export calls to ensure input size of example i… by @borisfom :: PR: #3305
Replacing outdated exports scripts by @borisfom :: PR: #3311
Batch implementation by @dimapihtar :: PR: #3276
Multiscale processing feature for speaker diarization by @tango4j :: PR: #3296
Add titanet by @nithinraok :: PR: #3333
update sparrowhawk export grammars to able to skip pynini by @yzhang123 :: PR: #3346
Prompt tuning by @vadam5 :: PR: #3309
Remove wordninja by @ekmb :: PR: #3363
Repair arbitrary file or folder deletion vulnerability by @haby0 :: PR: #3362
Moved shebangs to the first line by @davidalami :: PR: #3361
Added new method for logprobs computation by @dimapihtar :: PR: #3329
Update speaker collate functions by @nithinraok :: PR: #3381
Cache_hf by @ekmb :: PR: #3406
Update to NVIDIA PyTorch 21.12 Container by @ericharper :: PR: #3424
Working around Pytorch exporter issue with expand() by @borisfom :: PR: #3422
Remove apex by @ekmb :: PR: #3428
Vad infer refactor by @fayejf :: PR: #3394
Update LJSpeech preprocessing by @Oktai15 :: PR: #3423
Preprocess an entire folder of .json or .json.gz files into a single .bin and .idx file. by @MaximumEntropy :: PR: #3425
TimingCallback default buffer_size=1 by @michalivne :: PR: #3439
Extending input_example() to take max batch and dimension arguments by @borisfom :: PR: #3429
Refactor data preprocessing script by @yzhang123 :: PR: #3444
Test only if the model was trained on single GPU for accurate results. by @titu1994 :: PR: #3470
Upper bound ptl for r1.6.0, lower bound numpy in general by @ericharper :: PR: #3466
Add Apex import guard by @ericharper :: PR: #3467
Adding missing init files by @yzhang123 :: PR: #3505
Typos by @ekmb :: PR: #3504
Update titanet conf by @nithinraok :: PR: #3507
Raise PTL upper bound on r1.6.0 by @ericharper :: PR: #3510
Enforce utf-8 on all file r/w by @titu1994 :: PR: #3520
Pushing updated WFST Tutorial to r1.6.0 by @tbartley94 :: PR: #3521
WFST tutorial update by @tbartley94 :: PR: #3531
Update nvidia container check by @ericharper :: PR: #3535
Remove extra instance during restore by @ericharper :: PR: #3551
Remove wordtokenizer example from NLP tokenizer notebook by @aklife97 :: PR: #3477

v1.5.1

2 years ago

Features

Minor updates to expose speaker id, pitch, and duration on export of FastPitch #3192, #3207

Known Issues

Training of speaker models converge very slowly due to a bug (fixed in main: #3354)
ASR training does not reach adequate WER due to bug in Numba Spec Augment (fixed in main : #3299). For details refer to https://github.com/NVIDIA/NeMo/issues/3288#issuecomment-1000766337 . For a temporary workaround, disable Numba Spec Augment with https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/modules/audio_preprocessing.py#L471 set to False in the config for SpecAugment in the yaml config. The fix will be part of 1.6.0.

v1.5.0

2 years ago

Features

Megatron GPT pre-training with tensor model parallelism #2975
NMT encoder and decoder with different hidden size #2856
Logging timing of train/val/test steps #2936
Logging NMT encoder and decoder timing #2956
Logging timing per sentence length and tokenized text statistics #3004
Upgrade to PyTorch Lightning 1.5.0, bfloat support #2975
French Inverse Text Normalization #2921
Bucketing of tarred datasets for ASR models #2999
ASR with diarization #3007
Adding parallel transcribe for ASR models - suppports multi-gpu/multi-node #3017

Documentation Updates

RNNT

Contributors

@ericharper @michalivne @MaximumEntropy @VahidooX @titu1994 @blisc @okuchaiev @tango4j @erastorgueva-nv @fayejf @vadam5 @ekmb @yaoyu-33 @nithinraok @erhoo82 @tbartley94 @PeganovAnton @madhukarkm @yzhang123 (Please let us know if you have contributed to this release and we have missed you here.)

v1.4.0

2 years ago

Features

Improved speaker clustering #2729
Upgrade to NVIDIA PyTorch 21.08 container #2799
RNNT mAES beam search support #2802
Transfer learning for new speakers #2684
Simplify speaker scripts #2777
Perceiver-encoder architecture #2737
Relative paths in tarred datasets #2776
Torch only TTS package #2643
Inverse text normalization for Spanish #2489

Tutorial Notebooks

Duration and pitch control for TTS # 2700

Bug fixes

Fixed max delta generation #2727
Waveglow export #2671, #2699

Contributors

@tango4j @titu1994 @paarthneekhara @nithinraok @michalivne @erastorgueva-nv @borisfom @blisc (some contributors may not be listed explicitly)

v1.3.0

2 years ago

Added

RNNT Exportable to ONNX #2510
Multi-batch inference support for speaker diarization #2522
DALI Integration for char/subword ASR #2567
VAD Postprocessing #2636
Perceiver encoder for NMT #2621
gRPC NMT server #2656
German ITN # 2486
Russian TN and ITN #2519
Save/restore connector # 2592
PTL 1.4+ # 2600

Tutorial Notebooks

Non-English downstream NLP task #2532
RNNT Basics #2651

Bug Fixes

NMESE clustering for very small audio files #2566

Contributors

@pasandi20 @ekmb @nithinraok @titu1994 @ryanleary @yzhang123 @ericharper @michalivne @MaximumEntropy @fayejf (some contributors may not be listed explicitly)