News
From this version, we decided to use date-based versioning, e.g., v.202204
.
New Features
- [New Features][ESPnet1] added learnable fourier features #4029 by @popcornell
- [New Features][ESPnet1][ESPnet2][ASR] Restricted Self Attention for E2E Speech Summarization #4071 by @roshansh-cmu
- [New Features][ESPnet1][Installation][README] add lrs avsr recipe #4104 by @wentaoxandry
- [New Features][ESPnet1][README] add lip reading sentences dataset code #4074 by @wentaoxandry
- [New Features][ESPnet2][ASR] [ESPnet2] Intermediate/Self-conditioned CTC #4084 by @YosukeHiguchi
- [New Features][ESPnet2][ASR] [WIP] [ESPnet2] Mask-CTC #4158 by @YosukeHiguchi
- [New Features][ESPnet2][ASR][README] Add stochastic depth to conformer and share results on LibriSpeech 960h #4142 by @pyf98
- [New Features][ESPnet2][MT] MT task for espnet2 with IWSLT14 recipe #4111 by @siddalmia
- [New Features][ESPnet2][README][SE] Add DC-CRN complex masking and spectral mapping approach for speech enhancement #4127 by @Emrys365
- [New Features][ESPnet2][README][SE] Add DCCRN separator #4097 by @Johnson-Lsx
- [New Features][ESPnet2][README][SE] Add a new separator for speech enhancement/separation tasks #4062 by @LiChenda
- [New Features][ESPnet2][README][SE] Add iFaSNet for enhancement/separation tasks. #4130 by @LiChenda
- [New Features][ESPnet2][SE] Refactor DNN_Beamformer in espnet2 and add new beamformers #4082 by @Emrys365
Enhancement
- [Enhancement][ESPnet2] Add an optional suffix to the averaged model file name #4067 by @pyf98
- [Enhancement][ESPnet2] Update perturb_data_dir_speed.sh #4091 by @AmirHussein96
- [Enhancement][ESPnet2][ASR] Add tests for Intermediate/Self-conditioned CTC #4117 by @YosukeHiguchi
- [Enhancement][ESPnet2][TTS] Add option to use norm. feats over denorm. #4250 by @G-Thor
Recipe
- [Recipe][ESPnet1][RNNT] [ESPNET1] Add the results of conformer-transducer for Librispeech #4080 by @eesungkim
- [Recipe][ESPnet2][ASR] Add ASR recipe for VCTK dataset based on TTS's dataprep. #4088 by @kashikashi
- [Recipe][ESPnet2][ASR] Add new conformer config with hop length 160 for LibriSpeech 960h #4162 by @pyf98
- [Recipe][ESPnet2][ASR] Add new zh_openslr38 ASR recipe #4181 by @cuichenx
- [Recipe][ESPnet2][ASR] Add transformer results for LibriSpeech 100h #4089 by @pyf98
- [Recipe][ESPnet2][ASR] Added Marathi OpenSLR 64 recipe #4179 by @SujaySKumar
- [Recipe][ESPnet2][ASR] Added recipe for Microsoft Speech Corpus (Indian languages) #4194 by @chintu619
- [Recipe][ESPnet2][ASR] Automatic lyric recognition Recipe #4129 by @ftshijt
- [Recipe][ESPnet2][ASR] ESPNET - LRS3 Recepie #4101 by @gdebayan
- [Recipe][ESPnet2][ASR] bengali asr model with no finetuning #4047 by @dzeinali
- [Recipe][ESPnet2][MT] IWSLT'14 Results using ESPnet2-MT #4132 by @pyf98
- [Recipe][ESPnet2][README] Mandarin ISO id should be CMN instead of ZHO #4125 by @xinjli
- [Recipe][ESPnet2][README] Update README.md #4037 by @dzeinali
- [Recipe][ESPnet2][README] Update README.md #4121 by @dzeinali
- [Recipe][ESPnet2][README] Update README.md for How2 2000h ASR,SUM #4155 by @roshansh-cmu
- [Recipe][ESPnet2][RNNT] Create decode_rnnt_conformer.yaml #4058 by @sw005320
- [Recipe][ESPnet2][RNNT] Create train_rnnt_conformer.yaml #4057 by @sw005320
- [Recipe][ESPnet2][SLU] Add IEMOCAP results and configs #4100 by @YushiUeda
- [Recipe][ESPnet2][SLU] Add new config and support for computing WER in SLUE-VoxCeleb #4152 by @siddhu001
- [Recipe][ESPnet2][SLU] Add sentiment data preparation for IEMOCAP #4065 by @YushiUeda
- [Recipe][ESPnet2][SLU] ESPnet2 swbd_sentiment recipe #4134 by @YushiUeda
- [Recipe][ESPnet2][ST] egs2/iwslt22_dialect #4013 by @brianyan918
Bugfix
- [Bugfix][CI][ESPnet2] Fix CI test failures related to torch_complex 0.4.0 #4112 by @Emrys365
- [Bugfix][CI][Installation] fix doc ci by pinning jinja version #4239 by @xinjli
- [Bugfix][ESPnet2] Fix n-gram decoding #4168 by @sw005320
- [Bugfix][ESPnet2] bug fixes and efficient train/dev split in data prep of Microsoft Indian Languages recipe #4196 by @chintu619
- [Bugfix][ESPnet2] fix errors in configs of librispeech ssl frontends #4098 by @simpleoier
- [Bugfix][ESPnet2][ASR][ST] [bug patch] egs2/iwslt22_dialect #4049 by @brianyan918
- [Bugfix][ESPnet2][MT][ST] Fix joint tokenization in st.sh #4143 by @pyf98
- [Bugfix][ESPnet2][MT][ST] scoring fixes MT and ST #4146 by @siddalmia
- [Bugfix][ESPnet2][TTS] Fix speaker normalization #4229 by @LanceaKing
- [Bugfix][Installation] set gtn version #4122 by @brianyan918
- [Bugfix][ESPnet1][ESPnet2] minor fixes in ST in espnet2 #4056 by @siddalmia
Others
- [CI] Simplify vocoder compatibility test #4061 by @kan-bayashi
- [CI][Documentation] Fix notebook in the official doc. #4171 by @ShigekiKarita
- [Docker] Docker Updates #4064 by @Fhrozen
- [Documentation] Add a checklist for PRs on recipe #4053 by @ftshijt
- [Documentation] README Update for E2E Speech Summarization #4071 #4150 by @roshansh-cmu
- [Documentation] Update the example PyTorch version in Installation doc #4116 by @pyf98
- [Documentation] [documentation] fix minor typo in installation.md #4164 by @JDongian
- [Documentation][ESPnet1] fix typo #4044 by @ooyamatakehisa
- [Documentation][ESPnet1][ESPnet2][ASR] Add Huggingface-cli usage #4027 by @karthik19967829
Acknowledgements
Special thanks to @AmirHussein96, @Emrys365, @Fhrozen, @G-Thor, @JDongian, @Johnson-Lsx, @LanceaKing, @LiChenda, @ShigekiKarita, @SujaySKumar, @YosukeHiguchi, @YushiUeda, @brianyan918, @chintu619, @cuichenx, @dzeinali, @eesungkim, @ftshijt, @gdebayan, @kan-bayashi, @karthik19967829, @kashikashi, @ooyamatakehisa, @popcornell, @pyf98, @roshansh-cmu, @siddalmia, @siddhu001, @simpleoier, @sw005320, @wentaoxandry, @xinjli.