Speechbrain Versions Save

A PyTorch-based Speech Toolkit

v1.0.0

2 months ago

Please, help our community project. Star on GitHub!

🚀 What's New in SpeechBrain 1.0?

📅 On February 2024, we released SpeechBrain 1.0, the result of a year-long collaborative effort by a large international network of developers led by our exceptional core development team.

📊 Some Numbers:

SpeechBrain has evolved into a significant project and stands among the most widely used open-source toolkits for speech processing.
Over 140 developers have contributed to our repository, getting more than 7.3k stars on GitHub.
Monthly downloads from PyPI have reached an impressive 200k.
Expanded to over 200 recipes for Conversational AI, featuring more than 100 pretrained models on HuggingFace.

🌟 Key Updates:

SpeechBrain 1.0 introduces significant advancements, expanding support for diverse datasets and tasks, including NLP and EEG processing.
The toolkit now excels in Conversational AI and various sequence processing applications.
Improvements encompass key techniques in speech recognition, streamable conformer transducers, integration with K2 for Finite State Transducers, CTC decoding and n-gram rescoring, new CTC/joint attention Beam Search interface, enhanced compatibility with HuggingFace Models (including GPT2 and Llama2), and refined data augmentation, training, and inference processes.
We have created a new repository dedicated to benchmarks, accessible at here. At present, this repository features benchmarks for various domains, including speech self-supervised models (MP3S), continual learning (CL-MASR), and EEG processing (SpeechBrain-MOABB).

For detailed technical information, please refer to the section below.

🔄 Breaking Changes

People familiar with SpeechBrain know very well that we do our best to avoid backward incompatible changes. While SpeechBrain has consistently prioritized maintaining backward compatibility, the introduction of this new major version presented an opportunity for significant enhancements and refactorings.

🤗 HuggingFace Interface Refactor:
- Previously, our interfaces were limited to specific models like Whisper, HuBERT, WavLM, and wav2vec 2.0.
- We've refactored the interface to be more general, now supporting any transformer model from HuggingFace including LLMs.
- Simply inherit from our new interface and enjoy the flexibility.
- The updated interfaces can be accessed here.
🔍 BeamSearch Refactor:
- The previous beam search interface, while functional, was challenging to comprehend and modify due to the combined search and rescoring parts.
- We've introduced a new interface where scoring and search are separated, managed by distinct functions, resulting in simpler and more readable code.
- This update allows users to easily incorporate various scorers, including n-gram LM and custom heuristics, in the search part.
- Additionally, support for pure CTC training and decoding, batch and GPU decoding, partial or full candidate scoring, and N-best hypothesis output with neural LM rescorers has been added.
- An interface to K2 for search based on Finite State Transducers (FST) is now available.
- The updated decoders are available here.
🎨 Data Augmentation Refactor:
- The data augmentation capabilities have been enhanced, offering users access to various functions in speechbrain/augment.
- New techniques, such as CodecAugment, RandomShift (Time), RandomShift (Frequency), DoClip, RandAmp, ChannelDrop, ChannelSwap, CutCat, and DropBitResolution, have been introduced.
- Augmentation can now be customized and combined using the Augmenter interface in speechbrain/augment/augmenter.py, providing more control during training.
- Take a look here for a tutorial on speech augmentation.
- The updated augmenters are available here.
🧠 Brain Class Refactor:
- The fit_batch method in the Brain Class has been refactored to minimize the need for overrides in training scripts.
- Native support for different precisions (fp32, fp16, bf16), mixed precision, compilation, multiple optimizers, and improved multi-GPU training with torchrun is now available.
- Take a look at the refactored brain class here.
🔍 Inference Interfaces Refactor:
- Inference interfaces, once stored in a single file (speechbrain/pretrained/interfaces.py), are now organized into smaller libraries in speechbrain/inference, enhancing clarity and intuitiveness.
- You can access the new inference interfaces here.

🔊 Automatic Speech Recognition

Developed a new recipe for training a Streamable Conformer Transducer using Librispeech dataset (accessible here). The streamable model achieves a Word Error Rate (WER) of 2.72% on the test-clean subset.
Implemented a dedicated inference inference to support streamable ASR (accessible here).
New models, including HyperConformer andd Branchformer have been introduced. Examples of recipes utilizing them can be found here.
Additional support for datasets like RescueSpeech, CommonVoice 14.0, AMI, Tedlium 2.
The ASR search pipeline has undergone a complete refactoring and enhancement (see comment above).
A new recipe for Bayesian ASR has been added here.

🔄 Interface with Kaldi2 (K2-FSA)

Integration of an interface that seamlessly connects SpeechBrain with K2-FSA, allowing for constrained search and more.
Support for K2 CTC training and lexicon decoding, along with integration of K2 HLG and n-gram rescoring.
Competitive results achieved with Wav2vec2 on LibriSpeech test sets.
Explore an example recipe utilizing K2 here.

🎙 Speech Synthesis (TTS)

Improvements to FastSpeech2.
Development of the DiffWave Vocoder. See the recipe here.
Development of a Zero-Shot TTS baseline based on Tacotron. See the recipe here.

🌐 Speech-to-Speech Translation:

Introduction of new recipes for CVSS datasets and IWSLT 2022 Low-resource Task, based on mBART/NLLB and SAMU wav2vec.

🌟 Speech Generation

Implementation of diffusion and latent diffusion techniques with an example recipe showcased on AudioMNIST.

🎧 Interpretability of Audio Signals

Implementation of Learning to Interpret and PIQ techniques with example recipes demonstrated on ECS50.

😊 Speech Emotion Diarization

Support for Speech Emotion Diarization, featuring an example recipe on the Zaion Emotion Dataset. See the training recipe here.

🎙️ Speaker Recognition

Introduction of a new Speaker Recognition recipe on Voxceleb Speaker, based on ResNET.

🔊 Speech Enhancement

Release of a new Speech Enhancement baseline based on the DNS dataset.

🎵 Discrete Audio Representations

Support for pretrained models with discrete audio representations, including EnCodec and DAC.
Support for discretization of continuous representations provided by popular self-supervised models such as Hubert and Wav2vec2.

🤖 Interfaces with Large Language Models

Creation of interfaces with popular open-source Large Language Models, such as GPT2 and Llama2.
These models can be easily fine-tuned in SpeechBrain for tasks like Response Generation, exemplified with a recipe for the MultiWOZ dataset.
The Large Language Model can also be employed to rescore n-best ASR hypotheses.

🔄 Continuous Integration

All recipes undergo automatic testing with one or multiple GPUs, ensuring robust performance.
HuggingFace interfaces are automatically verified, contributing to a seamless integration process.
Continuous improvement of integration and unitary tests to comprehensively cover most functionalities within SpeechBrain.

🔍 Profiling

We have simplified the Profiler to enable easier identification of computing bottlenecks and quicker evaluation of model efficiency.
Now, you can profile your model during training effortlessly with:

python train.py hparams/config.yaml --profile_training --profile_warmup 10 --profile_steps 5

Check out our tutorial for more detailed information.

📈 Benchmarks

Release of a new benchmark repository, aimed at aiding the community in standardization across various areas.
1. CL-MASR (Continual Learning Benchmark for Multilingual ASR):
- A benchmark designed to assess continual learning techniques on multilingual speech recognition tasks
- Provides scripts to train multilingual ASR systems, specifically Whisper and WavLM-based, on a subset of 20 languages selected from Common Voice 13 in a continual learning fashion.
- Implementation of various methods, including rehearsal-based, architecture-based, and regularization-based approaches.
1. Multi-probe Speech Self Supervision Benchmark (MP3S):
- A benchmark for accurate assessment of speech self-supervised models.
- Noteworthy for allowing users to select multiple probing heads for downstream training.
1. SpeechBrain-MOABB:
- A benchmark offering recipes for processing electroencephalographic (EEG) signals, seamlessly integrated with the popular Mother of all BCI Benchmarks (MOABB).
- Facilitates the integration and evaluation of new models on all supported tasks, presenting an interface for easy model integration and testing, along with a fair and robust method for comparing different architectures.

🔄 Transitioning to SpeechBrain 1.0

Please, refer to this tutorial for in-depth technical information regarding the transition to SpeechBrain 1.0.

New Contributors

@ywk991112 made their first contribution in https://github.com/speechbrain/speechbrain/pull/2228
@kimmchii made their first contribution in https://github.com/speechbrain/speechbrain/pull/2320
@ppisljar made their first contribution in https://github.com/speechbrain/speechbrain/pull/2336
@gaspardpetit made their first contribution in https://github.com/speechbrain/speechbrain/pull/2335
@RISHIKREDDYL made their first contribution in https://github.com/speechbrain/speechbrain/pull/2389
@ZhaoZeyu1995 made their first contribution in https://github.com/speechbrain/speechbrain/pull/2345

Full Changelog: https://github.com/speechbrain/speechbrain/compare/v0.5.16...v1.0.0

v0.5.16

5 months ago

SpeechBrain 0.5.16 will be the last minor version of SpeechBrain before the major release of SpeechBrain 1.0.

In this minor version, we have focused on refining the existing features without introducing any interface changes, ensuring a seamless transition to SpeechBrain 1.0 where backward incompatible modifications will take place.

Key Highlights of SpeechBrain 0.5.16:

Bug Fixes: Numerous small fixes have been implemented to enhance the overall stability and performance of SpeechBrain.

Testing and Documentation: We have dedicated efforts to improve our testing infrastructure and documentation, ensuring a more robust and user-friendly experience.

Expanded Model and Dataset Support: SpeechBrain 0.5.16 introduces support for several new models and datasets, enhancing the versatility of the platform. For a detailed list, please refer to the commits below.

Stay informed and get ready for the groundbreaking SpeechBrain 1.0, where we will unveil substantial changes and exciting new features.

Thank you for being a part of the SpeechBrain community!

Commits

[cea36b4]: Update README.md (Mirco Ravanelli) #1599
[cead130]: Updated README.md (prometheus) #975
[779c620]: Update README.md (Mirco Ravanelli) #2124
[32af2ac]: update requirement (to avoid deprecation error) (Mirco Ravanelli) #975
[b039df1]: small fixes (Mirco Ravanelli) #975
[07e7c73]: small fixes (Mirco Ravanelli) #975
[dac6842]: Update README.md (Mirco Ravanelli) #975
[75f4c66]: Update README.md (Mirco Ravanelli) #975
[327a3f5]: Fixed SSVEP yaml file (prometheus) #975
[067d94e]: Fixed conflicts (prometheus) #975
[331741d]: Fixed read/write conflicts mne config file when training many models in parallel (prometheus) #975
[0f25d5b]: Added hparam files for other architectures (prometheus) #975
[9ba76e3]: Updated LMDA, forcing odd kernel size in depth attention (prometheus) #975
[6336200]: Fixed activation in LMDA (prometheus) #975
[1593cc4]: Fixed issue in deepconvnet (prometheus) #975
[2f0f5f0]: Fixed issue with shallowconvnet (prometheus) #975
[8f70136]: Fixed issue with lmda (prometheus) #975
[ac4f9e4]: Merge remote-tracking branch 'origin/develop' into fixeval (Adel Moumen) #2123
[cdce80c]: fix ddp issue with loading a key (Adel Moumen) #2128
[66633a0]: Added template yaml files (prometheus) #975
[6f631a7]: minor additions for tests (pradnya-git-dev) #2120
[331acdb]: add notes on tests with non-default gpu (Mirco Ravanelli) #2130
[091b3ce]: fixed hard-coded device (Mirco Ravanelli) #2130
[cc72c9e]: fixed hard-coded device (Mirco Ravanelli) #2130
[c60e606]: fixed hard-coded device (Mirco Ravanelli) #2130
[253859e]: Resolve paths so relative works too (Aku Rouhe) #2128
[8a98401]: small fix on orion flag (Mirco Ravanelli) #975
[7da9a95]: extend fix to all files (Mirco Ravanelli) #975
[4b09ff2]: fix style (Mirco Ravanelli) #975
[ced2922]: Merge remote-tracking branch 'upstream/develop' into eeg_decoding (Mirco Ravanelli) #975
[5e070a2]: fix useless file (Mirco Ravanelli) #975
[46565cf]: Merge branch 'develop' of https://github.com/speechbrain/speechbrain into develop (xuechenliu) #2142
[19235f2]: Merge remote-tracking branch 'upstream/Adel-Moumen-revert_commit_ddp' into revert_commit_ddp (Adel Moumen) #2128
[2fb247f]: Save the checkpoint folder and meta only on the main process and communicate to all procs (Peter Plantinga) #2132
[f37d433]: Only broadcast checkpoint folder if distributed (Peter Plantinga) #2132
[e23da7d]: Initialize external loggers only on main process (Peter Plantinga) #2134
[67b1255]: fixes (BenoitWang) #2119
[70d8901]: Merge branch 'develop' into fs2_internal_alignment (Yingzhi WANG) #2119
[5565073]: Add file check on all recipe tests (#2126) (Mirco Ravanelli) #2126
[76923a4]: removeused varibles, add exception types (BenoitWang) #2119
[0a18729]: Merge branch 'fs2_internal_alignment' of https://github.com/BenoitWang/speechbrain into fs2_internal_alignment (BenoitWang) #2119
[d10f9c9]: add docstrings and examples (BenoitWang) #2119
[300aba7]: fix (BenoitWang) #2119
[32eea80]: Improve documentation of multi-process checkpointing (Peter Plantinga) #2132
[1f1a657]: Add unittest for parallel checkpointing (Peter Plantinga) #2132
[c742768]: Update tests/unittests/test_checkpoints.py (Peter Plantinga) #2132
[cc02ab9]: Update speechbrain/utils/checkpoints.py (Peter Plantinga) #2132
[1c91654]: Update speechbrain/utils/checkpoints.py (Peter Plantinga) #2132
[9325b56]: add unknown as pad token id (poonehmousavi) #2086
[e03397a]: add unk_token for pad (poonehmousavi) #2086
[ba4511c]: fix precommit issue (poonehmousavi) #2086
[296d14d]: Update python versions tested in CI (Peter Plantinga) #2138
[9781034]: Fix version 3.10, interpreted as 3.1 (Peter Plantinga) #2138
[6132693]: Merge branch 'speechbrain:develop' into GPT2-finetuning (Pooneh Mousavi) #2086
[5cc966c]: Update pytest version (Peter Plantinga) #2138
[c848ec9]: readme update (pradnya-git-dev) #2120
[5eb55e3]: Merge remote-tracking branch 'upstream/develop' into bugfix/checkpoint-folder-on-main (Mirco Ravanelli) #2132
[7b9327b]: parallel checkpoint test sync via file (Peter Plantinga) #2132
[23b5dbc]: Update tests/unittests/test_checkpoints.py (Peter Plantinga) #2132
[bcbe5da]: Remove destroy_process_group() which causes hang (Peter Plantinga) #2132
[3298a29]: Merge branch 'develop' into fixDDP (Mirco Ravanelli) #2130
[25fa18a]: fix EOS issue (poonehmousavi) #2086
[b9e3fa4]: minor fix (poonehmousavi) #2086
[be4a6f1]: Merge branch 'develop' of https://github.com/speechbrain/speechbrain into resnet_spkreg (xuechenliu) #2142
[164f8fe]: Added bash script to save yaml files, fixed issue with orion config file, added baselines, added EEGConformer, removed DeepConvNet and LMDA (prometheus) #975
[3d39ccd]: Fixed issue in yaml (prometheus) #975
[321c9f7]: Fixed issue in baseline yaml (prometheus) #975
[a35b964]: Commit on the speaker embedding extraction script (xuechenliu) #2142
[b78eacf]: minor cleaning on the hparams (xuechenliu) #2142
[6fd881e]: Removed baselines, fixes in code format of ShallowConvNet, changes in hparam space of ShallowConvNet and EEGConformer (prometheus) #975
[d3e9ae0]: EOS issue (poonehmousavi) #2086
[d16ea05]: fix (poonehmousavi) #2086
[296398d]: fix pad_id (poonehmousavi) #2086
[43b4e29]: final fix for generation (poonehmousavi) #2086
[f860f4e]: disable open end generation (poonehmousavi) #2086
[646ec65]: add interface and increase dropout (BenoitWang) #2119
[966b3d5]: fix interface (BenoitWang) #2119
[7371caa]: fix import (BenoitWang) #2119
[7a21a66]: Bump gitpython from 3.1.32 to 3.1.34 in /recipes/BinauralWSJ0Mix (dependabot[bot]) #2156
[907f79a]: Use torchrun instead of torch.distributed.launch (Peter Plantinga) #2158
[73b8365]: Fix ddp test by using os environ local_rank (Peter Plantinga) #2158
[5f63f6d]: Remove local_rank from run_opts (Peter Plantinga) #2158
[98bcd07]: Update resample_folder.py to run with torchaudio 2.0 (Martin Nordstrom) #2162
[5b4ca63]: Fix path to output_filename in create_mixtures_metadata.py (Martin Nordstrom) #2162
[f64f569]: major bug fix; enhanced signal now fed into whisper instead of clean signal; revised results (sangeet2020) #2163
[f223310]: Bump gitpython from 3.1.34 to 3.1.35 in /recipes/BinauralWSJ0Mix (dependabot[bot]) #2164
[987aa35]: Merge branch 'speechbrain:develop' into RescueSpeech (Sangeet Sagar) #2163
[89de3dd]: minor changes (sangeet2020) #2163
[f8654a9]: fix test yaml (Mirco Ravanelli) #2165
[5f87b03]: minor changes (sangeet2020) #2163
[9630882]: fix yaml inconsistencies (Mirco Ravanelli) #2165
[dd4abba]: fix trailing whitespace (Mirco Ravanelli) #2165
[2545b43]: readme update dropbox links (sangeet2020) #2163
[284e347]: update dropbox link in tests/recipes (sangeet2020) #2163
[fa25f82]: Merge branch 'speechbrain:develop' into RescueSpeech (Sangeet Sagar) #2163
[775eeb0]: update dropbox links (sangeet2020) #2163
[f7d273d]: Merge branch 'develop' of github.com:speechbrain/speechbrain into fix-reproduce-libriparty (Martin Nordstrom) #2162
[5c57237]: YouTube channel / online summit (Adel Moumen) #2166
[fc3d72d]: Merge branch 'develop' of https://github.com/speechbrain/speechbrain into resnet_spkreg (Xuechen Liu) #2142
[b82e798]: fix fetching and checkpointing due to failing recipe tests (Mirco Ravanelli) #2167
[972cc65]: let checkpoiting with the same name (Mirco Ravanelli) #2167
[bc8906c]: fix black (Mirco Ravanelli) #2167
[26da725]: commented parallel checkpointing test. It is currently failing (even on other PRs) only on the CI servers (Mirco Ravanelli) #2167
[40d091b]: sort execution of recipes tests (Mirco Ravanelli)
[c6ef85d]: sort recipe tests + minor fixes (Mirco Ravanelli)
[ef92a05]: Merge remote-tracking branch 'upstream/develop' into fix-reproduce-libriparty (Mirco Ravanelli) #2162
[1a9f06a]: update dropbox & hf links (BenoitWang) #2119
[8d89a40]: resolve conflict (BenoitWang) #2119
[10d85e3]: minor edits for clarify improvements (Mirco Ravanelli) #2162
[525b74a]: Merge remote-tracking branch 'upstream/develop' into use-torchrun (Mirco Ravanelli) #2158
[bc81789]: Merge remote-tracking branch 'upstream/develop' into resnet_spkreg (Mirco Ravanelli) #2142
[9856912]: Merge branch 'develop' into fixDDP (Mirco Ravanelli) #2130
[825e114]: fix numpy 1.24 issue (BenoitWang) #2119
[56abcb1]: update readme (BenoitWang) #2119
[33c4d5b]: Merge remote-tracking branch 'upstream/develop' into fs2_internal_alignment (Mirco Ravanelli) #2119
[23e3ceb]: update to latest dev + minor modifications (Mirco Ravanelli) #2119
[b5be99f]: fix comments and add docstring (Xuechen Liu) #2142
[901b5e3]: update to latest dev + small fixes (Mirco Ravanelli) #2120
[8c6db1d]: Merge branch 'develop' into MSTTS (Mirco Ravanelli) #2120
[0b09dd6]: fix yaml + fix recipe test on voxceleb (Mirco Ravanelli) #2120
[ae6da04]: Merge branch 'MSTTS' of https://github.com/pradnya-git-dev/speechbrain into MSTTS (Mirco Ravanelli) #2120
[3ea3a1f]: add missing link (Mirco Ravanelli) #2120
[e88b65b]: Merge branch 'develop' of https://github.com/speechbrain/speechbrain into resnet_spkreg (Xuechen Liu) #2142
[3280d03]: fix recipe test, add docstring examples (BenoitWang) #2119
[0c38b08]: fix examples (BenoitWang) #2119
[eb7b839]: Merge branch 'speechbrain:develop' into MSTTS (pradnya-git-dev) #2120
[de139b2]: code optimization (pradnya-git-dev) #2120
[9364199]: code optimization - loss restore (pradnya-git-dev) #2120
[4fd2380]: minor documentation change (pradnya-git-dev) #2120
[d3be8d3]: minor documentation fix for tests (pradnya-git-dev) #2120
[a508c40]: updating loss example (pradnya-git-dev) #2120
[ff0c768]: updating hparams (pradnya-git-dev) #2120
[b17e13c]: removing script redundancy (pradnya-git-dev) #2120
[e813476]: minor changes for tests (pradnya-git-dev) #2120
[f6957ae]: updating recipe entry (pradnya-git-dev) #2120
[0c42325]: minor changes for tests (pradnya-git-dev) #2120
[ce07c3a]: changes for inference (pradnya-git-dev) #2120
[22a7743]: internal sorting for input texts (pradnya-git-dev) #2120
[fbb074c]: improve bug_report.yaml (Adel Moumen) #2172
[45a65a5]: fix title (Adel Moumen) #2172
[8790c07]: Update pull_request_template.md (Adel Moumen) #2172
[2dae0cb]: linters (Adel Moumen) #2172
[554ca2e]: Update README.md (#2171) (Adel Moumen) #2171
[fccb581]: Remove distributed_launch flag and update docs (Peter Plantinga) #2158
[9e1b588]: Fix check for rank and local rank (Peter Plantinga) #2158
[2d8e6f8]: small improvement in the doc + manage PLACEHOLDER and output folder (Mirco Ravanelli) #2142
[2cdc63f]: fix hard-coded devices (#2178) (Mirco Ravanelli) #2178
[3457755]: Fix multi-head attention when return_attn_weights=False (Luca Della Libera) #2183
[3a16166]: Update multi-head attention docstring (Luca Della Libera) #2183
[221f2da]: Updated yaml files after hparam tuning (prometheus) #975
[208bccb]: Updated EEGConformer (prometheus) #975
[dcc29c7]: Updated README.md (prometheus) #975
[5b791fe]: Updated README.md (prometheus) #975
[9861876]: Merge remote-tracking branch 'upstream/develop' into eeg_decoding (Mirco Ravanelli) #975
[d0296f5]: fix linters (Mirco Ravanelli) #975
[e412656]: improve README (Mirco Ravanelli) #975
[e8be915]: remove unnecesary folder (Mirco Ravanelli) #975
[e431763]: remove files that will be added into speechbrain benchmark (Mirco Ravanelli) #975
[63b2f99]: Merge branch 'speechbrain:develop' into GPT2-finetuning (Pooneh Mousavi) #2086
[dce8021]: Merge branch 'speechbrain:develop' into GPT2-finetuning (Pooneh Mousavi) #2086
[b90034c]: add response-generator interface (poonehmousavi) #2086
[34aafe0]: fix pytest (poonehmousavi) #2086
[eeda2c0]: fix pytest (poonehmousavi) #2086
[9284f24]: fix docstring (poonehmousavi) #2086
[b5adc8f]: updating hparams with the current best (pradnya-git-dev) #2120
[4f02ca8]: fix hyaml bug (poonehmousavi) #2086
[54ab2f8]: minor fix (poonehmousavi) #2086
[ed6f08d]: fix interface logging issue (poonehmousavi) #2086
[801162e]: fix precommit issue (poonehmousavi) #2086
[7cfd162]: HyperConformer (#1905) (Florian Mai) #1905
[2983f8a]: clean commnets (poonehmousavi) #2086
[697c708]: fix readme (poonehmousavi) #2086
[cf48a46]: change interface to be compatibale with pytest (poonehmousavi) #2086
[629b99e]: Update README.md (Adel Moumen) #2189
[ceb7838]: fix typo that preveted recipe tests to run (Mirco Ravanelli) #2086
[fd3b8a8]: automatic download + fix replacement path (Mirco Ravanelli) #2086
[9634e9d]: remove transformers from extra-req as already in the main requirements (Mirco Ravanelli) #2086
[3d37983]: fix linter (Mirco Ravanelli) #2086
[cd41db3]: DNS recipe (#1742) (Sangeet Sagar) #1742
[e229e1a]: Attempting to fix failing test (with pytorch 2.1) (#2193) (Mirco Ravanelli) #2193
[4ab5219]: Broadcast the decision to checkpoint to all processes (#2192) (Peter Plantinga) #2192
[f4e8dd5]: update huggingface_hub requirement to avoid TypeDict error (tuanct1997) #2195
[264a0bc]: Avoid sync if mid-epoch checkpoints are disabled (Peter Plantinga) #2200
[918d8ef]: new pitch (Mirco Ravanelli) #2201
[92f541e]: Bump gitpython from 3.1.35 to 3.1.37 in /recipes/BinauralWSJ0Mix (dependabot[bot]) #2203
[5eec78b]: fix open rir (Mirco Ravanelli) #2205
[86670b4]: small follow up fix on openrir (Mirco Ravanelli)
[fab9657]: update dropbox (Mirco Ravanelli) #2201
[67d0de9]: Update README.md (Mirco Ravanelli) #2201
[19cbb87]: Update LJSpeech.csv (Mirco Ravanelli) #2201
[6271dd0]: remove related doc with distributed_launch (Adel Moumen) #2207
[219476c]: pre-commit (Adel Moumen) #2207
[21d619c]: adding random speaker voice generation (pradnya-git-dev) #2120
[62ac16e]: Merge branch 'develop' into MSTTS (pradnya-git-dev) #2120
[b84fa8d]: minor changes for flake8 (pradnya-git-dev) #2120
[f69f280]: updates for doctests (pradnya-git-dev) #2120
[5897742]: fix one issue wit recipe tests (Mirco Ravanelli) #2120
[e0d5a1b]: last fix pitch fastspeec2 (Mirco Ravanelli)
[55b442d]: readme update (pradnya-git-dev) #2120
[79cff28]: minor update for tests (pradnya-git-dev) #2120
[a78a571]: update documentation to clarify when to use --jit (Mirco Ravanelli) #2215
[ba492f9]: small fix in recipe tests (Mirco Ravanelli) #2120
[ec359cb]: add dropbox link (Mirco Ravanelli) #2120
[40bbe0f]: add performance notice (Mirco Ravanelli) #2120
[fc892ac]: last change (Mirco Ravanelli) #2120
[3c840ed]: reverting an error added by HyperConformer (code from Samsung AI Cambridge) (#2217) (Parcollet Titouan) #2217
[121f55b]: fix recipe tests tool (#2218) (Adel Moumen) #2218
[81138e8]: ASR recipe for Tedlium2 (code from Samsung AI Cambridge) (#2191) (Parcollet Titouan) #2191
[7f62dd8]: Add speech-to-speech translation (#2044) (Jarod) #2044
[e09cdac]: Refactor aishell data prep (#2219) (Adel Moumen) #2219
[bd27e99]: Create .gitignore (#2222) (Adel Moumen) #2222
[ab3c962]: fix incorrect parameter in LibriTTS hifigan vocoder (Chaanks) #2244
[2f27f7e]: fix failing recipe test (tiny fix) (Mirco Ravanelli)
[94862c8]: Update version.txt (#2256) (Mirco Ravanelli) #2256
[0ac4dc3]: Merge branch 'develop' (Mirco Ravanelli) #2257
[a581cae]: New version (#2257) (Mirco Ravanelli) #2257
[65c0113]: Merge branch 'develop' (Mirco Ravanelli)

v0.5.15

9 months ago

v0.5.14

1 year ago

This release is a minor yet important release. It increases significantly the number of features available while fixing quite a lot of small bugs and issues. A summary of the achievements of this release is given below, while a complete detailed list of all the changes can be found at the bottom of this release note.

Notable achievements

22 new contributors, thank you so much, everyone!
31 new recipes (ASR, SLU, AST, AER, Interpretability, SSL).
FULL automatic recipe testing.
Increased coverage for the continuous integration over the code, URLs, YAML, recipes, and HuggingFace models.
New Conformer Large model for ASR.
Integration of Whisper for fine-tuning or inference.
Full pre-training of wav2vec2 entirely re-implemented AND documented.
Low resource Speech Translation with IWSLT.
Many other novelties... see below.

What's Changed

fix 1522 by @anautsch in https://github.com/speechbrain/speechbrain/pull/1526
bug-fix: fixed OPEN_RIR data preparation process conflict. by @xin-w8023 in https://github.com/speechbrain/speechbrain/pull/1536
add noise and reverberance version for BinauralWSJ0Mix by @huangzj421 in https://github.com/speechbrain/speechbrain/pull/1502
fix distributed namespace by @anautsch in https://github.com/speechbrain/speechbrain/pull/1566
feat: use member field instead of hard-code by @xin-w8023 in https://github.com/speechbrain/speechbrain/pull/1567
Update logo to new version by @pplantinga in https://github.com/speechbrain/speechbrain/pull/1575
IWSLT 2022 speech translation recipe by @mzboito in https://github.com/speechbrain/speechbrain/pull/1475
Fix Issue #1277 timit recipe missing uppercase option by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1564
Update README.md by @qanastek in https://github.com/speechbrain/speechbrain/pull/1577
Output hiddens states from all the transformer layers of huggingface_wav2vec by @BenoitWang in https://github.com/speechbrain/speechbrain/pull/1570
Fix bugs of update_learning_rate by @wangxin22 in https://github.com/speechbrain/speechbrain/pull/1578
Fix to use output of unsqueeze() in Tacotron2 parse_decoder_outputs() by @jqug in https://github.com/speechbrain/speechbrain/pull/1525
wav2vec2 pretraining implemented with speechbrain by @RuABraun in https://github.com/speechbrain/speechbrain/pull/1312
In filter_ctc_output(), remove redundant filtering by @olvb in https://github.com/speechbrain/speechbrain/pull/1584
Fixed output_all_hiddens for hubert in huggingface_wav2vec by @gorinars in https://github.com/speechbrain/speechbrain/pull/1587
Fix return value of batch_evaluation for separation recipes by @z-wony in https://github.com/speechbrain/speechbrain/pull/1555
fix endless doctest despite no example by @anautsch in https://github.com/speechbrain/speechbrain/pull/1591
Fix documented min python version to 3.7 by @AsuMagic in https://github.com/speechbrain/speechbrain/pull/1595
Conformer separation by @ycemsubakan in https://github.com/speechbrain/speechbrain/pull/1519
Add CTC recipe to AISHELL-1 by @BenoitWang in https://github.com/speechbrain/speechbrain/pull/1576
Add templates for issues by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1588
Added workaround for CyclicLR saving by @Gastron in https://github.com/speechbrain/speechbrain/pull/1683
scikit-learn import and comment fix by @underdogliu in https://github.com/speechbrain/speechbrain/pull/1485
Adding recipe for HiFiGAN training using LibriTTS dataset by @pradnya-git-dev in https://github.com/speechbrain/speechbrain/pull/1621
fix LibriSpeech CTC pretrainer by @BenoitWang in https://github.com/speechbrain/speechbrain/pull/1594
wav2vec German model added by @sangeet2020 in https://github.com/speechbrain/speechbrain/pull/1557
issue 1615 typo fix by @sharmadhiraj86 in https://github.com/speechbrain/speechbrain/pull/1700
typo in TransformerASR.py by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1704
Causality in Conv2d by @fpaissan in https://github.com/speechbrain/speechbrain/pull/1608
Switchboard Recipe by @dwgnr in https://github.com/speechbrain/speechbrain/pull/1460
read_audio fixes and docs cleanup by @AsuMagic in https://github.com/speechbrain/speechbrain/pull/1592
Fix path flake8 in pre-commit by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1721
Added german_cleaners by @padmalcom in https://github.com/speechbrain/speechbrain/pull/1642
fixing issue 1707 by @TParcollet in https://github.com/speechbrain/speechbrain/pull/1728
explicit fetch args & download-only option by @anautsch in https://github.com/speechbrain/speechbrain/pull/1735
fix sorting bug by @anautsch in https://github.com/speechbrain/speechbrain/pull/1730
remove discussions references by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1737
Fix torchaudio mel_normalized for Tacotron2&HifiGAN by @BenoitWang in https://github.com/speechbrain/speechbrain/pull/1740
Whisper finetuning by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1717
loss must be avg when BS>1 when calling evaluate_batch() by @sangeet2020 in https://github.com/speechbrain/speechbrain/pull/1744
[FIX] Flush gradients and save memory for validation. by @MartinKocour in https://github.com/speechbrain/speechbrain/pull/1739
add coloring in tqdm progress bar by @sangeet2020 in https://github.com/speechbrain/speechbrain/pull/1573
Fix librispeech transformer recipe by @TParcollet in https://github.com/speechbrain/speechbrain/pull/1775
🖍️ improving type-hints in speechbrain/pretrained/interfaces.py by @jonasvdd in https://github.com/speechbrain/speechbrain/pull/1725
Enabling the retrieval of whisper's hidden states by @Hguimaraes in https://github.com/speechbrain/speechbrain/pull/1751
Added fix to use DDP with hifi_gan training on ljspeech by @padmalcom in https://github.com/speechbrain/speechbrain/pull/1781
Fix wav2vec2 masking by @TParcollet in https://github.com/speechbrain/speechbrain/pull/1799
fix #1794 by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1805
refactor: recipe testing CSVs by @anautsch in https://github.com/speechbrain/speechbrain/pull/1600
fix 1788 by @BenoitWang in https://github.com/speechbrain/speechbrain/pull/1842
fix docstring for pooling by @BenoitWang in https://github.com/speechbrain/speechbrain/pull/1843
Whisper finetunng common voice by @poonehmousavi in https://github.com/speechbrain/speechbrain/pull/1809
fixing the convtasnet causal=True bug by @ycemsubakan in https://github.com/speechbrain/speechbrain/pull/1851
Fix Whisper doc + improve max_decode_ratio by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1858
Rewrite multi-GPU documentation by @AsuMagic in https://github.com/speechbrain/speechbrain/pull/1861
SLU Media recipe by @GaelleLaperriere in https://github.com/speechbrain/speechbrain/pull/1172
edits for refactoring check tool by @anautsch in https://github.com/speechbrain/speechbrain/pull/1838
minor fixes for recipe testing by @anautsch in https://github.com/speechbrain/speechbrain/pull/1872
Fix Whisper avoid_if_longer_than never used by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1882
Starting a recipe for ESC50 by @ycemsubakan in https://github.com/speechbrain/speechbrain/pull/1605
Fix for #1886 by @anautsch in https://github.com/speechbrain/speechbrain/pull/1890
fix batch_to_right by @anthony-wss in https://github.com/speechbrain/speechbrain/pull/1884
Fixes for pre-release testing by @anautsch in https://github.com/speechbrain/speechbrain/pull/1895
Fix Conformer Instabilities and add Large Model by @TParcollet in https://github.com/speechbrain/speechbrain/pull/1892
Downsampling by @salah-zaiem in https://github.com/speechbrain/speechbrain/pull/1888
fix core.py bf16 by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1898
S2SGreedySearcher : Do not continue decoding when EOS token was generated for all samples from a batch by @Jeronymous in https://github.com/speechbrain/speechbrain/pull/1899
quick fixes before minor by @anautsch in https://github.com/speechbrain/speechbrain/pull/1896

New Contributors

@xin-w8023 made their first contribution in https://github.com/speechbrain/speechbrain/pull/1536
@mzboito made their first contribution in https://github.com/speechbrain/speechbrain/pull/1475
@qanastek made their first contribution in https://github.com/speechbrain/speechbrain/pull/1577
@wangxin22 made their first contribution in https://github.com/speechbrain/speechbrain/pull/1578
@jqug made their first contribution in https://github.com/speechbrain/speechbrain/pull/1525
@olvb made their first contribution in https://github.com/speechbrain/speechbrain/pull/1584
@gorinars made their first contribution in https://github.com/speechbrain/speechbrain/pull/1587
@z-wony made their first contribution in https://github.com/speechbrain/speechbrain/pull/1555
@AsuMagic made their first contribution in https://github.com/speechbrain/speechbrain/pull/1595
@pradnya-git-dev made their first contribution in https://github.com/speechbrain/speechbrain/pull/1621
@sangeet2020 made their first contribution in https://github.com/speechbrain/speechbrain/pull/1557
@sharmadhiraj86 made their first contribution in https://github.com/speechbrain/speechbrain/pull/1700
@fpaissan made their first contribution in https://github.com/speechbrain/speechbrain/pull/1608
@dwgnr made their first contribution in https://github.com/speechbrain/speechbrain/pull/1460
@padmalcom made their first contribution in https://github.com/speechbrain/speechbrain/pull/1642
@MartinKocour made their first contribution in https://github.com/speechbrain/speechbrain/pull/1739
@jonasvdd made their first contribution in https://github.com/speechbrain/speechbrain/pull/1725
@poonehmousavi made their first contribution in https://github.com/speechbrain/speechbrain/pull/1809
@GaelleLaperriere made their first contribution in https://github.com/speechbrain/speechbrain/pull/1172
@anthony-wss made their first contribution in https://github.com/speechbrain/speechbrain/pull/1884
@salah-zaiem made their first contribution in https://github.com/speechbrain/speechbrain/pull/1888
@Jeronymous made their first contribution in https://github.com/speechbrain/speechbrain/pull/1899

Full Changelog: https://github.com/speechbrain/speechbrain/compare/v0.5.13...v0.5.14

v0.5.13

1 year ago

This is a minor release with better dependency version specification. We note that SpeechBrain is compatible with PyTorch 1.12, and the updated package reflects this. See the issue linked next to each commit for more details about the corresponding changes.

Commit summary

[edb7714]: Adding no_sync and on_fit_batch_end method to core (Rudolf Arseni Braun) #1449
[07155e9]: G2P fixes (flexthink) #1473
[6602dab]: fix for #1469, minimal testing for profiling (anautsch) #1476
[abbfab9]: test clean-ups: passes linters; doctests; unit & integration tests; load-yaml on cpu (anautsch) #1487
[1a16b41]: fix ddp incorrect command (=) #1498
[0b0ec9d]: using no_sync() in fit_batch() of core.py (Rudolf Arseni Braun) #1449
[5c9b833]: Remove torch maximum compatible version (Peter Plantinga) #1504
[d0f4352]: remove limit for HF hub as it does not work with colab (Titouan) #1508
[b78f6f8]: Add revision to hub (Titouan) #1510
[2c491a4]: fix transducer loss inputs devices (Adel Moumen) #1511
[4972f76]: missing space in install command (pehonnet) #1512
[6bc72af]: Fixing shuffle argument for distributed sampler in core.py (Rudolf Arseni Braun) #1518
[df7acd9]: Added the link for example results (cem) #1523
[5bae6df]: add LinearWarmupScheduler (Ge Li) #1537
[2edd7ee]: updating scipy version in requirements.txt. (Nauman Dawalatabad) #1546

v0.5.12

1 year ago

Release Notes - SpeechBrain v0.5.12

We worked very hard and we are very happy to announce the new version of SpeechBrain!

SpeechBrain 0.5.12 significantly expands the toolkit without introducing any major interface changes. I would like to warmly thank the many contributors that made this possible.

The main changes are the following:

A) Text-to-Speech: We developed the first TTS system of SpeechBrain. You can find it here. The system relies on Tacotron2 + HiFiGAN (as vocoder). The models coupled with an easy-inference interface are available on HuggingFace.

B) Grapheme-to-Phoneme (G2P): We developed an advanced Grapheme-to-Phoneme. You can find the code here. The current version significantly outperforms our previous model.

C) Speech Separation:

We developed a novel version of the SepFormer called Resource-Efficient SepFormer (RE-Sepformer). The code is available here and the pre-trained model (with an easy inference interface) here.
We released a recipe for Binaural speech separation with WSJMix. See the code here.
We released a new recipe with the AIShell mix dataset. You can see the code here.

D) Speech Enhancement:

We released the SepFormer model for speech enhancement. the code is here, while the pre-trained model (with easy-inference interface) is here.
We implemented the WideResNet for speech enhancement and use it to mimic loss-based speech enhancement. The code is here and the pretrained model (with easy-inference interface) is here.

E) Feature Front-ends:

We now support LEAF filter banks. The code is here. You can find an example of a recipe using it here.
We now support SincConv multichannel (see code here).

F) Recipe Refactors:

We refactored the Voxceleb recipe and fix the normalization issues. See the new code here. We also made the EER computation method less memory demanding (see here).
We refactored the IEMOCAP recipe for emotion recognition. See the new code here.

G) Models for African Languages: We now have recipes for the DVoice dataset. We currently support Darija, Swahili, Wolof, Fongbe, and Amharic. The code is available here. The pretrained model (coupled with an easy-inference interface) can be found on SpeechBrain-HuggingFace.

H) Profiler: We implemented a model profiler that helps users while developing new models with SpeechBrain. The profiler outputs a bunch of potentially useful information, such as the real-time factors and many other details. A tutorial is available here.

I) Tests: We significantly improved the tests. In particular, we introduced the following tests: HF_repo tests, docstring checks, yaml-script consistency, recipe tests, and check URLs. This will helps us scale up the project.

L) Other improvements:

We now support the torchaudio RNNT loss*.
We improved the relative attention mechanism of the Conformer.
We updated the transformer for LibriSpeech. This improves the performance from WER= 2.46% to 2.26% on the test-clean. See the code here.
The Environmental corruption module can now support different sampling rates.
Minor fixes.

v0.5.11

2 years ago

Dear users, We worked very hard, and we are very happy to announce the new version of SpeechBrain. SpeechBrain 0.5.11 further expands the toolkit without introducing any major interface change.

The main changes are the following:

We implemented new recipes, such as:

VoxLingua 107 for language identification.
Sepformer for speech enhancement
MetricGAN-U for speech enhancement
SLURP with wav2vec for spoken language understanding.
REALM for speech separation with real data.
Korean Speech Recognition with KsponSpeech.
CommonVoice for German.
IEMOCAP for language emotion recognition using wav2vec.

Support for Dynamic batching with a Tutorial to help users familiarize themselves with it.
Support for wav2vec training within SpeechBrain.
Developed an interface with Orion for hyperparameter tuning with a Tutorial to help users familiarize themselves with it.
the torchaudio transducer loss is now supported. We also kept our numba implementation to help users customize the transducer loss part if needed.
Improved CTC-Segmentation
Fixed minor bugs and issues (e.g., fixed MVDR beamformer ).

Let me thank all the amazing contributors for this achievement. Please, keep add a star to our project if you appreciate our effort for the community. Together, we are growing very fast, and we have big plans for the future.

Stay Tuned!

0.5.10

2 years ago

This version mainly expands the functionalities of SpeechBrain without adding any backward incompatibilities.

New Recipes:

Language Identification with CommonLanguage
EEG signal processing with ERPCore
Speech translation with Fisher-Call Home
Emotion Recognition with IEMOCAP
Voice Activity Detection with LibriParty
ASR with LibriSpeech wav2vec (WER=1.9 on test-clean)
SpeechEnhancement with CoopNet
SpeechEnhancement with SEGAN
Speech Separation with LibriMix, WHAM, and WHAMR
Support for guided attention
Spoken Language Understanding with SLURP

Beyond that, we fixed some minor bugs and issues.

v0.5.9

2 years ago

This main differences with the previous version are the following:

Added Wham/whamr/librimix for speech separation
Compatibility with PyTorch 1.9
Fixed minor bugs
Added SpeechBrain paper

v0.5.8

2 years ago

SpeechBrain 0.5.8 improves the previous version in the following way:

Added wav2vec support in TIMIT, CommonVoice, AISHELL-1
Improved Fluent Speech Command Recipe
Improved SLU recipes
Recipe for UrbanSound8k
Fix small bugs
Fix typos