Speechbrain Versions Save

A PyTorch-based Speech Toolkit

v1.0.0

2 months ago

GitHub Repo stars Please, help our community project. Star on GitHub!

🚀 What's New in SpeechBrain 1.0?

📅 On February 2024, we released SpeechBrain 1.0, the result of a year-long collaborative effort by a large international network of developers led by our exceptional core development team.

📊 Some Numbers:

  • SpeechBrain has evolved into a significant project and stands among the most widely used open-source toolkits for speech processing.
  • Over 140 developers have contributed to our repository, getting more than 7.3k stars on GitHub.
  • Monthly downloads from PyPI have reached an impressive 200k.
  • Expanded to over 200 recipes for Conversational AI, featuring more than 100 pretrained models on HuggingFace.

🌟 Key Updates:

  • SpeechBrain 1.0 introduces significant advancements, expanding support for diverse datasets and tasks, including NLP and EEG processing.

  • The toolkit now excels in Conversational AI and various sequence processing applications.

  • Improvements encompass key techniques in speech recognition, streamable conformer transducers, integration with K2 for Finite State Transducers, CTC decoding and n-gram rescoring, new CTC/joint attention Beam Search interface, enhanced compatibility with HuggingFace Models (including GPT2 and Llama2), and refined data augmentation, training, and inference processes.

  • We have created a new repository dedicated to benchmarks, accessible at here. At present, this repository features benchmarks for various domains, including speech self-supervised models (MP3S), continual learning (CL-MASR), and EEG processing (SpeechBrain-MOABB).

For detailed technical information, please refer to the section below.

🔄 Breaking Changes

People familiar with SpeechBrain know very well that we do our best to avoid backward incompatible changes. While SpeechBrain has consistently prioritized maintaining backward compatibility, the introduction of this new major version presented an opportunity for significant enhancements and refactorings.

  1. 🤗 HuggingFace Interface Refactor:

    • Previously, our interfaces were limited to specific models like Whisper, HuBERT, WavLM, and wav2vec 2.0.
    • We've refactored the interface to be more general, now supporting any transformer model from HuggingFace including LLMs.
    • Simply inherit from our new interface and enjoy the flexibility.
    • The updated interfaces can be accessed here.
  2. 🔍 BeamSearch Refactor:

    • The previous beam search interface, while functional, was challenging to comprehend and modify due to the combined search and rescoring parts.
    • We've introduced a new interface where scoring and search are separated, managed by distinct functions, resulting in simpler and more readable code.
    • This update allows users to easily incorporate various scorers, including n-gram LM and custom heuristics, in the search part.
    • Additionally, support for pure CTC training and decoding, batch and GPU decoding, partial or full candidate scoring, and N-best hypothesis output with neural LM rescorers has been added.
    • An interface to K2 for search based on Finite State Transducers (FST) is now available.
    • The updated decoders are available here.
  3. 🎨 Data Augmentation Refactor:

    • The data augmentation capabilities have been enhanced, offering users access to various functions in speechbrain/augment.
    • New techniques, such as CodecAugment, RandomShift (Time), RandomShift (Frequency), DoClip, RandAmp, ChannelDrop, ChannelSwap, CutCat, and DropBitResolution, have been introduced.
    • Augmentation can now be customized and combined using the Augmenter interface in speechbrain/augment/augmenter.py, providing more control during training.
    • Take a look here for a tutorial on speech augmentation.
    • The updated augmenters are available here.
  4. 🧠 Brain Class Refactor:

    • The fit_batch method in the Brain Class has been refactored to minimize the need for overrides in training scripts.
    • Native support for different precisions (fp32, fp16, bf16), mixed precision, compilation, multiple optimizers, and improved multi-GPU training with torchrun is now available.
    • Take a look at the refactored brain class here.
  5. 🔍 Inference Interfaces Refactor:

    • Inference interfaces, once stored in a single file (speechbrain/pretrained/interfaces.py), are now organized into smaller libraries in speechbrain/inference, enhancing clarity and intuitiveness.
    • You can access the new inference interfaces here.

🔊 Automatic Speech Recognition

  • Developed a new recipe for training a Streamable Conformer Transducer using Librispeech dataset (accessible here). The streamable model achieves a Word Error Rate (WER) of 2.72% on the test-clean subset.
  • Implemented a dedicated inference inference to support streamable ASR (accessible here).
  • New models, including HyperConformer andd Branchformer have been introduced. Examples of recipes utilizing them can be found here.
  • Additional support for datasets like RescueSpeech, CommonVoice 14.0, AMI, Tedlium 2.
  • The ASR search pipeline has undergone a complete refactoring and enhancement (see comment above).
  • A new recipe for Bayesian ASR has been added here.

🔄 Interface with Kaldi2 (K2-FSA)

  • Integration of an interface that seamlessly connects SpeechBrain with K2-FSA, allowing for constrained search and more.
  • Support for K2 CTC training and lexicon decoding, along with integration of K2 HLG and n-gram rescoring.
  • Competitive results achieved with Wav2vec2 on LibriSpeech test sets.
  • Explore an example recipe utilizing K2 here.

🎙 Speech Synthesis (TTS)

🌐 Speech-to-Speech Translation:

  • Introduction of new recipes for CVSS datasets and IWSLT 2022 Low-resource Task, based on mBART/NLLB and SAMU wav2vec.

🌟 Speech Generation

  • Implementation of diffusion and latent diffusion techniques with an example recipe showcased on AudioMNIST.

🎧 Interpretability of Audio Signals

  • Implementation of Learning to Interpret and PIQ techniques with example recipes demonstrated on ECS50.

😊 Speech Emotion Diarization

  • Support for Speech Emotion Diarization, featuring an example recipe on the Zaion Emotion Dataset. See the training recipe here.

🎙️ Speaker Recognition

🔊 Speech Enhancement

  • Release of a new Speech Enhancement baseline based on the DNS dataset.

🎵 Discrete Audio Representations

  • Support for pretrained models with discrete audio representations, including EnCodec and DAC.
  • Support for discretization of continuous representations provided by popular self-supervised models such as Hubert and Wav2vec2.

🤖 Interfaces with Large Language Models

  • Creation of interfaces with popular open-source Large Language Models, such as GPT2 and Llama2.
  • These models can be easily fine-tuned in SpeechBrain for tasks like Response Generation, exemplified with a recipe for the MultiWOZ dataset.
  • The Large Language Model can also be employed to rescore n-best ASR hypotheses.

🔄 Continuous Integration

  • All recipes undergo automatic testing with one or multiple GPUs, ensuring robust performance.
  • HuggingFace interfaces are automatically verified, contributing to a seamless integration process.
  • Continuous improvement of integration and unitary tests to comprehensively cover most functionalities within SpeechBrain.

🔍 Profiling

  • We have simplified the Profiler to enable easier identification of computing bottlenecks and quicker evaluation of model efficiency.
  • Now, you can profile your model during training effortlessly with:
python train.py hparams/config.yaml --profile_training --profile_warmup 10 --profile_steps 5
  • Check out our tutorial for more detailed information.

📈 Benchmarks

  • Release of a new benchmark repository, aimed at aiding the community in standardization across various areas.

    1. CL-MASR (Continual Learning Benchmark for Multilingual ASR):
    • A benchmark designed to assess continual learning techniques on multilingual speech recognition tasks

    • Provides scripts to train multilingual ASR systems, specifically Whisper and WavLM-based, on a subset of 20 languages selected from Common Voice 13 in a continual learning fashion.

    • Implementation of various methods, including rehearsal-based, architecture-based, and regularization-based approaches.

    1. Multi-probe Speech Self Supervision Benchmark (MP3S):
    • A benchmark for accurate assessment of speech self-supervised models.
    • Noteworthy for allowing users to select multiple probing heads for downstream training.
    1. SpeechBrain-MOABB:
    • A benchmark offering recipes for processing electroencephalographic (EEG) signals, seamlessly integrated with the popular Mother of all BCI Benchmarks (MOABB).
    • Facilitates the integration and evaluation of new models on all supported tasks, presenting an interface for easy model integration and testing, along with a fair and robust method for comparing different architectures.

🔄 Transitioning to SpeechBrain 1.0

  • Please, refer to this tutorial for in-depth technical information regarding the transition to SpeechBrain 1.0.

New Contributors

Full Changelog: https://github.com/speechbrain/speechbrain/compare/v0.5.16...v1.0.0

v0.5.16

5 months ago

SpeechBrain 0.5.16 will be the last minor version of SpeechBrain before the major release of SpeechBrain 1.0.

In this minor version, we have focused on refining the existing features without introducing any interface changes, ensuring a seamless transition to SpeechBrain 1.0 where backward incompatible modifications will take place.

Key Highlights of SpeechBrain 0.5.16:

Bug Fixes: Numerous small fixes have been implemented to enhance the overall stability and performance of SpeechBrain.

Testing and Documentation: We have dedicated efforts to improve our testing infrastructure and documentation, ensuring a more robust and user-friendly experience.

Expanded Model and Dataset Support: SpeechBrain 0.5.16 introduces support for several new models and datasets, enhancing the versatility of the platform. For a detailed list, please refer to the commits below.

Stay informed and get ready for the groundbreaking SpeechBrain 1.0, where we will unveil substantial changes and exciting new features.

Thank you for being a part of the SpeechBrain community!

Commits

  • [cea36b4]: Update README.md (Mirco Ravanelli) #1599
  • [cead130]: Updated README.md (prometheus) #975
  • [779c620]: Update README.md (Mirco Ravanelli) #2124
  • [32af2ac]: update requirement (to avoid deprecation error) (Mirco Ravanelli) #975
  • [b039df1]: small fixes (Mirco Ravanelli) #975
  • [07e7c73]: small fixes (Mirco Ravanelli) #975
  • [dac6842]: Update README.md (Mirco Ravanelli) #975
  • [75f4c66]: Update README.md (Mirco Ravanelli) #975
  • [327a3f5]: Fixed SSVEP yaml file (prometheus) #975
  • [067d94e]: Fixed conflicts (prometheus) #975
  • [331741d]: Fixed read/write conflicts mne config file when training many models in parallel (prometheus) #975
  • [0f25d5b]: Added hparam files for other architectures (prometheus) #975
  • [9ba76e3]: Updated LMDA, forcing odd kernel size in depth attention (prometheus) #975
  • [6336200]: Fixed activation in LMDA (prometheus) #975
  • [1593cc4]: Fixed issue in deepconvnet (prometheus) #975
  • [2f0f5f0]: Fixed issue with shallowconvnet (prometheus) #975
  • [8f70136]: Fixed issue with lmda (prometheus) #975
  • [ac4f9e4]: Merge remote-tracking branch 'origin/develop' into fixeval (Adel Moumen) #2123
  • [cdce80c]: fix ddp issue with loading a key (Adel Moumen) #2128
  • [66633a0]: Added template yaml files (prometheus) #975
  • [6f631a7]: minor additions for tests (pradnya-git-dev) #2120
  • [331acdb]: add notes on tests with non-default gpu (Mirco Ravanelli) #2130
  • [091b3ce]: fixed hard-coded device (Mirco Ravanelli) #2130
  • [cc72c9e]: fixed hard-coded device (Mirco Ravanelli) #2130
  • [c60e606]: fixed hard-coded device (Mirco Ravanelli) #2130
  • [253859e]: Resolve paths so relative works too (Aku Rouhe) #2128
  • [8a98401]: small fix on orion flag (Mirco Ravanelli) #975
  • [7da9a95]: extend fix to all files (Mirco Ravanelli) #975
  • [4b09ff2]: fix style (Mirco Ravanelli) #975
  • [ced2922]: Merge remote-tracking branch 'upstream/develop' into eeg_decoding (Mirco Ravanelli) #975
  • [5e070a2]: fix useless file (Mirco Ravanelli) #975
  • [46565cf]: Merge branch 'develop' of https://github.com/speechbrain/speechbrain into develop (xuechenliu) #2142
  • [19235f2]: Merge remote-tracking branch 'upstream/Adel-Moumen-revert_commit_ddp' into revert_commit_ddp (Adel Moumen) #2128
  • [2fb247f]: Save the checkpoint folder and meta only on the main process and communicate to all procs (Peter Plantinga) #2132
  • [f37d433]: Only broadcast checkpoint folder if distributed (Peter Plantinga) #2132
  • [e23da7d]: Initialize external loggers only on main process (Peter Plantinga) #2134
  • [67b1255]: fixes (BenoitWang) #2119
  • [70d8901]: Merge branch 'develop' into fs2_internal_alignment (Yingzhi WANG) #2119
  • [5565073]: Add file check on all recipe tests (#2126) (Mirco Ravanelli) #2126
  • [76923a4]: removeused varibles, add exception types (BenoitWang) #2119
  • [0a18729]: Merge branch 'fs2_internal_alignment' of https://github.com/BenoitWang/speechbrain into fs2_internal_alignment (BenoitWang) #2119
  • [d10f9c9]: add docstrings and examples (BenoitWang) #2119
  • [300aba7]: fix (BenoitWang) #2119
  • [32eea80]: Improve documentation of multi-process checkpointing (Peter Plantinga) #2132
  • [1f1a657]: Add unittest for parallel checkpointing (Peter Plantinga) #2132
  • [c742768]: Update tests/unittests/test_checkpoints.py (Peter Plantinga) #2132
  • [cc02ab9]: Update speechbrain/utils/checkpoints.py (Peter Plantinga) #2132
  • [1c91654]: Update speechbrain/utils/checkpoints.py (Peter Plantinga) #2132
  • [9325b56]: add unknown as pad token id (poonehmousavi) #2086
  • [e03397a]: add unk_token for pad (poonehmousavi) #2086
  • [ba4511c]: fix precommit issue (poonehmousavi) #2086
  • [296d14d]: Update python versions tested in CI (Peter Plantinga) #2138
  • [9781034]: Fix version 3.10, interpreted as 3.1 (Peter Plantinga) #2138
  • [6132693]: Merge branch 'speechbrain:develop' into GPT2-finetuning (Pooneh Mousavi) #2086
  • [5cc966c]: Update pytest version (Peter Plantinga) #2138
  • [c848ec9]: readme update (pradnya-git-dev) #2120
  • [5eb55e3]: Merge remote-tracking branch 'upstream/develop' into bugfix/checkpoint-folder-on-main (Mirco Ravanelli) #2132
  • [7b9327b]: parallel checkpoint test sync via file (Peter Plantinga) #2132
  • [23b5dbc]: Update tests/unittests/test_checkpoints.py (Peter Plantinga) #2132
  • [bcbe5da]: Remove destroy_process_group() which causes hang (Peter Plantinga) #2132
  • [3298a29]: Merge branch 'develop' into fixDDP (Mirco Ravanelli) #2130
  • [25fa18a]: fix EOS issue (poonehmousavi) #2086
  • [b9e3fa4]: minor fix (poonehmousavi) #2086
  • [be4a6f1]: Merge branch 'develop' of https://github.com/speechbrain/speechbrain into resnet_spkreg (xuechenliu) #2142
  • [164f8fe]: Added bash script to save yaml files, fixed issue with orion config file, added baselines, added EEGConformer, removed DeepConvNet and LMDA (prometheus) #975
  • [3d39ccd]: Fixed issue in yaml (prometheus) #975
  • [321c9f7]: Fixed issue in baseline yaml (prometheus) #975
  • [a35b964]: Commit on the speaker embedding extraction script (xuechenliu) #2142
  • [b78eacf]: minor cleaning on the hparams (xuechenliu) #2142
  • [6fd881e]: Removed baselines, fixes in code format of ShallowConvNet, changes in hparam space of ShallowConvNet and EEGConformer (prometheus) #975
  • [d3e9ae0]: EOS issue (poonehmousavi) #2086
  • [d16ea05]: fix (poonehmousavi) #2086
  • [296398d]: fix pad_id (poonehmousavi) #2086
  • [43b4e29]: final fix for generation (poonehmousavi) #2086
  • [f860f4e]: disable open end generation (poonehmousavi) #2086
  • [646ec65]: add interface and increase dropout (BenoitWang) #2119
  • [966b3d5]: fix interface (BenoitWang) #2119
  • [7371caa]: fix import (BenoitWang) #2119
  • [7a21a66]: Bump gitpython from 3.1.32 to 3.1.34 in /recipes/BinauralWSJ0Mix (dependabot[bot]) #2156
  • [907f79a]: Use torchrun instead of torch.distributed.launch (Peter Plantinga) #2158
  • [73b8365]: Fix ddp test by using os environ local_rank (Peter Plantinga) #2158
  • [5f63f6d]: Remove local_rank from run_opts (Peter Plantinga) #2158
  • [98bcd07]: Update resample_folder.py to run with torchaudio 2.0 (Martin Nordstrom) #2162
  • [5b4ca63]: Fix path to output_filename in create_mixtures_metadata.py (Martin Nordstrom) #2162
  • [f64f569]: major bug fix; enhanced signal now fed into whisper instead of clean signal; revised results (sangeet2020) #2163
  • [f223310]: Bump gitpython from 3.1.34 to 3.1.35 in /recipes/BinauralWSJ0Mix (dependabot[bot]) #2164
  • [987aa35]: Merge branch 'speechbrain:develop' into RescueSpeech (Sangeet Sagar) #2163
  • [89de3dd]: minor changes (sangeet2020) #2163
  • [f8654a9]: fix test yaml (Mirco Ravanelli) #2165
  • [5f87b03]: minor changes (sangeet2020) #2163
  • [9630882]: fix yaml inconsistencies (Mirco Ravanelli) #2165
  • [dd4abba]: fix trailing whitespace (Mirco Ravanelli) #2165
  • [2545b43]: readme update dropbox links (sangeet2020) #2163
  • [284e347]: update dropbox link in tests/recipes (sangeet2020) #2163
  • [fa25f82]: Merge branch 'speechbrain:develop' into RescueSpeech (Sangeet Sagar) #2163
  • [775eeb0]: update dropbox links (sangeet2020) #2163
  • [f7d273d]: Merge branch 'develop' of github.com:speechbrain/speechbrain into fix-reproduce-libriparty (Martin Nordstrom) #2162
  • [5c57237]: YouTube channel / online summit (Adel Moumen) #2166
  • [fc3d72d]: Merge branch 'develop' of https://github.com/speechbrain/speechbrain into resnet_spkreg (Xuechen Liu) #2142
  • [b82e798]: fix fetching and checkpointing due to failing recipe tests (Mirco Ravanelli) #2167
  • [972cc65]: let checkpoiting with the same name (Mirco Ravanelli) #2167
  • [bc8906c]: fix black (Mirco Ravanelli) #2167
  • [26da725]: commented parallel checkpointing test. It is currently failing (even on other PRs) only on the CI servers (Mirco Ravanelli) #2167
  • [40d091b]: sort execution of recipes tests (Mirco Ravanelli)
  • [c6ef85d]: sort recipe tests + minor fixes (Mirco Ravanelli)
  • [ef92a05]: Merge remote-tracking branch 'upstream/develop' into fix-reproduce-libriparty (Mirco Ravanelli) #2162
  • [1a9f06a]: update dropbox & hf links (BenoitWang) #2119
  • [8d89a40]: resolve conflict (BenoitWang) #2119
  • [10d85e3]: minor edits for clarify improvements (Mirco Ravanelli) #2162
  • [525b74a]: Merge remote-tracking branch 'upstream/develop' into use-torchrun (Mirco Ravanelli) #2158
  • [bc81789]: Merge remote-tracking branch 'upstream/develop' into resnet_spkreg (Mirco Ravanelli) #2142
  • [9856912]: Merge branch 'develop' into fixDDP (Mirco Ravanelli) #2130
  • [825e114]: fix numpy 1.24 issue (BenoitWang) #2119
  • [56abcb1]: update readme (BenoitWang) #2119
  • [33c4d5b]: Merge remote-tracking branch 'upstream/develop' into fs2_internal_alignment (Mirco Ravanelli) #2119
  • [23e3ceb]: update to latest dev + minor modifications (Mirco Ravanelli) #2119
  • [b5be99f]: fix comments and add docstring (Xuechen Liu) #2142
  • [901b5e3]: update to latest dev + small fixes (Mirco Ravanelli) #2120
  • [8c6db1d]: Merge branch 'develop' into MSTTS (Mirco Ravanelli) #2120
  • [0b09dd6]: fix yaml + fix recipe test on voxceleb (Mirco Ravanelli) #2120
  • [ae6da04]: Merge branch 'MSTTS' of https://github.com/pradnya-git-dev/speechbrain into MSTTS (Mirco Ravanelli) #2120
  • [3ea3a1f]: add missing link (Mirco Ravanelli) #2120
  • [e88b65b]: Merge branch 'develop' of https://github.com/speechbrain/speechbrain into resnet_spkreg (Xuechen Liu) #2142
  • [3280d03]: fix recipe test, add docstring examples (BenoitWang) #2119
  • [0c38b08]: fix examples (BenoitWang) #2119
  • [eb7b839]: Merge branch 'speechbrain:develop' into MSTTS (pradnya-git-dev) #2120
  • [de139b2]: code optimization (pradnya-git-dev) #2120
  • [9364199]: code optimization - loss restore (pradnya-git-dev) #2120
  • [4fd2380]: minor documentation change (pradnya-git-dev) #2120
  • [d3be8d3]: minor documentation fix for tests (pradnya-git-dev) #2120
  • [a508c40]: updating loss example (pradnya-git-dev) #2120
  • [ff0c768]: updating hparams (pradnya-git-dev) #2120
  • [b17e13c]: removing script redundancy (pradnya-git-dev) #2120
  • [e813476]: minor changes for tests (pradnya-git-dev) #2120
  • [f6957ae]: updating recipe entry (pradnya-git-dev) #2120
  • [0c42325]: minor changes for tests (pradnya-git-dev) #2120
  • [ce07c3a]: changes for inference (pradnya-git-dev) #2120
  • [22a7743]: internal sorting for input texts (pradnya-git-dev) #2120
  • [fbb074c]: improve bug_report.yaml (Adel Moumen) #2172
  • [45a65a5]: fix title (Adel Moumen) #2172
  • [8790c07]: Update pull_request_template.md (Adel Moumen) #2172
  • [2dae0cb]: linters (Adel Moumen) #2172
  • [554ca2e]: Update README.md (#2171) (Adel Moumen) #2171
  • [fccb581]: Remove distributed_launch flag and update docs (Peter Plantinga) #2158
  • [9e1b588]: Fix check for rank and local rank (Peter Plantinga) #2158
  • [2d8e6f8]: small improvement in the doc + manage PLACEHOLDER and output folder (Mirco Ravanelli) #2142
  • [2cdc63f]: fix hard-coded devices (#2178) (Mirco Ravanelli) #2178
  • [3457755]: Fix multi-head attention when return_attn_weights=False (Luca Della Libera) #2183
  • [3a16166]: Update multi-head attention docstring (Luca Della Libera) #2183
  • [221f2da]: Updated yaml files after hparam tuning (prometheus) #975
  • [208bccb]: Updated EEGConformer (prometheus) #975
  • [dcc29c7]: Updated README.md (prometheus) #975
  • [5b791fe]: Updated README.md (prometheus) #975
  • [9861876]: Merge remote-tracking branch 'upstream/develop' into eeg_decoding (Mirco Ravanelli) #975
  • [d0296f5]: fix linters (Mirco Ravanelli) #975
  • [e412656]: improve README (Mirco Ravanelli) #975
  • [e8be915]: remove unnecesary folder (Mirco Ravanelli) #975
  • [e431763]: remove files that will be added into speechbrain benchmark (Mirco Ravanelli) #975
  • [63b2f99]: Merge branch 'speechbrain:develop' into GPT2-finetuning (Pooneh Mousavi) #2086
  • [dce8021]: Merge branch 'speechbrain:develop' into GPT2-finetuning (Pooneh Mousavi) #2086
  • [b90034c]: add response-generator interface (poonehmousavi) #2086
  • [34aafe0]: fix pytest (poonehmousavi) #2086
  • [eeda2c0]: fix pytest (poonehmousavi) #2086
  • [9284f24]: fix docstring (poonehmousavi) #2086
  • [b5adc8f]: updating hparams with the current best (pradnya-git-dev) #2120
  • [4f02ca8]: fix hyaml bug (poonehmousavi) #2086
  • [54ab2f8]: minor fix (poonehmousavi) #2086
  • [ed6f08d]: fix interface logging issue (poonehmousavi) #2086
  • [801162e]: fix precommit issue (poonehmousavi) #2086
  • [7cfd162]: HyperConformer (#1905) (Florian Mai) #1905
  • [2983f8a]: clean commnets (poonehmousavi) #2086
  • [697c708]: fix readme (poonehmousavi) #2086
  • [cf48a46]: change interface to be compatibale with pytest (poonehmousavi) #2086
  • [629b99e]: Update README.md (Adel Moumen) #2189
  • [ceb7838]: fix typo that preveted recipe tests to run (Mirco Ravanelli) #2086
  • [fd3b8a8]: automatic download + fix replacement path (Mirco Ravanelli) #2086
  • [9634e9d]: remove transformers from extra-req as already in the main requirements (Mirco Ravanelli) #2086
  • [3d37983]: fix linter (Mirco Ravanelli) #2086
  • [cd41db3]: DNS recipe (#1742) (Sangeet Sagar) #1742
  • [e229e1a]: Attempting to fix failing test (with pytorch 2.1) (#2193) (Mirco Ravanelli) #2193
  • [4ab5219]: Broadcast the decision to checkpoint to all processes (#2192) (Peter Plantinga) #2192
  • [f4e8dd5]: update huggingface_hub requirement to avoid TypeDict error (tuanct1997) #2195
  • [264a0bc]: Avoid sync if mid-epoch checkpoints are disabled (Peter Plantinga) #2200
  • [918d8ef]: new pitch (Mirco Ravanelli) #2201
  • [92f541e]: Bump gitpython from 3.1.35 to 3.1.37 in /recipes/BinauralWSJ0Mix (dependabot[bot]) #2203
  • [5eec78b]: fix open rir (Mirco Ravanelli) #2205
  • [86670b4]: small follow up fix on openrir (Mirco Ravanelli)
  • [fab9657]: update dropbox (Mirco Ravanelli) #2201
  • [67d0de9]: Update README.md (Mirco Ravanelli) #2201
  • [19cbb87]: Update LJSpeech.csv (Mirco Ravanelli) #2201
  • [6271dd0]: remove related doc with distributed_launch (Adel Moumen) #2207
  • [219476c]: pre-commit (Adel Moumen) #2207
  • [21d619c]: adding random speaker voice generation (pradnya-git-dev) #2120
  • [62ac16e]: Merge branch 'develop' into MSTTS (pradnya-git-dev) #2120
  • [b84fa8d]: minor changes for flake8 (pradnya-git-dev) #2120
  • [f69f280]: updates for doctests (pradnya-git-dev) #2120
  • [5897742]: fix one issue wit recipe tests (Mirco Ravanelli) #2120
  • [e0d5a1b]: last fix pitch fastspeec2 (Mirco Ravanelli)
  • [55b442d]: readme update (pradnya-git-dev) #2120
  • [79cff28]: minor update for tests (pradnya-git-dev) #2120
  • [a78a571]: update documentation to clarify when to use --jit (Mirco Ravanelli) #2215
  • [ba492f9]: small fix in recipe tests (Mirco Ravanelli) #2120
  • [ec359cb]: add dropbox link (Mirco Ravanelli) #2120
  • [40bbe0f]: add performance notice (Mirco Ravanelli) #2120
  • [fc892ac]: last change (Mirco Ravanelli) #2120
  • [3c840ed]: reverting an error added by HyperConformer (code from Samsung AI Cambridge) (#2217) (Parcollet Titouan) #2217
  • [121f55b]: fix recipe tests tool (#2218) (Adel Moumen) #2218
  • [81138e8]: ASR recipe for Tedlium2 (code from Samsung AI Cambridge) (#2191) (Parcollet Titouan) #2191
  • [7f62dd8]: Add speech-to-speech translation (#2044) (Jarod) #2044
  • [e09cdac]: Refactor aishell data prep (#2219) (Adel Moumen) #2219
  • [bd27e99]: Create .gitignore (#2222) (Adel Moumen) #2222
  • [ab3c962]: fix incorrect parameter in LibriTTS hifigan vocoder (Chaanks) #2244
  • [2f27f7e]: fix failing recipe test (tiny fix) (Mirco Ravanelli)
  • [94862c8]: Update version.txt (#2256) (Mirco Ravanelli) #2256
  • [0ac4dc3]: Merge branch 'develop' (Mirco Ravanelli) #2257
  • [a581cae]: New version (#2257) (Mirco Ravanelli) #2257
  • [65c0113]: Merge branch 'develop' (Mirco Ravanelli)

v0.5.15

9 months ago

v0.5.14

1 year ago

This release is a minor yet important release. It increases significantly the number of features available while fixing quite a lot of small bugs and issues. A summary of the achievements of this release is given below, while a complete detailed list of all the changes can be found at the bottom of this release note.

Notable achievements

  • 22 new contributors, thank you so much, everyone!
  • 31 new recipes (ASR, SLU, AST, AER, Interpretability, SSL).
  • FULL automatic recipe testing.
  • Increased coverage for the continuous integration over the code, URLs, YAML, recipes, and HuggingFace models.
  • New Conformer Large model for ASR.
  • Integration of Whisper for fine-tuning or inference.
  • Full pre-training of wav2vec2 entirely re-implemented AND documented.
  • Low resource Speech Translation with IWSLT.
  • Many other novelties... see below.

What's Changed

New Contributors

Full Changelog: https://github.com/speechbrain/speechbrain/compare/v0.5.13...v0.5.14

v0.5.13

1 year ago

This is a minor release with better dependency version specification. We note that SpeechBrain is compatible with PyTorch 1.12, and the updated package reflects this. See the issue linked next to each commit for more details about the corresponding changes.

Commit summary

  • [edb7714]: Adding no_sync and on_fit_batch_end method to core (Rudolf Arseni Braun) #1449
  • [07155e9]: G2P fixes (flexthink) #1473
  • [6602dab]: fix for #1469, minimal testing for profiling (anautsch) #1476
  • [abbfab9]: test clean-ups: passes linters; doctests; unit & integration tests; load-yaml on cpu (anautsch) #1487
  • [1a16b41]: fix ddp incorrect command (=) #1498
  • [0b0ec9d]: using no_sync() in fit_batch() of core.py (Rudolf Arseni Braun) #1449
  • [5c9b833]: Remove torch maximum compatible version (Peter Plantinga) #1504
  • [d0f4352]: remove limit for HF hub as it does not work with colab (Titouan) #1508
  • [b78f6f8]: Add revision to hub (Titouan) #1510
  • [2c491a4]: fix transducer loss inputs devices (Adel Moumen) #1511
  • [4972f76]: missing space in install command (pehonnet) #1512
  • [6bc72af]: Fixing shuffle argument for distributed sampler in core.py (Rudolf Arseni Braun) #1518
  • [df7acd9]: Added the link for example results (cem) #1523
  • [5bae6df]: add LinearWarmupScheduler (Ge Li) #1537
  • [2edd7ee]: updating scipy version in requirements.txt. (Nauman Dawalatabad) #1546

v0.5.12

1 year ago

Release Notes - SpeechBrain v0.5.12

We worked very hard and we are very happy to announce the new version of SpeechBrain!

SpeechBrain 0.5.12 significantly expands the toolkit without introducing any major interface changes. I would like to warmly thank the many contributors that made this possible.

The main changes are the following:

A) Text-to-Speech: We developed the first TTS system of SpeechBrain. You can find it here. The system relies on Tacotron2 + HiFiGAN (as vocoder). The models coupled with an easy-inference interface are available on HuggingFace.

B) Grapheme-to-Phoneme (G2P): We developed an advanced Grapheme-to-Phoneme. You can find the code here. The current version significantly outperforms our previous model.

C) Speech Separation:

  1. We developed a novel version of the SepFormer called Resource-Efficient SepFormer (RE-Sepformer). The code is available here and the pre-trained model (with an easy inference interface) here.
  2. We released a recipe for Binaural speech separation with WSJMix. See the code here.
  3. We released a new recipe with the AIShell mix dataset. You can see the code here.

D) Speech Enhancement:

  1. We released the SepFormer model for speech enhancement. the code is here, while the pre-trained model (with easy-inference interface) is here.
  2. We implemented the WideResNet for speech enhancement and use it to mimic loss-based speech enhancement. The code is here and the pretrained model (with easy-inference interface) is here.

E) Feature Front-ends:

  1. We now support LEAF filter banks. The code is here. You can find an example of a recipe using it here.
  2. We now support SincConv multichannel (see code here).

F) Recipe Refactors:

  1. We refactored the Voxceleb recipe and fix the normalization issues. See the new code here. We also made the EER computation method less memory demanding (see here).
  2. We refactored the IEMOCAP recipe for emotion recognition. See the new code here.

G) Models for African Languages: We now have recipes for the DVoice dataset. We currently support Darija, Swahili, Wolof, Fongbe, and Amharic. The code is available here. The pretrained model (coupled with an easy-inference interface) can be found on SpeechBrain-HuggingFace.

H) Profiler: We implemented a model profiler that helps users while developing new models with SpeechBrain. The profiler outputs a bunch of potentially useful information, such as the real-time factors and many other details. A tutorial is available here.

I) Tests: We significantly improved the tests. In particular, we introduced the following tests: HF_repo tests, docstring checks, yaml-script consistency, recipe tests, and check URLs. This will helps us scale up the project.

L) Other improvements:

  1. We now support the torchaudio RNNT loss*.
  2. We improved the relative attention mechanism of the Conformer.
  3. We updated the transformer for LibriSpeech. This improves the performance from WER= 2.46% to 2.26% on the test-clean. See the code here.
  4. The Environmental corruption module can now support different sampling rates.
  5. Minor fixes.

v0.5.11

2 years ago

Dear users, We worked very hard, and we are very happy to announce the new version of SpeechBrain. SpeechBrain 0.5.11 further expands the toolkit without introducing any major interface change.

The main changes are the following:

  1. We implemented new recipes, such as:
  1. Support for Dynamic batching with a Tutorial to help users familiarize themselves with it.

  2. Support for wav2vec training within SpeechBrain.

  3. Developed an interface with Orion for hyperparameter tuning with a Tutorial to help users familiarize themselves with it.

  4. the torchaudio transducer loss is now supported. We also kept our numba implementation to help users customize the transducer loss part if needed.

  5. Improved CTC-Segmentation

  6. Fixed minor bugs and issues (e.g., fixed MVDR beamformer ).

Let me thank all the amazing contributors for this achievement. Please, keep add a star to our project if you appreciate our effort for the community. Together, we are growing very fast, and we have big plans for the future.

Stay Tuned!

0.5.10

2 years ago

This version mainly expands the functionalities of SpeechBrain without adding any backward incompatibilities.

New Recipes:

  • Language Identification with CommonLanguage
  • EEG signal processing with ERPCore
  • Speech translation with Fisher-Call Home
  • Emotion Recognition with IEMOCAP
  • Voice Activity Detection with LibriParty
  • ASR with LibriSpeech wav2vec (WER=1.9 on test-clean)
  • SpeechEnhancement with CoopNet
  • SpeechEnhancement with SEGAN
  • Speech Separation with LibriMix, WHAM, and WHAMR
  • Support for guided attention
  • Spoken Language Understanding with SLURP

Beyond that, we fixed some minor bugs and issues.

v0.5.9

2 years ago

This main differences with the previous version are the following:

  • Added Wham/whamr/librimix for speech separation
  • Compatibility with PyTorch 1.9
  • Fixed minor bugs
  • Added SpeechBrain paper

v0.5.8

2 years ago

SpeechBrain 0.5.8 improves the previous version in the following way:

  • Added wav2vec support in TIMIT, CommonVoice, AISHELL-1
  • Improved Fluent Speech Command Recipe
  • Improved SLU recipes
  • Recipe for UrbanSound8k
  • Fix small bugs
  • Fix typos