A PyTorch-based Speech Toolkit
Please, help our community project. Star on GitHub!
📅 On February 2024, we released SpeechBrain 1.0, the result of a year-long collaborative effort by a large international network of developers led by our exceptional core development team.
SpeechBrain 1.0 introduces significant advancements, expanding support for diverse datasets and tasks, including NLP and EEG processing.
The toolkit now excels in Conversational AI and various sequence processing applications.
Improvements encompass key techniques in speech recognition, streamable conformer transducers, integration with K2 for Finite State Transducers, CTC decoding and n-gram rescoring, new CTC/joint attention Beam Search interface, enhanced compatibility with HuggingFace Models (including GPT2 and Llama2), and refined data augmentation, training, and inference processes.
We have created a new repository dedicated to benchmarks, accessible at here. At present, this repository features benchmarks for various domains, including speech self-supervised models (MP3S), continual learning (CL-MASR), and EEG processing (SpeechBrain-MOABB).
For detailed technical information, please refer to the section below.
People familiar with SpeechBrain know very well that we do our best to avoid backward incompatible changes. While SpeechBrain has consistently prioritized maintaining backward compatibility, the introduction of this new major version presented an opportunity for significant enhancements and refactorings.
🤗 HuggingFace Interface Refactor:
🔍 BeamSearch Refactor:
🎨 Data Augmentation Refactor:
🧠 Brain Class Refactor:
🔍 Inference Interfaces Refactor:
python train.py hparams/config.yaml --profile_training --profile_warmup 10 --profile_steps 5
Release of a new benchmark repository, aimed at aiding the community in standardization across various areas.
A benchmark designed to assess continual learning techniques on multilingual speech recognition tasks
Provides scripts to train multilingual ASR systems, specifically Whisper and WavLM-based, on a subset of 20 languages selected from Common Voice 13 in a continual learning fashion.
Implementation of various methods, including rehearsal-based, architecture-based, and regularization-based approaches.
Full Changelog: https://github.com/speechbrain/speechbrain/compare/v0.5.16...v1.0.0
SpeechBrain 0.5.16 will be the last minor version of SpeechBrain before the major release of SpeechBrain 1.0.
In this minor version, we have focused on refining the existing features without introducing any interface changes, ensuring a seamless transition to SpeechBrain 1.0 where backward incompatible modifications will take place.
Key Highlights of SpeechBrain 0.5.16:
Bug Fixes: Numerous small fixes have been implemented to enhance the overall stability and performance of SpeechBrain.
Testing and Documentation: We have dedicated efforts to improve our testing infrastructure and documentation, ensuring a more robust and user-friendly experience.
Expanded Model and Dataset Support: SpeechBrain 0.5.16 introduces support for several new models and datasets, enhancing the versatility of the platform. For a detailed list, please refer to the commits below.
Stay informed and get ready for the groundbreaking SpeechBrain 1.0, where we will unveil substantial changes and exciting new features.
Thank you for being a part of the SpeechBrain community!
This release is a minor yet important release. It increases significantly the number of features available while fixing quite a lot of small bugs and issues. A summary of the achievements of this release is given below, while a complete detailed list of all the changes can be found at the bottom of this release note.
speechbrain/pretrained/interfaces.py
by @jonasvdd in https://github.com/speechbrain/speechbrain/pull/1725
avoid_if_longer_than
never used by @Adel-Moumen in https://github.com/speechbrain/speechbrain/pull/1882
Full Changelog: https://github.com/speechbrain/speechbrain/compare/v0.5.13...v0.5.14
This is a minor release with better dependency version specification. We note that SpeechBrain is compatible with PyTorch 1.12, and the updated package reflects this. See the issue linked next to each commit for more details about the corresponding changes.
We worked very hard and we are very happy to announce the new version of SpeechBrain!
SpeechBrain 0.5.12 significantly expands the toolkit without introducing any major interface changes. I would like to warmly thank the many contributors that made this possible.
The main changes are the following:
A) Text-to-Speech: We developed the first TTS system of SpeechBrain. You can find it here. The system relies on Tacotron2 + HiFiGAN (as vocoder). The models coupled with an easy-inference interface are available on HuggingFace.
B) Grapheme-to-Phoneme (G2P): We developed an advanced Grapheme-to-Phoneme. You can find the code here. The current version significantly outperforms our previous model.
C) Speech Separation:
D) Speech Enhancement:
E) Feature Front-ends:
F) Recipe Refactors:
G) Models for African Languages: We now have recipes for the DVoice dataset. We currently support Darija, Swahili, Wolof, Fongbe, and Amharic. The code is available here. The pretrained model (coupled with an easy-inference interface) can be found on SpeechBrain-HuggingFace.
H) Profiler: We implemented a model profiler that helps users while developing new models with SpeechBrain. The profiler outputs a bunch of potentially useful information, such as the real-time factors and many other details. A tutorial is available here.
I) Tests: We significantly improved the tests. In particular, we introduced the following tests: HF_repo tests, docstring checks, yaml-script consistency, recipe tests, and check URLs. This will helps us scale up the project.
L) Other improvements:
Dear users, We worked very hard, and we are very happy to announce the new version of SpeechBrain. SpeechBrain 0.5.11 further expands the toolkit without introducing any major interface change.
The main changes are the following:
Support for Dynamic batching with a Tutorial to help users familiarize themselves with it.
Support for wav2vec training within SpeechBrain.
Developed an interface with Orion for hyperparameter tuning with a Tutorial to help users familiarize themselves with it.
the torchaudio transducer loss is now supported. We also kept our numba implementation to help users customize the transducer loss part if needed.
Improved CTC-Segmentation
Fixed minor bugs and issues (e.g., fixed MVDR beamformer ).
Let me thank all the amazing contributors for this achievement. Please, keep add a star to our project if you appreciate our effort for the community. Together, we are growing very fast, and we have big plans for the future.
Stay Tuned!
This version mainly expands the functionalities of SpeechBrain without adding any backward incompatibilities.
New Recipes:
Beyond that, we fixed some minor bugs and issues.
This main differences with the previous version are the following:
SpeechBrain 0.5.8 improves the previous version in the following way: