Self Supervised Speech Pretraining And Representation Learning Save

Self-Supervised Speech Pre-training and Representation Learning Toolkit

Project README



Apache License 2.0 CC_BY_NC License CI Codecov Bitbucket open issues

Emergent announcement for the SUPERB website

Unfortunately, the SUPERB website is currently down because of a disk corruption issue. We are actively working to resolve this and are doing our best to preserve any submission results. Please stay tuned for updates if you're planning to submit your work in the near future. Thank you for your patience!

Contact

We prefer to have discussions directly on Github issue page, so that all the information is transparent to all the contributors and is auto-archived on the Github. If you wish to use email, please contact:

Please refer to the legacy citation of S3PRL and the timeline below, which justify our initiative on this project. This information is used to protect us from half-truths. We encourage to cite the individual papers most related to the function you are using to give fair credit to the developer of the function. You can find the names in the Change Log. Finally, we would like to thank our advisor, Prof. Hung-yi Lee, for his advice. The project would be impossible without his support.

If you have any question (e.g., about who came up with / developed which ideas / functions or how the project started), feel free to engage in an open and responsible conversation on the GitHub issue page, and we'll be happy to help!

Contribution (pull request)

Guideline

  • Starting in 2024, we will only accept new contributions in the form of new upstream models, so we can save bandwidth for developing new techniques (which will not be in S3PRL.)
  • S3PRL has transitioned into pure maintenance mode, ensuring the long-term maintenance of all existing functions.
  • Reporting bugs or the PR fixing the bugs is always welcome! Thanks!

Tutorials

Environment compatibilities CI

We support the following environments. The test cases are ran with tox locally and on github action:

Env versions
os ubuntu-18.04, ubuntu-20.04
python 3.7, 3.8, 3.9, 3.10
pytorch 1.8.1, 1.9.1, 1.10.2, 1.11.0, 1.12.1 , 1.13.1 , 2.0.1 , 2.1.0

Star History

Star History Chart

Change Log

We only list the major contributors here for conciseness. However, we are deeply grateful for all the contributions. Please see the Contributors page for the full list.


Introduction and Usages

This is an open source toolkit called s3prl, which stands for Self-Supervised Speech Pre-training and Representation Learning. Self-supervised speech pre-trained models are called upstream in this toolkit, and are utilized in various downstream tasks.

The toolkit has three major usages:

Pretrain

  • Pretrain upstream models, including Mockingjay, Audio ALBERT and TERA.
  • Document: pretrain/README.md

Upstream

  • Easily load most of the existing upstream models with pretrained weights in a unified I/O interface.
  • Pretrained models are registered through torch.hub, which means you can use these models in your own project by one-line plug-and-play without depending on this toolkit's coding style.
  • Document: upstream/README.md

Downstream

Below is an intuitive illustration on how this toolkit may help you:



Feel free to use or modify our toolkit in your research. Here is a list of papers using our toolkit. Any question, bug report or improvement suggestion is welcome through opening up a new issue.

If you find this toolkit helpful to your research, please do consider citing our papers, thanks!

Installation

  1. Python >= 3.6
  2. Install sox on your OS
  3. Install s3prl: Read doc or pip install -e ".[all]"
  4. (Optional) Some upstream models require special dependencies. If you encounter error with a specific upstream model, you can look into the README.md under each upstream folder. E.g., upstream/pase/README.md=

Reference Repositories

License

The majority of S3PRL Toolkit is licensed under the Apache License version 2.0, however all the files authored by Facebook, Inc. (which have explicit copyright statement on the top) are licensed under CC-BY-NC.

Used by

List of papers that used our toolkit (Feel free to add your own paper by making a pull request)

Self-Supervised Pretraining

Explanability

Adversarial Attack

Voice Conversion

Benchmark and Evaluation

  • SUPERB: Speech processing Universal PERformance Benchmark (Yang et al., 2021)

    @misc{superb,
          title={SUPERB: Speech processing Universal PERformance Benchmark},
          author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},
          year={2021},
          eprint={2105.01051},
          archivePrefix={arXiv},
          primaryClass={cs.CL}
    }
    
  • Utilizing Self-supervised Representations for MOS Prediction (Tseng et al., 2021)

    @misc{ssr_mos,
        title={Utilizing Self-supervised Representations for MOS Prediction},
        author={Wei-Cheng Tseng and Chien-yu Huang and Wei-Tsung Kao and Yist Y. Lin and Hung-yi Lee},
        year={2021},
        eprint={2104.03017},
        archivePrefix={arXiv},
        primaryClass={eess.AS}
    }
    

}

Citation

If you find this toolkit useful, please consider citing following papers.

  • If you use our pre-training scripts, or the downstream tasks considered in TERA and Mockingjay, please consider citing the following:
@misc{tera,
  title={TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech},
  author={Andy T. Liu and Shang-Wen Li and Hung-yi Lee},
  year={2020},
  eprint={2007.06028},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}
@article{mockingjay,
   title={Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders},
   ISBN={9781509066315},
   url={http://dx.doi.org/10.1109/ICASSP40776.2020.9054458},
   DOI={10.1109/icassp40776.2020.9054458},
   journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
   publisher={IEEE},
   author={Liu, Andy T. and Yang, Shu-wen and Chi, Po-Han and Hsu, Po-chun and Lee, Hung-yi},
   year={2020},
   month={May}
}
  • If you use our organized upstream interface and features, or the SUPERB downstream benchmark, please consider citing the following:
@inproceedings{yang21c_interspeech,
  author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},
  title={{SUPERB: Speech Processing Universal PERformance Benchmark}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1194--1198},
  doi={10.21437/Interspeech.2021-1775}
}
Open Source Agenda is not affiliated with "Self Supervised Speech Pretraining And Representation Learning" Project. README Source: s3prl/s3prl

Open Source Agenda Badge

Open Source Agenda Rating