NATSpeech Save

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

Project README

NATSpeech: A Non-Autoregressive Text-to-Speech Framework

| | 中文文档

This repo contains official PyTorch implementation of:

PortaSpeech: Portable and High-Quality Generative Text-to-Speech (NeurIPS 2021)
Demo page | HuggingFace🤗 Demo
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (DiffSpeech) (AAAI 2022)
Demo page | Project page | HuggingFace🤗 Demo

Key Features

We implement the following features in this framework:

Data processing for non-autoregressive Text-to-Speech using Montreal Forced Aligner.
Convenient and scalable framework for training and inference.
Simple but efficient random-access dataset implementation.

Install Dependencies

## We tested on Linux/Ubuntu 18.04. 
## Install Python 3.6+ first (Anaconda recommended).

export PYTHONPATH=.
# build a virtual env (recommended).
python -m venv venv
source venv/bin/activate
# install requirements.
pip install -U pip
pip install Cython numpy==1.19.1
pip install torch==1.9.0 # torch >= 1.9.0 recommended
pip install -r requirements.txt
sudo apt install -y sox libsox-fmt-mp3
bash mfa_usr/install_mfa.sh # install forced alignment tool

Documents

Citation

If you find this useful for your research, please cite the following papers:

PortaSpeech

@article{ren2021portaspeech,
  title={PortaSpeech: Portable and High-Quality Generative Text-to-Speech},
  author={Ren, Yi and Liu, Jinglin and Zhao, Zhou},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}

DiffSpeech

@article{liu2021diffsinger,
  title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},
  author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},
  journal={arXiv preprint arXiv:2105.02446},
  volume={2},
  year={2021}
 }

Acknowledgments

Our codes are influenced by the following repos:

License and Agreement

Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

Open Source Agenda is not affiliated with "NATSpeech" Project. README Source: NATSpeech/NATSpeech

Stars

955

Open Issues

Last Commit

1 year ago

Repository

NATSpeech/NATSpeech

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/natspeech"><img src="https://www.opensourceagenda.com/projects/natspeech/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022