A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
NATSpeech: A Non-Autoregressive Text-to-Speech Framework
| | 中文文档
This repo contains official PyTorch implementation of:
We implement the following features in this framework:
## We tested on Linux/Ubuntu 18.04.
## Install Python 3.6+ first (Anaconda recommended).
export PYTHONPATH=.
# build a virtual env (recommended).
python -m venv venv
source venv/bin/activate
# install requirements.
pip install -U pip
pip install Cython numpy==1.19.1
pip install torch==1.9.0 # torch >= 1.9.0 recommended
pip install -r requirements.txt
sudo apt install -y sox libsox-fmt-mp3
bash mfa_usr/install_mfa.sh # install forced alignment tool
If you find this useful for your research, please cite the following papers:
@article{ren2021portaspeech,
title={PortaSpeech: Portable and High-Quality Generative Text-to-Speech},
author={Ren, Yi and Liu, Jinglin and Zhao, Zhou},
journal={Advances in Neural Information Processing Systems},
volume={34},
year={2021}
}
@article{liu2021diffsinger,
title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},
author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},
journal={arXiv preprint arXiv:2105.02446},
volume={2},
year={2021}
}
Our codes are influenced by the following repos:
Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.