An extensible speech synthesis system, build with PyTorch and the original code is from r9y9's https://github.com/r9y9/nnmnkwii_gallery
An extensible speech synthesis system, build with PyTorch and the original code is from r9y9's https://github.com/r9y9/nnmnkwii_gallery . You will find it easy to train acoustic model by employing popular models such as tacotron's encoder, deepvoice's encoder, transformer's encoder and any other you created.
Note: the repo requires wav files with aligned HTS-style full-context lablel files.
Download a dataset
Unpack the dataset into ~/ExtensibleTTS-PyTorch/datasets
After unpacking, your tree should look like this for cmu_slt_arctic:
ExtensibleTTS-PyTorch
|- datasets
|- slt_arctic_full_data
|- label_phone_align
|- label_state_align
|- wav
|- file_id_list_full.scp
|- questions-radio_dnn_416.hed
python preprocess.py --label state_align
--label phone_align
python norm_params.py
python train_dnn.py --train_model duration
--train_model acoustic
for training a acoustic modelpython synthesis.py --label state_align --duration_checkpint * --acoustic_checkpint *
python train.py --restore_step *