Ga642381 FastSpeech2 Save

Multi-Speaker Pytorch FastSpeech2: Fast and High-Quality End-to-End Text to Speech :fist:

Project README

Multi-speaker FastSpeech 2 - PyTorch Implementation :zap:

This is a PyTorch implementation of Microsoft's FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.
Now supporting about 900 speakers in :fire: LibriTTS for multi-speaker text-to-speech.

Datasets :elephant:

This project supports 2 muti-speaker datasets:

:fire: Single-Speaker

LJSpeech

:fire: Multi-Speaker

LibriTTS
VCTK

Config

Configurations are in:

config/dataset.yaml
config/hparams.py

Please modify the dataest and mfa_path in hparams.

In this repo, we're using MFA v1. Migrating to MFA v2 is a TODO item.

Steps

preprocess.py
train.py
synthesize.py

1. Preprocess

File Structures:

[DATASET] / wavs / speaker / wav_files [DATASET] / txts / speaker / txt_files

wav_dir : the folder containing speaker dirs ( [DATASET] / wavs )
txt_dir : the folder containing speaker dirs ( [DATASET] / txts )
save_dir : the output directory (e.g. "./processed" )
--prepare_mfa : create mfa_data
--mfa : create textgrid files
--create_dataset : generate mel, phone, f0 ....., metadata.json

Example commands:

LJSpeech:

#run the script for organizing LJSpeech first
python ./script/organizeLJ.py

python preprocess.py /storage/tts2021/LJSpeech-organized/wavs /storage/tts2021/LJSpeech-organized/txts ./processed/LJSpeech --prepare_mfa --mfa --create_dataset

LibriTTS:

python preprocess.py /storage/tts2021//LibriTTS/train-clean-360 /storage/tts2021//LibriTTS/train-clean-360 ./processed/LibriTTS --prepare_mfa --mfa --create_dataset

VCTK:

python preprocess.py /storage/tts2021/VCTK-Corpus/wav48/ /storage/tts2021/VCTK-Corpus/txt ./processed/VCTK --prepare_mfa --mfa --create_dataset

metadata.json includes:

spker table
traning data
validation data

2. Train

data_dir : the preprocessed data directory
--comment: some comments

Example commands:

LJSpeech:

python train.py ./processed/LJSpeech --comment "Hello LJSpeech"

LibriTTS:

python train.py ./processed/LibriTTS --comment "Hello LibriTTS"

VCTK:

python train.py ./processed/VCTK --comment "Hello VCTK"

3. Synthesize

--ckpt_path: the checkpoint path
--output_dir: the directory to put the synthesized audios

Example commands:

python synthesize.py --ckpt_path ./records/LJSpeech_2021-11-22-22:42/ckpt/checkpoint_125000.pth.tar --output_dir ./output

References :notebook_with_decorative_cover:

Open Source Agenda is not affiliated with "Ga642381 FastSpeech2" Project. README Source: ga642381/FastSpeech2

Stars

Open Issues

Last Commit

1 year ago

Repository

ga642381/FastSpeech2

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/ga642381-fastspeech2"><img src="https://www.opensourceagenda.com/projects/ga642381-fastspeech2/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022