TTS for pitch-accented language. Korean dialect DB.
Temporal resolution | Linear control | Vocal range adjustment | Non-parallel referencing | Unseen style support | Dimension analysis requirement | |
---|---|---|---|---|---|---|
GST | X | X | X | O | X | O |
Soft pitchtron | O | * | O | O | O | X |
Hard pitchtron | O | O | O | ** | O | X |
Sentence | |
---|---|
Reference | "아니요 지는 그짝허고 이야기허고 싶지 않아요" |
Target | "그래요 갸는 친구허고 나들이가고 싶은것 같아요" |
python preprocess.py --dataset={following keywords}
Run them in following order. You can opt out some parts depending on your needs.
1. Missing speaker: fy15, mw12
2. Wrong data format: mw13_t01_s11.wav, mw13_t01_s12.wav, mw02_t10_s08.wav
3. Overlapping files and naming mistakes: mv11_t07_s4' (==mv11_t07_s40), fy17_t15_s18(==fy17_t16_s01), fv18_t07_s63(==fv18_t07_s62)
python preprocess.py --dataset=integrate_dataset
python train.py {program arguments}
Option | Mandatory | Purpose |
---|---|---|
-o | O | Directory path to save checkpoints. |
-c | X | Path of pretrained checkpoint to load. |
-l | O | Log directory to drop logs for tensorboard. |
*Pretrained models are trained on phoneme. They expect phoneme as input when you give texts to synthesize.
Model | Pretrained checkpoint | Matching hyperparameters |
---|---|---|
Soft pitchtron | Soft pitchtron | configs |
Hard pitchtron | Hard pitchtron | configs |
Global style token | GST | configs |
WaveGlow vocoder | WaveGlow | - |
python inferent_soft_pitchtron.py
python inference_hard_pitchtron.py
python inference_gst_tts.py
Contribution | URL |
---|---|
Tacotron2 | https://github.com/NVIDIA/tacotron2 |
Mellotron | https://github.com/NVIDIA/mellotron |
WaveGlow | https://github.com/NVIDIA/waveglow |
Korean text processing | https://github.com/keithito/tacotron |