Comprehensive Transformer TTS Versions Save

A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS

v0.2.1

2 years ago

Fix and update codebase & pre-trained models with demo samples

Fix variance adaptor to make it work with all combinations of building block and variance type/level
Update pre-trained models with demo samples of LJSpeech and VCTK under "transformer_fs2" building block and "cwt" pitch conditioning
Share the result of ablation studies of comparing "transformer" vs. "transformer_fs2" paired among three types of pitch conditioning ("frame", "ph", and "cwt")

v0.2.0

2 years ago

A lot of improvements with new features!

Prepare two different types of data pipeline in preprocessor to maximize unsupervised/supervised duration modelings
Adopt wavelet for pitch modeling & loss
Add fine-trained duration loss
Apply var_start_steps for better model convergence, especially under unsupervised duration modeling
Remove dependency of energy modeling on pitch variance
Add "transformer_fs2" building block, which is more close to the original FastSpeech2 paper
Add two types of prosody modeling methods
Loss camparison on validation set:
- LJSpeech - blue: v0.1.1 / green: v0.2.0
- VCTK - skyblue: v0.1.1 / orange: v0.2.0

v0.1.1

2 years ago

v0.1.0

2 years ago