A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS
Fix and update codebase & pre-trained models with demo samples
A lot of improvements with new features!
var_start_steps
for better model convergence, especially under unsupervised duration modeling