Comprehensive Transformer TTS Versions Save

A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS

v0.2.1

2 years ago

Fix and update codebase & pre-trained models with demo samples

  1. Fix variance adaptor to make it work with all combinations of building block and variance type/level
  2. Update pre-trained models with demo samples of LJSpeech and VCTK under "transformer_fs2" building block and "cwt" pitch conditioning
  3. Share the result of ablation studies of comparing "transformer" vs. "transformer_fs2" paired among three types of pitch conditioning ("frame", "ph", and "cwt")

v0.2.0

2 years ago

A lot of improvements with new features!

  1. Prepare two different types of data pipeline in preprocessor to maximize unsupervised/supervised duration modelings
  2. Adopt wavelet for pitch modeling & loss
  3. Add fine-trained duration loss
  4. Apply var_start_steps for better model convergence, especially under unsupervised duration modeling
  5. Remove dependency of energy modeling on pitch variance
  6. Add "transformer_fs2" building block, which is more close to the original FastSpeech2 paper
  7. Add two types of prosody modeling methods
  8. Loss camparison on validation set:
    • LJSpeech - blue: v0.1.1 / green: v0.2.0

    • VCTK - skyblue: v0.1.1 / orange: v0.2.0

v0.1.1

2 years ago

v0.1.0

2 years ago