singing voice change based on whisper, and lora for singing voice clone
final model for 32k
maxgan_pretrain_48K_5L.pth is based on maxgan_pretrain_16K_5L.pth by transfer learning, so the struct maybe not be the most suitable for 48K.
upsample_rates: [5,4,2,2,2] upsample_kernel_sizes: [15,12,4,4,4] upsample_initial_channel: 512
使用噪声增广数据进行训练的maxgan_pretrain.pth,包含了generator&discriminator
正式发布版本:maxgan_pretrain.pth 中包含了generator&discriminator
preview model is trained of 415 epochs, the 1000 epochs model will be released some days latter.
train on pure speech, 340 epochs, 10k steps
train on one T4 about 5 days, 1960 epoch
基于opencpop音色的预览模型