Cnn Vocoder Save Abandoned

A fast cnn-based vocoder

Project README

CNNVocoder

NOTE: I'm no longer working on this project. See #9.

A CNN-based vocoder.

This work is inspired from m-cnn model described in Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks. The authors show that even a simple upsampling networks is enough to synthesis waveform from spectrogram/mel-spectrogram.

In this repo, I use spectrogram feature for training model because it contains more information than mel-spectrogram feature. However, because the transformation from spectrogram to mel-spectrogram is just a linear projection, so basically, you can train a simple network predict spectrogram from mel-spectrogram. You also can change parameters to be able to train a vocoder from mel-spectrogram feature too.

Sample Audios

Architecture notes

Compare with m-cnn, my proposed network have some differences:

I use Upsampling + Conv layers instead of TransposedConv layer. This helps to prevent checkerboard artifacts.
The model use a lot of residual blocks pre/after the upsampling module to make network larger/deeper.
I only used l1 loss between log-scale STFT-magnitude of predicted and target waveform. Evaluation loss on log space is better than on raw STFT-magnitude because it's closer to human sensation about loudness. I tried to compute loss on spectrogram feature, but it didn't help much.

Install requirements

$ pip install -r requirements.txt

Training vocoder

1. Prepare dataset

I use LJSpeech dataset for my experiment. If you don't have it yet, please download dataset and put it somewhere.

After that, you can run command to generate dataset for our experiment:

$ python preprocessing.py --samples_per_audio 20 \ 
--out_dir ljspeech \
--data_dir path/to/ljspeech/dataset \
--n_workers 4

2. Train vocoder

$ python train.py --out_dir ${output_directory}

For more training options, please run:

$ python train.py --help

Generate audio from spectrogram

Generate spectrogram from audio

$ python gen_spec.py -i sample.wav -o out.npz

Generate audio from spectrogram

$ python synthesis.py --model_path path/to/checkpoint \
                      --spec_path out.npz \
                      --out_path out.wav

Pretrained model

You can get my pre-trained model here.

Acknowledgements

This implementation uses code from NVIDIA, Ryuichi Yamamoto, Keith Ito as described in my code.

License

MIT

Open Source Agenda is not affiliated with "Cnn Vocoder" Project. README Source: tuan3w/cnn_vocoder

Stars

Open Issues

Last Commit

3 years ago

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/cnn-vocoder"><img src="https://www.opensourceagenda.com/projects/cnn-vocoder/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022