A pytorch implementation of FFTNet.
This is a pytorch implementation of FFTNet described here. Work in progress.
pip install -r requirements.txt
Download CMU_ARCTIC dataset.
Train the model and save. The default parameters are pretty much the same as int the original paper. Raise the flag --preprocess when execute the first time.
python train.py \
--preprocess \
--wav_dir your_downloaded_wav_dir \
--data_dir preprocessed_feature_dir \
--model_file saved_model_name \
python decode.py \
--infile wav_file
--outfile reconstruct_file_name
--data_dir preprocessed_feature_dir \
--model_file saved_model_name \
FFTNet_generator and FFTNet_vocoder are two files I used to test the model workability using torchaudio yesno dataset.
There are some files decoded in the samples folder.
Use the flag --radixs to specify each layer's radix.
# a radix-4 FFTNet with 1024 receptive field
python train.py --radixs 4 4 4 4 4
The original FFtNet use Radix-2 structure. In my experiment, a radix-4 network can still achieved similar result, even radix-8, and by reduce the number of layers, it can run faster.
Fig. 2 in the paper can be redraw as dilated structure with kernel size 2 (also means radix size 2).
If we draw all the lines;
and transpose the the graph to let the arrows go backward, you'll find a WaveNet dilated structure.
Add the flag --transpose, you can get a simplified version of WaveNet.
# a WaveNet-like structure model withou gated/residual/skip unit.
python train.py --transpose
In my experiment, the transposed models are more easy to train and have slightly lower training loss compare to FFTNet.