Code:Completely Unsupervised Speech Recognition By A Generative Adversarial Network Harmonized With Iteratively Refined Hidden Markov Models
This is the implementation of our paper. In this paper, we proposed an unsupervised speech (phoneme) recogntion system which can achieve 33.1% phoneme error rate on TIMIT. This method developed a GAN-based model to achieve unsupervised phoneme recognition and we further use a set of HMMs to work in harmony with the GAN.
tensorflow 1.13
kaldi
srilm (can be built with kaldi/tools/install_srilm.sh)
librosa
path.sh
with your path of Kaldi and srilm.config.sh
with your code path and timit path.$ bash preprocess.sh
This script will extract features and split dataset into train/test set.
The data which WFST-decoder needed also generate from here.
config.sh
.src/GAN-based-model/config.yaml
.$ bash run.sh
This scipt contains the training flow for GAN-based model and HMM model.
GAN-based model generated the transcription for training HMM model.
HMM model refined the phoneme boundaries for training GAN-based model.
config.sh
bnd_type
: type of initial phoneme boundaries (orc/uns).
setting
: matched and nonmatched case in our paper (match/nonmatch).
jobs
: number of jobs in parallel (depends on your decive).
Completely Unsupervised Speech Recognition By A Generative AdversarialNetwork Harmonized With Iteratively Refined Hidden Markov Models, Kuan-Yu Chen, Che-Ping Tsai et.al.
Special thanks to Che-Ping Tsai (jackyyy0228) for kaldi parts! Special thanks to Sung-Feng Huang (b02901071) for pytorch version!