Mongolian Speech Recognition Save

Mongolian speech recognition with PyTorch

Project README

An online demo trained with a Mongolian proprietary dataset (WER 8%): https://chimege.mn/.

In this repo, following papers are implemented:

This repo is partially based on:

Training

  1. Install PyTorch>=1.3 with conda
  2. Install remaining dependencies: pip install -r requirements.txt
  3. Download the Mongolian Bible dataset: cd datasets && python dl_mbspeech.py
  4. Pre compute the mel spectrograms: python preprop_dataset.py --dataset mbspeech
  5. Train: python train.py --model crnn --max-epochs 50 --dataset mbspeech --lr-warmup-steps 100
    • logs for the TensorBoard are saved in the folder logdir

Results

During the training, the ground truth and recognized texts are logged into the TensorBoard. Because the dataset contains only a single person, the predicted texts from the validation set should be already recognizable after few epochs:

EXPECTED:

аливаа цус хувцсан дээр үсрэхэд цус үсэрсэн хэсгийг та нар ариун газарт угаагтун

PREDICTED:

аливаа цус хувцсан дээр үсэрхэд цус усарсан хэсхийг та нар ариун газарт угаагтун

For fun, you can also generate an audio with a Mongolian TTS and try to recognize it. The following code generates an audio with the TTS of the Mongolian National University and does speech recognition on that generated audio:

# generate audio for 'Миний төрсөн нутаг Монголын сайхан орон'
wget -O test.wav "http://172.104.34.197/nlp-web-demo/tts?voice=1&text=Миний төрсөн нутаг Монголын сайхан орон."
# speech recognition on that TTS generated audio
python transcribe.py --checkpoint=logdir/mbspeech_crnn_sgd_wd1e-05/epoch-0050.pth --model=crnn test.wav
# will output: 'миний төрсөн нут мөнголын сайхан оөрулн'

It is also possible to use a KenLM binary model. First download it from tugstugi/mongolian-nlp. After that, install parlance/ctcdecode. Now you can transcribe with the language model:

python transcribe.py --checkpoint=path/to/checkpoint --lm=mn_5gram.binary --alpha=0.3 test.wav

Contribute

If you are Mongolian and want to help us, please record your voice on Common Voice.

Open Source Agenda is not affiliated with "Mongolian Speech Recognition" Project. README Source: tugstugi/mongolian-speech-recognition

Open Source Agenda Badge

Open Source Agenda Rating