Mongolian Speech Recognition Save

Mongolian speech recognition with PyTorch

Project README

An online demo trained with a Mongolian proprietary dataset (WER 8%): https://chimege.mn/.

In this repo, following papers are implemented:

QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
- speech recognition as optical character recognition

This repo is partially based on:

decoder from SeanNaren/deepspeech.pytorch
Jasper/QuartzNet blocks from NVIDIA/NeMo

Training

Install PyTorch>=1.3 with conda
Install remaining dependencies: pip install -r requirements.txt
Download the Mongolian Bible dataset: cd datasets && python dl_mbspeech.py
Pre compute the mel spectrograms: python preprop_dataset.py --dataset mbspeech
Train: python train.py --model crnn --max-epochs 50 --dataset mbspeech --lr-warmup-steps 100
- logs for the TensorBoard are saved in the folder logdir

Results

During the training, the ground truth and recognized texts are logged into the TensorBoard. Because the dataset contains only a single person, the predicted texts from the validation set should be already recognizable after few epochs:

EXPECTED:

аливаа цус хувцсан дээр үсрэхэд цус үсэрсэн хэсгийг та нар ариун газарт угаагтун

PREDICTED:

аливаа цус хувцсан дээр үсэрхэд цус усарсан хэсхийг та нар ариун газарт угаагтун

For fun, you can also generate an audio with a Mongolian TTS and try to recognize it. The following code generates an audio with the TTS of the Mongolian National University and does speech recognition on that generated audio:

# generate audio for 'Миний төрсөн нутаг Монголын сайхан орон'
wget -O test.wav "http://172.104.34.197/nlp-web-demo/tts?voice=1&text=Миний төрсөн нутаг Монголын сайхан орон."
# speech recognition on that TTS generated audio
python transcribe.py --checkpoint=logdir/mbspeech_crnn_sgd_wd1e-05/epoch-0050.pth --model=crnn test.wav
# will output: 'миний төрсөн нут мөнголын сайхан оөрулн'

It is also possible to use a KenLM binary model. First download it from tugstugi/mongolian-nlp. After that, install parlance/ctcdecode. Now you can transcribe with the language model:

python transcribe.py --checkpoint=path/to/checkpoint --lm=mn_5gram.binary --alpha=0.3 test.wav

Contribute

If you are Mongolian and want to help us, please record your voice on Common Voice.

Open Source Agenda is not affiliated with "Mongolian Speech Recognition" Project. README Source: tugstugi/mongolian-speech-recognition

Stars

129

Open Issues

Last Commit

3 years ago

Repository

tugstugi/mongolian-speech-recognition

Homepage

https://www.chimege.mn/

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/mongolian-speech-recognition"><img src="https://www.opensourceagenda.com/projects/mongolian-speech-recognition/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022