LAS Mandarin PyTorch Save

Listen, attend and spell Model and a Chinese Mandarin Pretrained model (中文-普通话 ASR模型)

Project README


standard-readme compliant

中文说明 | English

This code is a PyTorch implementation for paper: Listen, Attend and Spell, a nice work on End-to-End ASR, Speech Recognition model.

also provides a Chinese Mandarin ASR pretrained model.

  • Dataset
  • Usage
    • generate vocab file
    • training
    • test
    • infer
  • Demo


Google Blog Page

Improving End-to-End Models For Speech Recognition

The LAS architecture consists of 3 components. The listener encoder component, which is similar to a standard AM, takes the a time-frequency representation of the input speech signal, x, and uses a set of neural network layers to map the input to a higher-level feature representation, henc. The output of the encoder is passed to an attender, which uses henc to learn an alignment between input features x and predicted subword units {yn, … y0}, where each subword is typically a grapheme or wordpiece. Finally, the output of the attention module is passed to the speller (i.e., decoder), similar to an LM, that produces a probability distribution over a set of hypothesized words.

Components of the LAS End-to-End Model.

Components of the LAS End-to-End Model.

This repository contains:

  1. model code which implemented the paper.
  2. generate vocab file, you can use to generate your vocab file for your dataset.
  3. training scripts to train the model.
  4. testing scripts to test the model.

Table of Contents


pip install -r requirements.txt



First, we should generate our vocab file from dataset's transcripts file. Please reference code in If you want train aishell data, you can use directly.

python --input_file $DATA_DIR/data_aishell/transcript_v0.8.txt --output_file ./aishell_vocab.txt --mode character --vocab_size 5000

it will create a vocab file named aishell_vocab.txt in your folder.


Before training, you need to write your dataset code in package dataset.

If you want use my aishell dataset code, you also should take care about the transcripts file path in data/ line 26:

src_file = "/data/Speech/SLR33/data_aishell/" + "transcript/aishell_transcript_v0.8.txt"

When ready.

Let's train:

python --config ./config/aishell_asr_example_lstm4atthead1.yaml

you can write your config file, please reference config/aishell_asr_example_lstm4atthead1.yaml

specific variables: corpus's path & vocab_file


python --config ./config/aishell_asr_example_lstm4atthead1.yaml --test



Chinese Mandarin

a pretrained model training on AISHELL-Dataset

download from Google Drive







  1. Listen, Attend and Spell, W Chan et al.
  2. Neural Machine Translation of Rare Words with Subword Units, R Sennrich et al.
  3. Attention-Based Models for Speech Recognition, J Chorowski et al.
  4. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks, A Graves et al.
  5. Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning, S Kim et al.
  6. Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM, T Hori et al.


If this project help you reduce time to develop, you can give me a cup of coffee :)






MIT © Kun

Open Source Agenda is not affiliated with "LAS Mandarin PyTorch" Project. README Source: jackaduma/LAS_Mandarin_PyTorch

Open Source Agenda Badge

Open Source Agenda Rating