RNN Transducer Save

MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks

Project README

End-to-End Speech Recognition using RNN-Transducer

File description

eval.py: rnnt joint model decode
model.py: rnnt model, which contains acoustic / phoneme model
model2012.py: rnnt model refer to Graves2012
seq2seq/*: seq2seq with attention
rnnt_np.py: rnnt loss function implementation on mxnet, support for both symbol and gluon refer to PyTorch implementation
DataLoader.py: data process
train.py: rnnt training script, can be initialized from CTC and PM model
train_ctc.py: ctc training script
train_att.py: attention training script

Directory description

conf: kaldi feature extraction config

Reference Paper

RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
RNNT joint (Graves 2013): Speech Recognition with Deep Recurrent Neural Networks
E2E criterion comparison (Baidu 2017): Exploring Neural Transducers for End-to-End Speech Recognition
Seq2Seq-Attention: Attention-Based Models for Speech Recognition

Run

Compile RNNT Loss Follow the instructions in here to compile MXNET with RNNT loss.
Extract feature link kaldi timit example dirs (local steps utils ) excute run.sh to extract 40 dim fbank feature run feature_transform.sh to get 123 dim feature as described in Graves2013
Train RNNT model:

python train.py --lr 1e-3 --bi --dropout .5 --out exp/rnnt_bi_lr1e-3 --schedule

Evaluation

Default only for RNNT

Greedy decoding:

python eval.py <path to best model parameters> --bi

Beam search:

python eval.py <path to best model parameters> --bi --beam <beam size>

Results

CTC

Decode PER

greedy 20.36

beam 100 20.03
Transducer

Decode PER

greedy 20.74

beam 40 19.84

Requirements

Python 3.6
MxNet 1.1.0
numpy 1.14

TODO

beam serach accelaration
Seq2Seq with attention

Open Source Agenda is not affiliated with "RNN Transducer" Project. README Source: HawkAaron/RNN-Transducer

Stars

135

Open Issues

Last Commit

2 years ago

Repository

HawkAaron/RNN-Transducer

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/rnn-transducer"><img src="https://www.opensourceagenda.com/projects/rnn-transducer/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022

Decode	PER
greedy	20.36
beam 100	20.03

Decode	PER
greedy	20.74
beam 40	19.84