Kaldi Io For Python Save

Python functions for reading kaldi data formats. Useful for rapid prototyping with python.

Project README


'Glue' code connecting kaldi data and python.

Supported data types

  • vector (integer)
  • Vector (float, double)
  • Matrix (float, double)
  • Posterior (posteriors, nnet1 training targets, confusion networks, ...)


Reading feature scp example:
import kaldi_io
for key,mat in kaldi_io.read_mat_scp(file):
Writing feature ark to file/stream:
import kaldi_io
with open(ark_file,'wb') as f:
  for key,mat in dict.iteritems():
    kaldi_io.write_mat(f, mat, key=key)
Writing features as 'ark,scp' by pipeline with 'copy-feats':
import kaldi_io
ark_scp_output='ark:| copy-feats --compress=true ark:- ark,scp:data/feats2.ark,data/feats2.scp'
with kaldi_io.open_or_fd(ark_scp_output,'wb') as f:
  for key,mat in dict.iteritems():
    kaldi_io.write_mat(f, mat, key=key)


  • from pypi:
pip install kaldi_io
  • from sources:
git clone https://github.com/vesis84/kaldi-io-for-python.git <kaldi-io-dir>`
pip install -r requirements.txt
pip install --editable .

Note: it is recommended to set export KALDI_ROOT=<some_kaldi_dir> environment variable. The I/O based on pipes can then contain kaldi binaries.

Unit tests

(note: these are not included in pypi package)

Unit tests are started this way:


or by:

python3 -m unittest discover -s tests -t . python2 -m unittest discover -s tests -t .


Apache License, Version 2.0 ('LICENSE-2.0.txt')


  • accepting pull requests with extensions on GitHub
  • accepting feedback via GitHub 'Issues' in the repo
Open Source Agenda is not affiliated with "Kaldi Io For Python" Project. README Source: KarelVesely84/kaldi-io-for-python
Open Issues
Last Commit
1 year ago

Open Source Agenda Badge

Open Source Agenda Rating