Sp Nitech SPTK Save

A suite of speech signal processing tools

Project README

SPTK

The Speech Signal Processing Toolkit (SPTK) is a software for speech signal processing tools.

Older version: SPTK3
PyTorch version: diffsptk

What is SPTK?

SPTK consists of over 100 commands for speech signal processing.
The data format used in SPTK is raw header-less, i.e., there is no specific structure. Thanks to the data format, we can check file contents immediately on CUI.
```
dmp +s data.raw
```
The data used in the commands is passed through standard input/output. We can chain multiple processes using pipes.
```
x2x +sd < data.raw | clip | x2x +da | less
```
The data type is basically little-endian double 8 bytes.
The commands do not require interactive user inputs. Parameters are set via command line options beforehand.
```
impulse -l 4 | sopr -m 10 | x2x +da
```

Documentation

See this page for a reference manual.
Our paper is available on the ISCA Archive.

Requirements

GCC 4.8.5+ / Clang 3.5.0+ / Visual Studio 2015+
CMake 3.1+

Installation

Linux / macOS

expand

The latest release can be downloaded through Git. The install procedure is as follows.

git clone https://github.com/sp-nitech/SPTK.git
cd SPTK
make

Then the SPTK commands can be used by adding bin/ directory to the PATH environment variable. If you would like to use a part of the SPTK functions, please link the static library lib/libsptk.a.

Windows

expand

You may need to add cmake and MSBuild to the PATH environment variable in advance. Please run make.bat or open Command Prompt and follow the below procedure:

cd /path/to/SPTK  # Please change here to your appropriate path.
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=..  # Please change install directory.
MSBuild /p:Configuration=Release INSTALL.vcxproj

You can compile SPTK via GUI instead of running MSBuild by opening the generated project file. Then the SPTK functions can be used by linking the static library lib/sptk.lib.

Demonstration

Twitter
Analysis-synthesis via mel-cepstrum
Parametric coding via line spectral pairs

Examples

SPTK provides some examples. Go to an example directory and execute run.sh, e.g.,

cd egs/analysis_synthesis/mgc
./run.sh

The below is a simple example that decreases the volume of input audio in input.wav. You may need to install sox command on your system.

sox -t wav input.wav -c 1 -t s16 -r 16000 - |
    x2x +sd | sopr -m 0.5 | x2x +ds -r |
    sox -c 1 -t s16 -r 16000 - -t wav output.wav

If you would like to draw figures, please prepare a python environment.

cd tools; make venv PYTHON_VERSION=3.8; cd ..
. ./tools/venv/bin/activate
impulse -l 32 | gseries impulse.png
deactivate

Changes from SPTK3

Input and output types are changed to double from float
Signal processing classes are written in C++ instead of C
Drawing commands are implemented in Python
Some option names
No memory leaks
Thread-safe
New features:
- Aperiodicity extraction (ap)
- Conversion from/to log area ratio (lar2par and par2lar)
- Dynamic range compression (drc)
- Entropy calculation (entropy)
- Huffman coding (huffman, huffman_encode, and huffman_decode)
- Magic number interpolation (magic_intpl)
- Median filter (medfilt)
- Mel-cepstrum postfilter (mcpf)
- Mel-filter-bank extraction (fbank)
- Nonrecursive MLPG (mlpg -R 1)
- Pitch adaptive spectrum estimation (pitch_spec)
- Pitch extraction by DIO used in WORLD (pitch -a 3)
- PLP extraction (plp)
- Pole-zero plot (gpolezero)
- Scalar quantization (quantize and dequantize)
- Sinusoidal generation from pitch (pitch2sin)
- Spectrogram plot (gspecgram)
- Stability check of LPC coefficients (lpccheck)
- Subband decomposition (pqmf and ipqmf)
- WORLD synthesis (world_synth)
- Windows build support
Obsoleted commands:
- acep, agcep, and amcep -> amgcep
- bell
- c2sp -> mgc2sp
- cat2 and echo2
- da
- ds, us, us16, and uscd -> sox
- fig
- gc2gc -> mgc2mgc
- gcep, mcep, and uels -> mgcep
- glsadf, lmadf, and mlsadf -> mglsadf
- ivq and vq -> imsvq and msvq
- lsp2sp -> mglsp2sp
- mgc2mgclsp and mgclsp2mgc
- psgr and xgr
- raw2wav, wav2raw, wavjoin, and wavsplit -> sox
Separated commands:
- c2ir -> c2mpir and mpir2c
- dtw -> dtw and dtw_merge
- mglsadf -> mglsadf and imglsadf
- train -> train and mseq
- ulaw -> ulaw and iulaw
- vstat -> vstat and median
Renamed commands:
- mgclsp2sp -> mglsp2sp

Who we are

Keiichi Tokuda - Produce and Design - Nagoya Institute of Technology
Keiichiro Oura - Nagoya Institute of Technology
Takenori Yoshimura - Main Maintainer - Nagoya Institute of Technology
Takato Fujimoto - Nagoya Institute of Technology

Contributors to former versions of SPTK

Akira Tamamori
Cassia Valentini
Chiyomi Miyajima
Fernando Gil Resende Junior
Gou Hirabayashi
Heiga Zen
Junichi Yamagishi
Kazuhito Koishida
Keiichi Tokuda
Keiichiro Oura
Kenji Chiba
Masatsune Tamura
Naohiro Isshiki
Noboru Miyazaki
Satoshi Imai
Shinji Sako
Tadashi Kitamura
Takao Kobayashi
Takashi Masuko
Takashi Nose
Takato Fujimoto
Takayoshi Yoshimura
Takenori Yoshimura
Toru Takahashi
Toshiaki Fukada
Toshihiko Kato
Toshio Kanno
Yoshihiko Nankaku

License

This software is released under the Apache License 2.0.

Citation

@InProceedings{sp-nitech2023sptk,
  author = {Takenori Yoshimura and Takato Fujimoto and Keiichiro Oura and Keiichi Tokuda},
  title = {{SPTK4}: An open-source software toolkit for speech signal processing},
  booktitle = {12th ISCA Speech Synthesis Workshop (SSW 2023)},
  pages = {211--217},
  year = {2023},
}

Open Source Agenda is not affiliated with "Sp Nitech SPTK" Project. README Source: sp-nitech/SPTK

Stars

208

Open Issues

Last Commit

2 weeks ago

Repository

sp-nitech/SPTK

License

Apache-2.0

Homepage

http://sp-tk.sourceforge.net

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/sp-nitech-sptk"><img src="https://www.opensourceagenda.com/projects/sp-nitech-sptk/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022