A suite of speech signal processing tools
The Speech Signal Processing Toolkit (SPTK) is a software for speech signal processing tools.
dmp +s data.raw
x2x +sd < data.raw | clip | x2x +da | less
impulse -l 4 | sopr -m 10 | x2x +da
The latest release can be downloaded through Git. The install procedure is as follows.
git clone https://github.com/sp-nitech/SPTK.git
cd SPTK
make
Then the SPTK commands can be used by adding bin/
directory to the PATH
environment variable.
If you would like to use a part of the SPTK functions, please link the static library lib/libsptk.a
.
You may need to add cmake
and MSBuild
to the PATH
environment variable in advance.
Please run make.bat
or open Command Prompt and follow the below procedure:
cd /path/to/SPTK # Please change here to your appropriate path.
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=.. # Please change install directory.
MSBuild /p:Configuration=Release INSTALL.vcxproj
You can compile SPTK via GUI instead of running MSBuild by opening the generated project file.
Then the SPTK functions can be used by linking the static library lib/sptk.lib
.
SPTK provides some examples.
Go to an example directory and execute run.sh
, e.g.,
cd egs/analysis_synthesis/mgc
./run.sh
The below is a simple example that decreases the volume of input audio in input.wav
.
You may need to install sox
command on your system.
sox -t wav input.wav -c 1 -t s16 -r 16000 - |
x2x +sd | sopr -m 0.5 | x2x +ds -r |
sox -c 1 -t s16 -r 16000 - -t wav output.wav
If you would like to draw figures, please prepare a python environment.
cd tools; make venv PYTHON_VERSION=3.8; cd ..
. ./tools/venv/bin/activate
impulse -l 32 | gseries impulse.png
deactivate
ap
)lar2par
and par2lar
)drc
)entropy
)huffman
, huffman_encode
, and huffman_decode
)magic_intpl
)medfilt
)mcpf
)fbank
)mlpg -R 1
)pitch_spec
)pitch -a 3
)plp
)gpolezero
)quantize
and dequantize
)pitch2sin
)gspecgram
)lpccheck
)pqmf
and ipqmf
)world_synth
)acep
, agcep
, and amcep
-> amgcep
bell
c2sp
-> mgc2sp
cat2
and echo2
da
ds
, us
, us16
, and uscd
-> sox
fig
gc2gc
-> mgc2mgc
gcep
, mcep
, and uels
-> mgcep
glsadf
, lmadf
, and mlsadf
-> mglsadf
ivq
and vq
-> imsvq
and msvq
lsp2sp
-> mglsp2sp
mgc2mgclsp
and mgclsp2mgc
psgr
and xgr
raw2wav
, wav2raw
, wavjoin
, and wavsplit
-> sox
c2ir
-> c2mpir
and mpir2c
dtw
-> dtw
and dtw_merge
mglsadf
-> mglsadf
and imglsadf
train
-> train
and mseq
ulaw
-> ulaw
and iulaw
vstat
-> vstat
and median
mgclsp2sp
-> mglsp2sp
This software is released under the Apache License 2.0.
@InProceedings{sp-nitech2023sptk,
author = {Takenori Yoshimura and Takato Fujimoto and Keiichiro Oura and Keiichi Tokuda},
title = {{SPTK4}: An open-source software toolkit for speech signal processing},
booktitle = {12th ISCA Speech Synthesis Workshop (SSW 2023)},
pages = {211--217},
year = {2023},
}