SoundPy (alpha stage) is a research-based python package for speech and sound. Applications include deep-learning, filtering, speech-enhancement, audio augmentation, feature extraction and visualization, dataset and audio file conversion, and beyond.
pip install soundpy==0.1.0a2
use_beg_ms
parameter: improved VAD recognition of silences post speech.extend_window_ms
paremeter: can extend VAD window if desired. Useful in higher SNR environments.mirror_sound
.This release coincides with the pypi release of pysoundtool-0.1.0a1.
Main adjustments include:
An experimental Python framework for sound visualization, analysis, augmentation, filtering as well as machine learning.
Basic functionality for preparing audio datasets (e.g. formatting them), filtering audio, visualizing audio and its features (signal, stft, powspec, fbank, mfcc), augmenting audio for machine learning, and building/implementing basic neural networks for simple speech recognition, speech classification (e.g. language, gender or sex, emotion, etc.), and denoising.
Might be a bit buggy still.
keywords: audio file format conversion, dataset preparation, wiener filter, convolutional neural networks, cnn, conv, lstm, long short-term memory network, cnn+lstm, cnnlstm, convlstm, autoencoder, denoiser, speech recognition, environment classification, scene classification, language classification, denoising, augmentation, feature extraction, mel-filterbank energies, fbank, mel-frequency cepstral coefficients, mfcc, short-time fourier transfrom, stft, raw signal.