Python Sound Tool Versions Save

SoundPy (alpha stage) is a research-based python package for speech and sound. Applications include deep-learning, filtering, speech-enhancement, audio augmentation, feature extraction and visualization, dataset and audio file conversion, and beyond.


3 years ago

Available via PyPi

pip install soundpy==0.1.0a2

Updates of v0.1.0a2 release:

Updated Dependencies

  • Updated dependencies to newest versions still compatible with Tensorflow 2.1.0
  • Note: bug in training with generators occurs with Tensorflow 2.2.0+. Models trained via generators fail to learn. Therefore, Tensorflow is limited to version 2.1.0 until that bug is fixed.

GPU option added

  • provide instructions for running Docker image for GPU


  • add use_beg_ms parameter: improved VAD recognition of silences post speech.
  • raise warning for sample rates lower than 44100 Hz. VAD seems to fail at lower sample rates.

soundpy.feats.get_vad_samples and soundpy.feats.get_vad_stft

  • moved from dsp module to the feats module
  • add extend_window_ms paremeter: can extend VAD window if desired. Useful in higher SNR environments.
  • raise warning for sample rates lower than 44100 Hz. VAD seems to fail at lower sample rates.

added soundpy.feats.get_samples_clipped and soundpy.feats.get_stft_clipped

  • another option for VAD
  • clips beginning and ending of audio data where high energy sound starts and ends.


  • can extract and augment features from audio files as each audio file fed to model.
  • example can be viewed: soundpy.models.builtin.envclassifier_extract_train
  • note: still very experimental


  • improvements in the smoothness of the added signal.
  • soundpy.dsp.clip_at_zero
  • improved soundpy.dsp.vad and soundpy.feats.get_vad_stft


  • can use it: soundpy.normalize (don't need to remember dsp or feats)


  • implemented in soundpy.files.loadsound() and soundpy.files.savesound()
  • vastly improves the ability to work with and combine signals.


  • clips beginning and ending audio at zero crossings (at negative to positive zero crossings)
  • useful when concatenating signals
  • useful for removing clicks at beginning or ending of audio signals


  • can now mirror the sound as a form of sound extention with parameter mirror_sound.

Removed soundpy_online (and therefore mybinder as well)

  • for the time being, this is too much work to keep up. Eventually plan on bringing this back in a more maintainable manner.

Added stereo sound functionality to the following functions:

  • soundpy.dsp.add_backgroundsound
  • soundpy.dsp.clip_at_zero
  • soundpy.dsp.calc_fft
  • soundpy.feats.get_stft
  • soundpy.feats.get_vad_stft
  • soundpy.dsp.ismono for checking if a signal is mono or stereo
  • soundpy.dsp.average_channels for averaging amplitude in all channels (e.g. identifying when energetic sounds start / end: want to consider all channels)
  • soundpy.dsp.add_channels for adding additional channels if needed (e.g. for applying a 'hann' or 'hamming' window to stereo sound)


3 years ago

This release coincides with the pypi release of pysoundtool-0.1.0a1.

Main adjustments include:

  • setting use_scipy defaults to False (use Librosa wherever possible)
  • setting dependency versions to avoid errors (numba and librosa; keras)


3 years ago

An experimental Python framework for sound visualization, analysis, augmentation, filtering as well as machine learning.

Basic functionality for preparing audio datasets (e.g. formatting them), filtering audio, visualizing audio and its features (signal, stft, powspec, fbank, mfcc), augmenting audio for machine learning, and building/implementing basic neural networks for simple speech recognition, speech classification (e.g. language, gender or sex, emotion, etc.), and denoising.

Might be a bit buggy still.

keywords: audio file format conversion, dataset preparation, wiener filter, convolutional neural networks, cnn, conv, lstm, long short-term memory network, cnn+lstm, cnnlstm, convlstm, autoencoder, denoiser, speech recognition, environment classification, scene classification, language classification, denoising, augmentation, feature extraction, mel-filterbank energies, fbank, mel-frequency cepstral coefficients, mfcc, short-time fourier transfrom, stft, raw signal.