Essentia Versions Save

C++ library for audio and music analysis, description and synthesis, including Python bindings

v2.1_beta5

4 years ago

Essentia 2.1 beta5 is our current preliminary version of the forthcoming 2.1 release. This pre-release includes the following changes:

  • Algorithms updates and bug-fixes

    • Fix the slaneyMel scale implementation in MelBands and MFCC (#849). Introduced in 2.1-beta4, it was erroneously computing the HTK Mel scale. Set htkMel as the default scale to ensure backward compatibility with all previous versions of MelBands/MFCC.

    • New option unit_tri for triangle area normalization in MelBands, MFCC, and TriangularBands.

    • New parameter silenceThreshold in MFCC and GFCC. Set default threshold to 1e-10 (#543).

    • TriangularBands: faster unit-sum normalization and an improved check for insufficient spectrum resolution (#142).

    • ConstantQ and the related Chromagram and SpectrumCQ are reimplemented from scratch and now function correctly. The maxFrequency parameter is replaced by numberBins.

    • New negativeFrequencies parameter in FFTC to include negative frequencies in the output.

    • New normalize parameter for IFFT size normalization.

    • FFTC now supports KissFFT and Accelerate.

    • PoolAggregator: new aggregation method last to get the last value. Fix possible nan/inf values in kurtosis and skewness (#689). Apply aggregation for pool values that contain only one vector too.

    • New checkRange parameter in Trimmer and StereoTrimmer.

    • PitchFilter: improve consistency between input and output stream types (#674).

    • PitchMelodia: fix missing output pitchConfidence in streaming mode.

    • MultiPitchMelodia: peakFrameThreshold and peakFrameThreshold parameters now work correctly (they were overridden by hardcoded values).

    • New tolerance parameter in PitchYinFFT. When the pitch confidence is lower than the tolerance value the output pitch is set to 0. A tolerance of 1 disables this feature.

    • Fix occasional negative values output by Danceability (#483).

    • LoudnessEBUR128:

      • Fix memory leaks and warnings on empty input. Set a larger internal buffer size to avoid buffer resizes.
      • New parameter startFromZero to zero-center the first window for loudness estimation.
    • Fix a memory leak in AudioLoader.

    • BeatTrackerDegara output is now deterministic (#860).

    • ChordDetectionBeats: add new parameter chromaPick and fix a beat segment indexing bug in the case of very close consecutive beats.

    • New minPeakDistance parameter in PeakDetection.

    • Fix invalid memory access in PCA (#727).

    • Update Key and KeyExtractor algorithms with new pitch class profiles and new parameters for detuning correction and low-energy HPCP bin thresholding. Use the new bgate profile by default. Add spectral whitening step to KeyExtractor. Change output key naming. Add a new function equivalentKey to match between equivalent names.

    • Proper mutex implementation for all FFT* algorithms.

  • New algorithms

    • Invertible Constant-Q based on Non-Stationary Gabor frames: NSGConstantQ, NSGIConstantQ, NSGConstantQStreaming.
    • Chromaprinter (fingerprinting) wrapper for the Chromaprint library.
    • NNLSChroma and LogSpectrum (derived from the original NNLS Chroma code).
    • TriangularBarkBands (more configurable than BarkBands) and BFCC (bark-frequency cepstrum coefficients).
    • New algorithms for audio problems detection: ClickDetector, DiscontinuityDetector, FalseStereoDetector, GapsDetector, HumDetector, NoiseBurstDetector, SNR, SaturationDetector, StartStopCut, TruePeakDetector.
    • New algorithms for probabilistic Yin (pYIN) pitch estimation: PitchYinProbabilistic, PitchYinProbabilities, PitchYinProbabilitiesHMM.
    • StereoTrimmer and StereoMuxer.
    • Welch (power spectral density estimation).
    • New algorithm IFFTC for inverse complex STFT.
    • Histogram.
  • Updated music and sound feature extractors streaming_extractor_music and streaming_extractor_freesound. Both extractors are now also available as algorithms: MusicExtractor and FreesoundExtractor. New MusicExtractorSVM algorithm allows applying SVM models to the output of MusicExtractor.

    • Fix possible memory leaks in MusicExtractor

    • Proper logging for "out of memory" errors

    • Skip aggregation for some descriptors

    • Add audio length to metadata and remove end_time

    • Add number of audio channels to metadata (number_channels)

    • Better grouping of metadata related to audio analysis

    • Updated key/chords estimation parameters

    • Estimate key using three different key profiles (temperley, krumhansl, edma)

    • Updated descriptors in MusicExtractor:

      • New LoudnessEBU128 loudness descriptors
      • Add melbands128 high-resolution melbands
      • Compute hpcp_crest
      • Compute bpm_histogram
      • New stdev aggregate statistics in addition to var
    • Updated descriptors in FreesoundExtractor

      • Add melbands96 high-resolution melbands
      • Add stdev statistic
      • Remove frequency_bands
      • Do not output bpm_confidence when configured to use 'degara' for beat tracking
      • spectral_contrast and scvalleys are now called spectral_contrast_coeffs and spectral_contrast_valleys for consistency with MusicExtractor
      • startFrame and stopFrame are now called sound_start_frame and sound_stop_frame
  • New extractors

    • Add a new extractor for spectrograms and log-energy Mel-spectrograms (streaming_spectrogram).
  • Python bindings updates

    • Add support for Python 3.
    • Update all tutorials and code examples to Python 3.
    • New essentia.pyutils submodule provides useful functions for a number of use-cases (spectrograms, CQ-grams, batch processing with extractors, etc.)
    • Fix a memory bug in Pool on a isSingleValue check in Python.
    • Faster VECTOR_VECTOR_REAL conversion from Python types.
  • Build scripts updates

    • Add script for Python packaging (python.py) and wheels.
    • Travis CI and build scripts for manylinux wheels.
    • Update Waf to 2.0.10.
    • The code is now partly C++11.
    • Build flags for MSVC.
    • Fixes for cross-compilation with Mingw-w64.
    • Default --prefix=$VIRTUAL_ENV when inside a virtualenv.
    • Read PKG_CONFIG_PATH and add new flag --pkg-config-path for custom lib paths.
    • New flag --only-python to build Python extension separately from libessentia.
    • Link only to libessentia when building examples.
    • Generate a proper essentia.pc pkg-config file.
    • Static builds updates.
      • Replace LibAv with FFmpeg, build with muxers.
      • Update Taglib version to 1.11.1, build with zlib.
      • Update Gaia to 2.4.5.
  • Miscellaneous

    • Fix segfault in the Vamp plugin (#635, #371).
    • Add support for SingleVectorString to Pool.
    • Added support for Cephes Bessel functions via a 3rdparty library Cephes.
  • Updated documentation, tutorials, and examples including a significant web redesign.

    • Improve build scripts for documentation.
    • Every algorithm page now has links to related algorithms.
    • An updated list of research works using Essentia.
    • New python examples.
    • New QA scripts for audio problems detection and HPCPs.
  • A usual assortment of code cleanup, updated and expanded unit tests, and better logging (more informative log and exception messages).

v2.1_beta4

5 years ago

This pre-release includes the following changes:

  • Improved algorithms

    • AudioLoader now supports audio sources with multiple audio streams (new parameter 'audioStream')
    • PoolAggregator now outputs stdev in addition to var (#342)
    • SpectralContrast: Improve precision for computation of subband bin intervals
    • Danceability now also outputs a DFA exponent vector
    • HPCP can now optionally apply unit sum normalization (#348)
    • HPCP: 'splitFrequency' parameter is now called 'bandSplitFrequency'
    • LoudnessEBUR128: Warn on empty input in the streaming mode
  • Updates to Mel and ERB energy band algorithms

    • Add support for extracting MelBands and MFCCs 'the htk way'
    • Add support for DCT type III in DCT algorithm
      • New parameter 'dctType' in DCT, MFCC and GFCC
      • New 'liftering' parameter in DCT and MFCC
    • New parameters 'normalize', 'type', 'scale' and 'weighting' in MelBands and MFCC
    • New 'type' parameter in GFCC
    • New 'logType' parameter in MFCC, GFCC
    • New 'log' parameter in TriangularBands and MelBands
    • ERBBands: 'type' parameter value "energy" is now called "power"
    • TriangularBands is now faster
  • New algorithms

    • SpectrumToCent for computing cent scale from frequency bins
    • New algorithm IDCT for inverse DCT
    • New algorithm SpectrumCQ
  • Bug-fixes in algorithms:

    • MelBands and TriangularBands: Add checks for insufficient spectrum resolution (#142)
    • Fix PitchYin out of range error (#376)
    • Fix Inf values in OddToEvenHarmonicEnergyRatio
    • Fix reset() in LowLevelSpectralExtractor and LowLevelSpectralEqloudExtractor
    • Fix occasional exception in BeatsLoudness (#199)
    • Danceability: Fix NaN danceability value occurring on very short input signals
    • Fix memory leak in MelBands
    • Fix memory bug in Vibrato
    • SpectralContrast: Force non-zero 'lowFrequencyBound' parameter to avoid division by zero (#568)
    • AudioLoader: Fix memory bug on exceptions while opening an audio file in AudioLoader
  • Updates to Python wrapper:

    • FrameGenerator now inherits the default parameters from FrameCutter
    • FrameGenerator now has a new method frame_times() to compute frame positions in time
    • Fix array memory corruption when passing NumPy array views to Essentia algorithms (#240)
    • Fix memory deallocation for streaming algorithms to avoid a memory leak
  • Extractors:

    • Freesound extractor now stores all results in json
  • Logging:

    • Remove colors in log messages when piped to file; do not print colors on Windows
  • Build scripts updates:

    • Update waf to 1.9.5
    • Update script for computing algorithm dependencies
  • Code cleanup and unit tests updates

  • Re-designed and expanded documentation:

    • Updated installation instructions
    • Reorganized and improved Python tutorials. Notebook tutorials are now also rendered as html
    • Updated algorithm descriptions
    • Added examples of industrial applications and academic studies using Essentia

v2.1_beta3

7 years ago

This pre-release includes the following changes:

  • Build script updates:
    • Cross-compilation for iOS and Android
    • Support for javascript using Emscripten
    • Updated dependencies in static extractors (LibAv 11.2, Taglib 1.10)
    • Fixed cross-compilation for Windows
    • Homebrew formula for easy installation on OSX
    • Updated Debian packaging
    • All dependencies are now optional. Algorithms and examples relying on missing dependencies will be ignored.
    • New flags for building lightweight versions of Essentia
      • --lightweight=LIBS to specify dependencies to be included
      • --include-algos=ALGOS and --ignore-algos=ALGOS to specify algorithms to be included
  • New algorithms:
    • SuperFlux algorithm for real-time onset detection (SuperFluxExtractor, SuperFluxNovelty)
    • Algorithms for sound modeling
      • Overlap-add (OverlapAdd)
      • Sine model analysis/synthesis (SineModelAnal, SineModelSynth)
      • Sine subtraction (SineSubtraction)
      • Sinusoidal plus Residual model analysis/synthesis (SprModelAnal, SprModelSynth)
      • Melody Analysis (monophonic/predominant)
      • HarmonicMask
      • Signal resampling (ResampleFFT)
    • New pitch-related algorithms
      • Multi-pitch estimation in polyphonic music (MultiPitchKlapuri, MultiPitchMelodia)
      • Adaptation of Melodia algorithm for monophonic signals (PitchMelodia)
      • Yin pitch detection algorithm (PitchYin)
      • Pitch contour segmentation into notes (PitchContourSegmentation)
      • Vibrato detection (Vibrato)
    • BPM estimation on loops (PercivalEnhanceHarmonics, PercivalEvaluatePulseTrains, LoopBpmConfidence, LoopBpmEstimator, PercivalBpmEstimator)
    • STFT on complex inputs ( FFTC)
    • ConstantQ and Chromagram (still in experimental stage)
    • TriangularBands
    • Lightweight spectral centroid implementation (SpectralCentroidTime)
    • Chords detection on beat segments (ChordsDetectionBeats)
    • VectorRealAccumulator
  • Improved algorithms:
    • LoudnessEBUR128 algorithms are now finalized (includes bug-fixes)
    • FFT now supports KissFFT and Accelerate FFT libraries as an alternative to FFTW
    • New profiles for Key estimation (including profiles for electronic music)
    • New 'generalized' parameter in Autocorrelation algorithm
    • New 'scale' and 'shift' parameters in UnaryOperator algorithm
    • New 'normalized' parameter in Windowing algorithm
    • New 'inputSize' parameter in GFCC algorithm
    • Added support for 8kHz for EqualLoudness algorithm
    • LogAttackTime now outputs attack times
    • BpmHistogramDescriptors now outputs a complete histogram
    • ChordsDescriptors now throws exception on incorrect chords
    • Refactored AudioLoader and AudioWriter algorithms. Use libavresample, remove support for libswresample
    • Rename PitchFilterMakam to PitchFilter. Allow filtering negative energy values. Remove optional 'octaveFilter' parameter
    • Rename PredominantMelody algorithm to PredominantPitchMelodia
  • Bug-fixes:
    • Fix wrong behavior of HarmonicPeaks that was indirectly affecting results in HPCP, Key, Tristimulus and OddToEvenHarmonicEnergy
    • Fixed filter coefficients in BandReject and BandPass
    • Fixed weightings in NoveltyCurve
    • Different key profiles in Key streaming algorithm now work correctly
    • Bug fixes in Envelope, TonicIndianArtMusic, RhythmExtractor2013, PitchYinFFT, BpmHistogramDescriptors, ReplayGain streaming
  • Updated extractors (including Freesound extractor)
  • Improved documentation
    • Fresh new design
    • Algorithms are now organized by categories.
    • Improved and rewritten algorithm descriptions
    • New python examples and tutorials
  • More minor fixes, improvements and code cleanup
  • Updated unit tests. Audio files for tests are now hosted in a separate repository

Known issues:

  • Some unit tests fail (#316)

v2.0

9 years ago
  • First release to be publicly available as free software released under AGPLv3
  • Refactoring of the core API
    • fix small API annoyances for the standard mode
    • streaming mode refactor. It is now much better defined, using sound computer science techniques (The visible network is a directed acyclic graph, the composites have better defined semantics, and the order of execution of the algorithms is the topological sort of the transitive reduction of the visible network after the composites have been expanded). In particular, the scheduler that runs the algorithms in the streaming mode is now a lot more correct, which permitted to clean all the small hacks that had accumulated in the algorithms themselves during the 1.x releases to compensate for the deficiencies of the initial scheduler.
  • New algorithms for onset detection, beat tracking and melody extraction
  • New and updated features extractors
  • Updated Vamp plugin
  • Much better documentation, more python examples
  • Bugfixes, more unittests, etc.

For post-release bugfixes use the 2.0 branch.

Ubuntu/Debian Libav compatibility:

  • Debian Wheezy - libav 6:0.8.17
  • Ubuntu Precise (12.04 LTS) - libav 4:0.8.17
  • Ubuntu Trusty (14.04 LTS) - libav 6:9.18

v2.1_beta2

9 years ago

Changes:

  • Build scripts updates:
    • New scripts for static builds on Linux, OSX and (cross-compilation) Windows
    • New flag --with-example to build only specific examples
    • New git commit SHA hash value accessible via Essentia library API for better versioning
  • Algorithm updates:
    • AudioLoader now outputs codec and bitrate, and computes md5 hash values over undecoded audio
    • MetadataReader now uses new TagLib 1.9 API and is able to read any tags
    • YamlInput now supports json
    • New Entropy algorithm
    • EffectiveDuration now accepts a threshold parameter
    • Fixed incorrect computation of onset rate in OnsetRate
    • New algorithm LoudnessEBUR128 for measuring loudness according to the EBU R128 standard (still in experimental stage)
    • New BinaryOperator algo
    • PitchYinFFT algorithm now includes peak interpolation
  • Revised and updated extractors:
    • Revised, refactored and expanded music extractor (streaming_extractor_music) including new functionality and descriptors
    • Updated Freesound extractor, including new descriptors
  • Some updates in core Essentia code
  • Updated documentation and examples
  • Bugfixes and unit tests updates

Dependencies: Libav 9, Taglib 1.9

Ubuntu/Debian Libav/Taglib compatibility:

  • Debian Jessie - the required package versions are already in the repository
  • Debian Wheezy - install libav/libtag1-dev packages from wheezy-backports repository
    • libav 6:10.1
    • libtag1-dev 1.9.1
  • Ubuntu Trusty (14.04 LTS), Utopic (14.10) and Vivid (15.04) - the required package versions are already in the repository

v2.0.1

10 years ago

Essentia 2.0.1:

  • Added pre-trained high-level classifier models for genres, moods, rhythm and instrumentation (to be used with streaming_extractor_archivemusic extractor, see accuracies here)
  • Fixed scheduler in streaming mode
  • Fixed compilation with clang/libc++/c++11
  • PitchYinFFT now supports parabolic interpolation
  • Updated Vamp plugin
  • Updated documentation and tutorials
  • Minor bugfixes, more unittests, etc.

For post-release bugfixes (including Ubuntu 14.04 compatibility) use the 2.0.1 branch.

Ubuntu/Debian Libav compatibility:

  • Debian Wheezy - libav 6:0.8.17
  • Ubuntu Precise (12.04 LTS) - libav 4:0.8.17
  • Ubuntu Trusty (14.04 LTS) - libav 6:9.18