🎶 dead simple audio classification
Dead simple audio classification
People who just want to classify some audio quickly, without having to dive into the world of audio analysis. If you need something a little more involved, check out pyAudioAnalysis or panotti
pip install pyaudioclassification
from pyaudioclassification import feature_extraction, train, predict
features, labels = feature_extraction(<data_path>)
model = train(features, labels)
pred = predict(model, <data_path>)
Or, if you're feeling reckless, you could just string them together like so:
pred = predict(train(feature_extraction(<training_data_path>)), <prediction_data_path>)
A full example with saving, loading & some dummy data can be found here.
Read below for a more detailed look at each of these calls.
First, add all your audio files to a directory in the following structure
data/
├── <class_name>/
│ ├── <file_name>
│ └── ...
└── ...
For example, if you were trying to classify dog and cat sounds it might look like this
data/
├── cat/
│ ├── cat1.ogg
│ ├── cat2.ogg
│ ├── cat3.wav
│ └── cat4.wav
└── dog/
├── dog1.ogg
├── dog2.ogg
├── dog3.wav
└── dog4.wav
Great, now we need to preprocess this data. Just call feature_extraction(<data_path>)
and it'll return our input and target data.
Something like this:
features, labels = feature_extraction('/Users/mac2015/data/')
(If you don't want to print to stdout, just pass verbose=False
as a argument)
Depending on how much data you have, this process could take a while... so it might be a good idea to save. You can save and load with NumPy
np.save('%s.npy' % <file_name>, features)
features = np.load('%s.npy' % <file_name>)
Next step is to train your model on the data. You can just call...
model = train(features, labels)
...but depending on your dataset, you might need to play around with some of the hyper-parameters to get the best results.
epochs
: The number of iterations. Default is 50
.
lr
: Learning rate. Increase to speed up training time, decrease to get more accurate results (if your loss is 'jumping'). Default is 0.01
.
optimiser
: Choose any of these. Default is 'SGD'
.
print_summary
: Prints a summary of the model you'll be training. Default is False
.
loss_type
: Classification type. Default is categorical
for >2 classes, and binary
otherwise.
You can add any of these as optional arguments, for example train(features, labels, lr=0.05)
Again, you probably want to save your model once it's done training. You can do this with Keras:
from keras.models import load_model
model.save('my_model.h5')
model = load_model('my_model.h5')
Now the fun part- try your trained model on new data!
pred = predict(model, <data_path>)
Your <data_path>
should point to a new, untested audio file.
If you have 2 classes (or if you force selected 'binary'
as a type), pred
will just be a single number for each file.
The closer it is to 0, the closer the prediction is for the first class, and the closer it is to 1 the closer the prediction is to the second class.
So for our cat/dog example, if it returns 0.2
it's 80% sure the sound is a cat, and if it returns 0.8
it's 80% sure it's a dog.
If you have more than 2 classes (or if you force selected 'categorical'
as a type), pred
will be an array for each sound file.
It'll look something like this
[[1.6454633e-06 3.7017996e-11 9.9999821e-01 1.5900606e-07]]
The index of each item in the array will correspond to the prediction for that class.
You can pretty print the predictions by showing them in a leaderboard, like so:
print_leaderboard(pred, <training_data_path>)
It looks like this:
1. Cow 100.0% (index 2)
2. Rooster 0.0% (index 0)
3. Frog 0.0% (index 3)
4. Pig 0.0% (index 1)