1D Triplet CNN Save

PyTorch implementation of the 1D-Triplet-CNN neural network model described in Fusing MFCC and LPC Features using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals by A. Chowdhury, and A. Ross.

Project README

1D-Triplet-CNN

PyTorch implementation of the 1D-Triplet-CNN neural network model described in Fusing MFCC and LPC Features using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals by A. Chowdhury, and A. Ross.

Research Article

Anurag Chowdhury, and Arun Ross, Fusing MFCC and LPC Features using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals, IEEE Transactions on Information Forensics and Security (2019).

IEEE Xplore: https://ieeexplore.ieee.org/document/8839817

1D-Triplet-CNN Model

1D-Triplet-CNN Details

Implementation details and requirements

The model was implemented in PyTorch 1.2.1 using Python 3.6 and may be compatible with different versions of PyTorch and Python, but it has not been tested.

Additional requirements are listed in the ./requirements.txt file.

Usage

Source code and model parameters

The source code of the 1D-Triplet-CNN model can be found in the model subdirectory, and a pre-trained model is available in the trained_models subdirectory.

Dataset

The pre-trained model avilable in the trained_models subdirectory was trained on a subset of Fisher speech corpus obtained from https://catalog.ldc.upenn.edu/LDC2004S13. The training data was also degraded with varying degrees of Babble noise obtained from NOISEX-92 dataset.

Training the 1D-Triplet-CNN model

In order to train a 1D-Triplet-CNN model as described in the research paper, use the 1D-Triplet-CNN implementation given in the models subdirectory. The network attains optimal performance when trained using a triplet learning framework. Read the research paper for more details on training the model.

Testing with the pretrained model

Recommended audio specifications

Usually, 2 seconds of speech audio sampled at 8000KHz is enough to produce reliable speaker recognition results. Longer audio samples will make the recognition task significantly slower with no significant benefits to performance. Audio samples smaller than 1secs with have considerable performance loss.

Usage

Satisfy the requirements listed in the ./requirements.txt file.
Run src/extractFeatures.m in MATLAB R2019a(or newer) to extract MFCC-LPC features from audio files placed in sample_audio subdirectory and save corresponding features as individual .mat files in sample_feature subdirectory.
Run src/test.py in Python 3.6 to evaluate some sample audio pairs for generating speaker verification scores.

Examples

Some usage examples might be added in future.

Open Source Agenda is not affiliated with "1D Triplet CNN" Project. README Source: iPRoBe-lab/1D-Triplet-CNN

Stars

Open Issues

Last Commit

4 years ago

Repository

iPRoBe-lab/1D-Triplet-CNN

License

MIT

Homepage

https://ieeexplore.ieee.org/document/8839817

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/1d-triplet-cnn"><img src="https://www.opensourceagenda.com/projects/1d-triplet-cnn/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022