InaSpeechSegmenter Versions Save

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

Overview
Versions
Reviews
Resources

interspeech23

1 year ago

final.onnx and raw81.pth are pretrained X-vector Resnet101 architectures, obtained from VBX project (Brno University of Technology) https://github.com/BUTSpeechFIT/VBx/tree/master/VBx/models/ResNet101_16kHz/nnet For more details see F. Landini, J. Profant, M. Diez, L. Burget: Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks (arXiv version)
interspeech2023_all.hdf5 and interspeech2023_cvfr.hdf5 are X-vector MLP gender classification models trained by @simonD3V . This work is described in a study submitted to interspeech 2023 to be described upon acceptance.

models

2 years ago

Classification models used in inaSpeechSegmenter