Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Providing num_speakers
to pyannote/speaker-diarization-3.1
now works as expected.
num_speakers
in pyannote/speaker-diarization-3.1
pipelinepyannote/speaker-diarization-3.1
no longer requires unpopular ONNX runtime
TimingHook
for profiling processing timeArtifactHook
for saving internal stepsHooks
"soft"
option to Powerset.to_multilabel
SpeakerDiarization
AgglomerativeClustering
to honor num_clusters
when providedmax_speakers
or detected num_speakers
in SpeakerDiarization
pipelinefbank
on GPU when requestedWeSpeakerPretrainedSpeakerEmbedding
to ONNXWeSpeakerPretrainedSpeakerEmbedding
onnxruntime
dependency.
You can still use ONNX hbredin/wespeaker-voxceleb-resnet34-LM
but you will have to install onnxruntime
yourself.logging_hook
(use ArtifactHook
instead)onset
and offset
parameter in SpeakerDiarizationMixin.speaker_count
You should now binarize segmentations before passing them to speaker_count
pyannote/speaker-diarization-3.0
is now much faster when sent to GPU.
import torch
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.0")
pipeline.to(torch.device("cuda"))
onnxruntime
to onnxruntime-gpu
Benchmark (DER %) | v2.1 | v3.0 |
---|---|---|
AISHELL-4 | 14.1 | 12.3 |
AliMeeting (channel 1) | 27.4 | 24.3 |
AMI (IHM) | 18.9 | 19.0 |
AMI (SDM) | 27.1 | 22.2 |
AVA-AVD | - | 49.1 |
DIHARD 3 (full) | 26.9 | 21.7 |
MSDWild | - | 24.6 |
REPERE (phase2) | 8.2 | 7.8 |
VoxConverse (v0.3) | 11.2 | 11.3 |
pipeline.to(torch.device('cuda'))
to use GPUSpeakerSegmentation
pipeline
Use SpeakerDiarization
pipeline insteadprodi.gy
recipespipeline.to(device)
return_embeddings
option to SpeakerDiarization
pipelinesegmentation_batch_size
and embedding_batch_size
mutable in SpeakerDiarization
pipeline (they now default to 1
)SpeakerDiarization
taskSegmentation
task to SpeakerDiarization
pipeline.to(device)
)SpeakerSegmentation
pipeline (use SpeakerDiarization
pipeline)segmentation_duration
parameter from SpeakerDiarization
pipeline (defaults to duration
of segmentation model)FINCHClustering
and HiddenMarkovModelClustering
pyannote.audio.core.io.Audio
is instantiated:
Audio()
by Audio(mono="downmix")
;Audio(mono=True)
by Audio(mono="downmix")
;Audio(mono=False)
by Audio()
.Model.introspection
If, for some weird reason, you wrote some custom code based on that,
you should instead rely on Model.example_output
.Version 2.1.x
introduces a major overhaul of pyannote.audio
default speaker diarization pipeline, made of three main stages:
More details in the attached technical report.