Faster Whisper transcription with CTranslate2
Add support for distil-large-v3 (https://github.com/SYSTRAN/faster-whisper/pull/755) The latest Distil-Whisper model, distil-large-v3, is intrinsically designed to work with the OpenAI sequential algorithm.
Benchmarks (https://github.com/SYSTRAN/faster-whisper/pull/773) Introduces functionality to measure benchmarking for memory, Word Error Rate (WER), and speed in Faster-whisper.
Support initializing more whisper model args (https://github.com/SYSTRAN/faster-whisper/pull/807)
Small bug fix:
New feature from original openai Whisper project:
Fix the broken tag v0.10.0
feature_size/num_mels
and other from preprocessor_config.json
yue
)CTranslate2
requirement to include the latest version 3.22.0tokenizers
requirement to include the latest version 0.15Support distil-whisper model (https://github.com/SYSTRAN/faster-whisper/pull/557)
Robust knowledge distillation of the Whisper model via large-scale pseudo-labelling.
For more detail: https://github.com/huggingface/distil-whisper
Upgrade ctranslate2 version to 4.0 to support CUDA 12 (https://github.com/SYSTRAN/faster-whisper/pull/694)
Upgrade PyAV version to 11.* to support Python3.12.x (https://github.com/SYSTRAN/faster-whisper/pull/679)
Small bug fixes
New improvements from original OpenAI Whisper project
feature_size/num_mels
and other from preprocessor_config.json
yue
)CTranslate2
requirement to include the latest version 3.22.0tokenizers
requirement to include the latest version 0.15faster_whisper.available_models()
to list the available model sizessupported_languages
to list the languages accepted by the modeltask
and language
parameterstokenizers
requirement to include the latest version 0.14Some generation parameters that were available in the CTranslate2 API but not exposed in faster-whisper:
repetition_penalty
to penalize the score of previously generated tokens (set > 1 to penalize)no_repeat_ngram_size
to prevent repetitions of ngrams with this sizeSome values that were previously hardcoded in the transcription method:
prompt_reset_on_temperature
to configure after which temperature fallback step the prompt with the previous text should be reset (default value is 0.5)duration_after_vad
in the returned TranscriptionInfo
objectlanguage
parameter is set to something elseno_speech_threshold
: when the threshold was met for a segment, the next 30-second window reused the same encoder output and was also considered as non speechSome recent improvements from openai-whisper are ported to faster-whisper:
The WhisperModel
constructor now accepts any repository ID as argument, for example:
model = WhisperModel("username/whisper-large-v2-ct2")
The utility function download_model
has been updated similarly.
initial_prompt
(useful to include timestamp tokens in the prompt)no_speech_threshold
is met (same as https://github.com/openai/whisper/commit/e334ff141d5444fbf6904edaaf408e5b0b416fe8)