Faster Whisper transcription with CTranslate2
Fix the broken tag v0.10.0
feature_size/num_mels
and other from preprocessor_config.json
yue
)CTranslate2
requirement to include the latest version 3.22.0tokenizers
requirement to include the latest version 0.15Support distil-whisper model (https://github.com/SYSTRAN/faster-whisper/pull/557)
Robust knowledge distillation of the Whisper model via large-scale pseudo-labelling.
For more detail: https://github.com/huggingface/distil-whisper
Upgrade ctranslate2 version to 4.0 to support CUDA 12 (https://github.com/SYSTRAN/faster-whisper/pull/694)
Upgrade PyAV version to 11.* to support Python3.12.x (https://github.com/SYSTRAN/faster-whisper/pull/679)
Small bug fixes
New improvements from original OpenAI Whisper project
feature_size/num_mels
and other from preprocessor_config.json
yue
)CTranslate2
requirement to include the latest version 3.22.0tokenizers
requirement to include the latest version 0.15faster_whisper.available_models()
to list the available model sizessupported_languages
to list the languages accepted by the modeltask
and language
parameterstokenizers
requirement to include the latest version 0.14Some generation parameters that were available in the CTranslate2 API but not exposed in faster-whisper:
repetition_penalty
to penalize the score of previously generated tokens (set > 1 to penalize)no_repeat_ngram_size
to prevent repetitions of ngrams with this sizeSome values that were previously hardcoded in the transcription method:
prompt_reset_on_temperature
to configure after which temperature fallback step the prompt with the previous text should be reset (default value is 0.5)duration_after_vad
in the returned TranscriptionInfo
objectlanguage
parameter is set to something elseno_speech_threshold
: when the threshold was met for a segment, the next 30-second window reused the same encoder output and was also considered as non speechSome recent improvements from openai-whisper are ported to faster-whisper:
The WhisperModel
constructor now accepts any repository ID as argument, for example:
model = WhisperModel("username/whisper-large-v2-ct2")
The utility function download_model
has been updated similarly.
initial_prompt
(useful to include timestamp tokens in the prompt)no_speech_threshold
is met (same as https://github.com/openai/whisper/commit/e334ff141d5444fbf6904edaaf408e5b0b416fe8)TranscriptionInfo
with additional propertiesall_language_probs
: the probability of each language (only set when language=None
)vad_options
: the VAD options that were used for this transcriptionWhen the model is loaded from its name like WhisperModel("large-v2")
, a request is made to the Hugging Face Hub to check if some files should be downloaded.
It can happen that this request raises an exception: the Hugging Face Hub is down, the internet is temporarily disconnected, etc. These types of exception are now catched and the library will try to directly load the model from the local cache if it exists.
onnxruntime
dependency for Python 3.11 as the latest version now provides binary wheels for Python 3.11IndexError
on empty segments when using word_timestamps=True
__version__
at the module level