Faster Whisper transcription with CTranslate2
Fix download_root
to correctly set the cache directory where the models are downloaded.
Some information are now logged under INFO
and DEBUG
levels. The logging level can be configured like this:
import logging
logging.basicConfig()
logging.getLogger("faster_whisper").setLevel(logging.DEBUG)
New arguments were added to the WhisperModel
constructor to better control how the models are downloaded:
download_root
to specify where the model should be downloaded.local_files_only
to avoid downloading the model and directly return the path to the cached model, it it exists.condition_on_previous_text=False
(note that the bug still exists in openai/whisper v20230314)Segment
structure with additional properties to match openai/whisperAudioInfo
to TranscriptionInfo
and add a new property options
to summarize the transcription options that were usedFix some IndexError
exceptions:
The Silero VAD model is integrated to ignore parts of the audio without speech:
model.transcribe(..., vad_filter=True)
The default behavior is conservative and only removes silence longer than 2 seconds. See the README to find how to customize the VAD parameters.
Note: the Silero model is executed with onnxruntime
which is currently not released for Python 3.11. The dependency is excluded for this Python version and so the VAD features cannot be used.
The function decode_audio
has a new argument split_stereo
to split stereo audio into seperate left and right channels:
left, right = decode_audio(audio_file, split_stereo=True)
# model.transcribe(left)
# model.transcribe(right)
Segment
attributes avg_log_prob
and no_speech_prob
(same definition as openai/whisper)av.error.InvalidDataError
exception during decodingprefix
to be passed only to the first 30-second windowsuppress_tokens
with some special tokens that should always be suppressed (unless suppress_tokens is None
)WhisperModel
instance. The conversion step is no longer required for the original Whisper models.# Automatically download https://huggingface.co/guillaumekln/faster-whisper-large-v2
model = WhisperModel("large-v2")
Initial publication of the library on PyPI: https://pypi.org/project/faster-whisper/