Faster Whisper Versions Save

Faster Whisper transcription with CTranslate2

4 weeks ago

Add support for distil-large-v3 (https://github.com/SYSTRAN/faster-whisper/pull/755) The latest Distil-Whisper model, distil-large-v3, is intrinsically designed to work with the OpenAI sequential algorithm.
Benchmarks (https://github.com/SYSTRAN/faster-whisper/pull/773) Introduces functionality to measure benchmarking for memory, Word Error Rate (WER), and speed in Faster-whisper.
Support initializing more whisper model args (https://github.com/SYSTRAN/faster-whisper/pull/807)
Small bug fix:
- code breaks if audio is empty (https://github.com/SYSTRAN/faster-whisper/pull/768)
- Foolproof: Disable VAD if clip_timestamps is in use (https://github.com/SYSTRAN/faster-whisper/pull/769)
- make faster_whisper.assets as a valid python package to distribute (https://github.com/SYSTRAN/faster-whisper/pull/774)
- Loosen tokenizers version constraint (https://github.com/SYSTRAN/faster-whisper/pull/804)
- CUDA version and updated installation instructions (https://github.com/SYSTRAN/faster-whisper/pull/785)
New feature from original openai Whisper project:
- Feature/add hotwords (https://github.com/SYSTRAN/faster-whisper/pull/731)
- Improve language detection (https://github.com/SYSTRAN/faster-whisper/pull/732)

3 months ago

Bug fixes and performance improvements:
- Update logic to get segment from features before encoding (https://github.com/SYSTRAN/faster-whisper/pull/705)
- Fix window end heuristic for hallucination_silence_threshold (https://github.com/SYSTRAN/faster-whisper/pull/706)

3 months ago

Fix the broken tag v0.10.0

3 months ago

Support "large-v3" model with
- The ability to load feature_size/num_mels and other from preprocessor_config.json
- A new language token for Cantonese (yue)
Update CTranslate2 requirement to include the latest version 3.22.0
Update tokenizers requirement to include the latest version 0.15
Change the hub to fetch models from Systran organization

3 months ago

Support distil-whisper model (https://github.com/SYSTRAN/faster-whisper/pull/557) Robust knowledge distillation of the Whisper model via large-scale pseudo-labelling.
For more detail: https://github.com/huggingface/distil-whisper
Upgrade ctranslate2 version to 4.0 to support CUDA 12 (https://github.com/SYSTRAN/faster-whisper/pull/694)
Upgrade PyAV version to 11.* to support Python3.12.x (https://github.com/SYSTRAN/faster-whisper/pull/679)
Small bug fixes
- Illogical "Avoid computing higher temperatures on no_speech" (https://github.com/SYSTRAN/faster-whisper/pull/652)
- broken prompt_reset_on_temperature (https://github.com/SYSTRAN/faster-whisper/pull/604)
- Word timing tweaks (https://github.com/SYSTRAN/faster-whisper/pull/616)
New improvements from original OpenAI Whisper project
- Skip silence around hallucinations (https://github.com/SYSTRAN/faster-whisper/pull/646)
- Prevent infinite loop for out-of-bound timestamps in clip_timestamps (https://github.com/SYSTRAN/faster-whisper/pull/697)

6 months ago

Support "large-v3" model with
- The ability to load feature_size/num_mels and other from preprocessor_config.json
- A new language token for Cantonese (yue)
Update CTranslate2 requirement to include the latest version 3.22.0
Update tokenizers requirement to include the latest version 0.15
Change the hub to fetch models from Systran organization

8 months ago

Add function faster_whisper.available_models() to list the available model sizes
Add model property supported_languages to list the languages accepted by the model
Improve error message for invalid task and language parameters
Update tokenizers requirement to include the latest version 0.14

8 months ago

Expose new transcription options

Some generation parameters that were available in the CTranslate2 API but not exposed in faster-whisper:

repetition_penalty to penalize the score of previously generated tokens (set > 1 to penalize)
no_repeat_ngram_size to prevent repetitions of ngrams with this size

Some values that were previously hardcoded in the transcription method:

prompt_reset_on_temperature to configure after which temperature fallback step the prompt with the previous text should be reset (default value is 0.5)

Fix a possible memory leak when decoding audio with PyAV by forcing the garbage collector to run
Add property duration_after_vad in the returned TranscriptionInfo object
Add "large" alias for the "large-v2" model
Log a warning when the model is English-only but the language parameter is set to something else

10 months ago

Fix a bug related to no_speech_threshold: when the threshold was met for a segment, the next 30-second window reused the same encoder output and was also considered as non speech
Improve selection of the final result when all temperature fallbacks failed by returning the result with the best log probability

10 months ago

Some recent improvements from openai-whisper are ported to faster-whisper:

The WhisperModel constructor now accepts any repository ID as argument, for example:

model = WhisperModel("username/whisper-large-v2-ct2")

The utility function download_model has been updated similarly.

Accept an iterable of token IDs for the argument initial_prompt (useful to include timestamp tokens in the prompt)
Avoid computing higher temperatures when no_speech_threshold is met (same as https://github.com/openai/whisper/commit/e334ff141d5444fbf6904edaaf408e5b0b416fe8)
Fix truncated output when using a prefix without disabling timestamps
Update the minimum required CTranslate2 version to 3.17.0 to include the latest fixes