Port of OpenAI's Whisper model in C/C++
In this release we significantly reduce the memory usage during inference by introducing "scratch" buffers to ggml
.
The new memory requirements per model are as follows:
Model | Disk | Mem (Old) | Mem (New) |
---|---|---|---|
tiny | 75 MB | ~390 MB | ~125 MB |
base | 142 MB | ~500 MB | ~210 MB |
small | 466 MB | ~1.0 GB | ~600 MB |
medium | 1.5 GB | ~2.6 GB | ~1.7 GB |
large | 2.9 GB | ~4.7 GB | ~3.3 GB |
It's a simple idea that instead of creating a new memory buffer for each new tensor in the computation, we reuse the memory of old tensors that are no longer needed. The implementation is in PR #431. It's not very clean - I think there is some better way to do this, but for now it will work.
Additionally, there might be some inference speed improvements on Apple Silicon in the Decoder part of the transformer. I haven't done proper benchmarks, but seems there is about ~30% performance boost. The results are identical to v1.1.1
.
ggml
/ whisper
whisper
: PPC64 big-endian support by @fitzsim in https://github.com/ggerganov/whisper.cpp/pull/398
whisper
: condition sampled timestamp tokens to be monotonically increasing by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/425
wasm
: fix typo in helper.js by @bhbs in https://github.com/ggerganov/whisper.cpp/pull/459
ggml
/whisper
: reduce memory usage during inference by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/431
ci
: run workflows on pull requests + bindings depend on .h by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/446
go
: added wrappers to reset and print timings by @glaslos in https://github.com/ggerganov/whisper.cpp/pull/436
go
: add WhisperLangAutoDetect method to go binding by @RobinXL in https://github.com/ggerganov/whisper.cpp/pull/451
go
: add wrapper for system info by @glaslos in https://github.com/ggerganov/whisper.cpp/pull/456
go
: support "auto" as an option when set language by @polarmoon in https://github.com/ggerganov/whisper.cpp/pull/462
whisper.wasm
: add labels for easier radio selection by @kokes in https://github.com/ggerganov/whisper.cpp/pull/435
livestream.sh
: run main with model arg instead of default by @EricTendian in https://github.com/ggerganov/whisper.cpp/pull/453
main
: CSV format export trimmed spaces fix by @alex-bacart in https://github.com/ggerganov/whisper.cpp/pull/444
addon.node
: using whisper as a Node.js addon by @chenqianhe in https://github.com/ggerganov/whisper.cpp/pull/443
Full Changelog: https://github.com/ggerganov/whisper.cpp/compare/v1.1.1...v1.2.0
I'll use these release notes to write some random thoughts about the project - sort of a short blog post.
I'm really happy with how whisper.cpp
turned out to be so far. There is a very positive reception in the ML community - most people seem to be excited by the simplicity of the implementation and the fact that it is quite self-contained. I receive a lot of questions about the project and about various ideas that it can be applied to. I really enjoy it and I try to respond to everyone!
I also find it very satisfying that there are so many contributions already happening by so many people. To me this illustrates the power of open-source collaboration. The contributions not only improve the functionality and the quality of the code, but also help to generate various new ideas and approaches to explore.
Another interesting thing is that the project keeps on giving. Every time I start to think that now is a good time to put it in the background for a while and focus on other stuff, some new cool idea pops up and I can't help but start working on it. Having this custom implementation allows me to interact with the model on a lower level which opens some interesting ways to explore it.
So far the development has been focused on improving the performance, expanding the platform coverage and having robust decoding strategies with a variety of examples. During this time, there have been several ideas that accumulated over-time which I find interesting to explore (diarization, token-level timestamps, improved timestamp accuracy, etc). I think I'll try to focus more on these in the future and see if I can achieve something interesting.
Windows port of whisper.cpp
utilising vendor-agnostic GPGPU based on DirectCompute by @Const-me
whisper.cpp
Since the v1.1.0 pre-release there have been several reports of improved transcription quality.
Together with my observations, I think we can declare version v1.1.1
as "stable".
There were actually a couple of bug-fixes implemented since v1.1.0
, so make sure to update to v1.1.1
for optimal results.
Another update is that the prototype for v1.2.0 is almost ready: https://github.com/ggerganov/whisper.cpp/pull/431 Initial results indicate that the memory usage can be reduced by a factor of 2-3 for the smaller models.
You can provide feedback in the existing v1.1.0 discussion.
ggml
/ whisper
whisper
: perform entropy check only when we have at least 32 tokens 1a91c19af929d6dc614a9f3b03026fb23be002a6whisper
: fix condition for providing past prompt (critical) 78f166174f126345ed87cc8f6941af1905c4a0f2go
: remove sample_best
and sample_timestamp
bindings by @Trojan295 in https://github.com/ggerganov/whisper.cpp/pull/409
main
: re-enable temperature fallback f583e2d2f5a60e6ebf5bb2819ba4c4d348d41ea2main
: add an option to accept optional output filenames by @garychia in https://github.com/ggerganov/whisper.cpp/pull/424
whisper.android
: use AssetManager for Android by @Digipom in https://github.com/ggerganov/whisper.cpp/pull/415
whisper.wasm
: add small and small.en models 206fc93396936725bd362c93796cfdc8a87f8509bench
: add memcpy and ggml_mul_mat benchmarks (experimental) 1290fc64572f434f2f36721d2e2b0913cec0178aFull Changelog: https://github.com/ggerganov/whisper.cpp/compare/v1.1.0...v1.1.1
The major change in this pre-release is the improved decoding implementation in whisper.cpp
:
T > 0
best_of
parameter for T > 0
beam_size
)More information about the decoding changes can be found in #291 Additionally, there are a few performance improvements for Apple Silicon, WASM and non-F16C platforms. Support for POWER9 architectures has been added.
The reason that this is a pre-release and not an official release is that the new implementation has not been sufficiently tested yet and the existing bindings for other languages have not been updated to support the API changes. The official release 1.1.x
will be created when there is enough feedback about the new decoding implementation and when the bindings have been updated. So make sure to send your feedback in the discussion created for this pre-release. For now, the 1.0.4
release should be considered more stable.
ggml
/ whisper
ggml
: POWER9 support by @fitzsim in #320, #349, #369ggml
: simplify the SIMD code by @ggerganov in #324ggml
: add SSE3 and fp16 conversion lookup table by @abitofevrything in #368ggml
: utilise Accelerate's vDSP for some computations d51fc3ee0a0038cdf1522ca3d58b58299de41eb8ggml
: speed-up softmax compute via Accelerate and loop unrolling d61d55cd4b9fe77511c8eea28d0220ce552f7008ggml
: do not start extra threads when using BLAS d347a59a5f224f6a5ab0084ec95715451972d3b0whisper
: do sample_to_timestamp calculation with 64 bit precision to avoid overflow by @boolemancer in #388whisper
: various code clean-up and improvements by @asmaloney in #317 #318 #319 #322 etcwhisper
: improve decoding by @ggerganov in #291whisper
: account for speed_up flag for short audio #405whisper_token_data::plog
whisper_init_from_file()
whisper_init_from_buffer()
whisper_init()
whisper_sample_best()
whisper_sample_timestamp()
whisper_n_audio_ctx()
whisper_get_logits()
whisper_get_probs()
struct whisper_full_params
whisper.android
: remove android ABI constraint by @Digipom in #301whisper.swiftui
: SwiftUI example by @Digipom in #308main
: add -ocsv
, aka --output-csv
for writing CSV file containing millisecond timestamps by @NielsMayer in #340command
: refactor to split command list & general transcription modes by @asmaloney in #331command
: always-prompt mode by @dnhkng in #383stream
: fix data race on bool + avoid division-by-zero a466c3404dc62dc221061bb37fb8f78741d749b8stream
: fix a bug that inserted a lot of empty audio at the start a6dbd9188b13378dc36e2c669b9a22e17b4201d1bench.wasm
: print system info fafd78945d5a7ea11ffa31fa6c05dd6593b7d031Full Changelog: https://github.com/ggerganov/whisper.cpp/compare/v1.0.4...v1.1.0
ggml
/ whisper
ggml
compatible with c99 9955fa4ed7cc694d5d47fe0bb5f0d02066f9cbac | 0f117594066a213cc3cc9261c8906f316e6fb153whisper_tokenize()
- basic text tokenization bf69b669a00e457b6bfa69b97f1fdf2578d3e403ggml_vec_scale_f32
by @katsu560 in https://github.com/ggerganov/whisper.cpp/pull/285
ggml_compute_forward_dup_f16()
a7047b2a28a8eccb94318eca8a3207894d3822c7whisper_tokenize()
whisper_lang_max_id()
whisper_lang_str()
whisper_lang_auto_detect()
whisper_token_lang()
--prompt
option b8065d90f5fdcdb445a8fb3f4717cba54c332cac--print-progress
option 32fbc8cd04912904cf84af7c5bd0e0e711a6f021--lang auto
option fba10a4c68f0533a339174ef81c6a18ea228d331Full Changelog: https://github.com/ggerganov/whisper.cpp/compare/1.0.3...1.0.4
General-purpose, short voice command detection on Raspberry Pi 4 using example/command:
https://user-images.githubusercontent.com/1991296/208255185-6e9d60ea-4bc8-4b64-b731-8ca9f3b7333b.mp4
ggml
/ whisper
ggml
compatible with c99 9955fa4ed7cc694d5d47fe0bb5f0d02066f9cbac | 0f117594066a213cc3cc9261c8906f316e6fb153whisper_tokenize()
- basic text tokenization bf69b669a00e457b6bfa69b97f1fdf2578d3e403ggml_vec_scale_f32
by @katsu560 in https://github.com/ggerganov/whisper.cpp/pull/285
ggml_compute_forward_dup_f16()
a7047b2a28a8eccb94318eca8a3207894d3822c7whisper_tokenize()
whisper_lang_max_id()
whisper_lang_str()
whisper_lang_auto_detect()
whisper_token_lang()
--prompt
option b8065d90f5fdcdb445a8fb3f4717cba54c332cac--print-progress
option 32fbc8cd04912904cf84af7c5bd0e0e711a6f021--lang auto
option fba10a4c68f0533a339174ef81c6a18ea228d331Full Changelog: https://github.com/ggerganov/whisper.cpp/compare/1.0.3...1.0.4
General-purpose, short voice command detection on Raspberry Pi 4 using example/command:
https://user-images.githubusercontent.com/1991296/208255185-6e9d60ea-4bc8-4b64-b731-8ca9f3b7333b.mp4