Whisper.cpp Versions Save

Port of OpenAI's Whisper model in C/C++

v1.2.0

1 year ago

Overview

In this release we significantly reduce the memory usage during inference by introducing "scratch" buffers to ggml.

The new memory requirements per model are as follows:

Model Disk Mem (Old) Mem (New)
tiny 75 MB ~390 MB ~125 MB
base 142 MB ~500 MB ~210 MB
small 466 MB ~1.0 GB ~600 MB
medium 1.5 GB ~2.6 GB ~1.7 GB
large 2.9 GB ~4.7 GB ~3.3 GB

It's a simple idea that instead of creating a new memory buffer for each new tensor in the computation, we reuse the memory of old tensors that are no longer needed. The implementation is in PR #431. It's not very clean - I think there is some better way to do this, but for now it will work.

Additionally, there might be some inference speed improvements on Apple Silicon in the Decoder part of the transformer. I haven't done proper benchmarks, but seems there is about ~30% performance boost. The results are identical to v1.1.1.

What's Changed

Core ggml / whisper

Bindings

Examples

New Contributors

Full Changelog: https://github.com/ggerganov/whisper.cpp/compare/v1.1.1...v1.2.0

Highlights

I'll use these release notes to write some random thoughts about the project - sort of a short blog post.

I'm really happy with how whisper.cpp turned out to be so far. There is a very positive reception in the ML community - most people seem to be excited by the simplicity of the implementation and the fact that it is quite self-contained. I receive a lot of questions about the project and about various ideas that it can be applied to. I really enjoy it and I try to respond to everyone!

I also find it very satisfying that there are so many contributions already happening by so many people. To me this illustrates the power of open-source collaboration. The contributions not only improve the functionality and the quality of the code, but also help to generate various new ideas and approaches to explore.

Another interesting thing is that the project keeps on giving. Every time I start to think that now is a good time to put it in the background for a while and focus on other stuff, some new cool idea pops up and I can't help but start working on it. Having this custom implementation allows me to interact with the model on a lower level which opens some interesting ways to explore it.

So far the development has been focused on improving the performance, expanding the platform coverage and having robust decoding strategies with a variety of examples. During this time, there have been several ideas that accumulated over-time which I find interesting to explore (diarization, token-level timestamps, improved timestamp accuracy, etc). I think I'll try to focus more on these in the future and see if I can achieve something interesting.



  • "The New Yorker" article featuring whisper.cpp

v1.1.1

1 year ago

Overview

Since the v1.1.0 pre-release there have been several reports of improved transcription quality. Together with my observations, I think we can declare version v1.1.1 as "stable".

There were actually a couple of bug-fixes implemented since v1.1.0, so make sure to update to v1.1.1 for optimal results.

Another update is that the prototype for v1.2.0 is almost ready: https://github.com/ggerganov/whisper.cpp/pull/431 Initial results indicate that the memory usage can be reduced by a factor of 2-3 for the smaller models.

You can provide feedback in the existing v1.1.0 discussion.

What's Changed

Core ggml / whisper

  • whisper : perform entropy check only when we have at least 32 tokens 1a91c19af929d6dc614a9f3b03026fb23be002a6
  • whisper : fix condition for providing past prompt (critical) 78f166174f126345ed87cc8f6941af1905c4a0f2

Bindings

Examples

  • main : re-enable temperature fallback f583e2d2f5a60e6ebf5bb2819ba4c4d348d41ea2
  • main : add an option to accept optional output filenames by @garychia in https://github.com/ggerganov/whisper.cpp/pull/424
  • whisper.android : use AssetManager for Android by @Digipom in https://github.com/ggerganov/whisper.cpp/pull/415
  • whisper.wasm : add small and small.en models 206fc93396936725bd362c93796cfdc8a87f8509
  • bench : add memcpy and ggml_mul_mat benchmarks (experimental) 1290fc64572f434f2f36721d2e2b0913cec0178a

New Contributors

Full Changelog: https://github.com/ggerganov/whisper.cpp/compare/v1.1.0...v1.1.1

v1.1.0

1 year ago

Overview

The major change in this pre-release is the improved decoding implementation in whisper.cpp:

  • Support for average logprob and entropy based criteria for fallback
  • Support for temperature T > 0
  • Improved Greedy decoder via best_of parameter for T > 0
  • Add beam search decoding (a.k.a beam_size)

More information about the decoding changes can be found in #291 Additionally, there are a few performance improvements for Apple Silicon, WASM and non-F16C platforms. Support for POWER9 architectures has been added.

The reason that this is a pre-release and not an official release is that the new implementation has not been sufficiently tested yet and the existing bindings for other languages have not been updated to support the API changes. The official release 1.1.x will be created when there is enough feedback about the new decoding implementation and when the bindings have been updated. So make sure to send your feedback in the discussion created for this pre-release. For now, the 1.0.4 release should be considered more stable.

What's Changed

Core ggml / whisper

  • ggml : POWER9 support by @fitzsim in #320, #349, #369
  • ggml : simplify the SIMD code by @ggerganov in #324
  • ggml : add SSE3 and fp16 conversion lookup table by @abitofevrything in #368
  • ggml : utilise Accelerate's vDSP for some computations d51fc3ee0a0038cdf1522ca3d58b58299de41eb8
  • ggml : speed-up softmax compute via Accelerate and loop unrolling d61d55cd4b9fe77511c8eea28d0220ce552f7008
  • ggml : do not start extra threads when using BLAS d347a59a5f224f6a5ab0084ec95715451972d3b0
  • whisper : do sample_to_timestamp calculation with 64 bit precision to avoid overflow by @boolemancer in #388
  • whisper : various code clean-up and improvements by @asmaloney in #317 #318 #319 #322 etc
  • whisper : improve decoding by @ggerganov in #291
  • whisper : account for speed_up flag for short audio #405

C-style API

  • Add loader class to allow loading from buffer and others by @prsyahmi in https://github.com/ggerganov/whisper.cpp/pull/353
  • Add whisper_token_data::plog
  • Add whisper_init_from_file()
  • Add whisper_init_from_buffer()
  • Change whisper_init()
  • Remove whisper_sample_best()
  • Remove whisper_sample_timestamp()
  • Add whisper_n_audio_ctx()
  • Add whisper_get_logits()
  • Remove whisper_get_probs()
  • Change struct whisper_full_params

Bindings

  • Golang bindings by @djthorpe in #287, #379, #384

Examples

  • whisper.android : remove android ABI constraint by @Digipom in #301
  • whisper.swiftui : SwiftUI example by @Digipom in #308
  • main : add -ocsv, aka --output-csv for writing CSV file containing millisecond timestamps by @NielsMayer in #340
  • command : refactor to split command list & general transcription modes by @asmaloney in #331
  • command : always-prompt mode by @dnhkng in #383
  • stream : fix data race on bool + avoid division-by-zero a466c3404dc62dc221061bb37fb8f78741d749b8
  • stream : fix a bug that inserted a lot of empty audio at the start a6dbd9188b13378dc36e2c669b9a22e17b4201d1
  • bench.wasm : print system info fafd78945d5a7ea11ffa31fa6c05dd6593b7d031

New Contributors

Full Changelog: https://github.com/ggerganov/whisper.cpp/compare/v1.0.4...v1.1.0

Highlights

image

1.0.4

1 year ago

What's Changed

Core ggml / whisper

  • Make ggml compatible with c99 9955fa4ed7cc694d5d47fe0bb5f0d02066f9cbac | 0f117594066a213cc3cc9261c8906f316e6fb153
  • Fix UB causing asserts in Debug when reading the model vocabulary 124c718c73f915f3e4235ae2af8841356e76177d
  • Minor improvements in the Greedy decoding strategy 6a7c82501e3794724ba80bfb9a983810af036803
  • Add Windows build without OpenBLAS by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/282
  • Add whisper_tokenize() - basic text tokenization bf69b669a00e457b6bfa69b97f1fdf2578d3e403
  • Language auto-detect option by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/286
  • Add AVX,AVX2 support for ggml_vec_scale_f32 by @katsu560 in https://github.com/ggerganov/whisper.cpp/pull/285
  • Implement extra cases for ggml_compute_forward_dup_f16() a7047b2a28a8eccb94318eca8a3207894d3822c7
  • Added Roadmap and updated F.A.Q. discussion #126

C-style API

  • Add whisper_tokenize()
  • Add whisper_lang_max_id()
  • Add whisper_lang_str()
  • Add whisper_lang_auto_detect()
  • Add whisper_token_lang()

Examples

  • Improve prompting in "talk" example a613f16aec81b7715cdbd4386ba62ab2ff1216b3
  • Add "sliding window" mode to "stream" example b0f8013eb9f371b500abf1e3c506399ce7f59b11
  • Add Android sample by @Digipom in https://github.com/ggerganov/whisper.cpp/pull/277
  • Guided mode for the "command" example by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/271
  • Example "main" supports --prompt option b8065d90f5fdcdb445a8fb3f4717cba54c332cac
  • Example "main" supports --print-progress option 32fbc8cd04912904cf84af7c5bd0e0e711a6f021
  • Example "main" supports --lang auto option fba10a4c68f0533a339174ef81c6a18ea228d331

New Contributors

Full Changelog: https://github.com/ggerganov/whisper.cpp/compare/1.0.3...1.0.4

Highlights

image image

v1.0.4

1 year ago

What's Changed

Core ggml / whisper

  • Make ggml compatible with c99 9955fa4ed7cc694d5d47fe0bb5f0d02066f9cbac | 0f117594066a213cc3cc9261c8906f316e6fb153
  • Fix UB causing asserts in Debug when reading the model vocabulary 124c718c73f915f3e4235ae2af8841356e76177d
  • Minor improvements in the Greedy decoding strategy 6a7c82501e3794724ba80bfb9a983810af036803
  • Add Windows build without OpenBLAS by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/282
  • Add whisper_tokenize() - basic text tokenization bf69b669a00e457b6bfa69b97f1fdf2578d3e403
  • Language auto-detect option by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/286
  • Add AVX,AVX2 support for ggml_vec_scale_f32 by @katsu560 in https://github.com/ggerganov/whisper.cpp/pull/285
  • Implement extra cases for ggml_compute_forward_dup_f16() a7047b2a28a8eccb94318eca8a3207894d3822c7
  • Added Roadmap and updated F.A.Q. discussion #126

C-style API

  • Add whisper_tokenize()
  • Add whisper_lang_max_id()
  • Add whisper_lang_str()
  • Add whisper_lang_auto_detect()
  • Add whisper_token_lang()

Examples

  • Improve prompting in "talk" example a613f16aec81b7715cdbd4386ba62ab2ff1216b3
  • Add "sliding window" mode to "stream" example b0f8013eb9f371b500abf1e3c506399ce7f59b11
  • Add Android sample by @Digipom in https://github.com/ggerganov/whisper.cpp/pull/277
  • Guided mode for the "command" example by @ggerganov in https://github.com/ggerganov/whisper.cpp/pull/271
  • Example "main" supports --prompt option b8065d90f5fdcdb445a8fb3f4717cba54c332cac
  • Example "main" supports --print-progress option 32fbc8cd04912904cf84af7c5bd0e0e711a6f021
  • Example "main" supports --lang auto option fba10a4c68f0533a339174ef81c6a18ea228d331

New Contributors

Full Changelog: https://github.com/ggerganov/whisper.cpp/compare/1.0.3...1.0.4

Highlights

image image