Swift native on-device speech recognition with Whisper for Apple Silicon
Early stopping now keeps track of the chunked window internally when running async transcription via the VAD chunking method. This will give further control for stopping specific windows based on your custom criteria in the TranscriptionCallback
.
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.7.1...v0.7.2
Hotifx for shouldEarlyStop
logic
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.7.0...v0.7.1
This is a very exciting release because we're seeing yet another massive speedup in offline throughput thanks to VAD based chunking 🚀
chunkingStrategy
which can significantly speed up your single file transcriptions with minimal WER downsides..none
chunking strategy with .vad
https://github.com/argmaxinc/WhisperKit/assets/1981179/0f865caa-3a08-412e-a0bf-080ec16a439a
detectLanguage
with just an audio path as input from the main whisperKit object. This will return a simple language code and probability back as a tuple, and has minimal logging/timing.let whisperKit = try await WhisperKit()
let (language, probs) = try await whisperKit.detectLanguage(audioPath: "your/audio/path/spanish.wav")
print(language) // "es"
@_disfavoredOverload
for deprecated methods by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/143
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.6.1...v0.7.0
Smaller patch release with some nice improvements and two new contributors 🙌
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.6.0...v0.6.1
audioPaths
input: let audioPaths = [
"/path/to/file1.wav",
"/path/to/file2.wav"
]
let whisperKit = try await WhisperKit()
let transcriptionResults: [[TranscriptionResult]?] = await whisperKit.transcribe(audioPaths: audioPaths)
--audio-folder "path/to/folder/"
We aim to minimize breaking changes, so with this update we added a few deprecation flags for changed interfaces, which will be removed later but for now are still usable and will not throw build errors. There are some breaking changes for lower level and newer methods so if you do notice build errors click the dropdown below to see the full guide.
WhisperKit
Deprecated
public func transcribe(
audioPath: String,
decodeOptions: DecodingOptions? = nil,
callback: TranscriptionCallback = nil
) async throws -> TranscriptionResult?
use instead
public func transcribe(
audioPath: String,
decodeOptions: DecodingOptions? = nil,
callback: TranscriptionCallback = nil
) async throws -> [TranscriptionResult]
Deprecated
public func transcribe(
audioArray: [Float],
decodeOptions: DecodingOptions? = nil,
callback: TranscriptionCallback = nil
) async throws -> TranscriptionResult?
use instead
public func transcribe(
audioArray: [Float],
decodeOptions: DecodingOptions? = nil,
callback: TranscriptionCallback = nil
) async throws -> [TranscriptionResult]
TextDecoding
Deprecated
func decodeText(
from encoderOutput: MLMultiArray,
using decoderInputs: DecodingInputs,
sampler tokenSampler: TokenSampling,
options decoderOptions: DecodingOptions,
callback: ((TranscriptionProgress) -> Bool?)?
) async throws -> [DecodingResult]
use instead
func decodeText(
from encoderOutput: MLMultiArray,
using decoderInputs: DecodingInputs,
sampler tokenSampler: TokenSampling,
options decoderOptions: DecodingOptions,
callback: ((TranscriptionProgress) -> Bool?)?
) async throws -> DecodingResult
Deprecated
func detectLanguage(
from encoderOutput: MLMultiArray,
using decoderInputs: DecodingInputs,
sampler tokenSampler: TokenSampling,
options: DecodingOptions,
temperature: FloatType
) async throws -> [DecodingResult]
use instead
func detectLanguage(
from encoderOutput: MLMultiArray,
using decoderInputs: DecodingInputs,
sampler tokenSampler: TokenSampling,
options: DecodingOptions,
temperature: FloatType
) async throws -> DecodingResult
Transcriber
protocolAudioProcessing
static func loadAudio(fromPath audioFilePath: String) -> AVAudioPCMBuffer?
becomes
static func loadAudio(fromPath audioFilePath: String) throws -> AVAudioPCMBuffer
AudioStreamTranscriber
public init(
audioProcessor: any AudioProcessing,
transcriber: any Transcriber,
decodingOptions: DecodingOptions,
requiredSegmentsForConfirmation: Int = 2,
silenceThreshold: Float = 0.3,
compressionCheckWindow: Int = 20,
useVAD: Bool = true,
stateChangeCallback: AudioStreamTranscriberCallback?
)
becomes
public init(
audioEncoder: any AudioEncoding,
featureExtractor: any FeatureExtracting,
segmentSeeker: any SegmentSeeking,
textDecoder: any TextDecoding,
tokenizer: any WhisperTokenizer,
audioProcessor: any AudioProcessing,
decodingOptions: DecodingOptions,
requiredSegmentsForConfirmation: Int = 2,
silenceThreshold: Float = 0.3,
compressionCheckWindow: Int = 20,
useVAD: Bool = true,
stateChangeCallback: AudioStreamTranscriberCallback?
)
TextDecoding
func prepareDecoderInputs(withPrompt initialPrompt: [Int]) -> DecodingInputs?
becomes
func prepareDecoderInputs(withPrompt initialPrompt: [Int]) throws -> DecodingInputs
microphoneUnavailable
error by @hewigovens in https://github.com/argmaxinc/WhisperKit/pull/113
--language
values by @jkrukowski in https://github.com/argmaxinc/WhisperKit/pull/116
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.5.0...v0.6.0
This is a HUGE release with some great new features and fixes 🙌
withoutTimestamps: true
TextDecoding
protocol which runs a single forward pass and reads the language logits to find the most likely language for the input audiousePrefilPrompt: false
and the language: nil
and it is not an English only model.wordTimestamps: true
swift run whisperkit-cli transcribe --model-prefix "distil" --model "large-v3_turbo_600MB" --verbose --audio-path ~/your_audio.wav
We added an experimental new mode for streaming in WhisperAX called "Eager streaming mode". We're still refining this feature but we think it can soon be a great way to do real-time transcription with Whisper. Give it a try in Testflight or take a look a the code and let us know how it can be improved.
Recommended settings for the best performance for this iteration are:
Looking for feedback on:
https://github.com/argmaxinc/WhisperKit/assets/1981179/0a88ca34-3a0e-4ff5-9829-9f980a4661ea
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.4.1...v0.5.0
v0.4.0 was our first release on Homebrew, and this will be our first automated update to the formula, huge props to @jkrukowski for his contributions on this.
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.4.0...v0.4.1
Lots of nice fixes in this release!
We had to rename the CLI entry point in preparation for homebrew distribution, here is how to use it now:
Old:
swift run transcribe --audio-path path/to/your/audio.mp3
New:
swift run whisperkit-cli transcribe --audio-path path/to/your/audio.mp3
Progress
to WhisperKit
by @finnvoor in https://github.com/argmaxinc/WhisperKit/pull/71
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.3.3...v0.4.0
Some great contributions in this patch:
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.3.2...v0.3.3
With these our build warnings are now down to 0 🎉
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.3.1...v0.3.2