Fast inference engine for Transformer models
This major version introduces the breaking change while updating to cuda 12.
New features
generate_tokens
Generator.async_generate_tokens
to return an asynchronous generator compatible with asyncio
Whisper::align