A SpeechToText application that uses OpenAI's whisper via faster-whisper to transcribe audio and send that information to VRChats textbox system and/or KillFrenzyAvatarText over OSC. Also supports various other methods like OBS via Browsersource and a SteamVR overlay!
A SpeechToText application that uses OpenAI's whisper via faster-whisper to transcribe audio and send that information to VRChats textbox system and/or KillFrenzyAvatarText over OSC. Also supports OBS via Browsersource and a SteamVR overlay!
[!NOTE] This program is designed to be completely free of charge, open source, and independent from Cloud-Based Transcription services such as Microsoft Azure. It accomplishes this by utilizing transcription algorithms that run on your own hardware, thereby upholding privacy, enhancing latency, and ensuring reliability. As a result, I will not be incorporating any cloud-based transcription or translation services into this program.
With default settings, this program has following requirements:
[!NOTE] Depending on settings changed in the program those requirements can change drastically.
VRAM usages per Model: (int8 Precision. English models only)
~200MB with tiny.en
~220MB with base.en
~320MB with distil-small.en
~380MB with small.en
~580MB with distil-medium.en
~900MB with medium.en
~900MB with distil-large-v2
~1.6GB with large-v2
Frosty704 using VRCTextboxSTT and KillFrenzyAvatarText with their Billboard project. More to that on their repository.
There are similar projects that already exist that you might want to consider using
You can always leave a Github Star 🟊 (It's free) or buy me a coffee: