Auto-generated video subtitles for the web using machine learning
This is a proof-of-concept (PoC) that demonstrates two different end-to-end implementations of auto-generated subtitles sourced from an HLS Live Stream.
This PoC was referenced in the blog post "Live Adaptive Video Speech Recognition" on REDspace's blog Well Red.
A server-focused strategy is the preferred strategy that assumes you have direct access to the encoder (or only its output) for augmentation. Audio data is retrieved directly from the encoder output where it is then sent for transcription. Delivery of the transcripts can be performed in a number of ways, the most spec-compliant way would be live WebVTT
segments.
A secondary strategy is a client-focused one. Which can be implemented on any playback source but still has a small backend component in use. Audio data is sent from the client's browser to the backend component where it is then sent for transcription. Once the timed transcription is recieved, it is then translated to WebVTT
cues to allow native rendering capabilities offered by the browser. A very early PoC of this strategy can be found under ./archive
and is not recommended for use.
This PoC is designed to demonstrate live content, but it can be applied to work with VoD as well (either on-the-fly or once)
See Known Issues below for details on current limitations/issues
See Roadmap for future tasks/wishlist items
It's required to have a GCP service account setup https://cloud.google.com/video-intelligence/docs/common/auth
Tested on:
ffmpeg
4.3.1ffmpeg
4.3.1Steps:
ffmpeg
binary in a new directory bin
GOOGLE_APPLICATION_CREDENTIALS
and FFMPEG_PATH
are set in shell scriptsgcp
src/server/encoder/controller.go
src/server/encoder/strategies/x264.go
)npm run start:client
and npm run start:server
ffmpeg
arguments to be provided via external source