Easily take an entire YouTube playlist and turn it into high quality transcripts using Whisper.
This Python-based tool is designed for transcribing YouTube videos and playlists into text. It integrates various technologies like faster-whisper for transcription, SpaCy for natural language processing, and CUDA for GPU acceleration, aimed at processing video content efficiently. The script is capable of handling both individual videos and entire playlists, outputting accurate transcripts along with metadata.
Bulk Transcripts Have Never Been This Easy! |
pytube
to download the audio from YouTube videos or playlists.faster_whisper.WhisperModel
for converting audio to text. This model is a variant of OpenAI's Whisper designed for speed and accuracy.Initialization:
convert_single_video
flag.Environment Configuration:
Video Processing:
Transcription:
Metadata Generation:
Output:
Display/Read:
transcript_reader.html
, which does further clean up and offers a "Reader Mode" where you can choose the font, text size, text width, and toggle dark mode. Simply open this html file in your browser and paste in the transcript text from one of the generated files in the generated_transcript_combined_texts
folder.Screenshot of it in Action |
Paste Transcript Text into the Transcript Reader HTML File | Reader using Dark Mode and Cambria Font |
Environment Setup:
python3 -m venv venv
source venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install wheel
requirements.txt
:
pip install -r requirements.txt
Running the Script:
python3 bulk_transcribe_youtube_videos_from_playlist.py
convert_single_video
flag. This choice dictates which URL (either single_video_url
or playlist_url
) will be used for downloading content.add_to_system_path
function adds new paths to the system environment, ensuring that dependencies like CUDA Toolkit are accessible. For Windows systems, it also handles the case where the new path contains spaces, enclosing it in quotes.get_cuda_toolkit_path
locates the CUDA Toolkit directory, crucial for GPU acceleration. It checks the Anaconda packages directory for the toolkit's installation path.download_audio
asynchronously downloads audio from YouTube videos. It ensures unique naming for each audio file by appending a counter if a file with the same name already exists. This function returns the path to the downloaded audio file and the filename.compute_transcript_with_whisper_from_audio_func
configures the WhisperModel for transcription. It checks CUDA availability and sets the device and compute type accordingly.use_spacy_for_sentence_splitting
flag, the script either uses SpaCy or a custom regex-based method for sentence splitting. This is important for structuring the transcript into readable sentences.clean_filename
sanitizes video titles for use as filenames, removing special characters and replacing spaces with underscores.remove_pagination_breaks
cleans up the transcript text by removing hyphens at line breaks and correcting line break issues, improving readability.normalize_logprobs
normalizes the log probabilities of transcription segments, useful for assessing the model's confidence in its transcription.__main__
block, where it selects the URL to process (single video or playlist) and initiates the process_video_or_playlist
coroutine.process_video_or_playlist
handles the asynchronous downloading and transcription of videos. It creates a semaphore to limit the number of simultaneous downloads based on max_simultaneous_youtube_downloads
.model.transcribe
on a separate thread using asyncio.to_thread
to maintain the asynchronous nature of the script. This function performs the actual audio-to-text transcription.beam_size
of 10 and activates the vad_filter
. The beam_size
parameter affects the trade-off between accuracy and speed during transcription - a higher value can lead to more accurate results but requires more computational resources. The vad_filter
(Voice Activity Detection filter) helps in ignoring non-speech segments in the audio, focusing the transcription process on relevant audio parts.sophisticated_sentence_splitter
.use_spacy_for_sentence_splitting
flag.en_core_web_sm
) using download_spacy_model
. This model is optimized for English language processing, focusing on tasks like tokenization, lemmatization, and sentence boundary detection.sophisticated_sentence_splitter
, when using SpaCy, processes the transcript text to extract sentences. This process involves removing pagination breaks, tokenizing the text into sentences using SpaCy's model, and trimming whitespaces.asyncio.Semaphore
to control the number of simultaneous downloads, ensuring that the system resources are not overwhelmed.This tool represents a comprehensive solution for transcribing YouTube videos and playlists. By leveraging state-of-the-art technologies in machine learning, natural language processing, and asynchronous programming, it offers an efficient and reliable way to convert audio content into structured text data. Whether for accessibility, content analysis, educational purposes, or archival, this script provides a robust framework to meet a wide range of transcription needs.
We welcome contributions to this project! Whether you're interested in fixing bugs, adding new features, or improving documentation, your help is greatly appreciated. To contribute:
Please adhere to the existing coding style and add unit tests for any new functionality. If you have any questions or need assistance, feel free to open an issue in the repository.
MIT License
Copyright (c) 2023 by Jeffrey Emanuel
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
See my other open-source projects at https://github.com/Dicklesworthstone, including:
Take Your YouTube Transcript Addiction to the Next Level! |