download youtube subtitles(closed caption, cc) as txt or json, support translation and proxy. available on PIP 🐍 . try it online at google colab!
3.0.0 fix download error and it finally supports download entire playlist! see Download the caption of entire playlist
try it online with google's free python runtime! protip: you are able to download the output file from the sidebar! FREE from installation on your machine!
https://colab.research.google.com/drive/1oseD2yEsScx0YYOZ1x1F8GSG9iJ4x3qi?usp=sharing
Due to changes of youtube api, you need to UPGRADE to 3.0.0, see Install and Run
Download youtube subtitles(closed caption, cc) or srt as txt or json.
--caption_num
--caption_num_second
to choose the caption which will be displayed as original or translation transcript.
python version of algolia/youtube-captions-scraper: Fetch youtube user submitted or fallback to auto-generated captions
dl-youtube-cc https://www.youtube.com/watch?v=wgNiGj1nGYE --translation 'ru'
or
dl-youtube-cc wgNiGj1nGYE --translation 'ru'
will saved as Version1.5SpecialProgramGenshinImpact.txt
video_link https://www.youtube.com/watch?v=wgNiGj1nGYE
original code="zh-Hans" name="Chinese (Simplified)"
translation ru
---------00:00----------
从前,有一对双胞胎结伴在宇宙中旅行
Давным-давно, два близнеца вместе путешествовали по Вселенной.
---------00:05----------
但有一天,他们前路遇阻
Однажды путь им преградило неизвестное божество
dl-youtube-cc wgNiGj1nGYE --translation ru --to_json=True
will saved as Version1.5SpecialProgramGenshinImpact.json
{
"original": [
{
"start": "0",
"dur": "5056",
"text": "从前,有一对双胞胎结伴在宇宙中旅行"
},
// continue
],
"translation": [
{
"start": "0",
"dur": "5056",
"text": "Давным-давно, два близнеца вместе путешествовали по Вселенной."
},
// continue
],
"merged": [
{
"start": "0",
"dur": "5056",
"text": "从前,有一对双胞胎结伴在宇宙中旅行",
"translate_text": "Давным-давно, два близнеца вместе путешествовали по Вселенной."
},
// continue
]
All available caption will be displayed, use --caption_num
--caption_num_second
to choose the caption which will be displayed as original or translation transcript.
>> dl-youtube-cc "wgNiGj1nGYE" --caption_num=0 --caption_num_second=3 --output_file="0,3-zh,fr.txt"
INFO: available caption(s):
INFO: #0 ✔ as original code="zh-Hans" name="Chinese (Simplified)"
INFO: #1 ⭕ code="zh-Hant" name="Chinese (Traditional)"
INFO: #2 ⭕ code="en-US" name="English (United States)"
INFO: #3 ✔ as translation code="fr" name="French"
INFO: #4 ⭕ code="de" name="German"
INFO: #5 ⭕ code="id" name="Indonesian"
INFO: #6 ⭕ code="pt" name="Portuguese"
INFO: #7 ⭕ code="ru" name="Russian"
INFO: #8 ⭕ code="es" name="Spanish"
INFO: #9 ⭕ code="th" name="Thai"
INFO: #10 ⭕ code="vi" name="Vietnamese"
INFO: given by --caption_num default to 0 as original
INFO: Save to 0,3-zh,fr.txt
pip install download-youtube-subtitle
or pip install download-youtube-subtitle --user
dl-youtube-cc -h
or uninstall to reinstall new version
pip uninstall download-youtube-subtitle -y
dl-youtube-cc -h
will show the following.
NAME
dl-youtube-cc - download youtube closed caption(subtitles) by videoID
SYNOPSIS
dl-youtube-cc VIDEOID <flags>
DESCRIPTION
Examples:
dl-youtube-cc -h # to see this helpful infomation
dl-youtube-cc wgNiGj1nGYE --translation 'ru' # use russian translation, see ./lang_code for full list
dl-youtube-cc wgNiGj1nGYE --caption_num=1 --translation 'ru' # choose the caption num for original transcript and use russian translation,
dl-youtube-cc wgNiGj1nGYE --caption_num=1 --caption_num_second=2 # manually choose the original and translation transcript from available caption list
dl-youtube-cc wgNiGj1nGYE --translation False # without translation
dl-youtube-cc wgNiGj1nGYE --save_to_file=False # print stuff in console
dl-youtube-cc wgNiGj1nGYE --output_file='test.txt' # print stuff in named file
dl-youtube-cc wgNiGj1nGYE --to_json=True # print stuff in json
POSITIONAL ARGUMENTS
VIDEOID
Type: str
the video link or the id of youtube video, the string after 'v=' in a youtube video link
FLAGS
--translation=TRANSLATION
Type: typing.Union[str, bool]
Default: 'zh-Hans'
which will be displayed as original transcript, default to 'zh-Hans' for simplified Chinese, see ./lang_code.json for full list, or pass False to disable translation
--caption_num=CAPTION_NUM
Type: int
Default: 0
choose the caption which will be displayed as original transcript
--caption_num_second=CAPTION_NUM_SECOND
Type: Optional[int]
Default: None
will surpass translation option, choose the caption which will be displayed as translation transcript
--output_file=OUTPUT_FILE
Type: Optional[str]
Default: None
default to video title
--save_to_file=SAVE_TO_FILE
Type: bool
Default: True
pass False to print in console
--to_json=TO_JSON
Type: bool
Default: False
pass True to export caption to json
--remove_font_tag=REMOVE_FONT_TAG
Type: bool
Default: True
remove font tag
dl-youtube-cc-playlist -h
will show the following.
NAME
dl-youtube-cc-playlist - download youtube closed caption(subtitles) by playlist. To figure most of params, please use dl-youtube-cc to download one video first before downloading the entire playlist.
SYNOPSIS
dl-youtube-cc-playlist PLAYLIST_URL <flags>
DESCRIPTION
Examples:
dl-youtube-cc-playlist -h # to see this helpful infomation
dl-youtube-cc-playlist PLS1QulWo1RIaJECMeUT4LFwJ-ghgoSH6n
dl-youtube-cc-playlist PLS1QulWo1RIaJECMeUT4LFwJ-ghgoSH6n 0 3 # download the first 3 videos
dl-youtube-cc-playlist https://www.youtube.com/playlist?list=PLS1QulWo1RIaJECMeUT4LFwJ-ghgoSH6n
POSITIONAL ARGUMENTS
PLAYLIST_URL
Type: str
the playlist link or the id of youtube playlist, the string after 'list=' in the url
FLAGS
--start=START
Default: 0
the index number in the playlist to start downloading, starting from 0
-e, --end=END
Type: Optional[]
Default: None
the index number in the playlist to end downloading, exclusively
--translation=TRANSLATION
Type: Optional[typing.Union[st...
Default: None
which will be displayed as original transcript, default to 'zh-Hans' for simplified Chinese, see ./lang_code.json for full list, or pass False to disable translation
--caption_num=CAPTION_NUM
Type: int
Default: 0
choose the caption which will be displayed as original transcript
--caption_num_second=CAPTION_NUM_SECOND
Type: Optional[int]
Default: None
will surpass translation option, choose the caption which will be displayed as translation transcript
-o, --output_file=OUTPUT_FILE
Type: Optional[str]
Default: None
default to video title
--save_to_file=SAVE_TO_FILE
Type: bool
Default: True
pass False to print in console
--to_json=TO_JSON
Type: bool
Default: False
pass True to export caption to json
-r, --remove_font_tag=REMOVE_FONT_TAG
Type: bool
Default: True
remove font tag
NOTES
You can also use flags syntax for POSITIONAL ARGUMENT
import download_youtube_subtitle.common as common
import download_youtube_subtitle.main as download_youtube_subtitle
# ...
pip install 'fire' 'requests' 'IPython' 'sure' 'pytube' 'progiter'
pip install -e .
python main.py -h
python main.py VIDEOID
cd tests
./run.sh
./test_cli.sh
Packaging Python Projects — Python Packaging User Guide
./nb/notebook2script.py
from course-v3/nbs/dl2 at master · fastai/course-v3