Yt Videos List Versions Save

Create and **automatically** update a list of all videos on a YouTube channel (in txt/csv/md form) via YouTube bot with end-to-end web scraping - no API tokens required. Multi-threaded support for YouTube videos list updates.

v0.6.7

7 months ago
  • BUGFIX

    • fix pip installation problem due to incorrectly formatted version specifiers
    • update video duration extraction to correctly extract the duration of each video and avoid writing 'N/A'
  • FEATURE IMPROVEMENTS

    • improve identification of seen videos in csv files by
      • avoiding potentially brittle regular expression matching
      • parsing each row of the csv file and extracting the (Video ID|Video URL) value from the corresponding column directly
    • normalize whitespace to avoid including newlines, carriage returns, and multiple consecutive whitespace characters in the video title
    • improve logging messages by including time.time() and time.perf_counter() when logging the time taken to perform an operation
  • PERFORMANCE IMPROVEMENTS

    • increase write efficiency by completely avoiding writing to a temporary file when no new videos found for an existing file
  • INTERNAL IMPROVEMENT

    • the following change does not affect the functionality of the program
      • add unit tests for the video title whitespace normalization

v0.6.6

1 year ago

v0.6.5

1 year ago
  • BINARY UPDATES
    • Mozilla Firefox
      • geckodriver v0.32.0 (Firefox versions ≥ 104)
      • geckodriver v0.31.0 (Firefox versions ≥ 99)
    • Opera Stable 82, 83, 84, 85, 88, 89, 90, 91, 92 & 93
      • operadriver v.107.0.5304.88 (Opera Stable 93)
      • operadriver v.106.0.5249.119 (Opera Stable 92)
      • operadriver v.105.0.5195.102 (Opera Stable 91)
      • operadriver v.104.0.5112.81 (Opera Stable 90)
      • operadriver v.103.0.5060.66 (Opera Stable 89)
      • operadriver v.102.0.5005.61 (Opera Stable 88)
      • there was no operadriver release specifically for version 101 (Opera Stable 87)
      • there was no operadriver release specifically for version 100 (Opera Stable 86)
      • operadriver v.99.0.4844.51 (Opera Stable 85)
      • operadriver v.98.0.4758.82 (Opera Stable 84)
      • operadriver v.97.0.4692.71 (Opera Stable 83)
      • operadriver v.96.0.4664.45 (Opera Stable 82)
    • Google Chrome version 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, & 108 (updated version 97 binaries)
      • chromedriver 108.0.5359.22
      • chromedriver 107.0.5304.62
      • chromedriver 106.0.5249.61
      • chromedriver 105.0.5195.52
      • chromedriver 104.0.5112.79
      • chromedriver 103.0.5060.134
      • chromedriver 102.0.5005.61
      • chromedriver 101.0.4951.41
      • chromedriver 100.0.4896.60
      • chromedriver 99.0.4844.51
      • chromedriver 98.0.4758.102
      • chromedriver 97.0.4692.71 (previously 97.0.4692.20)
    • Brave Browser version 96, 97, 98, 99, 102, 103, 104, 105, 106, & 107
      • bravedriver v.107.0.5304.88 (uses operadriver binaries)
      • bravedriver v.106.0.5249.119 (uses operadriver binaries)
      • bravedriver v.105.0.5195.102 (uses operadriver binaries)
      • bravedriver v.104.0.5112.81 (uses operadriver binaries)
      • bravedriver v.103.0.5060.66 (uses operadriver binaries)
      • bravedriver v.102.0.5005.61 (uses operadriver binaries)
      • there was no operadriver release specifically for version 101
      • there was no operadriver release specifically for version 100
      • bravedriver v.99.0.4844.51 (uses operadriver binaries)
      • bravedriver v.98.0.4758.82 (uses operadriver binaries)
      • bravedriver v.97.0.4692.71 (uses operadriver binaries)
      • bravedriver v.96.0.4664.45 (uses operadriver binaries)
    • Microsoft Edge version 100, 101, 102, 103, 104, 105, 106, 107, 108, & 109 (updated version 96, 97, & 98 binaries)
      • msedgedriver 109.0.1481.0
      • msedgedriver 108.0.1462.15
      • msedgedriver 107.0.1418.42
      • msedgedriver 106.0.1370.52
      • msedgedriver 105.0.1343.53
      • msedgedriver 104.0.1293.91
      • msedgedriver 103.0.1264.77
      • msedgedriver 102.0.1245.62
      • msedgedriver 101.0.1210.53
      • msedgedriver 100.0.1185.60
      • there was no msedgedriver release specifically for version 99
      • msedgedriver 98.0.1085.0 (previously 98.0.1086.0)
      • msedgedriver 97.0.1072.76 (previously 97.0.1072.8)
      • msedgedriver 96.0.1054.75 (previously 96.0.1054.26)
  • MINOR BUGFIXES
    • Update URL for Quanta Magazine channel (commit 06fa9d8fd0ae52022912aa93cb313363e248ff6e)
    • Update time duration for video 130 in test reference files (commit b8641f75288192152e22f010fbf79281c523c4fb)
    • Use call command to properly run helper batch script (commit d519edfc83f2a06eb5ca507a7cd2f485ffc68b63)
    • Make browser version detection more robust
      • commit 4fbaa6f8794b3be89a45258fa6593ca10cc06d4f
      • commit 9b8ba0177271ac5243fc57898b6498fecc44abba
      • commit 90c532208ffbea1df33d8523de4d368ac205fbb0
      • commit c297e21dbb234ab13756e7fc4bc503448da21195
      • commit 02584be29e185b213a9f26ad7fdb0545a1f1ec65
  • INTERNAL IMPROVEMENTS
    • Update save_thread_result package dependency version number → 0.0.9 (commit b5a9f14b51fd5219fdcc2eebfa38ce32fd20a640)
    • Support browser versions up to 120 (commit a155c05d3d6189efe0551590c155badb68b24994)

v0.6.4

1 year ago
  • BUGFIXES

    update XPath for blocking cookies button
    • commit 62464aa9cf803c8f8b45bbffcb748f691595bf1c
    make url a required positional argument
    • commit 93029fababed9506677505e9485197c9f4ce9498
  • FEATURE IMPROVEMENTS

    raise error instead of printing message and then sys.exit()ing
    • see commits with a commit message starting with "Raise"
    • also, see commit d43ef6a7b0721e2b8c1660fd2a2255ac9c8dbd71
    use explicit exception chaining
    • see the following commits for more details:
      • commit e6057bba68bee85db57c66209a49fdc4e6d7c6fa
      • commit 784ef61b41269acde38b8041d937ab1386c244b7
      • commit 57c6a99bf7ce4e7a1881b20a07cbd59005f5b838
    show warning for users on unsupported operating systems
    • see the following commits for more details:
      • commit 67f519602e6d05913d5554138ad5a14ce639540b
      • commit cdcb5f48acb87e5fd7b1271d7e62f666013e1821
      • commit 01314c63f043af617bbbf07d6ac073b4a88fd0c3
      • commit c146baf39efa3866723a25573e02e29dd3e2296a
      • commit a5fddafca47e791a76b8e75147cff2e54c5e2385
      • commit 77ffc95237fb37676c2b4a2b2042c8df4f9107c9
      • commit 136dba035c43b3bb0427c04c4894406c2899ee8f
      • commit 912e54a6d1ba9a1e7594078d63d42d530daf91d0
      • commit c94a5a6594a03bbbdbb58d20a4a6bc7e9ee1aa23
    include real time taken by program
    • see commits with a commit message
      • starting with "Include real time"
      • including log_time_taken
  • PERFORMANCE IMPROVEMENTS

    optimize multithreading for create_list_from function
    • see commit 67d94a0886fbcd1a200694926dd40ca3a466cb14 for more details
      • NOTE: create_thread_from mentioned in this commit message was a typo and should be create_list_from
    • see the following commits for related changes:
      • commit 22a77d1bae6f3d9cc97ba463506b08fc9bdc7a3b
      • commit 41d6a962d7f7342fd885f93645d5f25eec0441cf
      • commit 67d94a0886fbcd1a200694926dd40ca3a466cb14
      • commit b3a902d2ff8b6bdd823449e9b8f8a3ee794dfefb
      • commit 4a07eff4b33c6d661442b5025f3ccff54d9807d7
      • commit 53cfd1a1b73578497e0d680194d5280117262d34
      • commit e46ebf84309d390d9b708b04559646a0d11dcc35
      • commit 5062cef05086e7d6cde3147364c4153e3e0ec073
      • commit b56720a2cfd9a28bc8ab77e8397e31861391607a
      • commit ff0806bfbe6f178854b1456160a0a9833b019b1e
      • commit 6aa54715ab85a1d79e5bdad95fd46b73568cfd31
      • commit d6c6e1a5c17b59f782eece2190c0a5e072001d51
      • commit cb4485be5c13aa18d2b475fda68a78ccf255ae47
      • commit 9f6049e4dbb7b49ead7da44e8f97986e94e437f5
      • commit 99380adb738c41fcf8767a4f43433e64a49f7666
      • commit c7f4ab86912f56672c025cf952141da491608d37
      • commit e182962a46aa94ae2018197b36343a86e1f8146a
      • commit ee1094a25e5741cb0ec29c78dba2b9a5984ce738
      • commit 5c5ef4ed6f79370ac180960371de6dfce1cd66d3
  • INTERNAL IMPROVEMENTS

    these changes do not affect the functionality of the program
    • interesting changes
      • commit 96398e7f22fad73c9117acf3f22885eba2c73e35
      • commit 1211f1a4216e35fe2b3860ffc94469f8acd9f15b
      • commit 34cc678578acbc1bdfe090ad5bff936d913c0dd7
      • commit 7f4d28a0fae0a10cb6ea6334a5161312f54172c0
      • commit b84795ca237728f35dfd297623ddb3d9ff7bf5d8
      • commit 3bdd29a4b40e39e040f73bb28dba913f9e0b6fa3
    • displayed debugging information changes
      • commit cea29fee822e0cbbf4aa77a38b1d788dc3e59b5b
      • commit f838764d1a862c4d29ab57451f8d0b3bfb7bb075
      • commit 3da2ce66c9a200bd3c2064cf19d15c463c5fa6bf
    • testing/building changes
      • commit 155c06e1633276e6039316a528a3f514e4ed6d04
      • commit b5177051d6287bdca8e30bc88ef5c93dc21040e2
        • bug from commit 773fcb613d2f93946419143a051413a15dc6224c
      • commit 1f80d6b009965bc517bb8cab7d77f9558c8376bc
      • commit 3cf637f37555ab65e902734b17f5a5febc1f5ec2
      • commit 23cec086011c150c6c119fa3a1385a48c3e4ee3b
      • commit 512e7a7d024c02f5fc21160fbaab745f26990d5b
      • commit 90eecb3be4dd1658efe249bd10240cd99f0737ad
      • commit 1c14e1dc2bf68d0a2d546cb56058c4f4287bd8eb
      • commit 9d32a8fb2b810da97aded35dcb7492c2488f627c
      • commit 8e54376d43ef3b2e7b13f8b0eb22b676c609f3f9
      • commit 722eeb76e2bd406dac33b15a0026940babed8ccf
      • commit 02f92d87c979b99bb9fcbbaa11be8e03fb65c56e
      • commit e9c6fb3bf369cae6317fe61adc8daddd65e39a5b
    • refactoring changes
      • rename variables to be more descriptive
      • rename functions to be more descriptive
      • reorganize code for readability
      • remove unneccesary intermediate variables
      • add intermediate variables for clarity
    • documentation changes
      • improve error messages
      • improve README
      • improve docstrings

v0.6.3

2 years ago
  • BINARY UPDATES
  • Mozilla Firefox
    • geckodriver v0.30.0 (Firefox versions ≥ 92)
  • Opera Stable 77, 78, 79, 80, & 81
    • operadriver v.95.0.4638.54 (Opera Stable 81)
    • operadriver v.94.0.4606.61 (Opera Stable 80)
    • operadriver v.93.0.4577.63 (Opera Stable 79)
    • operadriver v.92.0.4515.107 (Opera Stable 78)
    • operadriver v.91.0.4472.77 (Opera Stable 77)
  • Google Chrome version 92, 93, 94, 95, 6, & 97 (updated version 91 binaries)
    • chromedriver 97.0.4692.20
    • chromedriver 96.0.4664.45
    • chromedriver 95.0.4638.69
    • chromedriver 94.0.4606.113
    • chromedriver 93.0.4577.63
    • chromedriver 92.0.4515.107
    • chromedriver 91.0.4472.101 (previously 91.0.4472.19)
  • Brave Browser version 91, 92, 93, 94, & 95
    • operadriver v.95.0.4638.54 (uses operadriver binaries)
    • operadriver v.94.0.4606.61 (uses operadriver binaries)
    • operadriver v.93.0.4577.63 (uses operadriver binaries)
    • operadriver v.92.0.4515.107 (uses operadriver binaries)
    • operadriver v.91.0.4472.77 (uses operadriver binaries)
  • Microsoft Edge version 93, 94, 95, 96, 97, & 98 (updated version 90, 91, & 92 binaries)
    • msedgedriver 98.0.1086.0
    • msedgedriver 97.0.1072.8
    • msedgedriver 96.0.1054.26
    • msedgedriver 95.0.1020.53
    • msedgedriver 94.0.992.58
    • msedgedriver 93.0.961.52
    • msedgedriver 92.0.902.84 (previously 92.0.881.0)
    • msedgedriver 91.0.864.71 (previously 91.0.864.19)
    • msedgedriver 90.0.818.66 (previously 90.0.818.56)
  • MINOR BUGFIXES
    • handle videos with no "Video Duration" field (commit 2f538e14e574f9435c13a544fb3018d347890f4a)
      • this is an extremely rare edge case
        • based on anecdotal data, occurs about 1 in every 70000 videos
    • update URLs shown in exception messages (commit 3f0961283836713cbc65592c6171693dcc808705 & commit 99ed682abec835304664e40e18d220fa0d4204b9)
    • correctly handle unfinished threads in create_list_from() method (commit aa4ff3de648f84891a96a5e7a33a8efb00fc0b19)
    • generalize URL normalization for removing trailing parameters (commit 0789a3e66b9965f4d29259c23860488714d81a3a)
      • this removes any trailing tracking parameters that might be associated with a video URL
        • e.g. youtube.com/watch?v=abcdefghijk?pp=sAQB → youtube.com/watch?v=abcdefghijk
    • verify page has videos (commit 82a48563a1a6bcdcac944d51516dcec079d0691e)
      • prevents crashing on channels with 0 public videos
  • LOGGING IMPROVEMENTS
    • include total number of videos in each file after each run
      • commit 83827694160542cba28e1e7f7b24e6c5526ece1d
      • commit 3282089af25b2188206b0457475d55e21fcd0689
      • commit 94f5acb7e2b5ac2d0d41311aa1e9847665ff401e
      • commit c9f2671ae6d827c3b9d35013016610bda799590e
  • INTERNAL CHANGES
    • refactor code to:
      • reduce code duplication
      • make variable and function names more context specific
      • place repeated code inside variables
      • make browser naming more specific (commit 81144cb88f5ac8c99a9fe27198b6cebf9bafdde2)

v0.6.2

2 years ago
  • see the following commits for more details:
    • commit e21a79ccc27c7cdc8741ade01162012e1cc718a8
    • commit 3e6262d3166c41350f300cbfca423197d689c627
    • commit 0c817f228041e91a56ff0345bc157752125f5edf
    • commit 5453b14719eb241e08f01d0c1fe5af96f1e4b6ef

v0.6.1

2 years ago
  • BREAKING CHANGE

    • BEFORE:
      • create_list_for() returned a str containing the name of the file the program wrote to
    • NOW:
      • create_list_for() returns a tuple containing
        • a list of lists containing the video information found by the program for the current run
          • by default, returns dummy video data to avoid cluttering the output
          • to return the actual video data, set the video_data_returned ListCreator attribute to True
            • dummy data: [[0, '', '', '']]
        • a tuple containing a str with the name of the channel (taken from the channel's heading) and a str with the name of the file written to
          • ('The Channel Name', 'the_name_of_the_file')
          • ('The Channel Name', '') if the ListCreator attributes are txt=False, csv=False, md=False, AND video_data_returned=True
      • see the NEW FEATURES section below for more details about video_data_returned
    • access the full documentation for the updated create_list_for method with help(ListCreator.create_list_for) in the python interpreter
  • BUGFIX

    • fixes cookie_consent blocking logic for new HTML in GDPR regions
      • YouTube updated the HTML formatting for blocking cookie consent, and the previous cookie consent blocking logic broke
      • this release fixes the blocking logic to work with the new HTML formatting
  • NEW FEATURES

    • overview for the new ListCreator attributes given here, but run help(ListCreator) in the python interpreter or read the "More API information" section in the python README to see the full documentation:
      • file_suffix allows more control over the file naming (True by default)
      • all_video_data_in_memory scrapes the ENTIRE YouTube channel's videos page, EVEN if files exist for the channel already (False by default)
        • must also set the video_data_returned attribute to True to actually get this information
      • video_data_returned returns the video data for all videos the program scraped (False by default)
        • data returned depends on a number of factors, see full documentation for more details
      • video_id_only saves only the video ID instead of the entire URL (False by default)
        • example: saves 'abcdefghijk' instead of 'https://www.youtube.com/watch?v=abcdefghijk'
    • overview for the updated file_name argument options in the create_list_for method given here, but run help(ListCreator.create_list_for) in the python interpreter to see the full documentation:
      • file_name='auto' names the output file(s) using the name that shows up under the banner when you navigate to the channel's homepage (with spaces removed)
      • file_name='id' names the output file(s) using the identifier from the URL provided to the url argument
        • run help(ListCreator.create_list_for) for a comprehensive list of examples
        • using file_name='id' is very useful when multiple channels have the SAME channel name
  • PERFORMANCE IMPROVEMENTS

    • BEFORE:
      • the program pulled the video data from the selenium instance and wrote to the file(s) directly
    • NOW:
      • the program loads the video data from the selenium instance into memory, THEN writes the saved video data from memory to the file(s)
        • the performance improvement is more noticeable when writing more information
          • for example:
            • writing information for 200 videos to just a csv file: negligible performance difference between writing to csv file directly and loading to memory & THEN writing to csv file
            • writing information for 200 videos to csv, txt, md files: slight performance difference between writing to files directly and loading to memory & THEN writing to files, but still not much of a performance difference
            • writing information for 20000 videos to just a csv file: noticeable performance difference between writing to csv file directly and loading to memory & THEN writing to csv file
            • writing information for 20000 videos to csv, txt, md files: significant performance difference between writing to to files directly and loading to memory & THEN writing to files
          • summary:
            • the performance difference between writing to ONE file directly and loading to memory & THEN writing to ONE file is barely noticeable for small jobs and more noticeable for larger jobs
            • the performance difference between writing to MULTIPLE files directly and loading to memory & THEN writing to MULTIPLE file is more noticeable for small jobs (compared to writing to only ONE file) and SIGNIFICANT for larger jobs
    • logs from tests used to benchmark performance included below:
See logs
for https://www.youtube.com/user/schafer5 (small channel, 230 videos)
writing to 1 file directly with csv=True, txt=False, md=False
  • to create the file:
It took 9.240757292005583            seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos
It took 4.265756259999762            seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv
This program took 19.537945401003526 seconds to complete.
  • to update the file:
It took 0.8453300589972059          seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos
It took 0.6392399440010195          seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv
This program took 7.754261410002073 seconds to complete.
writing to 1 file by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
  • to create the file:
It took 9.163404727999989            seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos
It took 4.260267737000007            seconds to load information for 230 videos into memory
It took 0.002389371999996115         seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv
This program took 19.483281371000004 seconds to complete.
  • to update the file:
It took 0.8521808300000089          seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos
It took 1.0964175420000117          seconds to load information for 60 videos into memory
It took 0.0015745449999826633       seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv
This program took 7.985743492000012 seconds to complete.
writing to 3 files directly with csv=True, txt=True, md=True
  • to create the files:
It took 9.166668037003546            seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos
It took 10.160974278995127           seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.txt
It took 10.164936708999448           seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv
It took 10.168633003995637           seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.md
This program took 25.594990328005224 seconds to complete.
  • to update the files:
It took 0.8503098270011833          seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos
It took 1.5225159670007997          seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv
It took 1.5322243859991431          seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.txt
It took 1.5359413480036892          seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.md
This program took 8.472728426997492 seconds to complete.
writing to 3 files by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
  • to create the files:
It took 9.367390958000005      seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos
It took 4.218187391999997      seconds to load information for 230 videos into memory
It took 0.003894963000000473   seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.md
It took 0.005060710999998719   seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv
It took 0.006283445999997639   seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.txt
This program took 18.754924324 seconds to complete.
  • to update the files:
It took 0.8672965029999986          seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos
It took 1.0901944209999996          seconds to load information for 60 videos into memory
It took 0.005667658999996661        seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv
It took 0.008393589000000645        seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.txt
It took 0.008197031000001687        seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.md
This program took 8.090583961999997 seconds to complete.
for https://www.youtube.com/c/KhanAcademy (medium channel, 8095 videos)
writing to 1 file directly with csv=True, txt=False, md=False
  • to create the file:
It took 322.72226654399856          seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 256.63442500399833          seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv
This program took 585.4076739919983 seconds to complete.
  • to update the file:
It took 0.8482559289986966          seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 0.5600300389996846          seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv
This program took 7.653723870003887 seconds to complete.
writing to 1 file by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
  • to create the file:
It took 316.9717323640002       seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 248.92245618300012      seconds to load information for 8095 videos into memory
It took 0.07691853599999376     seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv
This program took 572.114162118 seconds to complete.
  • to update the file:
It took 0.8459371520000332          seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 0.9670944140000302          seconds to load information for 60 videos into memory
It took 0.02941359300007207         seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv
This program took 8.209143252000104 seconds to complete.
writing to 3 files directly with csv=True, txt=True, md=True
  • to create the files:
It took 314.01985485899786          seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 519.1903085960002           seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.txt
It took 519.1941804189992           seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv
It took 519.197644068001            seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.md
This program took 839.4073893879977 seconds to complete.
  • to update the files:
It took 0.8488957250010571          seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 1.580211615000735           seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv
It took 1.681963879003888           seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.txt
It took 1.6842712280049454          seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.md
This program took 8.823843261001457 seconds to complete.
writing to 3 files by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
  • to create the files:
It took 316.342601403           seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 261.87072707100003      seconds to load information for 8095 videos into memory
It took 0.1363127509999913      seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv
It took 0.1775351439999895      seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.md
It took 0.18588107000005039     seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.txt
This program took 584.703847726 seconds to complete.
  • to update the files:
It took 0.8483775499998956          seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 1.0671216570001434          seconds to load information for 60 videos into memory
It took 0.17331316700006028         seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv
It took 0.22995445900005507         seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.txt
It took 0.23345572800008085         seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.md
This program took 8.503321469999833 seconds to complete.
for https://www.youtube.com/user/NBCNews/videos (large channel, ~32550 videos)
writing to 1 file directly with csv=True, txt=False, md=False
  • to create the file:
It took 3420.0639533489993          seconds to find 32347 videos from https://www.youtube.com/user/NBCNews/videos
It took 4988.648231769999           seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.csv
This program took 8414.909623333002 seconds to complete.
  • to update the file:
# forgot to run this test :D
writing to 1 file by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
  • to create the file:
It took 3367.386001154002       seconds to find 32357 videos from https://www.youtube.com/user/NBCNews/videos
It took 4880.191474030002       seconds to load information for 32357 videos into memory
It took 0.24478799300050014     seconds to write all 32357 videos to NBCNews_reverse_chronological_videos_list.csv
This program took 8253.73690525 seconds to complete.
  • to update the file:
It took 0.8474488579995523          seconds to find 60 videos from https://www.youtube.com/user/NBCNews/videos
It took 1.1012943870009622          seconds to load information for 60 videos into memory
It took 0.11654774600174278         seconds to write the 5 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv
This program took 8.668505469999218 seconds to complete.
writing to 3 files directly with csv=True, txt=True, md=True
  • to create the files:
It took 3396.025502143               seconds to find 32347 videos from https://www.youtube.com/user/NBCNews/videos
It took 7683.585577874001            seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.txt
It took 7683.592947972               seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.md
It took 7684.030176524999            seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.csv
This program took 11086.336240618999 seconds to complete.
  • to update the files:
It took 0.8738655359993572          seconds to find 60 videos from https://www.youtube.com/user/NBCNews/videos
It took 1.8775347520004289          seconds to write the 0 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv
It took 2.120259861001614           seconds to write the 0 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.txt
It took 2.132926509999379           seconds to write the 0 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.md
This program took 9.435579917999348 seconds to complete.
writing to 3 files by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
  • to create the files:
It took 3478.1540728540003          seconds to find 32353 videos from https://www.youtube.com/user/NBCNews/videos
It took 5022.493407319              seconds to load information for 32353 videos into memory
It took 0.5065521739998076          seconds to write the 6 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv
It took 0.587243801997829           seconds to write all 32353 videos to NBCNews_reverse_chronological_videos_list.txt
It took 0.6058889249979984          seconds to write all 32353 videos to NBCNews_reverse_chronological_videos_list.md
This program took 8507.703900004002 seconds to complete.
  • to update the files:
It took 0.8569685050024418         seconds to find 60 videos from https://www.youtube.com/user/NBCNews/videos
It took 1.1060196290018212         seconds to load information for 60 videos into memory
It took 0.5880495099991094         seconds to write the 4 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv
It took 0.8386826800015115         seconds to write the 4 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.txt
It took 0.8496009250011411         seconds to write the 4 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.md
This program took 9.45503293100046 seconds to complete.

v0.6.0

2 years ago
  • compare changes to previous version
  • if you are an existing user, skim through the BREAKING CHANGE and NON-BREAKING CHANGES sections below
    • if you are a new user, you do not need to worry about these sections - just skip to the NEW FEATURES section at the bottom and read the python README to get started
  • BREAKING CHANGE
    • the program now extracts the video duration for every video uploaded by a channel
      • this will likely cause problems when updating pre-existing csv files, since
        • the video duration information goes in a new column
        • csv file renderers expect consistent column formatting throughout the file
          • BUT a pre-existing csv file will only have the Video Number,Video Title,Video URL,Watched,Watch again later,Notes columns
          • so updating a pre-existing csv file will result in newly extracted videos having the Video Number,Video Title,Video Duration,Video URL,Watched,Watch again later,Notes columns while the already extracted videos will only have the Video Number,Video Title,Video URL,Watched,Watch again later,Notes columns (no Video Duration column)
          • therefore, updating a pre-existing csv file will result in the newly extracted videos having 7 columns, while pre-existing videos will have only 6 columns
      • if you want to continue using your pre-existing csv file and do NOT WANT TO INCLUDE the video duration for previously extracted videos:
        • if you have NOT yet updated the pre-existing csv file:
          • APPROACH 1: use a csv file editor such as Excel, Google Sheets, Numbers, IDE extension, etc.
            • open the csv file
            • insert the Video Duration column between the Video Title and Video URL columns
            • save the file
              • the csv editor should automatically format the existing rows to include the Video Duration column
              • therefore, all rows should now have an empty cell for the Video Duration column
          • APPROACH 2: use a simple text editor/IDE
            • open the csv file
            • insert the Video Duration column between the Video Title and Video URL columns
            • text editors will NOT automatically format the existing rows to include the Video Duration column
              • so you will need to manually format the existing rows to include the Video Duration column
              • the simplest way to do this would be to use a Find and Replace operation:
                • Find all occurrences of: ,https://
                • Replace with: ,,https://
                  • this assumes the only urls in the csv file are in the Video URL column!
                    • if you have manually added/modified parts of the file and this is no longer true, you will have to modify this approach slightly to meet your needs
        • if you have ALREADY updated the pre-existing csv file:
          • you will not be able to use APPROACH 1 from above
          • you will need to use APPROACH 2 with slight modifications:
            • Find all occurrences of (with regular expression mode enabled): ([^:][^\d]{2}),https://
            • Replace with: $1,,https:// (depending on your editor, you may need to substitute $1 with \1 or something else)
              • looks for ,https:// where it is NOT preceeded with :\d\d
                • since the most recently extracted videos will have the video duration but the already existing videos will not have the video duration
                • so this only adds a comma for previously extracted videos without the video duration
                • as with APPROACH 1, this also assumes the only urls in the csv file are in the Video URL column!
                  • if you have manually added/modified parts of the file and this is no longer true, you will also have to modify this approach slightly to meet your needs
            • if the file is a chronological_videos_list file (as opposed to a reverse_chronological_videos_list file):
              • you will ALSO need to insert the Video Duration column between the Video Title and Video URL columns in the csv header
                • since chronological_videos_list files use the csv header from the pre-existing csv file
                  • NOTE the program updates the reverse_chronological_videos_list csv header every time the program looks for new videos when rerun on a previously scraped channel
                  • but usually this csv header update is not noticeable since the header does not change
                  • the csv header update is noticeable this time, however, since there is a new column (Video Duration)
                  • for chronological_videos_list files, however, the program never updates the csv header
      • if you want to continue using your pre-existing csv file and WANT TO INCLUDE the the video duration for previously extracted videos:
        • rerun the program for the channel (in a different directory)
        • copy over any notes you took in the pre-existing file to the new file with the video duration information
      • if you do NOT want/care about using the pre-existing csv file
        • just delete the pre-existing csv file and rerun the program on the channel again (or run the program on the same channel from a different directory)
          • NOTE that if the channel deleted a video OR unlisted a video between
            • the time the video information was originally scraped
            • and you rerunning this after installing release 0.6.0+
            • the deleted/unlisted video(s) will not show up (no workaround for this - this is how YouTube displays videos)
  • NON-BREAKING CHANGES
    • txt and md files now also include the video duration information
      • this is simply an extra line in the output file, and will not cause any rendering issues since txt and md files do not depend on a consistent formatting the way csv files do
    • txt and md file now use slightly different formatting such as
      • fewer newlines
      • md files using h3 headings for video information instead of bullet points (the bullet points were also improperly formatted previously, but since they are no longer used, this is not an issue)
    • NOTE that if you want these files to contain the video duration information, you will still need to rerun the program on the channel from scratch (either in a different directory, or after deleting the pre-existing files in the current directory)
  • NEW FEATURES
    • verify_page_bottom_n_times attribute
      • for more information, see
        • commit a68f8f62e5c343cbb0641125e271bb96cc4f0750
        • commit 5b361de36f4d38704d4f4f5cb079162ace2bcb6c
        • commit 916f0502420f9f9ccc64b370cf6f531cad6f24c7
        • commit 6a02bfeb177089d8c21c7ffe2d2c67a979dbc4b2 (documentation)
    • file_buffering attribute
      • for more information, see
        • commit 0730cdb6852c064bb667ab3df22f52830ad8e065
        • commit 38b83177f867e1cf796e78be0efaabe2359b800c (documentation)

v0.5.9

2 years ago
  • compare changes to previous version
  • creates new file I/O threads if writing to more than 1 file
    • see commit 58c5faba14da25b89e104a50d380489a30d8df71 for details
  • supports scraping multiple channels from a txt file containing urls
    • see Scraping multiple channels from a file simultaneously with multi-threading section in python README for usage details
    • see __init__.py file for code changes

v0.5.8

3 years ago
  • compare changes to previous version
  • no changes in API or functionality
    • these changes should have been a part of release 0.5.7, but missed these bugs during testing
  • see the following commits for bug fixes:
    • commit b41081485c3d599856f4431bcee01e6bb79146da
    • commit aa3b6243883bc22a1c52ef09639de0c8b50d40b9
    • commit cd65c5c73d945db487743b4679f2997a6f1d06e4
    • commit e5d16d7a87b6907789615776b12eb68d860524e4