Languagepod101 Scraper Save

Python scraper for Language Pods such as Japanesepod101.com :japanese_ogre: :japan: :sushi: Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨

Project README

:zap: languagepod101-scraper :zap:

Language selection
languagepod101-scraper is a resource for dozen of language learning courses and study material for FREE.


:mortar_board: About

languagepod101-scraper helps you download full language courses and save them to a local directory. The courses are produced and distributed by Innovative Language, who provides language learning courses from a selection of dozens of languages. Each lesson is usually 10-20 minutes long.

To get started, choose one of the languages courses offered by Innovative Language and create a free account.

:pushpin: Usage

To use the script, fulfill the requirements and follow the example as demonstrated below.

:electric_plug: Requirements

:bookmark_tabs: Example

For the sake of example, the process of downloading of a course from Japanese Pod 101 will be demonstrated.

Japanese Pod 101 and all other sites have a similar structure which looks as following:

Japanesepod101
├─ Level 1 - Absolute Beginner
│  ├─ Newbie Season 1
│  │  ├─ lesson 01
│  │  ├─ lesson 02
│  │  ├─ lesson 03
│  │  ├─ ...
│  ├─ Newbie Season 2
│  ├─ ...
├─ Level 2 - Beginner
│  ├─ Lower Beginner Season 1
│  │  ├─ lesson 01
│  │  ├─ lesson 02
│  │  ├─ lesson 03
│  │  ├─ ...
│  ├─ ...
├─ Level 3 - Intermediate
│  ├─ ...
│  │  ├─ ...
│  │  ├─ ...
│  ├─ ...
│  ├─ ...
├─ Level 4 - Upper Intermediate
│  ├─ ...
├─ Level 5 - Advanced
│  ├─ ...
  • To download Lower Beginner Season 1 we will have to use our web browser to navigate to lesson 1 of this course (any other lesson url from the same course is ok too...).

    Navigation would look like this: Japanesepod101Level 2 - BeginnerLower Beginner Season 1lesson 01.

    Save the URL for lesson 01 from the address bar, as you will have to provide it to the script later on.

  • Create a directory in your PC for this course, and enter into it.

  • Run the language101_scraper.py script, and follow the instructions. You will have to provide:

    • the email you used to sign up for the course
    • your password for the course
    • the course's lesson URL you have navigated through earlier (in our example: lesson 01 of the Lower Beginner Season 1 course).
  • Alternatively, you can pass the data as parameters when invoking the script:

    ./language101_scraper.py -u $USERNAME -p $PASSWORD --url YOUR_LESSON_URL
    
  • The script will start downloading the MP3/MP4/M4V files into the local navigated folder. Any possible errors would be printed out.

  • Output inside folder should look like this:

    ├─01 - A Formal Japanese Introduction - JapanesePod101 - Dialogue.mp3
    ├─01 - A Formal Japanese Introduction - JapanesePod101 - Review.mp3
    ├─01 - A Formal Japanese Introduction - JapanesePod101 - Main Lesson.mp3
    ├─02 - Which Famous Tokyo Tower is That - JapanesePod101 - Dialogue.mp3
    ├─02 - Which Famous Tokyo Tower is That - JapanesePod101 - Main Lesson.mp3
    ├─02 - Which Famous Tokyo Tower is That - JapanesePod101 - Review.mp3
    ├─03 - Networking in Japan - JapanesePod101 - Dialogue.mp3
    ├─03 - Networking in Japan - JapanesePod101 - Main Lesson.mp3
    ├─03 - Networking in Japan - JapanesePod101 - Review.mp3
    ├─...
    

:clipboard: Disclaimer and known issues

  • Any usage of the script is under user's responsibility only. Users of the script must act according to site's terms.

  • As of today, Innovative Language's terms of use does not forbid usage of crawlers or scrapers on any of their sites. This may change in the future, so be aware.

  • If you like the services Innovative Language provides you should consider a monthly subscription. Basic programs start at around $5 per month and include support from native speaker teachers.

  • As with all websites, the site's structure may change in the future and thus, as often happens with scraping scripts, deprecate it. It is not really a question of if the site's source code will change but rather when (so enjoy it while it's still working :grin:).

:lock: License

All of the content presented in the websites belongs to the original creators (Innovative Language) and I have nothing to do with it.

The license below refers only to the script and not to the downloaded content.

License - MIT

:speech_balloon: Status and changelog

  • 23.03.2022: Added support for basic video downloading (nothing fancy, just m4v and mp4 files) Added error handling for when a lesson library/lesson contents URL is used instead of the first lesson (user is now warned)
  • 11.05.2021: Headers and waiting time added, script is alive again.
Open Source Agenda is not affiliated with "Languagepod101 Scraper" Project. README Source: nedlir/languagepod101-scraper

Open Source Agenda Badge

Open Source Agenda Rating