Advanced Web Scraping Tutorial Save

The Zipru scraper developed in the Advanced Web Scraping Tutorial.

Project README

Advanced Web Scraping Tutorial Project

This repository is a companion to the article Advanced Web Scraping: Bypassing captcha, "403 Forbidden," and more. Please refer to the article for further details.

This is a scrapy web scraper for the fictional Zipru torrent site. It is designed to bypass four distinct anti-scraping mechanisms:

  1. User agent filtering.
  2. Obfuscated javascript redirects.
  3. Captchas.
  4. Header consistency checks.

The scraper is not actually functional because Zipru is not a real site. The code, however, is otherwise complete and can easily be adapted to work on other sites.

Open Source Agenda is not affiliated with "Advanced Web Scraping Tutorial" Project. README Source: sangaline/advanced-web-scraping-tutorial
Stars
423
Open Issues
2
Last Commit
7 years ago

Open Source Agenda Badge

Open Source Agenda Rating