A curated list of awesome packages, articles, and other cool resources from the Scrapy community.
A curated list of awesome packages, articles, and other cool resources from the Scrapy community. Scrapy is a fast high-level web crawling & scraping framework for Python.
scrapyscript Run a Scrapy spider programmatically from a script or a Celery task - no project required.
scrapyd A service daemon to run Scrapy spiders
scrapyd-client Command line client for Scrapyd server
python-scrapyd-api A Python wrapper for working with Scrapyd's API.
SpiderKeeper A scalable admin ui for spider service
scrapyrt HTTP server which provides API for scheduling Scrapy spiders and making requests with spiders.
Gerapy Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js
SpiderKeeper admin ui for scrapy/open source scrapinghub.
ScrapydWeb Scrapyd cluster management, Scrapy log analysis & visualization, Basic auth, Auto packaging, Timer Tasks, Email notice, and Mobile UI.
scrapy-sentry Logs Scrapy exceptions into Sentry
scrapy-statsd-middleware Statsd integration middleware for scrapy
scrapy-jsonrpc An extension to control a running Scrapy web crawler via JSON-RPC
scrapy-fieldstats A Scrapy extension to log items coverage when the spider shuts down
spidermon Extension which provides useful tools for data validation, stats monitoring, and notification messages.
HttpProxyMiddleware A middleware for scrapy. Used to change HTTP proxy from time to time.
scrapy-proxies Processes Scrapy requests using a random proxy from list to avoid IP ban and improve crawling speed.
scrapy-rotating-proxies Use multiple proxies with Scrapy
scrapy-random-useragent Scrapy Middleware to set a random User-Agent for every Request.
scrapy-fake-useragent Random User-Agent middleware based on fake-useragent
scrapy-crawlera Crawlera routes requests through a pool of IPs, throttling access by introducing delays and discarding IPs from the pool when they get banned from certain domains, or have other problems.
scrapy-elasticsearch A scrapy pipeline which send items to Elastic Search server
scrapy-mongodb MongoDB pipeline for Scrapy.
scrapy-mysql-pipeline MySQL pipeline to persist items in MySQL databases.
scrapy-s3pipeline Scrapy pipeline to store chunked items into AWS S3 bucket
scrapy-sqs-exporter Scrapy extension for outputting scraped items to an Amazon SQS instance
scrapy-kafka-export Scrapy extension which writes crawled items to Kafka
scrapy-rss-exporter An RSS exporter for Scrapy
scrapy-djangoitem Scrapy extension to write scraped items using Django models
scrapy-deltafetch Scrapy spider middleware to ignore requests to pages containing items seen in previous crawls
scrapy-crawl-once This package provides a Scrapy middleware which allows to avoid re-crawling pages which were already downloaded in previous crawls.
scrapy-magicfields Scrapy middleware to add extra fields to items, like timestamp, response fields, spider attributes etc.
scrapy-pagestorage A scrapy extension to store requests and responses information in storage service.
itemloaders Library to populate items using XPath and CSS with a convenient API.
itemadapter Adapter which provides a common interface to handle objects of different types in an uniform manner.
scrapy-poet Page Object pattern implementation which enables writing reusable and portable extraction and crawling code.
Web Scraping in Python using Scrapy (with multiple examples)
Advanced Web Scraping: Bypassing "403 Forbidden," captchas, and more