Scrapy Versions Save

Scrapy, a fast high-level web crawling & scraping framework for Python.

2.3.0

3 years ago

Hihglights:

Feed exports now support Google Cloud Storage as a storage backend
The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items.

It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS).
The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule

2.2.1

3 years ago

The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions.

2.2.0

3 years ago

Highlights:

Python 3.5.2+ is required now
dataclass objects and attrs objects are now valid item types
New TextResponse.json method
New bytes_received signal that allows canceling response download
CookiesMiddleware fixes

See the full changelog

2.1.0

4 years ago

Highlights:

New FEEDS setting to export to multiple feeds
New Response.ip_address attribute

See the full changelog

2.0.1

4 years ago

Response.follow_all now supports an empty URL iterable as input (#4408, #4420)
Removed top-level reactor imports to prevent errors about the wrong Twisted reactor being installed when setting a different Twisted reactor using TWISTED_REACTOR (#4401, #4406)

2.0.0

4 years ago

Highlights:

Python 2 support has been removed
Partial coroutine syntax support and experimental asyncio support
New Response.follow_all method
FTP support for media pipelines
New Response.certificate attribute
IPv6 support through DNS_RESOLVER

See the full changelog

1.7.4

4 years ago

Revert the fix for #3804 (#3819), which has a few undesired side effects (#3897, #3976).

1.7.3

4 years ago

Enforce lxml 4.3.5 or lower for Python 3.4 (#3912, #3918)

1.7.2

4 years ago

Fix Python 2 support (#3889, #3893, #3896)

1.7.0

4 years ago

Highlights:

Improvements for crawls targeting multiple domains
A cleaner way to pass arguments to callbacks
A new class for JSON requests
Improvements for rule-based spiders
New features for feed exports

See the full change log