Apify Js Versions Save

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

v3.6.2

5 months ago

3.6.2 (2023-11-26)

Bug Fixes

prevent race condition in KeyValueStore.getAutoSavedValue() (#2193) (e340e2b)

v3.6.1

5 months ago

3.6.1 (2023-11-15)

Bug Fixes

ts: ignore import errors for got-scraping (012fc9e)
ts: specify type explicitly for logger (aec3550)

Features

puppeteer: enable new headless mode (#1910) (7fc999c)

v3.6.0

5 months ago

3.6.0 (2023-11-15)

Bug Fixes

add skipNavigation option to enqueueLinks (#2153) (118515d)
BrowserPool: ignore --no-sandbox flag for webkit launcher (#2148) (1eb2f08), closes #1797
core: respect some advanced options for RequestList.open() + improve docs (#2158) (c5a1b07)
declare missing dependency on got-scraping in the core package (cd2fd4d)
provide more detailed error messages for browser launch errors (#2157) (f188ebe)
retry incorrect Content-Type when response has blocked status code (#2176) (b54fb8b), closes #1994

Features

core: add crawler.exportData() helper (#2166) (c8c09a5)
got-scraping v4 (#2110) (2f05ed2)

v3.5.8

6 months ago

3.5.8 (2023-10-17)

Bug Fixes

MemoryStorage: ignore invalid files for request queues (#2132) (fa58581), closes #1985
refactor extractUrls to split the text line by line first (#2122) (7265cd7)

v3.5.7

7 months ago

3.5.7 (2023-10-05)

Bug Fixes

add warning when we detect use of RL and RQ, but RQ is not provided explicitly (#2115) (6fb1c55), closes #1773
ensure the status message cannot stuck the crawler (#2114) (9034f08)
RQ request count is consistent after migration (#2116) (9ab8c18), closes #1855 #1855

v3.5.6

7 months ago

3.5.6 (2023-10-04)

Bug Fixes

types: re-export RequestQueueOptions as an alias to RequestProviderOptions (#2109) (0900f76)

Features

add incapsula iframe selector to the blocked list (#2111) (2b17d8a), closes apify/store-website-content-crawler#154

v3.5.5

7 months ago

3.5.5 (2023-10-02)

Bug Fixes

allow to use any version of puppeteer or playwright (#2102) (0cafceb), closes #2101
session pool leaks memory on multiple crawler runs (#2083) (b96582a), closes #2074 #2031
templates: install browsers on postinstall for playwright (#2104) (323768b)
types: make return type of RequestProvider.open and RequestQueue(v2).open strict and accurate (#2096) (dfaddb9)

Features

experimental support for request locking (Request Queue v2) (#1975) (70a77ee), closes #1365

v3.5.4

8 months ago

3.5.4 (2023-09-11)

Bug Fixes

core: allow explicit calls to purgeDefaultStorage to wipe the storage on each call (#2060) (4831f07)
various helpers opening KVS now respect Configuration (#2071) (59dbb16)

Features

remove side effect from the deprecated error context augmentation (#2069) (f9fb5c4)

v3.5.3

8 months ago

3.5.3 (2023-08-31)

Bug Fixes

browser-pool: improve error handling when browser is not found (#2050) (282527f), closes #1459
clean up inProgress cache when delaying requests via sameDomainDelaySecs (#2045) (f63ccc0)
crawler instances with different StorageClients do not affect each other (#2056) (3f4c863)
pin all internal dependencies (#2041) (d6f2b17), closes #2040
respect current config when creating implicit RequestQueue instance (845141d), closes #2043

Features

core: add default dataset helpers to BasicCrawler (#2057) (e2a7544)

v3.5.2

8 months ago

3.5.2 (2023-08-21)

Bug Fixes

make the Request constructor options typesafe (#2034) (75e7d65)
pin @crawlee/* packages versions in crawlee metapackage (#2040) (61f91c7)
support DELETE requests in HttpCrawler (#2039) (7ea5c41), closes #1658

Features

Add options for custom HTTP error status codes (#2035) (b50ef1a), closes #1711