Apify Js Versions Save

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

v3.9.1

2 weeks ago

3.9.1 (2024-04-11)

Features

browserPerProxy browser launch option (#2418) (df57b29)

v3.9.0

2 weeks ago

3.9.0 (2024-04-10)

Bug Fixes

include actual key in error message of KVS' setValue (#2411) (9089bf1)
notify autoscaled pool about newly added requests (#2400) (a90177d)
puppeteer: allow passing networkidle to waitUntil in gotoExtended (#2399) (5d0030d), closes #2398
sitemaps support application/xml (#2408) (cbcf47a)

Features

createAdaptivePlaywrightRouter utility (#2415) (cee4778), closes #2407
tieredProxyUrls for ProxyConfiguration (#2348) (5408c7f)
better newUrlFunction for ProxyConfiguration (#2392) (330598b), closes #2348 #2065
expand #shadow-root elements automatically in parseWithCheerio helper (#2396) (a05b3a9)

v3.8.2

1 month ago

3.8.2 (2024-03-21)

Bug Fixes

core: solve possible dead locks in RequestQueueV2 (#2376) (ffba095)
correctly report gzip decompression errors (#2368) (84a2f17)
puppeteer: improve detection of older versions (98d4e86), closes #2370
use 0 (number) instead of false as default for sessionRotationCount (#2372) (667a3e7)

Features

implement global storage access checking and use it to prevent unwanted side effects in adaptive crawler (#2371) (fb3b7da), closes #2364

v3.8.1

2 months ago

3.8.1 (2024-02-22)

Bug Fixes

fix crawling context type in router.addHandler() (#2355) (d73c202)

v3.8.0

2 months ago

3.8.0 (2024-02-21)

Bug Fixes

createRequests works correctly with exclude (and nothing else) (#2321) (048db09)
puppeteer: add 'process' to the browser bound methods (#2329) (2750ba6)
puppeteer: replace page.waitForTimeout() with sleep() (52d7219), closes #2335
puppeteer: support puppeteer@v22 (#2337) (3cc360a)

Features

KeyValueStore.recordExists() (#2339) (8507a65)
accessing crawler state, key-value store and named datasets via crawling context (#2283) (58dd5fc)
adaptive playwright crawler (#2316) (8e4218a)
add Sitemap.tryCommonNames to check well known sitemap locations (#2311) (85589f1), closes #2307
core: add userAgent parameter to RobotsFile.isAllowed() + RobotsFile.from() helper (#2338) (343c159)
Support plain-text sitemap files (sitemap.txt) (#2315) (0bee7da)

v3.7.3

2 months ago

3.7.3 (2024-01-30)

Bug Fixes

enqueueLinks: filter out empty/nullish globs (#2286) (84319b3)
pass on an invisible CF turnstile (#2277) (d8734e7), closes #2256

v3.7.2

3 months ago

3.7.2 (2024-01-09)

Bug Fixes

RequestQueue: always clear locks when a request is reclaimed (#2263) (0fafe29), closes #2262

v3.7.1

3 months ago

3.7.1 (2024-01-02)

Bug Fixes

ES2022 build compatibility and move to NodeNext for module (#2258) (7fe1e68), closes #2257

v3.7.0

4 months ago

3.7.0 (2023-12-21)

Bug Fixes

retryOnBlocked doesn't override the blocked HTTP codes (#2243) (81672c3)
browser-pool: respect user options before assigning fingerpints (#2190) (f050776), closes #2164
filter out empty globs (#2205) (41322ab), closes #2200
make CLI work on Windows too with --no-purge (#2244) (83f3179)
make SessionPool queue up getSession calls to prevent overruns (#2239) (0f5665c), closes #1667
MemoryStorage: lock request JSON file when reading to support multiple process crawling (#2215) (eb84ce9)

Features

allow configuring crawler statistics (#2213) (9fd60e4), closes #1789
check enqueue link strategy post redirect (#2238) (3c5f9d6), closes #2173
log cause with retryOnBlocked (#2252) (e19a773), closes #2249
robots.txt and sitemap.xml utils (#2214) (fdfec4f), closes #2187

v3.6.2

5 months ago

3.6.2 (2023-11-26)

Bug Fixes

prevent race condition in KeyValueStore.getAutoSavedValue() (#2193) (e340e2b)