Apify Js Versions Save

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

v3.9.1

2 weeks ago

3.9.1 (2024-04-11)

Features

v3.9.0

2 weeks ago

3.9.0 (2024-04-10)

Bug Fixes

  • include actual key in error message of KVS' setValue (#2411) (9089bf1)
  • notify autoscaled pool about newly added requests (#2400) (a90177d)
  • puppeteer: allow passing networkidle to waitUntil in gotoExtended (#2399) (5d0030d), closes #2398
  • sitemaps support application/xml (#2408) (cbcf47a)

Features

v3.8.2

1 month ago

3.8.2 (2024-03-21)

Bug Fixes

  • core: solve possible dead locks in RequestQueueV2 (#2376) (ffba095)
  • correctly report gzip decompression errors (#2368) (84a2f17)
  • puppeteer: improve detection of older versions (98d4e86), closes #2370
  • use 0 (number) instead of false as default for sessionRotationCount (#2372) (667a3e7)

Features

  • implement global storage access checking and use it to prevent unwanted side effects in adaptive crawler (#2371) (fb3b7da), closes #2364

v3.8.1

2 months ago

3.8.1 (2024-02-22)

Bug Fixes

  • fix crawling context type in router.addHandler() (#2355) (d73c202)

v3.8.0

2 months ago

3.8.0 (2024-02-21)

Bug Fixes

  • createRequests works correctly with exclude (and nothing else) (#2321) (048db09)
  • puppeteer: add 'process' to the browser bound methods (#2329) (2750ba6)
  • puppeteer: replace page.waitForTimeout() with sleep() (52d7219), closes #2335
  • puppeteer: support puppeteer@v22 (#2337) (3cc360a)

Features

  • KeyValueStore.recordExists() (#2339) (8507a65)
  • accessing crawler state, key-value store and named datasets via crawling context (#2283) (58dd5fc)
  • adaptive playwright crawler (#2316) (8e4218a)
  • add Sitemap.tryCommonNames to check well known sitemap locations (#2311) (85589f1), closes #2307
  • core: add userAgent parameter to RobotsFile.isAllowed() + RobotsFile.from() helper (#2338) (343c159)
  • Support plain-text sitemap files (sitemap.txt) (#2315) (0bee7da)

v3.7.3

2 months ago

3.7.3 (2024-01-30)

Bug Fixes

v3.7.2

3 months ago

3.7.2 (2024-01-09)

Bug Fixes

  • RequestQueue: always clear locks when a request is reclaimed (#2263) (0fafe29), closes #2262

v3.7.1

3 months ago

3.7.1 (2024-01-02)

Bug Fixes

  • ES2022 build compatibility and move to NodeNext for module (#2258) (7fe1e68), closes #2257

v3.7.0

4 months ago

3.7.0 (2023-12-21)

Bug Fixes

  • retryOnBlocked doesn't override the blocked HTTP codes (#2243) (81672c3)
  • browser-pool: respect user options before assigning fingerpints (#2190) (f050776), closes #2164
  • filter out empty globs (#2205) (41322ab), closes #2200
  • make CLI work on Windows too with --no-purge (#2244) (83f3179)
  • make SessionPool queue up getSession calls to prevent overruns (#2239) (0f5665c), closes #1667
  • MemoryStorage: lock request JSON file when reading to support multiple process crawling (#2215) (eb84ce9)

Features

v3.6.2

5 months ago

3.6.2 (2023-11-26)

Bug Fixes

  • prevent race condition in KeyValueStore.getAutoSavedValue() (#2193) (e340e2b)