Apify Js Versions Save

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

v3.6.2

5 months ago

3.6.2 (2023-11-26)

Bug Fixes

  • prevent race condition in KeyValueStore.getAutoSavedValue() (#2193) (e340e2b)

v3.6.1

5 months ago

3.6.1 (2023-11-15)

Bug Fixes

  • ts: ignore import errors for got-scraping (012fc9e)
  • ts: specify type explicitly for logger (aec3550)

Features

v3.6.0

5 months ago

3.6.0 (2023-11-15)

Bug Fixes

  • add skipNavigation option to enqueueLinks (#2153) (118515d)
  • BrowserPool: ignore --no-sandbox flag for webkit launcher (#2148) (1eb2f08), closes #1797
  • core: respect some advanced options for RequestList.open() + improve docs (#2158) (c5a1b07)
  • declare missing dependency on got-scraping in the core package (cd2fd4d)
  • provide more detailed error messages for browser launch errors (#2157) (f188ebe)
  • retry incorrect Content-Type when response has blocked status code (#2176) (b54fb8b), closes #1994

Features

v3.5.8

6 months ago

3.5.8 (2023-10-17)

Bug Fixes

  • MemoryStorage: ignore invalid files for request queues (#2132) (fa58581), closes #1985
  • refactor extractUrls to split the text line by line first (#2122) (7265cd7)

v3.5.7

7 months ago

3.5.7 (2023-10-05)

Bug Fixes

  • add warning when we detect use of RL and RQ, but RQ is not provided explicitly (#2115) (6fb1c55), closes #1773
  • ensure the status message cannot stuck the crawler (#2114) (9034f08)
  • RQ request count is consistent after migration (#2116) (9ab8c18), closes #1855 #1855

v3.5.6

7 months ago

3.5.6 (2023-10-04)

Bug Fixes

  • types: re-export RequestQueueOptions as an alias to RequestProviderOptions (#2109) (0900f76)

Features

v3.5.5

7 months ago

3.5.5 (2023-10-02)

Bug Fixes

  • allow to use any version of puppeteer or playwright (#2102) (0cafceb), closes #2101
  • session pool leaks memory on multiple crawler runs (#2083) (b96582a), closes #2074 #2031
  • templates: install browsers on postinstall for playwright (#2104) (323768b)
  • types: make return type of RequestProvider.open and RequestQueue(v2).open strict and accurate (#2096) (dfaddb9)

Features

  • experimental support for request locking (Request Queue v2) (#1975) (70a77ee), closes #1365

v3.5.4

8 months ago

3.5.4 (2023-09-11)

Bug Fixes

  • core: allow explicit calls to purgeDefaultStorage to wipe the storage on each call (#2060) (4831f07)
  • various helpers opening KVS now respect Configuration (#2071) (59dbb16)

Features

  • remove side effect from the deprecated error context augmentation (#2069) (f9fb5c4)

v3.5.3

8 months ago

3.5.3 (2023-08-31)

Bug Fixes

  • browser-pool: improve error handling when browser is not found (#2050) (282527f), closes #1459
  • clean up inProgress cache when delaying requests via sameDomainDelaySecs (#2045) (f63ccc0)
  • crawler instances with different StorageClients do not affect each other (#2056) (3f4c863)
  • pin all internal dependencies (#2041) (d6f2b17), closes #2040
  • respect current config when creating implicit RequestQueue instance (845141d), closes #2043

Features

  • core: add default dataset helpers to BasicCrawler (#2057) (e2a7544)

v3.5.2

8 months ago

3.5.2 (2023-08-21)

Bug Fixes

  • make the Request constructor options typesafe (#2034) (75e7d65)
  • pin @crawlee/* packages versions in crawlee metapackage (#2040) (61f91c7)
  • support DELETE requests in HttpCrawler (#2039) (7ea5c41), closes #1658

Features