Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
setValue
(#2411) (9089bf1)networkidle
to waitUntil
in gotoExtended
(#2399) (5d0030d), closes #2398
application/xml
(#2408) (cbcf47a)RequestQueueV2
(#2376) (ffba095)createRequests
works correctly with exclude
(and nothing else) (#2321) (048db09)page.waitForTimeout()
with sleep()
(52d7219), closes #2335
puppeteer@v22
(#2337) (3cc360a)KeyValueStore.recordExists()
(#2339) (8507a65)userAgent
parameter to RobotsFile.isAllowed()
+ RobotsFile.from()
helper (#2338) (343c159)retryOnBlocked
doesn't override the blocked HTTP codes (#2243) (81672c3)--no-purge
(#2244) (83f3179)