Headless Chrome Crawler Versions Save

Distributed crawler powered by Headless Chrome

1.8.0

5 years ago

Added

Set previousUrl to onSuccess argument.
Set options, depth, previousUrl to errors.
Support customCrawl for HCCrawler.connect() and HCCrawler.launch()'s options.
Add Dockerfile and tips for using Docker.

Changed

Drop newpage event.
Update Puppeteer version to 1.5.0.

Fixed

Fix a bug of not marking skipped requests correctly.
Fix requestfinished event's argument as described in the API reference.

1.7.0

5 years ago

Added

Support cookies for crawler.queue()'s options.
Make onSuccess pass cookies in the response.

changed

Update Puppeteer version to 1.4.0.

1.6.0

6 years ago

[1.6.0] - 2018-04-21

Added

Support viewport and skipRequestedRedirect for crawler.queue()'s options.
Emit requestdisallowed event.
Make onSuccess pass redirectChain in the response.

changed

Bump Node.js version up to 8.10.0.
Update Puppeteer version to 1.3.0.
Move node_redis to the peer dependencies.
Make crawler.queue() to return Promise.

Fixed

Fix a bug of silently failing to insert jQuery due to CSP.

1.5.0

6 years ago

Added

Support waitFor for crawler.queue()'s options.
Support slowMo for HCCrawler.connect()'s options.

Fixed

Fix a bug of not allowed to set timeout option per request.
Fix a bug of crawling twice if one url has a trailing slash on the root folder and the other does not.

1.4.0

6 years ago

Added

Support browserCache for crawler.queue()'s options.
Support depthPriority option again.

1.3.4

6 years ago

changed

Drop depthPriority for crawler.queue()'s options.

1.3.3

6 years ago

Added

Emit newpage event.
Support deniedDomains and depthPriority for crawler.queue()'s options.

changed

Allow allowedDomains option to accept a list of regular expressions.

1.3.2

6 years ago

Added

Support followSitemapXml for crawler.queue()'s options.

Fixed

Fix a bug of not showing console message properly

1.3.1

6 years ago

Fixed

Fix a bug of listing response properties as methods.
Fix a bug of not obeying robots.txt.

1.3.0

6 years ago

Added

Add HCCrawler.defaultArgs() method.
Emit requestretried event.

changed

Use cache option not only for remembering already requested URLs but for request queue for distributed environments.
Moved onSuccess, onError and maxDepth options from HCCrawler.connect() and HCCrawler.launch() to crawler.queue().