Headless Chrome Crawler Versions Save

Distributed crawler powered by Headless Chrome

1.2.5

6 years ago

Added

Support obeyRobotsTxt for crawler.queue()'s options.
Support persist for RedisCache's constructing options.

changed

Make cache to be required for HCCrawler.connect() and HCCrawler.launch()'s options.
Provide skipDuplicates to remember and skip duplicate URLs, instead of passing null to cache option.
Modify BaseCache interface.

1.2.4

6 years ago

Added

Support CSV and JSON Lines formats for exporting results
Emit requeststarted, requestskipped, requestfinished, requestfailed, maxdepthreached, maxrequestreached and disconnected events.
Improve debug logs by tracing public APIs and events.

Changed

Allow onSuccess and evaluatePage options as null.
Change crawler.isPaused, crawler.queueSize, crawler.pendingQueueSize and crawler.requestedCount from read-only properties to methods.

Fixed

Fix a bug of ignoring maxDepth option.

1.2.3

6 years ago

[1.2.3] - 2017-12-17

changed

Refactor by changing tye style of requiring cache directory.

Fixed

Fix a bug of starting too many crawlers more than maxConcurrency when requests fail.

1.2.2

6 years ago

Added

Automatically collect and follow links found in the requested page.
Support maxDepth for crawler.queue()'s options.

1.2.1

6 years ago

Added

Support screenshot for crawler.queue()'s options.

1.2.0

6 years ago

Changed

Rename ensureCacheClear to persistCache for HCCrawler.connect([options]) and HCCrawler.launch([options])'s options

1.1.2

6 years ago

Added

Support maxRequest, allowedDomains and userAgent option for crawler.queue([options])
Support pluggable cache
Add crawler.setMaxRequest(maxRequest), crawler.pause() and crawler.resume() methods
Add crawler.pendingQueueSize and crawler.requestedCount read-only properties

1.1.1

6 years ago

[1.1.1] - 2017-12-09

Added

Add CHANGELOG
Automatically dismisses dialog
Enrich unit tests

Changed

Refactor by separating HCCrawler and Crawler classes
Make preparation of pages parallel

1.1.0

6 years ago

Refactor by separating HCCrawler and Crawler classes
Public API to launch a browser has changed. Now you can launch browser by HCCrawler.launch()
Rename shouldRequest to preRequest
Modify README according to new public API
Modify examples according to new public API
Support extraHeaders option
Refactor handlers for options
Add comment in JSDoc style

1.0.0

6 years ago

Add test with mocha and power-assert
Add coverage with istanbul
Add setting for CircleCI
Add .editorconfig
Migrate from NPM to Yarn
Refactor helper to class static method style
Add debug log