Headless Chrome Crawler Versions Save

Distributed crawler powered by Headless Chrome

1.2.5

6 years ago

Added

changed

  • Make cache to be required for HCCrawler.connect() and HCCrawler.launch()'s options.
  • Provide skipDuplicates to remember and skip duplicate URLs, instead of passing null to cache option.
  • Modify BaseCache interface.

1.2.4

6 years ago

Added

  • Support CSV and JSON Lines formats for exporting results
  • Emit requeststarted, requestskipped, requestfinished, requestfailed, maxdepthreached, maxrequestreached and disconnected events.
  • Improve debug logs by tracing public APIs and events.

Changed

  • Allow onSuccess and evaluatePage options as null.
  • Change crawler.isPaused, crawler.queueSize, crawler.pendingQueueSize and crawler.requestedCount from read-only properties to methods.

Fixed

  • Fix a bug of ignoring maxDepth option.

1.2.3

6 years ago

[1.2.3] - 2017-12-17

changed

  • Refactor by changing tye style of requiring cache directory.

Fixed

  • Fix a bug of starting too many crawlers more than maxConcurrency when requests fail.

1.2.2

6 years ago

Added

  • Automatically collect and follow links found in the requested page.
  • Support maxDepth for crawler.queue()'s options.

1.2.1

6 years ago

Added

1.2.0

6 years ago

Changed

1.1.2

6 years ago

Added

1.1.1

6 years ago

[1.1.1] - 2017-12-09

Added

  • Add CHANGELOG
  • Automatically dismisses dialog
  • Enrich unit tests

Changed

  • Refactor by separating HCCrawler and Crawler classes
  • Make preparation of pages parallel

1.1.0

6 years ago
  • Refactor by separating HCCrawler and Crawler classes
  • Public API to launch a browser has changed. Now you can launch browser by HCCrawler.launch()
  • Rename shouldRequest to preRequest
  • Modify README according to new public API
  • Modify examples according to new public API
  • Support extraHeaders option
  • Refactor handlers for options
  • Add comment in JSDoc style

1.0.0

6 years ago