Distributed crawler powered by Headless Chrome
obeyRobotsTxt
for crawler.queue()'s options.persist
for RedisCache's constructing options.cache
to be required for HCCrawler.connect() and HCCrawler.launch()'s options.skipDuplicates
to remember and skip duplicate URLs, instead of passing null
to cache
option.BaseCache
interface.requeststarted
, requestskipped
, requestfinished
, requestfailed
, maxdepthreached
, maxrequestreached
and disconnected
events.onSuccess
and evaluatePage
options as null
.crawler.isPaused
, crawler.queueSize
, crawler.pendingQueueSize
and crawler.requestedCount
from read-only properties to methods.maxDepth
for crawler.queue()'s options.ensureCacheClear
to persistCache
for HCCrawler.connect([options]) and HCCrawler.launch([options])'s optionsmaxRequest
, allowedDomains
and userAgent
option for crawler.queue([options])
HCCrawler.launch()
shouldRequest
to preRequest
extraHeaders
option