A web crawling framework written in Kotlin
kotlinx.coroutines
. This required an update to some of the places where coroutine builders were called internally.Krawler#removeUrlsByRootPage
and Krawler#removeUrlsByAge
KrawlQueueEntry
. Queues will always attempt to pop the lowest
priority
entry available. Priority can be assigned by overriding the Krawler#assignQueuePriorty
method.0.4.1 (2017-8-15)
kotlinx.coroutines
to .170.4.0 (2017-5-17)
Rewrote core crawl loop to use Kotlin 1.1 coroutines. This has effectively turned the crawl process into a multi-stage pipeline. This architecture change has removed the necessity for some locking by removing resource contention by multiple threads.
Updated the build file to build the simple example as a runnable jar
Minor bug fies in the KrawlUrl class.
Fixed a number of bugs that would result in a crashed thread, and subsequently an incorrect number of crawled pages as well as cause slowdowns due to a reduced number of worker threads.
Added a new utility function to wrap doCrawl
and log any uncaught exceptions during crawling.