Trandoshan Io Crawler Save Abandoned

Go process used to crawl websites

Project README

crawler

Build Status Go Report Card Maintainability

Crawler is a Go written program designed to crawl website

features

  • use tor SOCKS proxy to crawl hidden services
  • fast, built using valyala/fasthttp (up to 10x faster than net/http)
  • extract both absolute and relative URLs
  • use scalable messaging protocol (nats)

how it work

  • The Crawler process connect to a nats server (specified by env variable NATS_URI) and set-up a subscriber for message with tag todoSubject
  • When an URL is received the crawler start crawling
  • When crawling is done, the crawler will publish content to nats server with subject contentSubject and found urls with subject doneSubject
Open Source Agenda is not affiliated with "Trandoshan Io Crawler" Project. README Source: trandoshan-io/crawler
Stars
150
Open Issues
0
Last Commit
4 years ago
License

Open Source Agenda Badge

Open Source Agenda Rating