A simple but powerful web crawler library for .NET
X-Robots-Tag
header and <meta name="robots" />
tag)async
/await
systemInfinity Crawler is licensed under the MIT license. It is free to use in personal and commercial projects.
There are support plans available that cover all active Turner Software OSS projects. Support plans provide private email support, expert usage advice for our projects, priority bug fixes and more. These support plans help fund our OSS commitments to provide better software for everyone.
The crawler is built around fast but "polite" crawling of website. This is accomplished through a number of settings that allow adjustments of delays and throttles.
You can control:
crawl-delay
is defined for the User-agent, that will be the minimum)using InfinityCrawler;
var crawler = new Crawler();
var result = await crawler.Crawl(new Uri("http://example.org/"), new CrawlSettings {
UserAgent = "MyVeryOwnWebCrawler/1.0",
RequestProcessorOptions = new RequestProcessorOptions
{
MaxNumberOfSimultaneousRequests = 5
}
});