A simple web crawler, using Abot, that indexes page contents into Azure Search.
Azure AI Search delivers accurate, hyper-personalized responses in your Gen AI applications. This project helps you get content from a website into an Azure AI Search index. It uses Abot to crawl websites. For each page it extracts the content in a customizable way and indexes it into Azure Search.
This project is intended as a demo or a starting point for a real crawler. At a minimum, you'll want to replace the console messages with proper logging, and customize the text extraction to improve results for your use case.
To adjust what content is extracted and indexed from each page, implement your own TextExtractor subclass. See the class documentation for more information.
The Abot crawler is configured by the method Crawler.CreateCrawlConfiguration, which you can adjust to your liking.