ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to return natural language answers to the user's queries.
ScrapeGPT is a Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes advanced natural language processing techniques to provide accurate responses to user queries.
The idea came from the job interview question when I was given an assignment to "study our company website and provide new product recommendations"> I decided to automate the process of studying a whole website and made ScrapeGPT.
Note:
You don't have to use it as a Telegram bot, at the very end of the scrapeGPT.py there is a commented main() function to use it as CLI. Also, there is an experimental file called scrapeGPT_gradio_app.py, where the same functionality is implemented in the form of Gradio app and using qDrant VectorDB, Ollama and GPT4ALLEmbeddings.
To get started with the bot, follow these steps:
git clone https://github.com/LexiestLeszek/scrapeGPT.git
cd scrapeGPT
pip install -r requirements.txt
ollama pull qwen:1.8b
API_TOKEN
in the script with your actual token and don't forget bot's telegram nickname.python scrapeGPT.py
Once the bot is running, interact with it on Telegram:
/start
command to initialize the bot.Contributions are welcome! To contribute:
This project is licensed under the terms of the MIT license. See the LICENSE file for details.
If you have any questions or suggestions, feel free to open an issue on GitHub.