Proxy Web Crawler Save

Automates the process of repeatedly searching for a website via scraped proxy IP and search keywords

Project README

Search for a website with a different proxy each time

This script automates the process of searching for a website via keyword and the DuckDuckGo search engine.... page after page

Pass a complete URL and at least 1 keyword as command line arguments to run program:
python proxy_crawler.py -u -k <keyword(s)>
python proxy_crawler.py -u "https://www.whatsmyip.org" -k "my ip"

Add the -x option to run headless (no GUI):
python proxy_crawler.py -u "https://www.whatsmyip.org" -k "my ip" -x

A list of proxies from the web are scraped first using sslproxies.org
Then using a new proxy socket for each iteration, the specified keyword(s) is searched for until the desired website is found
The website is then visited, and one random link is clicked within the website
The bot is slowed down on purpose, but will also run fairly slow due to proxy connection
Browser windows may open and close repeatedly during runtime (due to connection errors) until a healthy/valid proxy is encountered

Requirements:
- python3
- selenium
- Firefox browser
- geckodriver
Download the latest geckodriver from Mozilla
Unzip the file and place geckodriver into your path
Ensure selenium is installed: pip install -r requirements.txt

Author: rootVIII 2018-2023

Open Source Agenda is not affiliated with "Proxy Web Crawler" Project. README Source: rootVIII/proxy_web_crawler

Stars

Open Issues

Last Commit

6 months ago

Repository

rootVIII/proxy_web_crawler

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/proxy-web-crawler"><img src="https://www.opensourceagenda.com/projects/proxy-web-crawler/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022