Proxy Web Crawler Save

Automates the process of repeatedly searching for a website via scraped proxy IP and search keywords

Project README

Search for a website with a different proxy each time

This script automates the process of searching for a website via keyword and the DuckDuckGo search engine.... page after page

Pass a complete URL and at least 1 keyword as command line arguments to run program:
python proxy_crawler.py -u -k <keyword(s)>
python proxy_crawler.py -u "https://www.whatsmyip.org" -k "my ip"

Add the -x option to run headless (no GUI):
python proxy_crawler.py -u "https://www.whatsmyip.org" -k "my ip" -x

  • A list of proxies from the web are scraped first using sslproxies.org
  • Then using a new proxy socket for each iteration, the specified keyword(s) is searched for until the desired website is found
  • The website is then visited, and one random link is clicked within the website
  • The bot is slowed down on purpose, but will also run fairly slow due to proxy connection
  • Browser windows may open and close repeatedly during runtime (due to connection errors) until a healthy/valid proxy is encountered

  • Requirements:
    • python3
    • selenium
    • Firefox browser
    • geckodriver
  • Download the latest geckodriver from Mozilla
  • Unzip the file and place geckodriver into your path
  • Ensure selenium is installed: pip install -r requirements.txt

screenshot1

screenshot2

screenshot3
Author: rootVIII 2018-2023
Open Source Agenda is not affiliated with "Proxy Web Crawler" Project. README Source: rootVIII/proxy_web_crawler

Open Source Agenda Badge

Open Source Agenda Rating