Uscrapper Save

Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.

Project README

Uscrapper Vanta

project-image

Introducing Uscrapper Vanta, Unleashing the Power of Open-Source Intelligence, dive deeper into the vast web with Uscrapper Vanta, Vanta unlocks a new level of data extraction capabilities, empowering the exploration of the uncharted territories of the dark web and uncovering hidden gems with pinpoint accuracy using the keyword extraction model. Uscrapper Vanta retains the core strengths of its predecessor, It can be used to Harvest a wealth of personal information, including email addresses, social media links, author names, geolocations, phone numbers, and usernames, from both hyperlinked and non-hyperlinked sources. Leveraging multithreading and sophisticated anti-web scraping defenses with advanced modules, ensuring you can access the data you require, Vanta supports 'crawl and scrape' within the same domain, gathering information from every relevant corner of a website, Generates comprehensive reports to organize and analyze the extracted data, turning raw information into actionable insights.

🤩 Whats New?:

Uscrapper Vanta:

Dark Web Support: Uscrapper Vanta now has the capability to handle .onion or dark web links. This expanded functionality enables users to extract crucial information from previously inaccessible sources, providing a more comprehensive view of the digital landscape.
Keyword-Based Scraping: With the introduction of a new model, Uscrapper Vanta now allows users to scrape web pages for specific keywords or a list of keywords. This tailored approach enhances the tool's versatility, enabling users to focus on extracting only the information relevant to their needs.

💡 Extracted Details:

Uscrapper extracts the following details from the provided website:

Email Addresses: Displays email addresses found on the website.
Social Media Links: Displays links to various social media platforms found on the website.
Author Names: Displays the names of authors associated with the website.
Geolocations: Displays geolocation information associated with the website.
Non-Hyperlinked Details: Displays non-hyperlinked details found on the website including email addresses phone numbers and usernames.
Keyword Based Extraction: Displays relevant data by specifying terms or curating comprehensive keyword lists.

📽 Preview:

project-ss

project-ss2

🛠️ Installation Steps:

git clone https://github.com/z0m31en7/Uscrapper.git

cd Uscrapper/install/ 
chmod +x ./install.sh && ./install.sh      #For Unix/Linux systems

🔮 Usage:

To run Uscrapper-vanta, use the following command-line syntax:

python Uscrapper-vanta.py [-h] [-u URL] [-O] [-ns] [-c CRAWL] [-t THREADS] [-k KEYWORDS [KEYWORDS ...]] [-f FILE]

Arguments:

-u URL, --url URL (URL of the website)
-O, --generate-report (Generate a report)
-ns, --nonstrict (Display non-strict usernames (may show inaccurate results))
-c CRAWL, --crawl (CRAWL) specify max number of links to Crawl and scrape within the same scope
-t THREADS, --threads THREADS (Number of threads to utilize while crawling (default=4))
-k KEYWORDS [KEYWORDS ...], --keywords KEYWORDS [KEYWORDS ...] (Keywords to search for (as space-separated arguments)
-f FILE, --file FILE (Path to a text file containing keywords)

📜 Note:

Uscrapper relies on web scraping techniques to extract information from websites. Make sure to use it responsibly and in compliance with the website's terms of service and applicable laws.
The accuracy and completeness of the extracted details depend on the structure and content of the website being analyzed.
To bypass some Anti-Webscrapping methods we have used selenium which can make the overall process slower.

💌 Contribution:

Want a new feature to be added?

Make a pull request with all the necessary details and it will be merged after a review.
You can contribute by making the regular expressions more efficient and accurate, or by suggesting some more features that can be added.

🛡️ License:

This project is licensed under the MIT-LICENSE

Open Source Agenda is not affiliated with "Uscrapper" Project. README Source: z0m31en7/Uscrapper

Stars

406

Open Issues

Last Commit

1 week ago

Repository

z0m31en7/Uscrapper

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/uscrapper"><img src="https://www.opensourceagenda.com/projects/uscrapper/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022