Abrade Save

A fast Web API scraper written in C++ and built on Boost ASIO

Project README

Abrade

CI

Abrade is a coroutine-based web scraper suitable for querying the existence (a HEAD request) or the contents (a GET request) of a web resource with a sequential, numerical pattern.

Check out the blog post at http://lospi.net for usage and examples.

> abrade -h
Usage: abrade host pattern:
  --host arg                            host name (eg example.com)
  --pattern arg (=/)                    format of URL (eg ?mynum={1:5}&myhex=0x
                                        {hhhh}). See documentation for
                                        formatting of patterns.
  --agent arg (=Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0)
                                        User-agent string (default: Firefox 47)
  --out arg                             output path. dir if contents enabled.
                                        (default: HOSTNAME)
  --err arg                             error path (file). (default:
                                        HOSTNAME-err.log)
  --proxy arg                           SOCKS5 proxy address:port. (default:
                                        none)
  --screen arg                          omits 200-level response if contents
                                        contains screen (default: none)
  -d [ --stdin ]                        read from stdin (default: no)
  -t [ --tls ]                          use tls/ssl (default: no)
  -s [ --sensitive ]                    complain about rude TCP teardowns
                                        (default: no)
  -o [ --tor ]                          use local proxy at 127.0.0.1:9050
                                        (default: no)
  -r [ --verify ]                       verify ssl (default: no)
  -l [ --leadzero ]                     output leading zeros in URL (default:
                                        no)
  -e [ --telescoping ]                  do not telescope the pattern (default:
                                        no)
  -f [ --found ]                        print when resource found (default:
                                        no). implied by verbose
  -v [ --verbose ]                      prints gratuitous output to console
                                        (default: no)
  -c [ --contents ]                     read full contents (default: no)
  --test                                no network requests, just write
                                        generated URIs to console (default: no)
  -p [ --optimize ]                     Optimize number of simultaneous
                                        requests (default: no)
  -i [ --init ] arg (=1000)             Initial number of simultaneous requests
  --min arg (=1)                        Minimum number of simultaneous requests
  --max arg (=25000)                    Maximum number of simultaneous requests
  --ssize arg (=50)                     Size of velocity sliding window
  --sint arg (=1000)                    Size of sampling interval
  -h [ --help ]                         produce help message

v0.2

You can now pipe URLs to Abrade via the --stdin option:

echo /anything/a/b/c?d=123 | abrade httpbin.org --stdin --contents --verbose

You must omit the pattern positional argument to pipe from stdin.

You can also use the --screen option to detect error landing pages that still return 200 responses. Such responses get screened out and will not get written to disk during a --content scrape.

Linux ELF

Windows EXE

Docker Image

docker pull jlospinoso/abrade:v0.2.0

or

docker pull quay.io/jlospinoso/abrade:v0.2.0

v0.1

Linux ELF

Windows EXE

Docker Image

docker pull jlospinoso/abrade:v0.1.0

or

docker pull quay.io/jlospinoso/abrade:v0.1.0

Building Abrade

  1. Abrade uses cmake, so you'll need to install it.
  2. Clone abrade.
  3. Navigate to the checked out directory.
  4. Make a build subdirectory.
  5. Navigate to the build directory.
  6. Invoke cmake.
  7. Use make (*nix) or Visual Studio (Windows) to build the project.

For example, on *nix:

git clone [email protected]:JLospinoso/abrade.git
cd abrade
mkdir build
cd build
cmake ..
make

On Windows, you'll need to open the abrade.sln file and build.

Open Source Agenda is not affiliated with "Abrade" Project. README Source: JLospinoso/abrade

Open Source Agenda Badge

Open Source Agenda Rating