A fast Web API scraper written in C++ and built on Boost ASIO
Abrade is a coroutine-based web scraper suitable for querying the existence (a HEAD request) or the contents (a GET request) of a web resource with a sequential, numerical pattern.
Check out the blog post at http://lospi.net for usage and examples.
> abrade -h
Usage: abrade host pattern:
--host arg host name (eg example.com)
--pattern arg (=/) format of URL (eg ?mynum={1:5}&myhex=0x
{hhhh}). See documentation for
formatting of patterns.
--agent arg (=Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0)
User-agent string (default: Firefox 47)
--out arg output path. dir if contents enabled.
(default: HOSTNAME)
--err arg error path (file). (default:
HOSTNAME-err.log)
--proxy arg SOCKS5 proxy address:port. (default:
none)
--screen arg omits 200-level response if contents
contains screen (default: none)
-d [ --stdin ] read from stdin (default: no)
-t [ --tls ] use tls/ssl (default: no)
-s [ --sensitive ] complain about rude TCP teardowns
(default: no)
-o [ --tor ] use local proxy at 127.0.0.1:9050
(default: no)
-r [ --verify ] verify ssl (default: no)
-l [ --leadzero ] output leading zeros in URL (default:
no)
-e [ --telescoping ] do not telescope the pattern (default:
no)
-f [ --found ] print when resource found (default:
no). implied by verbose
-v [ --verbose ] prints gratuitous output to console
(default: no)
-c [ --contents ] read full contents (default: no)
--test no network requests, just write
generated URIs to console (default: no)
-p [ --optimize ] Optimize number of simultaneous
requests (default: no)
-i [ --init ] arg (=1000) Initial number of simultaneous requests
--min arg (=1) Minimum number of simultaneous requests
--max arg (=25000) Maximum number of simultaneous requests
--ssize arg (=50) Size of velocity sliding window
--sint arg (=1000) Size of sampling interval
-h [ --help ] produce help message
You can now pipe URLs to Abrade via the --stdin
option:
echo /anything/a/b/c?d=123 | abrade httpbin.org --stdin --contents --verbose
You must omit the pattern
positional argument to pipe from stdin.
You can also use the --screen
option to detect error landing pages that
still return 200 responses. Such responses get screened out and will not
get written to disk during a --content
scrape.
docker pull jlospinoso/abrade:v0.2.0
or
docker pull quay.io/jlospinoso/abrade:v0.2.0
docker pull jlospinoso/abrade:v0.1.0
or
docker pull quay.io/jlospinoso/abrade:v0.1.0
build
subdirectory.make
(*nix) or Visual Studio (Windows) to build the project.For example, on *nix:
git clone [email protected]:JLospinoso/abrade.git
cd abrade
mkdir build
cd build
cmake ..
make
On Windows, you'll need to open the abrade.sln
file and build.