Gargantua Versions Save

The fast website crawler

v0.5.0-alpha

3 years ago
  • Ignore invalid SSL certificates
  • Log response headers

v0.4.1-alpha

3 years ago

With the last release I introduced a bug which caused gargantua to visits same URL more than once.

v0.4.0-alpha

3 years ago

You can specify a log file with the --log argument:

gargantua crawl --url https://www.sitemaps.org/sitemap.xml --workers 5 --log "gargantua.log"
Date and time       #worker   Status Code     Bytes   Response Time   URL                                                          Parent URL
2020/11/05 09:23:14 #001:     200             4403    148.759000ms    https://www.sitemaps.org                                     https://www.sitemaps.org/ko/faq.html
2020/11/05 09:23:14 #002:     200             4403    290.536000ms    http://www.sitemaps.org/                                     https://www.sitemaps.org/ko/faq.html
2020/11/05 09:23:14 #003:     200            45077    283.243000ms    https://www.sitemaps.org/protocol.html                       https://www.sitemaps.org/ko/faq.html
2020/11/05 09:23:14 #004:     404             1245    155.376000ms    https://www.sitemaps.org/protocol.htm                        https://www.sitemaps.org/ko/faq.html
2020/11/05 09:23:14 #005:     200             4403    155.577000ms    https://www.sitemaps.org/index.html                          https://www.sitemaps.org/ko/faq.html
2020/11/05 09:23:14 #001:     200             2591    286.451000ms    http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd    https://www.sitemaps.org/ko/faq.html
2020/11/05 09:23:14 #003:     200            10839    143.738000ms    https://www.sitemaps.org/terms.html                          https://www.sitemaps.org/ko/faq.html
2020/11/05 09:23:14 #005:     200            15681    141.580000ms    https://www.sitemaps.org/faq.html                            https://www.sitemaps.org/ko/protocol.html
2020/11/05 09:23:14 #002:     404             1245    286.175000ms    http://www.sitemaps.org/protocol.htm                         https://www.sitemaps.org/ko/faq.html

v0.3.0-alpha

4 years ago

You can now customize the user-agent that is used by the crawler:

gargantua crawl \
                --url https://www.sitemaps.org/sitemap.xml \
                --workers 5 \
                --user-agent "gargantua bot / iPhone"

v0.2.0-alpha

7 years ago

If fixed the bug that caused the UI to no exit after the crawler was done and made a quick YouTube video showing how gargantua works:

gargantua-in-action-crawling-a-website

Blog post: https://andykdocs.de/!gargantua-prototype

v0.1.0-alpha

7 years ago

「 gargantua 」crawls websites from your command line and displays the results and statistics live via a text-based UI:

gargantua crawl --url https://www.sitemaps.org/sitemap.xml --workers 5

Screenshot of gargantua v0.1.0-alpha crawling sitemaps.org