Best 27 Warc Open Source Projects

🗃 Open source self-hosted web archiving. Takes URLs/browser history/book...

Heritrix is the Internet Archive's open-source, extensible, web-scale, a...

Collect and revisit web pages.

The archivist's web crawler: WARC output, dashboard for all crawls, dyna...

InterPlanetary Wayback: A distributed and persistent archive replay syst...

Run a high-fidelity browser-based crawler in a single Docker container

Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron...

WarcDB: Web crawl data as SQLite databases.

Streaming WARC/ARC library for fast web archive IO

:whale2: Web Archiving Integration Layer: One-Click User Instigated Pres...

Bitextor generates translation memories from multilingual websites

Chrome extension to "Create WARC files from any webpage"

CoCrawler is a versatile web crawler built using modern tools and concur...

A toolkit for CDX indices such as Common Crawl and the Internet Archive'...

An Apache Spark framework for easy data processing, extraction as well a...