pdfHTML is an iText add-on for C# (.NET) that allows you to easily conve...
Browsertrix is the hosted, high-fidelity, browser-based crawling service...
📜 A CLI toolkit for extracting and working with your digital history
An open-source archive that gathers, saves, shares and analyzes news hom...
Library of Alexandria (LoA in short) is a project that aims to collect a...
Archive all your favorite podcasts
Bash Utility for Creating Stage 4 Tarballs
Scripts and other things for working with DEVONthink, a personal informa...
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC co...
The news homepage archive
Archive and unarchive databases as flat text files
A command line tool to archive a git repository from GitHub to the Inter...
A Jupyter/Jupyterlab extension to make, download and extract archive files.
A dockerized, queued high fidelity web archiver based on Squidwarc
Tools for tracking stories on news homepages