Datatogether Warc Save Abandoned

Golang WARC (Web ARChive) Library

Project README

warc

GitHub Slack GoDoc License

warc is an implementation of ISO28500 1.0, the WebARCive specfication. it provides readers, writers, and structs for working with warc records.

from the spec:

The WARC (Web ARChive) file format offers a convention for concatenating multiple resource records (data objects), each consisting of a set of simple text headers and an arbitrary data block into one long file. The WARC format is an extension of the ARC File Format [ARC] that has traditionally been used to store "web crawls" as sequences of content blocks harvested from the World Wide Web. Each capture in an ARC file is preceded by a one-line header that very briefly describes the harvested content and its length. This is directly followed by the retrieval protocol response messages and content. The original ARC format file is used by the Internet Archive (IA) since 1996 for managing billions of objects, and by several national libraries. package warc

Affero General Public License v3

Getting Involved

We would love involvement from more people! If you notice any errors or would like to submit changes, please see our Contributing Guidelines.

We use GitHub issues for tracking bugs and feature requests and Pull Requests (PRs) for submitting changes

Usage

import "github.com/datatogether/warc"

Open Source Agenda is not affiliated with "Datatogether Warc" Project. README Source: datatogether/warc
Stars
27
Open Issues
4
Last Commit
4 years ago
License

Open Source Agenda Badge

Open Source Agenda Rating