containerd meets ipfs to distribute content
Containerd meets IPFS. Peer-to-peer distribution of content blobs.
Converting a manifest from DockerHub to p2p manifest:
# Term 1: Start a IPFS daemon
$ make ipfs
# Term 2: Start a rootless containerd backed by ipcs.
$ make containerd
# Term 3: Convert alpine to a p2p manifest
$ make convert
2019/06/04 13:54:40 Resolved "docker.io/library/alpine:latest" as "docker.io/library/alpine:latest@sha256:769fddc7cc2f0a1c35abb2f91432e8beecf83916c421420e6a6da9f8975464b6"
2019/06/04 13:54:40 Original Manifest [456] sha256:769fddc7cc2f0a1c35abb2f91432e8beecf83916c421420e6a6da9f8975464b6:
// ...
2019/06/04 13:54:41 Converted Manifest [456] sha256:9181f3c247af3cea545adb1b769639ddb391595cce22089824702fa22a7e8cbb:
// ...
2019/06/04 13:54:41 Successfully pulled image "localhost:5000/library/alpine:p2p"
Converting two manifests from DockerHub to p2p manifests, and then comparing the number of shared IPLD nodes (layers chunked into 262KiB blocks):
# Term 1: Start a IPFS daemon
$ make ipfs
# Term 2: Start a rootless containerd backed by ipcs.
$ make containerd
# Term 3: Convert ubuntu:bionic and ubuntu:xenial into p2p manifests, then bucket IPLD nodes into nodes unique to each image, and nodes inside intersect.
$ make compare
// ...
2019/06/04 13:51:33 Comparing manifest blocks for "docker.io/library/ubuntu:xenial" ("sha256:8d382cbbe5aea68d0ed47e18a81d9711ab884bcb6e54de680dc82aaa1b6577b8")
2019/06/04 13:51:34 Comparing manifest blocks for "docker.io/titusoss/ubuntu:latest" ("sha256:cfdf8c2f3d5a16dc4c4bbac4c01ee5050298db30cea31088f052798d02114958")
2019/06/04 13:51:34 Found 322 blocks
docker.io/library/ubuntu:xenial: 4503
docker.io/library/ubuntu:xenial n docker.io/titusoss/ubuntu:latest: 87550251
docker.io/titusoss/ubuntu:latest: 76117824
// 87550251 shared bytes in IPLD nodes
IPFS backed container image distribution is not new. Here is a non-exhaustive list of in-the-wild implementations:
P2P container image distribution is also implemented with different P2P networks:
The previous IPFS implementations all utilize the Docker Registry HTTP API V2 to distribute. However, the connection between containerd pulling the image and registry is not peer-to-peer, and if the registry was ran as a sidecar the content would be duplicated twice in the local system. Instead, I chose to implement it as a containerd content plugin for the following reasons:
content.Store
, no duplication.content.Store
interface will allow using ipcs
to also back the buildkit cache.IPFS imposes a 4 MiB limit for blocks because it may be run in a public network with adversarial peers. Since its not able to verify hashes until all the content has arrived, an attacker can send gibberish flooding connections and consuming bandwidth. Chunking data into smaller blocks also aids in deduplication:
IPCS implements containerd's content.Store
interface and can be built as a golang plugin to override containerd's default local store. A converter implementation is also provided that converts a regular OCI image manifest to a manifest where every descriptor is replaced with the descriptor of the root DAG node added to IPFS. The root node is the merkle root of the 262KiB chunks of the layer.
Although the IPFS daemon or its network may already have the bytes for all image's P2P content, containerd has a boltdb metadata store wrapping the underlying content.Store
.
A image pull, starting from the client side goes through the following layers:
proxy.NewContentStore
(content.Store)So in the case of this project ipcs
, a pull is simply flushing through its content.Store
layers to register the image in containerd's metadata stores. Note that the majority of the blocks don't need to be downloaded into IPFS's local storage in order to complete a pull, and can be delayed until unpacking the layers into snapshots.
Collected data on: 7/11/2019
Systems:
Configuration:
yamux
to mplex
: export LIBP2P_MUX_PREFS="/mplex/6.7.0"
sync
to false
StrategicProviding
Comparison:
Image | Total size (bytes) | IPFS blocks | DockerHub pull (secs) | IPFS pull (secs) | Diff (Hub/IPFS) |
---|---|---|---|---|---|
docker.io/library/alpine:latest | 2759178 | 14 | 1.744165732 | 0.7662775298 | 227.62% |
docker.io/ipfs/go-ipfs:latest | 23545678 | 103 | 1.791054265 | 1.633165299 | 109.67% |
docker.io/library/ubuntu:latest | 28861894 | 38 | 2.720580011 | 1.629809674 | 116.93% |
docker.io/library/golang:latest | 296160075 | 380 | 4.687380759 | 6.015498289 | 77.92% |
IPFS's performance seems to slow down as the number of nodes (size of total image) goes up. There was a recent regression in go-ipfs
v0.4.21 that was fixed in this commit on master
:
As seen from make compare
, there also doesn't seem to be any improvements in deduplication between IPFS chunks as opposed to OCI layers:
$ GO111MODULE=on IPFS_PATH=./tmp/ipfs go run ./cmd/compare docker.io/library/alpine:latest docker.io/library/ubuntu:latest docker.io/library/golang:latest docker.io/ipfs/go-ipfs:latest
// ...
2019/06/04 13:39:55 Found 1381 blocks
docker.io/ipfs/go-ipfs:latest: 46891351
docker.io/library/alpine:latest: 5516903
docker.io/library/golang:latest: 828096081
docker.io/library/ubuntu:latest: 57723854
// Zero block intersection, they are very different images though.
Serious usage of p2p container image distribution should consider Dragonfly and Kraken, because IPFS suffers from performance issues:
Related benchmarking:
Explore deduplication by adding each layer's uncompressed, untared files into IPFS to get chunked-file-granular deduplication. IPFS's Unixfs (UNIX/POSIX fs features implemented via IPFS) needs the following:
diff.Comparer
and diff.Applier
to apply custom IPFS layer mediatype to containerd's tmpmount.Explore IPFS-FUSE mounted layers for lazy container rootfs:
Explore IPFS tuning to improve performance
badgerds
is hard to shard for benchmarking purposes, because GC doesn't remove data on disk: https://github.com/ipfs/go-ds-badger/issues/54