Introducing Attic, a self-hostable Nix Binary Cache server

And now we have chunking! In this new model, NARs are backed by a sequence of content-addressed chunks in the Global Chunk Store. Newly-uploaded NARs will be split into chunks with FastCDC and only new chunks will be uploaded to the storage backend. NARs that have existed prior to chunking will be converted to have a single chunk.

When upgrading to the new version, you need to add the following to the configuration:

# Data chunking
#
# Warning: If you change any of the values here, it will be
# difficult to reuse existing chunks for newly-uploaded NARs
# since the cutpoints will be different. As a result, the
# deduplication ratio will suffer for a while after the change.
[chunking]
# The minimum NAR size to trigger chunking
#
# If 0, chunking is disabled entirely for newly-uploaded NARs.
# If 1, all newly-uploaded NARs are chunked.
nar-size-threshold = 131072 # chunk files that are 128 KiB or larger

# The preferred minimum size of a chunk, in bytes
min-size = 65536            # 64 KiB

# The preferred average size of a chunk, in bytes
avg-size = 131072           # 128 KiB

# The preferred maximum size of a chunk, in bytes
max-size = 262144           # 256 KiB

During a download, atticd reassembles the entire NAR from constituent chunks by streaming from the storage backend. This means that traffic is proxied through the machine running atticd.

If you don’t want chunking and would still like downloads to always stream directly from S3, Attic has you covered as well. You can configure nar-size-threshold to 0 to disable chunking entirely. With this configuration, all new NARs will be uploaded as one chunk. atticd will directly return presigned S3 URLs for NARs that only have a single chunk.

Some new entries to the FAQ:

Why chunk NARs instead of individual files?

In the current design, chunking is applied to the entire uncompressed NAR file instead of individual constituent files in the NAR. Big NARs that benefit the most from chunk-based deduplication (e.g., VSCode, Zoom) often have hundreds or thousands of small files. During NAR reassembly, it’s often uneconomical or impractical to fetch thousands of files to reconstruct the NAR in a scalable way. By chunking the entire NAR, it’s possible to configure the average chunk size to a larger value, ignoring file boundaries and lumping small files together. This is also the approach casync has taken.

You may have heard that the Tvix store protocol chunks individual files instead of the NAR. The design of Attic is driven by the desire to effectively utilize existing platforms with practical limitations [0], while looking forward to the future.

[0] In more concrete terms, I want to use Cloudflare Workers for the sweet, sweet free egress :smiley:

What happens if a chunk is corrupt/missing?

When a chunk is deleted from the database, all dependent .narinfo and .nar will become unavailable (503). However, this can be recovered from automatically when any NAR containing the chunk is uploaded.

At the moment, Attic cannot automatically detect when a chunk is corrupt or missing, since it’s difficult to correctly distinguish between transient and persistent failures. The atticadm utility will have the functionality to kill/delete bad chunks.

17 Likes