Introducing Attic, a self-hostable Nix Binary Cache server

Hmm, we probably want to tweak the retry configs in the AWS SDK. But even then, it should automatically cleanup stray multipart uploads upon failures.

2 Likes

Question on the design. Does the attic client know to try downloading from the storage service directly if possible or is it proxied via the server?

E.g. am I downloading directly from B2/R2/S3?

1 Like

Downloads aren’t proxied and the Attic server just returns a 307 redirect to a presigned URL on S3. Uploads are streamed through the server because compression is handled server-side.

2 Likes

Awesome! For my use-case, this is much better than previous implementations.

I see that you’re working in compression, Kudos for that too!

4 Likes

Hi all, it’s been a week since the public release and I really appreciate the responses so far :slight_smile: Meanwhile, there are a few new things:

  • Attic can now be built statically (nix build .#attic-static), with a fix submitted upstream to Nix. This should make installation in CI environments much easier.
  • There is now an NixOS module for atticd that you can use via nixos/atticd.nix or nixosModules.atticd via flakes.
  • attic login without a token no longer overwrites existing tokens.
  • Fixed a bug where ~/.config/attic/config.toml was created with default permissions (should have been 600).
9 Likes

How do you generate the JWT for ATTIC_SERVER_TOKEN_HS256_SECRET_BASE64?

It’s not a JWT but just some random data to serve as the secret. openssl rand 64 | base64 -w0 should work. I should write more documentations on this.

2 Likes

I got atticd running with the nix module, but I didn’t get a root token. Any way to get one?

jan 11 00:47:39 nixos atticd[153175]: Attic Server 0.1.0 (release)
jan 11 00:47:39 nixos atticd[153175]: Running migrations...
jan 11 00:47:39 nixos atticd[153175]: Starting API server...
jan 11 00:47:39 nixos atticd[153175]: Listening on [::]:8484...
1 Like

The OOBE sequence isn’t triggered if you already have a config file. You can generate tokens with atticadm -f path/to/server.toml make-token (also set the ATTIC_SERVER_TOKEN_HS256_SECRET_BASE64 environment variable).

2 Likes

@zhaofengli Looks cool. Are you planning to support watch mode(aka cachix watch-store)?

1 Like

Yes, that’s definitely on the todo list! There is some refactoring that I need to take care of before I’ll get to it, though.

3 Likes

Downloads aren’t proxied and the Attic server just returns a 307 redirect to a presigned URL on S3. Uploads are streamed through the server because compression is handled server-side.

Actually this may be changing with the addition of chunking which requires the NAR to be assembled on the server. Additional caching (e.g., CDN) may be implemented outside atticd, and the assembly can even be done on FaaS platforms like Cloudflare Workers. From my experiments, chunking can improve the efficiency of storing huge unfree paths (e.g., vscode, zoom-us, etc.) by a huge amount.

As an example, I follow nixos-unstable and zoom-us (~500 MiB uncompressed, ~160 MiB compressed) frequently gets rebuilt while remaining at the same version. For the 5 store paths of zoom-5.12.9.367 picked from my store chunked with FastCDC at an average size of 5MB (coarse-grained to help reassembly from S3), the common chunks weigh ~470 MiB in total (~160 MiB compressed individually). In this scenario, the deduplication ratio for 5 paths is 0.25x (~210 MiB vs ~800 MiB).

I’m inclined to make chunking the only supported flow to simplify maintenance. Existing NAR files will be converted into a single chunk. Any thoughts?

3 Likes

When doing chunking, please also make sure that there is a way to recover from broken chunks easily. This will reduce the dedup factor, but will save the cache for being unusable because of a single flipped bit.

2 Likes

And now we have chunking! In this new model, NARs are backed by a sequence of content-addressed chunks in the Global Chunk Store. Newly-uploaded NARs will be split into chunks with FastCDC and only new chunks will be uploaded to the storage backend. NARs that have existed prior to chunking will be converted to have a single chunk.

When upgrading to the new version, you need to add the following to the configuration:

# Data chunking
#
# Warning: If you change any of the values here, it will be
# difficult to reuse existing chunks for newly-uploaded NARs
# since the cutpoints will be different. As a result, the
# deduplication ratio will suffer for a while after the change.
[chunking]
# The minimum NAR size to trigger chunking
#
# If 0, chunking is disabled entirely for newly-uploaded NARs.
# If 1, all newly-uploaded NARs are chunked.
nar-size-threshold = 131072 # chunk files that are 128 KiB or larger

# The preferred minimum size of a chunk, in bytes
min-size = 65536            # 64 KiB

# The preferred average size of a chunk, in bytes
avg-size = 131072           # 128 KiB

# The preferred maximum size of a chunk, in bytes
max-size = 262144           # 256 KiB

During a download, atticd reassembles the entire NAR from constituent chunks by streaming from the storage backend. This means that traffic is proxied through the machine running atticd.

If you don’t want chunking and would still like downloads to always stream directly from S3, Attic has you covered as well. You can configure nar-size-threshold to 0 to disable chunking entirely. With this configuration, all new NARs will be uploaded as one chunk. atticd will directly return presigned S3 URLs for NARs that only have a single chunk.

Some new entries to the FAQ:

Why chunk NARs instead of individual files?

In the current design, chunking is applied to the entire uncompressed NAR file instead of individual constituent files in the NAR. Big NARs that benefit the most from chunk-based deduplication (e.g., VSCode, Zoom) often have hundreds or thousands of small files. During NAR reassembly, it’s often uneconomical or impractical to fetch thousands of files to reconstruct the NAR in a scalable way. By chunking the entire NAR, it’s possible to configure the average chunk size to a larger value, ignoring file boundaries and lumping small files together. This is also the approach casync has taken.

You may have heard that the Tvix store protocol chunks individual files instead of the NAR. The design of Attic is driven by the desire to effectively utilize existing platforms with practical limitations [0], while looking forward to the future.

[0] In more concrete terms, I want to use Cloudflare Workers for the sweet, sweet free egress :smiley:

What happens if a chunk is corrupt/missing?

When a chunk is deleted from the database, all dependent .narinfo and .nar will become unavailable (503). However, this can be recovered from automatically when any NAR containing the chunk is uploaded.

At the moment, Attic cannot automatically detect when a chunk is corrupt or missing, since it’s difficult to correctly distinguish between transient and persistent failures. The atticadm utility will have the functionality to kill/delete bad chunks.

16 Likes

This only improves storage requirements on the server side but does not speed up downloading for the client, right?

1 Like

Currently it does not, but the attic client could serve as a local server that performs NAR reassembly client-side. We can also extend the Binary Cache API to let Nix itself support chunked downloads, but that would be a much longer process.

3 Likes

But that requires the attic client to cache the chunks, increasing storage demand. Or at least keep a mapping from Hashes to store-paths to reassemble NARs on demand. Or just be optimistic and choose a store path with a similar name (after the hash) to get a reference NAR…

I think the Tvix protocol might suit this case better, as it probably can integrate with the auto-optimise-store machinery.

I’m sure there will be a use case for each of the variants, depending on client storage, bandwidth, etc.
Maybe even a multistage solution with a Tvix/Attic bridge in the middle.

How is the chunking performed? Just by byte-count on the (uncompressed) NAR stream?

One of the benefits of single-file-based downloads is that basically all unchanged files dedup properly. Without this, a single-byte content change or length change will permute all later stream content and prevent successful dedup.

If the chunking protocol were aware of file boundaries in the NAR stream, and chose chunk boundaries to align with those, those boundaries become a point where the offset can be corrected and successfully dedup following identical content.

It’s Content-Defined Chunking based on a rolling hash (see the FastCDC slides for a quick overview), and it solves exactly the boundary-shift problem that would make Fixed-Size Chunking useless (adding a byte to the beginning or somewhere in the middle).

6 Likes

A couple of new things:

  • attic watch-store is here. It tries to batch paths together so the number of expensive operations (computing closures, querying missing paths) is minimized.
  • When uploading a path, the .narinfo will now be uploaded as part of the PUT payload if it’s larger than 4KiB. This makes Attic more usable behind reverse proxies with header size limits. The server must be updated to support this.

There may be some API breakages soon to support client-side compression (the NAR stream will still be decompressed and recompressed server-side), as well as a better way to handle API compatibility. Hopefully we can stabilize everything and cut the first release soon.

8 Likes