Celler: An Attic Fork

tl;dr There is an Attic binary cache fork that aims to continue the stalled Attic development (7 months of no activity).

Why Fork Attic?

Here at Cyberus Technology, we tried to set up a large Attic instance for CTRL-OS last year. We eventually switched back to a simpler solution. While much of the blame goes to the underlying blob storage provider (cough Hetzner cough), we noticed that for a production setup, there are some crucial features that are lacking. Most notably for our case, there is no good way to troubleshoot an Attic instance beyond instrumenting the code. Or to put it in more economic terms, while Attic roughly saves us 60% of storage costs, it also massively increases our maintenance costs. Not a win!

So while production is happily chugging along on a plain old S3 bucket for now, I would really love to switch back to Attic, because of storage savings, but also because it offers a nice way to have per-developer or per-project caches on our own infrastructure. But upstream Attic development has stalled. So instead of letting our changes sit in PRs without getting merged, there is now celler. (“Celler” is Catalan for wine cellar, but pronounce it as you please. :wine_glass:)

Our Plans

So what are the plans for Celler? Initially, I have been working on updating all dependencies. This was harder than planned, because Attic uses C++ FFI bindings to libnixstore. These bindings would subtly break on updates. The first step was getting rid of these bindings. Instead we now use native Rust code to talk to the Nix daemon (via gorgon’s nix-daemon crate) where possible and shell out to Nix CLI where necessary.

With the updates largely done, we can look towards features that are crucial for production setups. My most important goals are to add proper (error) logging and metrics to the code base. Anyone operating Celler should see in their ops dashboard how things are going.

A bit more in the future, I’m looking towards OIDC integration. Using sudo on the server to create user credentials is not a workflow that scales!

What Are Your Ideas?

I’m genuinely curious what other people are missing from Attic. What features would you love to have? Which PR from the attic repository deserves to be merged?

28 Likes

Is this a drop in replacement at this time, or has the db schema changed?

I see this PR for loading tokens from files instead of environment variables. I would find this useful if it hasn’t been addressed yet.

I don’t think a fork of an attic is a celler. Garret would be a much better name given the project concept.

Can we have Celler expose a gRPC API for a generic CAS (content addressable storage) for other remote building / CI/CD systems? :slight_smile:

2 Likes
  • Especially long-lived JWTs are a footgun and Attic provides little remediation short of rotating signing keys. OIDC and short-lived tokens is the correct approach here. That would also allow integrating with Forgejo (Details), if I understand correctly.
  • We can inspect caches, if we know their name. Why can’t we list existing caches?
  • Fix the horrible CLI naming (attic is fine, atticd-atticadm isn’t)

I very much appreciate work going into this code base.

7 Likes

gc pinning (i.e. the software is released, don’t purge it without pulling the pin). and potentially look at options to drop the database entirely. I mean at that point it’s probably a rewrite, but i’d love to a design that stores bulk metadata updates as a WAL in the blob store (with eventual compaction into an index), s.t. that you can have a stateless cache server and a blob store and you are golden. I know s3 can do this safely with conditional writes, I just don’t know if other not s3 but s3-compatible blob stores can do that kind of thing? Anyway, a good gc pinning story would be incredible, and if there were a pot I would try to throw money into it for that feature.

EDIT: I mean maybe you could do it without conditional writes, but I think the story gets much more complicated. I’d also be prepared to run 3 servers instead of one with RDS (i.e. way cheaper to run 3 t4g micros than one HA RDS, and just replicate txn metadata into raft log written back to s3), but now I’ve strayed into @bme’s fantasy cache server.

Glad someone picked up the work.

I noticed that Prometheus was planned. I would go further (and simplify at the same time) and would suggest OpenTelemetry support instead! As it can be configured to be compatible with Prometheus exporters it should be the best of both worlds, while giving way more observability to the application.

5 Likes

What concrete problem are you looking to solve?

Consider generic CI/CD systems or local building systems who wants to directly push to the cache artifacts, not necessarily Nix binary cache artifacts.

If celler is this multi-tenant generic system for content addressable artifacts, with a touch for Nix binary cache, we can reuse it as an interface in many other software without locking ourselves via the HTTP Nix binary cache API protocol which is very naive.

The database schema is still identical. So you can migrate by pointing celler to your attic database. This is not tested, so please do so at your own risk. I’ve opened this ticket to document the process: Migration Guide for Attic · Issue #19 · blitz/celler · GitHub

Loading secrets via systemd LoadCredentials= and friends would be really neat. It’s not done yet. I’ve opened: Use systemd credentials · Issue #18 · blitz/celler · GitHub

Totally agree with the long-lived JWTs. The way this is currently handled also leaves them in your shell history, making this extra bad.

Regarding listing caches, I’ve opened: Allow listing existing caches · Issue #20 · blitz/celler · GitHub

atticd-atticadm (and cellerd-celleradm) is just the result of lacking proper authentication support and should go away in this form once OIDC is supported.

I like the idea of adding GC roots. I’ve tracked this here: Per-Cache Garbage Collection Roots · Issue #21 · blitz/celler · GitHub

Regarding re-architecting Celler to remove the RDS: Sounds interesting, but that would require rearchitecting the codebase. The nice thing about the RDS approach is conceptual simplicity.

That sounds like a brainstorming session in waiting. Let’s get back to that when we have an opportunity to chat.

1 Like

I’m wondering if Celler should join forces with GitHub - kalbasit/ncps: Nix binary cache proxy service -- with local caching and signing. · GitHub . I use both (well ncps and attic).

I think there some features overlap.

Support for uploading and fetching build logs would be useful.

Related:

1 Like