Upcoming Garbage Collection for cache.nixos.org

I keep seeing assertions being made, mostly in passing, about the “value” of old data.

Without meaning to imply any position on the topic, or invite discussion directly here (this announcement is not the thread for it), has the effort so far included some attempt to qualify or quantify that value?

Because the storage and egress fees are setting a bound on that value; that’s part¹ of the point of them existing, after all. There are lots of optimisations and options, but ultimately we need some way to rank the value of different retained data a little more directly². That perhaps also needs to include a “value to whom” or “for what use case” dimension that can then point to funding mechanisms for it.

Has this kind of formula been covered in the working group materials and write up so far? I’d be happy to see a pointer to it, if so. I’m also happy if the answer is something along the lines of “that’s over the other side of this first big hump” as an explicit scoping decision.

  1. Of course the actual number includes other elements (like profit and lock-in), and concentrates on a particularly narrow and specific unit of measurement.

  2. The current clean up proposal is clearly doing this by identifying the least-valuable / most-garbage store paths, but it feels like an indirect (though valid initial) approach.


If you want to help, I created a dedicated Matrix channel for this conversation: #archivists:nixos.org. (moved from #nixos-archivist:numtide.com).

The indexing is almost done, and we’ll be running the first GC script in 1-2 weeks. First, focusing on ISOs and AMIs as a first pass, as it’s a quick win and low risk.