2024-07-10 Long-term S3 cache solutions meeting minutes #7

fricklerhandwerk · July 10, 2024, 7:08pm

I’ve spent time with @edef today to help everyone keep tabs on what’s happening. Here are some notes.

Very good news: Got read-through to work on test bucket after some initial hiccups
Added latency is negligible, on the order of 1-2 ms, probably dominated by the handshake
Side notes:
- AWS logs effectively show which software is used how much
  - we could in theory use that to optimise maintenance efforts
  - slightly tricky to work with that sensitive data (IP addresses), but can be done
  - we even have a machine in there for just that
  - right now we’re doing remarkably little with the huge amount of data we have at our disposal
- currently the “data/archivists” team is just @edef
  - mostly figuring out which questions to ask
  - lots of time goes into data cleaning
  - trying to do some analysis what we can do with the cache data
- there’s a difficulty with mapping store path hashes to packages
  - if we have the hash in the store, we get a narinfo
    - there’s a narinfo dump tool from last year
  - otherwise all we get is a store path
  - only a few % of all store derivations are cached
    - not clear what’s the criterion for keeping them or whether we started saving all of them at some point
    - ~430k drvs as of end 2023, but 200M store paths
- since recenlty we’re collecting very granular long-term AWS cost data
  - there’s something to be gleaned from that for sure
  - e.g. we only serve ~1T/mo of traffic from the bucket directly, costing a bit under $100/mo
since Tigris claims they copy async, this would mean we’d serve each object twice initially
the front-end for all this is Fastly
- Fastly configuration: infra/terraform/cache/s3-authn.vcl at 9532a8bb92ae9319de11b6c2052b0d68ff3b4dae · NixOS/infra · GitHub
when moving the cache, we’d likely break the Tsinguhua University cache replication mechanism
- they’re hitting the S3 bucket directly for releases: tunasync-scripts/nix.sh at c2051ee938594b22423f6d18e92690c9b11763fa · tuna/tunasync-scripts · GitHub
- cache URLs are taken from Fastly: tunasync-scripts/nix-channels.py at c2051ee938594b22423f6d18e92690c9b11763fa · tuna/tunasync-scripts · GitHub
- our S3 bucket is not a public API and may change in breaking ways any time, but it’s set to “requester pays”, so anyone paying AWS for the traffic can do whatever
  - once we migrate off AWS all of that will be gone, except for a copy of everything in Glacier for disaster recovery purposes
    - this would incur acceptable ongoing cost, and the cost of recovery is still lower than the cost of recovering data that doesn’t exist
- if we want to change how we deliver data to them, we want to generally improve the replication story for everyone
- contacted @NickCao to coordinate the transition
- @dramforever got in touch with the infra team to collaborate – thank you very much!
Next steps:
- @edolstra @ron: we need a credit card to pay for the Tigris account
  - 5GB of free allowance, but that is obviously too little
- ideally we’d not hit S3 for the 404 path
  - need to serve 404s very fast
  - currently we’re serving from S3, this is bad
    - we should be able to do a lot better
    - there’s only 5GiB of data required for answering whether to 404
    - paying S3 for requests, but fairly little, so cost is secondary concern
  - narinfo is on the critical path for end-user experience
  - this is optimisation for later though
- and we don’t want to hit Tigris with the narinfo workload yet
- have to think about costs of uploading to Glacier

fricklerhandwerk · October 10, 2024, 9:13am

Unfortunately we weren’t able to make much progress since the last update. We were in touch with Tigris, who were keen to make sure we have everything we need. Sadly @edef capacity to do work is severely reduced due to ongoing health issues, and my ability to support her is limited to being there to talk.

Since, thanks to @ron’s effort, Amazon significantly raised our allowance, the pressure is off for the time being. The foundation currently has no storage costs, to my knowledge. But the ultimate goal is still to substantially reduce the resource consumption, as we can’t rely on sponsorship forever, which in principle may and at any time.

Right now there’s no one really “owning” the cache in the sense of being capable, available, responsible, and accountable to make and implement decisions, including big changes. The arrangement set up end of last year with @flokli and @edef was supposed to change that, but it didn’t work out for multiple reasons. I had picked this up mid-2024 to see if we can keep things on track, so I still keep an eye on it – but I can’t do the actual work, this is not my area of expertise and I have other commitments that prevent me from leaning into it.

@zimbatm recently gave @Mic92 the necessary permissions to be able to do (small?) things in the meantime as needed. That is only a solution for maintenance mode and therefore mereley a temporary measure. This is why I’m supporting @Erethon to onboard into the infra team, who has the relevant experience, already helped us a great deal with devops for Summer of Nix, and has proven to be a reliable collaborator. He’ll assist us to get the work-in-progress security tracker deployed, which could be a stepping stone to get the cache migration moving again. I’ll take care of managing the foundation’s financial capacity to make this economically viable and sustainable, in consultation with @ron until the steering committee becomes operational.

Mic92 · October 11, 2024, 3:23am

Always happy to see new people that are interested with helping on the NixOS infrastructure. Since I am meeting flokli and edef in person next week, I was planning to dedicate the whole week to make progress on that the binary cache front.

Mic92 · October 20, 2024, 7:42am

Okay, so I also had a chat with @edef1c who is working on a more sound gc algorithm.
@zimbatm also documented the current tools and what needs to be done (hopefully soon documented in a more public state)
From that I saw that it’s actually not so trivial to get the list of all live store paths for this big bucket.
So here is my plan B in case we cannot make the gc work soon enough:

We would create a new s3 bucket next to the old bucket and make hydra push to the new bucket. Old store paths would be accessible through the old bucket and new data in the new bucket. Fastly will check both buckets sequentially when a requests hits cache.nixos.org.
We remove the old bucket from fastly after some transition period.
After some testing period, we send the whole old bucket to glacier.

The configuration for this plan is here: Introduce staging binary cache to reduce our binary cache size by Mic92 · Pull Request #492 · NixOS/infra · GitHub

If someone wants to suggest a technically better solution, they are welcome to implement it.