I’ve spent time with @edef today to help everyone keep tabs on what’s happening. Here are some notes.
- Very good news: Got read-through to work on test bucket after some initial hiccups
- Added latency is negligible, on the order of 1-2 ms, probably dominated by the handshake
- Side notes:
- AWS logs effectively show which software is used how much
- we could in theory use that to optimise maintenance efforts
- slightly tricky to work with that sensitive data (IP addresses), but can be done
- we even have a machine in there for just that
- right now we’re doing remarkably little with the huge amount of data we have at our disposal
- currently the “data/archivists” team is just @edef
- mostly figuring out which questions to ask
- lots of time goes into data cleaning
- trying to do some analysis what we can do with the cache data
- there’s a difficulty with mapping store path hashes to packages
- if we have the hash in the store, we get a narinfo
- there’s a narinfo dump tool from last year
- otherwise all we get is a store path
- only a few % of all store derivations are cached
- not clear what’s the criterion for keeping them or whether we started saving all of them at some point
- ~430k drvs as of end 2023, but 200M store paths
- if we have the hash in the store, we get a narinfo
- since recenlty we’re collecting very granular long-term AWS cost data
- there’s something to be gleaned from that for sure
- e.g. we only serve ~1T/mo of traffic from the bucket directly, costing a bit under $100/mo
- AWS logs effectively show which software is used how much
- since Tigris claims they copy async, this would mean we’d serve each object twice initially
- the front-end for all this is Fastly
- when moving the cache, we’d likely break the Tsinguhua University cache replication mechanism
- they’re hitting the S3 bucket directly for releases: tunasync-scripts/nix.sh at c2051ee938594b22423f6d18e92690c9b11763fa · tuna/tunasync-scripts · GitHub
- cache URLs are taken from Fastly: tunasync-scripts/nix-channels.py at c2051ee938594b22423f6d18e92690c9b11763fa · tuna/tunasync-scripts · GitHub
- our S3 bucket is not a public API and may change in breaking ways any time, but it’s set to “requester pays”, so anyone paying AWS for the traffic can do whatever
- once we migrate off AWS all of that will be gone, except for a copy of everything in Glacier for disaster recovery purposes
- this would incur acceptable ongoing cost, and the cost of recovery is still lower than the cost of recovering data that doesn’t exist
- once we migrate off AWS all of that will be gone, except for a copy of everything in Glacier for disaster recovery purposes
- if we want to change how we deliver data to them, we want to generally improve the replication story for everyone
- contacted @NickCao to coordinate the transition
- @dramforever got in touch with the infra team to collaborate – thank you very much!
- Next steps:
-
@edolstra @ron: we need a credit card to pay for the Tigris account
- 5GB of free allowance, but that is obviously too little
- ideally we’d not hit S3 for the 404 path
- need to serve 404s very fast
- currently we’re serving from S3, this is bad
- we should be able to do a lot better
- there’s only 5GiB of data required for answering whether to 404
- paying S3 for requests, but fairly little, so cost is secondary concern
- narinfo is on the critical path for end-user experience
- this is optimisation for later though
- and we don’t want to hit Tigris with the narinfo workload yet
- have to think about costs of uploading to Glacier
-
@edolstra @ron: we need a credit card to pay for the Tigris account