2024-06-06 re: Long-term S3 cache solutions meeting minutes #6

tomberek · June 6, 2024, 11:00pm

S3 Cache Discussion

Agenda

updates
topics
questions
next steps

(all numbers are estimates)

updates

[ron]: meeting with AWS in 45 min to get additional credits and/or time (12-24 months)
[ron]: experiment with Tigris Data
[edef]:
- excited, they are on Fly. Can become a smarter server over time
- Storage provider and durability questions
- $300/mo for Glacier to hedge the risk
- small objects
[tom]:
- narinfo? 90GB
- does this drive cost, or just inefficiency?
- can we accept the inefficiency?
-
[edef]:
- permissions
- Tigris will require egress bandwidth
- availability requirements, SLA, and durability
- Tigris is new

Raw bucket requests (releases.nixos.org)

[edef]: discussion regarding rising costs
[tom]: still a problem? No

Current problem is resolved.
Blocked on preventing future re-occurrence via policy.
- needs debugging Fastly and AWS policy
- proposal: monitor and alert for re-occurrence

S3 costs (cache.nixos.org)

[edef]: underlying bucket is requestor pays
- fastly has accepted the cost so far
- apx 2PB/month ~= 10Gbps
[tom]: chip away at this by convincing companies to shift to s3-requestor-pays
- who are the primary drivers?
- CI + GitHub (Azure)
- TODO: numbers? percentage?
- TODO: contact GitHub
- they do caching for Docker layers
- TODO: give us some internal-Azure support
- TODO: give us Azure block storage accounts
[edef]: can do DNS tricks to provide smarter storage
- Hydra writes to S3

Contact GitHub
Contact large users to use S3-requestor-pays
- open question: should this be encouraged?
- have a discussion, need a way to do social coordination
- need a “heavy cache users group”-type thing
Maintain Fastly relationship
- Currently we are okay. Above measures will help, but are not yet critical.

[edef]: apx. 2x costs of storage vs bandwidth
- requests $30/day
- 404s are ???/day (see current efforts to resolve)
- concern that changes in serving behavior can cause problems
- cache misses increase latency for every novel build

Glacier

[edef]:

blocker, 250 million objects
12.5K to put narinfo’s into glacier. (x2)
storage
- S3 cost is size-driven
- Galcier cost has a #-objects component
10x reduction?
criticisms:
- dedup: costly and not yet ready
- re-compress: costly
small NARs + narinfo
- ~73% <128k objects are charged at 128K == 15TB cost. About 3TB real data. Annoying, but bearable.
- $9k is small objects for transfer
- 500 TB
simple: spend $3-4k to move large things
large stuff
- 485 TB (total)
- move it over
gc for cache via tracing and 90GB parquet exists
- given roots it can estimate savings
  https://cs.tvl.fyi/depot/-/tree/tvix/tools/weave
- live data needed to ensure we do not throw
- contact Jonas (zimbatm) for access via Archivist bucket

[tom]: what portions? and tradeoff?
[tom]: send everything to Glacier, be more stringent in deletion from S3

top-priority. Move everything to Glacier.
What are the tradeoffs between roots vs. savings?
reachability analysis
last accessed analysis
- S3 logs. based on last 6 months)
- Fastly logs

set-union (3) of (4) except for old FODs w/o CA
build dependencies might be needed
- can solve this manually?
  [edef]:
turn on versioning
delete once known to be in Glacier
- delete is soft
- hard delete once confident
test
- modify glibc, examine derivaiton tree
- can we fetch all FODs?
- get FOD list from croughan
need a “data team”
[tom]: TODO ask via Marketing, tell people where the data, post where the data is, ask for help.

Long Run

CA store
dedup
Tigris evaluation
- try it out, see if there is external usage
- S3-like interface
- eventually move infra to be near storage
- $10k/month
- “Cloud Act” provides free egress
- no egress
- QUESTION: are there case studies of anyone using Tigris for S3?

next steps

Provide edef an approved discretionary budget.
Focus on Glacier transfer.
- Copy first, EVERYTHING
- Need approval for the plan?
  - Have techincal access
  - have approval for storage
  - need approval for transfer itself (~$4k)
Build a “Root Set for Retention”
Build a “Proposal for Deletion”
Post analysis sources to Discourse

zimbatm · June 7, 2024, 6:41am

This will limit our backend evolution. For example, if we want to move the narinfos to a faster KV store. But easy to implement.

Another alternative is to develop a caching proxy solution we could give to companies to install on their premises, a bit like Netflix’s Open Connect Appliances. This can absorb a lot of the traffic.

picnoir · June 7, 2024, 7:06am

Guix created the nar-herder that kind of implements such a setup for the bordeaux.guix.gnu.org binary cache. We could potentially re-use parts of the tool. Maybe it could make sense to reach out to the author of this tool who’s been working on this problem space for quite a while.

flokli · June 7, 2024, 8:00am

Any fetch-through caching solution (like an nginx with some caching config even), deployed at a site doing a lot of requests, and that endpoint configured on each machine should help reducing the number of requests for the same contents. We could help providing example configs, as well as documentation on how to test it is working - in addition to ensuring failure modes in Nix are well-understood.

Anything doing something more fancy than that “closer to the client” becomes a bit of a liability on our side, as we need to be aware of such deployments when changing anything in the path. Currently our only stable public interface exposed by cache.nixos.org is the http protocol, we need to be very careful before encouraging to rely on something else, as that’ll then also become some public API which would need to be supported for a considerable amount of time.

fricklerhandwerk · June 7, 2024, 9:23am

I’m in for putting up such a tutorial or guide in a prominent place on nix.dev.

samrose · June 12, 2024, 2:10pm

@zimbatm a while back @RaitoBezarius also mentioned How big is the binary cache? Would it be reasonable to make a mirror? - #3 by RaitoBezarius

could be a useful transfer mechanism for people who are thinking about self-hosting as you mentioned

srd424 · June 12, 2024, 5:31pm

FWIW, I did this a few months back just for my local home network (old habits, really - having grown up in the 90s lack of local cache always feels a decadent waste of bandwidth!)

Despite lack of familiarity with modern caching solutions, etc., I cobbled something together and haven’t hit any weird failure modes so far.

We could help providing example configs, as well as documentation on how to test it is working - in addition to ensuring failure modes in Nix are well-understood.

Simple docker images, VM images, nix configurations, etc would be an obvious way of supplying a (software) “appliance.”

samrose · June 13, 2024, 12:27pm

This seems like a really good immediate and possibly ongoing solution.

If it’s true that the order of substituters in your nix.conf determines the order of priority they are used, this could be kind of a solved problem if we show people how to set up a cache at s3, cachix, tigris, cloudflare, etc and then just make cache.nixos.org the last resort and last-listed cache + how to make sure they are pushing to and using their own cache every time they successfully build. That would not change the size of nixpkgs cache, but can decrease the load on it.

Setting up your own cache is an underrated and underrepresented feature of nix anyway. More people should know about it and leverage it.

fricklerhandwerk · October 10, 2024, 8:44am

By the way, this is now done!

flokli · October 20, 2024, 6:57am

This does not document a fetch-through binary cache cache things from cache.nixos.org, but how to host and sign things from a local Nix store. I think it would make sense to document both.

fricklerhandwerk · October 28, 2024, 7:55am

@flokli it would be enough to set up the remote builds with builders-use-substitutes = true and the HTTP cache on the same machine (more precisely: to use the same store) and disable the default cache on the client, right? I can add that.