S3 Cache Discussion
Agenda
- updates
- topics
- questions
- next steps
(all numbers are estimates)
updates
[ron]: meeting with AWS in 45 min to get additional credits and/or time (12-24 months)
[ron]: experiment with Tigris Data
[edef]:
- excited, they are on Fly. Can become a smarter server over time
- Storage provider and durability questions
- $300/mo for Glacier to hedge the risk
- small objects
[tom]:
- narinfo? 90GB
- does this drive cost, or just inefficiency?
- can we accept the inefficiency?
-
[edef]:
- permissions
- Tigris will require egress bandwidth
- availability requirements, SLA, and durability
- Tigris is new
Raw bucket requests (releases.nixos.org)
[edef]: discussion regarding rising costs
[tom]: still a problem? No
- Current problem is resolved.
- Blocked on preventing future re-occurrence via policy.
- needs debugging Fastly and AWS policy
- proposal: monitor and alert for re-occurrence
S3 costs (cache.nixos.org)
[edef]: underlying bucket is requestor pays
- fastly has accepted the cost so far
- apx 2PB/month ~= 10Gbps
[tom]: chip away at this by convincing companies to shift to s3-requestor-pays
- who are the primary drivers?
- CI + GitHub (Azure)
- TODO: numbers? percentage?
- TODO: contact GitHub
- they do caching for Docker layers
- TODO: give us some internal-Azure support
- TODO: give us Azure block storage accounts
[edef]: can do DNS tricks to provide smarter storage
- Hydra writes to S3
- Contact GitHub
- Contact large users to use S3-requestor-pays
- open question: should this be encouraged?
- have a discussion, need a way to do social coordination
- need a “heavy cache users group”-type thing
- Maintain Fastly relationship
- Currently we are okay. Above measures will help, but are not yet critical.
[edef]: apx. 2x costs of storage vs bandwidth
- requests $30/day
- 404s are ???/day (see current efforts to resolve)
- concern that changes in serving behavior can cause problems
- cache misses increase latency for every novel build
Glacier
[edef]:
- blocker, 250 million objects
- 12.5K to put narinfo’s into glacier. (x2)
- storage
- S3 cost is size-driven
- Galcier cost has a #-objects component
- 10x reduction?
- criticisms:
- dedup: costly and not yet ready
- re-compress: costly
- small NARs + narinfo
- ~73% <128k objects are charged at 128K == 15TB cost. About 3TB real data. Annoying, but bearable.
- $9k is small objects for transfer
- 500 TB
- simple: spend $3-4k to move large things
- large stuff
- 485 TB (total)
- move it over
- gc for cache via tracing and 90GB parquet exists
- given roots it can estimate savings
https://cs.tvl.fyi/depot/-/tree/tvix/tools/weave - live data needed to ensure we do not throw
- contact Jonas (zimbatm) for access via Archivist bucket
- given roots it can estimate savings
[tom]: what portions? and tradeoff?
[tom]: send everything to Glacier, be more stringent in deletion from S3
- top-priority. Move everything to Glacier.
- What are the tradeoffs between roots vs. savings?
- reachability analysis
- last accessed analysis
- S3 logs. based on last 6 months)
- Fastly logs
- set-union (3) of (4) except for old FODs w/o CA
- build dependencies might be needed
- can solve this manually?
[edef]:
- can solve this manually?
- turn on versioning
- delete once known to be in Glacier
- delete is soft
- hard delete once confident
- test
- modify glibc, examine derivaiton tree
- can we fetch all FODs?
- get FOD list from croughan
- need a “data team”
[tom]: TODO ask via Marketing, tell people where the data, post where the data is, ask for help.
Long Run
- CA store
- dedup
- Tigris evaluation
- try it out, see if there is external usage
- S3-like interface
- eventually move infra to be near storage
- $10k/month
- “Cloud Act” provides free egress
- no egress
- QUESTION: are there case studies of anyone using Tigris for S3?
next steps
- Provide edef an approved discretionary budget.
- Focus on Glacier transfer.
- Copy first, EVERYTHING
- Need approval for the plan?
- Have techincal access
- have approval for storage
- need approval for transfer itself (~$4k)
- Build a “Root Set for Retention”
- Build a “Proposal for Deletion”
- Post analysis sources to Discourse