2023-11-14 re: Long-term S3 cache solutions meeting minutes #4

This is a follow-up of 2023-10-24 re: Long-term S3 cache solutions meeting minutes #1, 2023-10-30 re: Long-term S3 cache solutions meeting minutes #2 and 2023-11-07 re: Long-term S3 cache solutions meeting minutes #3.

Full details can be found in NixOS Cache GC Meeting - HedgeDoc.

Quick Recap

  • Work is still ongoing with analyzing the impact of 2 & 3 channel bumps and their deduplication, some woes were had regarding the amount of disk space available, this is now fixed as the EBS volume has been authorized to be resized to 1TB.
  • Bucket log ingestion is now automated, every day at 3AM UTC, a new ~300MB Parquet files is uploaded to the Archeology bucket enabling incremental consumption of the bucket logs: feat(users/flokli/nixos/archeology-ec2): automate bucket log parsing · Gerrit Code Review — here’s the example of the data we can get out of those log files: num requests and bytes sent per hour, nix-cache, 2023-11-10 · GitHub.
  • Now that Requester Pays has been enabled, we know that the users of the S3 are only Hydra, Fastly or the archeologists themselves. As Fastly perform chunk requests, we see many more requests for the same file (multiple requests for multiple chunks of the same file.)
  • We discussed CDN optimizations regarding fetching ranges or not, the history of it is documented here: cache.nixos.org: fastly<->s3 throttled? · Issue #212 · NixOS/infra · GitHub as we use a feature called Streaming Miss from Fastly.
  • We analyze the costs of the current S3 bucket and uncovered it was more expensive than predicted even after the Requester Pays implementation, I will let @zimbatm comment on it with more precise information as I don’t have any picture to elaborate on it.

Next steps

  • @flokli will do the classical S3 engineering on the bucket logs to avoid keeping them around for too long, though @zimbatm says this can be neglected compared to the rest of the cache.
  • @flokli and @edef continues to analyze the data by narrowing down the request rates to cold paths only instead of hot paths. It’s a JOIN with @edolstra’s SQLite database.
  • @flokli and @edef continues to perform a deduplication test on those 2/3 channel bumps on the EC2 archeology box.

cc @ron @delroth

10 Likes