S3 Update and Recap of Community Call

We wanted to share the latest updates and highlights of the recent S3 Community Call. We discussed our next steps as LogicBlox, the longstanding sponsor of https://cache.nixos.org, is ending its sponsorship. More context can be found here → The NixOS Foundation’s Call to Action: S3 Costs Require Community Support - Announcements - NixOS Discourse

Small note - We are still working on a few of the short term options to receive more information and will share immediately.

Meeting Overview

Our agenda covered an introduction to the issue, a review of our financial situation, potential options, and an open discussion.

Financials

We have over 200k EUR in the Foundation bank account, some of which have prior obligations. We can sustain the current costs for at least one year, but the termination of the LogicBlox sponsorship represents a significant cost increase as discussed prior.

Options Discussed

A host of solutions were proposed, from Foundation covering the short-term costs while developing a long-term solution to reaching out to AWS to subsidize costs. Other options included Cloudflare R2 with a potential promise of sponsorship, exploring trusted vendors like Wasabi, Telnix, Storj, and considering self-hosting.

We had guests from Storj and Cloudflyer present at the meeting who shared insights about their platforms, emphasizing their competitive cost structures, migration support, and potential for replacing Fastly.

Key Discussions

We discussed various topics from data storage, deduplication to the longer term options.

The overall model we are placing is to split between the short term and long term.

Short term will prioritize the matters we need to handle to have a solid resolution in place for at least the next 12 months. This will allow the Infra team and the general community to explore a longer term solution with optimizations and improvements which are out of scope for the timeframe we currently have.

Key Priorities for the Short Term

  1. 0 Risk - Provide and move forward with a resolution that proposes the least amount of change/risk. As an example, our preference would be to stay on AWS in the short term if feasible.
  2. Costs - While the foundation made measures to always be in a financial state to support events such as this, we prefer to find a resolution that is cost-effective, so those funds can be used for more community related matters. Should it come to it, we have made sure to have the reserves and can support the current state for almost a full year.
  3. Longevity/scalability - We’d prefer a resolution that can be potentially more aligned with the next 5 to 10 years. Proper SLA/Guarantees so that we have time to migrate properly when we need to. With that in mind, we are still optimizing in the short term for the two priorities above.

Short-term Solutions Discussed and General Update

  1. Storj - During the call: Storj has offered an 80% discount for pricing until we double our usage and described their system as being similar to Tahoe-LAFS
  2. Cloudflare R2 - The foundation had an initial call with the Cloudflare team on June 7th. We provided an overarching review of Nix, community, the current situation and dove into the opportunities to collaborate based on the Cloudflare OSS Program.
    1. Next Steps: Cloudflare team is taking the discussion internal and will revert in the coming days.
  3. AWS S3 - We have been in contact with AWS throughout the week and awaiting a response to meet with the OSS team. We expect further information in the coming days with the primary goal of exploring sponsorship.
  4. Migrate billing to foundation

You can view the full discussion/options in the call notes, discourse thread and GitHub issues in the links below.

Long-term Solutions

For long-term solutions, discussions ranged from self-hosting with community hardware, adopting more efficient data structures, to implementing deduplication methods. We believe that once the immediate problem is solved, we can provide a runway of approximately 1 year for the infra team to work on these longer-term solutions.

Thanks to all the participants and contributors. The efforts and care from so many folks is nothing short of incredible.

Please comment/add any further topics and items!

Links:

26 Likes

Was Backblaze B2 one of the options you considered or did you reach out to them. They were shortly mentioned in the previous thread as an option due to 0 egress cost from S3 and competitive pricing: The NixOS Foundation's Call to Action: S3 Costs Require Community Support - #77 by hexa

2 Likes

what about the option of deleting unreachable paths that would free up between 79-94% of the space, potentially in conjunction with the other solutions?

4 Likes

Domen had a bad time using B2 for Cachix and had to migrate off it. So I wouldn’t be surprised if people are nervous about it.

AIUI (I don’t have any decision-making capacity here), deletion is on the table but needs to be handled carefully, because this bucket might have the last copy of sources that no longer exist. I know this has already happened because I found some instances when testing fonts in nixpkgs for reproducibility.

7 Likes

Adding on to this, options that might impose a wider change are being split into the “long term” bucket just so we can have a more structured approach giving more time for the Infra team and community members to explore/evaluate/test.
Our current goal is just to maintain as close to as is with minimal impact to funds.

7 Likes

Perhaps we need to need to develop a retention policy over the next year, that spells out clearly which derivations must always be kept and which can be deleted after a certain period of time.

Here’s an example policy:

Easy wins for deletion:

  • NixOS Release ISOs/RPi SD images/Virtualbox VHDs - They can be trivially rebuilt from other cached packages, take up space with duplicate content, and are rarely if ever substituted from cache.
    • In the long term, we could consider a dockerTools-like streamed image approach that doesn’t put the final image into the store at all.
  • Unreachable binaries for older (N-2?) releases of NixOS - These should be very rarely requested, and we can still keep the metadata and checksums to aid in statistics generation and future research.

Paths to never delete:

  • Sources for software. These generally don’t change as often as the binaries built from them, so shouldn’t be as expensive to cache.
  • Patches. Most patches should exist in the nixpkgs repo, but there’s likely patches that were pulled with fetch* functions at some point and I’m not sure how one would disambiguate them.
5 Likes

I thought about it too, would love to see this feature implemented…