Reading through the options, a lot of the suggestions are wildly unproven. The main considerations with a volunteer organization like this should be that when you move, you move for a good period of time and the place you move to is as easy to operate as possible (within your budget).
With the financial reserves, the short term solution (3-6 months) should probably be to stay on S3 with some tweaks and eat the 9k/month. If that 9k can be significantly reduced with some of the measures discussed here, that becomes even more attractive.
For longer term I would suggest doing a lot of due diligence before actually moving anywhere.
To be up front I work for the company I am going to suggest. American Cloud, https://americancloud.com, does not charge egress fees and we offer an Object Storage service similar to the S3 experience. We also would be more than happy to assist in the migration for free. If you are interested please email at firstname.lastname@example.org and I will get a meeting scheduled with our CEO, COO and CTO.
It could also be an option to have an option that allows sharing only the packages one has installed in order to save disk-space. I’d be up for that. It would also make it interesting to organizations that want to spin up a multiple instances of nixos machines in different datacenters.
I think it would also make it much easier for the community to contribute: “activate this flag and you contribute” vs “please donate money into this pot, it’s probably going where we say it is”.
I’m with @vs49688 : we probably want to hold all source archives and patches, but in general, we will have to be very careful when doing this. I’m sure there are other ways the cache has cached derivations from nixpkgs which cannot be rebuilt from scratch. (It took a long time before people noticed that fetchZip/postBuild changes broke many fonts).
If you set up a VPC with an S3 Gateway Endpoint (free), you then get free transfers for S3<->VPC. So you could shove a bunch of machines in there (which you still pay for, of course) to do this without paying to egress the entire contents of the bucket.
Every time I’ve looked at S3 Intelligent Tiering in my own work, the $0.0025 per 1,000 objects automation fee makes me nervous. According to @edolstra, there are 667M objects in the cache.nixos.org bucket, so you’re paying $1667.50/month in automation fees, and 3/4 of the bucket is already in Infrequent-Access tier by some mechanism or other. So Intelligent Tiering needs to move a lot of stuff to smarter storage classes to come out ahead (or we only turn it on for large NARs, or something).
I’m aware of some mirrors of hydra, which is very helpful. It should be easier to set up a mirror, and it should actively be promoted. There are more concerns than just storage, such as DNS Blocks, which effect reproducibility (another benefit of using something like bittorrent/ipfs as a substitution mechanism)
I host a website on IPFS (I also ran a few ipfs nodes manually), it’s been quite unstable overall and generally slow (which is th reason Dhall moved away from it as well, see Use GitHub as official source of Prelude? · Issue #162 · dhall-lang/dhall-lang · GitHub). However this was 5 years ago, if IPFS actually works readonably now I’m all for this proposal. Different parts of the cache can be hosted on different systems, and I already sacrifice my ssd to the nix gods, might as well contribute whatever cache I have to help with hosting costs. I wonder if anyone tested it enough to see if it works well in practice.
There have been many suggestions here but it seems like feedback from those that make decisions is missing (or at least I can’t find it).
How/where can we follow the decision making process? Are there calls we can listen in to? Maybe even recordings thereof as the community is worldwide and thus not in the same timezone.
Maybe due to the spread out nature of the community decisions are made async? (That’d be awesome BTW) If so, where do we follow this?
Maybe some people here work in the industry or have some server space and bandwidth they could share, but aren’t jumping in as things seem quite opaque. It’s even possible that a decision has already been made, so somebody with a solution and just getting back on the forum isn’t posting due to that assumption.
I have a question. Do peoples /nix directories also contain this information in a useful form to recreate parts of the cache? If so, perhaps that could be used in a scenario where there is a desire to recreate the cache with less egress costs? There could be a tool that given a list of wanted derivations checks what is present and uploads that.
Hashes could be computed and compared to validate the integrity, and eventually only what could not be retrieved from the community need to be exported.
Cumbersome and hopefully unneccessary but possibly an option?