The NixOS Foundation's Call to Action: S3 Costs Require Community Support

bmabsout · June 13, 2023, 4:24am

What about a model where companies using the cache to obtain more than x-gigabytes per month need to pay some amount per gigabyte after x? Instead of being purely reliant on donations. If that means they end up hosting the cache themselves, thatbincreases the chance of having mirrors which offset the hoeting costs to begin with.

adam248 · June 13, 2023, 5:25am

We need an opt-in peer-to-peer cache-sharing system…
basically torrents with the built-in hash checking
Users of Nix Packages and NixOS can opt-in to allow sharing of their private bandwidth and storage to share cached builds.
Then we can use cache.nixos.org to manage the torrent links, then add garbage collection to the cloud storage to clean up old data.
I am sure there are many Nix fans who will gladly share their raw bandwidth and data storage to help the NixOS foundation out.

wamserma · June 13, 2023, 6:34am

NobbZ · June 13, 2023, 3:27pm

Does trustix work? Is it moving forward? I tried asking a question in their matrix a couple of weeks ago. After a couple of days all I had was a “” reaction…

All I wanted to know want, how I can use my machines as substitutes mutually and/or providing to a higher global network of trust.

vcunat · June 13, 2023, 3:48pm

Trustix and P2P don’t solve the “guaranteed storage” problem, i.e. the S3 replacement. We have signatures by trusted hydra infra (and they’re small). We have a CDN distributing all data wolrdwide and fast (for free).

adam248 · June 14, 2023, 7:09am

Given the fact that one of the official solutions offered was to do garbage collection, I don’t think anyone should think cache.nixos.org’s S3 is a guaranteed storage solution.
It is for our convenience and ease of use.
If you want guaranteed storage, you should host your own copies of vital binaries.
If you are doing that, then please share with the community. (Unless they are just custom builds).
And it is my understanding, (someone correct me if I’m wrong), that if push comes to shove Nix can compile everything from source if it has to.
So as long as the source code/binaries are available on the internet somewhere, we will be fine in the long run. (I am talking about Linux kernel binaries and the like)
Finally, if a binary is in high demand and commonly used then generally speaking P2P solutions reflect that, so we avoid the Tragedy of the Commons.

matthewcroughan · June 14, 2023, 2:00pm

This is mostly true, but the most important and vital part of the Nixpkgs infrastructure in my opinion is tarballs.nixos.org which hosts the tarballs for source code that has gone missing. Before nix build even touches the real internet to get the source code for derivations in Nixpkgs, it talks to tarballs.nixos.org. Thankfully, as I found out via Marsnix, the sources (inputs) are only 400~GB for a given revision of Nixpkgs, and there is tons of overlap between Nixpkgs versions. It should be easier to host this yourself, and I’d love to provide a mechanism. Imagine services.nix-mirror = { enable = true; revisions = [ "nixos-23.05" inputs.nixpkgs.rev ]; tarballs = true; outputs = false; }, but I have very little time to work on these things, since I’m getting paid to work on other problems instead of this one.

7c6f434c · June 14, 2023, 2:45pm

I think there might be some FODs that are in cache and not in tarballs.nixos.org? Would be nice to check, the move-FODs-to-tarballs was not always performed strictly regularly.

(Separately, right now Nix makes GC-pinning just the FODs your system needs impossible, and all build-deps as impractical as it can get…)

matthewcroughan · June 14, 2023, 3:26pm

I believe if you run Marsnix on your system toplevel it’ll just work today. Again, Marsnix is just a crappy shell script I wrote, but it mostly does the right thing, querying all of the FODs associated with a given drv and realising them.

7c6f434c · June 14, 2023, 3:40pm

I don’t see any GC root management there, though? Or is the idea that you have the resulting directory of symlinks per generation and manually remove? (My complaint was specifically about GC; I do have some shell scripts that do almost the right thing for my setup, but Nix is not helping here)

matthewcroughan · June 14, 2023, 3:42pm

I could imagine a version of Marsnix that is implemented in pure nix and creates a gcroot in a single outpath. Just a small tweak of the existing script. All possible. I’ve just tested that it works, although my script currently assumes passing a nixpkgs path, I haven’t created an argument for arbitrary drvs.

matthewcroughan · June 14, 2023, 3:52pm

Is this like Elon Musk’s version of the Twitter blue tick?

chrisaw · June 14, 2023, 6:38pm

Hah, I see your point.

On the other hand - you can’t deny that it raised money!

balazs.lengyel · June 15, 2023, 5:29pm

TLDR: please don’t disregard any potential solution just because it’s not instantly a perfect fit, exercised on the BitTorrent argument.

Long version:
I dislike the “BitTorrent does not share the guaranteed long-term storage” argument, since it is wrong as long as the the seeding node(s) keep(s) the files available. Since the initial seeders are controlled by the nix foundation it’s their decision. This is the same as the decision to GC or not on S3/…

Pros:

RE Storage & CPU: Modern machines can store and serve a lot of torrents at the same time for little money. The underlying filesystem can potentially compress/deduplicate raw data transparently
RE Traffic: Outgoing traffic also doesn’t have to explode. Applying aggressive bandwidth limits for the individual torrent on the seeding node after a few downloads should reduce traffic by a huge amount.
Heritage: files are available (potentially faster than today) for currently popular packages and slow for unpopular packages. But any independent/private mirror helps out with almost zero need for coordination.
Redundancy: Rent a similar machine and storage on the other side of the world.
Can keep current distribution model in parallel, switch gradually or only parts of the data.

Cons:

Performance follows mainstream usage, so odd packages will suffer → any company/sub-community can set up a custom BitTorrent mirror for important packages independent of upstream.
Would potentially reduce cost savings via the currently cheap CDN
Sounds illegal to the uninitiated and some ISPs might be jerks.

wmertens · June 19, 2023, 9:19am

There’s definitely valid arguments here, but I think long-term it would be better to have something like IPFS seeding combined with a smarter nix-store that works with deduplicated blocks instead of with nar files.

I’m confident that this would result in a severalfold reduction in stored size and bandwidth, and hosting doesn’t even need to happen in the cloud (like with bittorrent).

(note that due to the store path hashes, lots of duplicate blocks aren’t detected by sliding window dedupers. So the store paths need to be removed entirely (so variable length doesn’t matter) from the files before deduplicating, and restored after reduplication. See here for more.)

T313C0mun1s7 · June 24, 2023, 4:10pm

I really wish I had more to offer than I do. I don’t know what a solution is going to look like. However, I know what is not a solution, and it is using too much energy in these discussions.

BitTorrent is not a solution as it simply does not address the problem in any meaningful way. BitTorrent addresses distrobution, not storage. Look at @balazs.lengyel answer just above me. He suggests that Nix maintain a seeder. The only way to do that is to have a copy of the seed. Thus you are not addressing the storage issue. You are only addressing a distrobution issue.

I think where people went sideway suggesting BitTorrent may have gone like this:

Come up with a buzzword that sounds good - decentralization
What is a good decentralized solution? - BitTorrent
Suggest BitTorrent as a solution

The issue with that is of couse if you have to maintain a seed to ensure the item remains available, you now have to store both the stuff to be seeded, and the torrent metadata itself. In otherwords, you actually increase your storeage requirement rather than reduce it.

While BitTorrent is a great solution to the problem of distrobution, that is not the issue at hand. Storeage is the problem that really needs a solution.

T313C0mun1s7 · June 24, 2023, 4:18pm

How does this address storage? A cache is a temporary copy of something that must already exist. It does not in any way releave the need for Nix to store the original. You are providing a distrobution solution to a storage issue.

adam248 · June 25, 2023, 2:35am

The way I see it, the main problem is actually three problems.

Storage.
Bandwidth.
Growth.
Currently, cache.nixos.org has to handle all this by itself and is charged by AWS.

Regarding bandwidth:
Using a peer-to-peer system allows for the distribution of bandwidth across multiple systems. This directly reduces the bandwidth load on cache.nixos.org and therefore reduces costs.

Regarding growth:
As the Nix community grows, so does the demand for the cache. This is a direct relationship and, if left without a solution, could cause failure in our success. Ballooning costs are not what anyone wants to see. But with a peer-to-peer system, as the Nix community grows, so does the cache as more and more people opt in to share their resources.

Finally, to storage:
The cache is not a life-and-death storage system for the Nix community. If a Nix system needs a package which is unavailable at cache.nixos.org, it can build it itself. This is how it comes into existence. However, it is generally faster to download a pre-built package from the cache than compile everything from the source code.
The cache is also very dynamic and is subject to change in packages that are in demand by the community. And the more a cache entry is in demand in a peer-to-peer system, the more copies will be kept on many nodes/mirrors. The more nodes/mirrors holding a package, the more bandwidth is now available for that package.
And visa versa, as a package becomes obsolete, the less nodes/mirrors will hold copies of it until no one holds any copies. This is called organic garbage collection. This directly reflects the community’s needs, instead of the Nix Foundation guessing what package should be garbage collected and which to keep.
Also, a package can return from the dead by nodes/mirrors rebuilding it and sharing again. This is also dynamically and organically done according to the community’s needs.

EDIT: I forgot to mention that all cache.nixos.org needs to maintain is the hashes of builds to verify a build out in the peer-to-peer system is valid. And a hash does not take up much storage. So it is not a pure peer-to-peer system, rather a hybrid validator/peer-to-peer system. This will require the Foundation to choose a solution officially. There are already a few peer-to-peer Nix Binary Cache built, but for this to work, the Foundation must ultimately make the executive decision. We don’t want a fragmented solution. (the Foundation might be able to create an OFFICAL API standard that different peer-to-peer solutions can utilize.)

Finally, I believe a peer-to-peer system is a natural solution for a distributed cache system. If we were talking about a much more sensitive and critical part of the infrastructure, I would tend to agree with you, but in this case, I am fully convinced of my point and have yet to see a genuine counter-point that brings me concern.

Yes, many people think and use decentralisation as a buzzword without knowing the tradeoffs. But here, the tradeoffs are all in the Foundation’s and the Community’s favour.

I prefer that the Nix Foundation spend its donations on talented people to help manage the Nix ecosystem instead of raw compute resources. A far better use of actual money donations if you ask me. I am sure we can all agree leadership/management is what Nix really needs to move from niche to mainstream. (Not a bloated cache system)

I appreciate the criticism of my suggestion, which allowed me to spell out my thoughts more clearly. So thank you, @T313C0mun1s7, for your comments.

adam248 · June 25, 2023, 3:12am

I think an OFFICIAL working solution is urgent that has direct endorsement from the Foundation. We don’t want a fragmented peer-to-peer system, as that will be troublesome for the community.
Optimization can be looked at later. (unless the implementation is easy to do from the start.)
But I am definitely on-board with the optimization of the cache system!!!
Excellent points.
But for all those who are experimenting with distributed cache systems, I thank you.
You are unofficially named the Nix Scientists!

adam248 · June 25, 2023, 3:52am

I just realized there is already a working group on the task.

If you have more questions regarding a peer-to-peer solution, I suggest we continue the discussion there.