NixOS S3 Long Term Resolution - Phase 1

Why do you need approval to add an experimental option in nixpkgs? Nixpkgs is open to anyone to contribute and there’s no need to get buy-in beyond from the code owners / maintainers, etc.

The Foundation is clearly not one of them, FYI.

4 Likes

Ok, thanks for that. So we just need a PR then? That clears things up for me. Thank you very much.

Correct, but again, I strongly recommend to start in another namespace than nix.experimental, I am not sure that Nix code owners will appreciate that usage. Once the solution is proven to be interesting and can be adopted, this can move to a more official namespace, of course. :slight_smile:

Again, YMMV because I am not a Nix developer.

3 Likes

Thanks for the tip. The namespace is obviously a point to discuss with the Nix devs. (I was just using it as a placeholder. :slight_smile:

1 Like

Amazon just announced that in preparation of the European Data Act that they will waive any fees for moving out of AWS. Does this impact our decisions and plans? It sounds like a huge game changer.

10 Likes

This may be shortsighted, but I would wager that most users aren’t benefiting from a good deal of the historic caching. Holding on to so much history is most likely to benefit businesses maintaining projects with pinned dependencies.

If things are not sustainable I do not see an issue in culling some old cache. There is nothing stopping these users from maintaining their own cache if building from source is an issue for them.

I am personally fine building from source more often. I have a number of options for caching my own projects dependencies if needs be.

While I agree that P2P would be a good solution there are attempts such as trustix which seem to be somewhat of a WIP & it sounds like this needs to be solved with fairly immediate action.

Additionally S3 seems massively overpriced for what is essentially a gated ftp API. It doesn’t seem to align with Nix’s values. Perhaps on prem storage backed by a foss S3 compat protocol like minio could be an alternative for a smaller cache?

The community can contribute to & experiment with P2P solutions for caching niche dependencies. Not building from source is a luxury & the money could be spent incentivizing core contributors to improve the ecosystem further.

8 Likes

Also want to mention that Cloud Storage Pricing: Wasabi vs Azure, Google & AWS S3 Pricing has much better pricing for s3 compliant storage

1 Like

Personally I think the best value for money option is Ceph Object Storage. This project is drop in replacement for S3 meaning that most high traffic objects could be migrated first, dropping costs and then every month new storage could be added until all data migrated away.

Ceph used at CERN and seriously powerful.

Suggest at least 3 mirror locations chosen for global ping time, and the development of some sort of routing server to point users to closest live mirror.

Suggest reaching out to universities and companies and offer them a portion of storage capacity for power and rack space.

Many of the costs in S3 are IO and less so for long term storage. In Ceph S3 the IO is electricity, network and disk wear and tear. This is why moving or complimenting some of the higher traffic objects first makes sense.

The clever bit can be handled client side even, determining the best data centre to use and the health of the various ceph clusters.

The real win would be using Nix or Kubernetes and Rook Ceph to be cloud agnostic because Ceph supports the S3 API.

Would suggest talking to ISPs for a Netflix style arrangement where storage and cashing boxes could be Geo located for benefit of both parties. This would require intense sharding but that could benefit reliability in the long run.

Currently at work I have moved some of our Kubernetes workload off cloud for cost savings and we are making extensive use of Ceph for Block/Shared File System/S3 object storage as well as for serving of container images in a highly available way.

3 Likes

Appreciate the suggestion. For full transparency I’ve shifted some priorities around and working with the community on other areas you might have seen.
I’ll try to check in if we can make progress in parallel on the S3 items as also I’ve met recently with providers so we can have a wider array of options as we review this.

Bottom line if someone working on S3 needs anything or is blocked please feel free to ping in more than one channel or even directly.
Apologies!

7 Likes