The NixOS Foundation's Call to Action: S3 Costs Require Community Support

jade · June 3, 2023, 7:02am

I have a suggestion with respect to how to get it out, although it has the potential risk of costing even more money in the process: can we deduplicate the data by stuffing it into a content addressed store on top of s3 before pulling that deduplicated content address store out? It might be worthwhile looking at a random sample (also, how large would be necessary?) of the stores to see how much that would actually save before trying it at a full scale however.

deliciouslytyped · June 3, 2023, 8:01am

Free solutions are Cool and Good however: I don’t have an understanding the scale of Nix in relation to cloud users in general, but it would probably be necessary to discuss with any platform we migrate to, especially if they aren’t huge, because the providers do in fact have costs, and the free solutions probably aren’t meant for a sudden ingress of 150TB and increasing, in an ongoing manner.

deliciouslytyped · June 3, 2023, 8:21am

I specifically remember seeing some EU related open science / reproducible research related information, which may be a string worth pulling on in the long term. I don’t know what to expect, maybe these solutions will not be a good fit for our requirements. Perhaps the NLNet connection can yield some information here?

A cursory search yielded this https://open-research-europe.ec.europa.eu/for-authors/data-guidelines#approvedrepositories , which has some lists of recommended data providers. Though I expect most scientists wont be using hundreds of terabytes either.

These look superficially interesting

https://b2share.eudat.eu
https://zenodo.org/ this is especially interesting, since it came out of CERN and IIUC they have actually big data requirements

These also look superficially interesting from policy / community perspective:

Unrelatedly, I also found About RDA | RDA (might be a global thing).

Federico · June 3, 2023, 8:25am

Yes. Providers which credibly belong to the bandwidth alliance for the foreseeable future can be considered.

Scaleway also has an opensource sponsoring program, though its budget may be limited (presumably it’s possible to ask for more than the base credit of 2400 €, but not sure). Their object storage charges 12 €/TB-month, so it would seem to be significantly less expensive than AWS and R2.

Is a mix of the three possible? I mean, using the new storage for the new objects and the most frequently accessed objects (presumably the 27 or 91 TiB of store paths mentioned above), while keeping AWS S3 for the rest until the egress costs for it can be handled. (This doesn’t help much if the storage alone costs more than the egress.)

One provider which on paper offers such a service is BunnyCDN’s perma-cache. Put it between Fastly and AWS S3, get it populated gradually as objects are requested (meaning no egress costs except what you’d have had anyway, if people behave the same), then at some point use it as source for migration to the destination storage.

Solene · June 3, 2023, 9:32am

I personally can’t give you advice about the migration process to something cheaper. But I encourage people to think about the size of the repo. It’s currently 425 TiB of data that need to be served in high availability.

If you look at nixpkgs commit graph Contributors to NixOS/nixpkgs · GitHub it’s easy to see the activity increased drastically over the last few years. Even nixpkgs tarball size went from 25 MiB to 36 MiB in just one year!

In my opinion, the project must consider pruning some data from archives, it would be interesting to see the size of the biggest unused closures instead of pruning everything that is past a certain date or hasn’t been accessed for a while.

The current storage rate isn’t sustainable in the long run, except if you want to throw money away. This also has some ecological implications, the more you store, the more hard drives must be spinning (and it’s not linear due to redundancy).

vs49688 · June 3, 2023, 9:44am

In my opinion, the project must consider pruning some data from archives, it would be interesting to see the size of the biggest unused closures instead of pruning everything that is past a certain date or hasn’t been accessed for a while.

Perhaps just keep source archives and patches? We don’t need to keep every single build of every single channel revision - these can always be rebuilt, but not if their sources are gone.
Case in point: fakeroue had its source removed, but it was still cached and thus could be recovered. Situations like this are where having the cache is invaluable.

NobbZ · June 3, 2023, 11:25am

I remember that at the beginning of 2022 there have been figures around 300 TiB in the S3 buckets floating around. Mostly rumors though. And if those rumors are even close to the truth, that means that also the cache increased by about 50% in size, not only the nixpkgs tarball…

Yes, as much as my already hour long bisection session will hate this, though I have to agree…

I think, that applying GC/prune only on the “cold” storage would be a good first step. And even there, we could start with a block level deduplication, though I am not yet sure how that should look like.

Static block sizes on the FS level will probably not get us far enough, as the NARs are compressed, and a single byte change in the containing data might change the compressed result of all compressed blocks after that, and if it is just due to a slight alignment change.

Then we have deduplication with floating block sizes, like the restic backup repository format uses. There again though I am not aware of any easy to use wrapper that would be able to apply a similar deduplication method to “dynamic” filesystems.

I remember though, that recently (Q4’22/Q1’23) a software got announced, that can do some partial NAR deduplication, but didn’t provide any parity or reconstruction mechanisms. Sadly I do not remember the name, nor did I follow the development and whether or not those things got fixed.

Solene · June 3, 2023, 11:34am

If there is interest to keep legacy packages for some reasons, maybe a community / 3rd party substituter could be created, Software Heritage seems to have some kind of interest into Nix, they could host this.

nagy · June 3, 2023, 11:42am

Then we have deduplication with floating block sizes, like the restic backup repository format uses. There again though I am not aware of any easy to use wrapper that would be able to apply a similar deduplication method to “dynamic” filesystems.
I remember though, that recently (Q4’22/Q1’23) a software got announced, that can do some partial NAR deduplication,

I assume, you are talking about this ?

It does content-defined chunking and deduplication on the uncompressed NARs using fastcdc.

graham33 · June 3, 2023, 12:21pm

@domenkozar @ron et al, I probably missed it above, but just wondering if we have Intelligent Tiering (Amazon S3 Intelligent-Tiering Storage Class | AWS) on the buckets?

vhodges · June 3, 2023, 12:38pm

Another option would be a move to Backblaze’s B2 @ $5/TB/Month and $0 egress via Fastly. Note, I’ve only used them for a small side project where it worked fine, but others may have experience with them at scale that would be good to hear.

But I also think a sensible data retention policy and prune/dedupe/GC would help control future growth rates.

pcs3rd · June 3, 2023, 12:44pm

I’m curious if this would be possible by straight-up using torrents.
transmission provides a cli and there appears to be FUSE filesystems that will pull torrents on the fly.
With ~~512Tb, it would be 4,400 users if they all provision 128gb and ~2,000 for caches of 512gb~~.
I can’t do math apparently. 425 TiB is ~58412GB, so about 460 users with a 128GB cache, or 115 users with a 512GB cache
Torrenting is usually fast for me, but the only thing I’d be really worried about is residential internet upload speeds.

If this were to happen, I’d happily donate a terabyte or two on a machine I have sitting.

deliciouslytyped · June 3, 2023, 12:50pm

In my opinion the problem with distributing storage load to users continues to be that you need “guaranteed” availability, and I think you don’t really have that with a swarm. There are really several issues here that will probably get conflated on-and-off.

bandwidth costs
hot storage costs
cold storage costs

This can be further subdivided by someone that has a clue (though see above, this has probably been largely discussed above).

I would very much hope it isn’t necessary to GC the archives (note we dont actually have full reproducibility for a lot of things, though I’m not sure how important this is; definitely keep the sources), but perhaps putting infrequently accessed data in higher latency storage will decrease costs?

Also, due to the growth curve, I expect a lot of the oldest data is actually not that much of the storage usage? How much could we save by GCing how much?

Edit: and of course, if one checks the finances thread it actually has already been stated that a lot of the data is in colder storage, and Eelco says that GC-ing would remove a lot of data. IIUC.

rnhmjoj · June 3, 2023, 12:52pm

The cache of all packages from, say, the last couple of year is much less than that: it’s should be totally feasible. The rest of the cache is accessed much less frequently and could probably be stored and served from a single host without a CDN.

rnhmjoj · June 3, 2023, 12:56pm

Also, an interesting point is: Half a petabyte in storage and 3 TB transfer a day? Shit. That's nothing, unless... | Hacker News.
Basically, if we agree we don’t care about 99.9999whatever% reliability we could self-host the cache and cut a substantial cost: it’s not that much data or bandwidth as it may seem.

joepie91 · June 3, 2023, 1:57pm

I think we should seriously consider running and/or organizing our own infrastructure here. I’m not a fan of being dependent on the goodwill of corporations - they do fundamentally exist to turn a profit, and “paying for a FOSS project’s storage” doesn’t generally do that, so eventually the tap runs dry. In this case we got advance notice, but if we switch over to another sponsored provider, there’s really no certainty that we won’t have much worse luck next time.

This amount of storage is well within range of what can be organized at small scale, and 3TB of transfer is essentially nothing; even a $5 VPS gives you that amount of traffic for free nowadays. At the very least we should probably be paying for a cheap storage/hosting provider with some sort of hard contractual obligation - whether it’s something like Backblaze B2 or just a plain ol’ server with lots of hard drives at Hetzner/OVH/Leaseweb/etc.

I don’t think decentralized storage is a viable option for a backing store; availability is a notorious problem for that, and the likelihood of data loss is very high.

Edit: A typical price for non-AWS/Azure/GCP storage at various providers is $5/TB/month.

vcunat · June 3, 2023, 2:01pm

No. (for running an S3 replacement on our own) We barely manage to maintain the infra we already have. And in one month we’re supposed to start maintaining this? If this doesn’t work for an hour, noone will be able to pull stuff from cache.nixos.org (except lucky stuff cached by Fastly I guess).

joepie91 · June 3, 2023, 2:05pm

This is why I also mentioned options like Backblaze B2; they charge $5 per TB per month in storage, and $10 per TB of traffic, if I’m not mistaken. For 500TB of data and 3TB of monthly transfer, this works out to around $2500 per month; which is significantly less than the current $9000/month estimate.

The “running your own infrastructure” option is for if costs need to be reduced further than that.

Of course there’s no reason we can’t move to something like B2 for an immediate cost reduction, and then in the long term work out something more sustainable, without the “shutdown in one month” pressure.

joepie91 · June 3, 2023, 2:11pm

Also, something that just occurred to me: we may be able to significantly reduce transfer-out costs by sending it through AWS Lightsail (their VPS service). If I’m not mistaken, traffic between S3 and Lightsail is free, and Lightsail itself has cheaper egress.

rnhmjoj · June 3, 2023, 2:27pm

We barely manage to maintain the infra we already have.

It sounds like you’re overworked: have you ever publicly called for volunteers to join the infrastructure team?
I think managing NixOS infrastructure is genuinely cool: given a large base of the NixOS users do sys admin either for work or as a hobby it shouldn’t be too hard to find someone.