Styx - alternate binary substitution mechanism

dnr · May 17, 2024, 9:12am

After working on nix-sandwich to use differential compression to address redundancy in the Nix substitution mechanism’s data transfer, I wanted to try addressing redundancy in local storage as well.

The main motivation is that many packages quite similar and it’s a waste to store and transfer all that data redundantly.

I think what I came up with is a bit crazy, but has some pretty interesting ideas. The highlights:

Breaks up files into chunks and only downloads new chunks.
But also: can use differential compression on sequences of chunks to get more benefit.
Materializes substituted store paths as read-only filesystems that share chunks behind the scenes.
But also: chunks are fetched on-demand and cached, without further userspace involvement. (This is like FUSE but better, using some bleeding-edge Linux features.)
A Styx-only binary cache would store much less data than a nar-based binary cache, though more than a CDC-based one. However, the benefits go all the way to local disk.

There’s a ton more detail in the README in the repo, please take a look:

It’s still pretty experimental, let’s call it pre-alpha, but all the major features work. You can try it out yourself in a VM (or bare metal if you’re brave). Note that the default configuration uses resources in my AWS account, I may turn it off if it gets too expensive. But there’s terraform to run your own.

Given all the recent happenings, I should add: so far I integrated Styx with Nix 2.18, but I’m very open to integration with other implementations. The actual Nix glue is quite small, most of Styx runs in its own daemon and in the cloud.

Mic92 · May 24, 2024, 4:57pm

Nice. Do you intend to use this for a production workload yourself or is this more of a hobby project?

ryantm · May 25, 2024, 2:50am

Sounds a lot like the tvix-store.

Mic92 · May 25, 2024, 7:27am

The main advantage so is the lack of FUSE. This should make it a lot faster to access the filesystem. In my experience the main issue with FUSE is not the additional required context switches but the double caching before and after the fuse as well as additional userspace ↔ kernel copies that make it slower than in-kernel filesystems. Since erofs doesn’t need to allow any mutations it is potentially also faster than other filesystem that have to have locks for this in place.

flokli · May 25, 2024, 12:07pm

tvix-store is not FUSE only. It’s another internal data model, and you can “view” it as a FUSE filesystem (that’s what we do today, mostly because it was the easiest to get going, and the virtiofs backend came for free too).

Nothing is preventing it from also “rendering out erofs”, like styx does, and exposing a filesystem in some scenarious without FUSE, if the performance becomes a problem (and all obvious bottlenecks are addressed). It’s just that noone wrote that code so far

flokli · May 25, 2024, 12:21pm

On that note, I’d love to do some collaboration, styx looks cool

Would get very interesting if styx could also create erofs from the tvix data model elements, though we use blake3 digests for data mostly, and use our own merkle structure for directory listings, rather than a modified NAR .ls format.

dnr · May 28, 2024, 5:40am

It’s a hobby project for the moment. I’d like to get it to a point where I can use it for my personal computers/servers. After that I’m not sure. This sort of system makes even more sense in a private cluster deployment scenario, since the server-side parts are more expensive to run than a regular binary cache (at least if you include the differential compression bits). I’d be happy to talk with and help anyone who wants to use it for something like that.

dnr · May 28, 2024, 5:48am

Thanks!

I definitely took some inspiration from the tvix store! I decided to try to get something usable with plain nix first, and was a little less ambitious with the overall protocol, partly for expediency and partly because of erofs limitations.

I’m certainly interested in collaborating or adapting some of these techniques for tvix. I just took another look at the castore docs, and some initial thoughts:

The digest algo and directory protocol are easy to adapt.
Currently Styx is definitely an example of “the chunking parameters … ‘bleed’ into the root hash of the entire data structure itself”, as described in the castore doc. A fixed chunk size in the fs images is forced by erofs (for now), so it was expedient to use the same throughout the server side too. It may be possible to change, though.
I don’t think it’s possible to just run styx against a castore right now: The key requirement is to get a digest for each 64KiB (or whatever fixed size) chunk of each file, aligned from the start of the file. I see in StatBlobRequest you can ask for more granular chunks, but there’s no guarantee on what you get, or even a hint to ask for a particular chunk size. Maybe the Bao could help here though? I haven’t looked into that part.
It would be nice to just take whatever chunks the server wants to give out there and pass them through to erofs and have it compose a file out of them. As far as I know, erofs might be able to do that, but only using its compression features and mapped devices, rather than indexed devices and chunks. The mapped vs indexed thing should be pretty easy to change, though it introduces some limitations and might break when you hit 16 TiB of unique data. The compression stuff, though, is completely undocumented and I didn’t even try to figure it out yet. I’m not totally sure what’s possible there.

kiara · May 28, 2024, 12:38pm

@dnr

I definitely took some inspiration from the tvix store! I decided to try to get something usable with plain nix first

hm, i thought The Virus Lounge’s lazy-deps did do something like this for plain nix.
if so, how does Styx compare?

fricklerhandwerk · May 28, 2024, 3:30pm

This seems completely unrelated. lazy-deps merely delays realisation of executables in derivations by wrapping them into shell scripts. It’s quite simple-minded. I’ve recently built a slightly more generic variant that strikes a different balance between laziness and correctness by construction: GitHub - fricklerhandwerk/lazy-drv: Realise Nix derivations on demand.

Styx is apparently about the store layer, where lazy realisation is a possible feature.

dnr · May 28, 2024, 4:13pm

Right, Styx is working at the filesystem layer and only on derivations substituted from a binary cache (it has no effect on locally-built derivations). When using it, you’d see the entire store path show up on disk, but the contents of files would be downloaded lazily. Actually even parts of files, e.g. very large executables might not need to be fully downloaded.

Since lazy-drv used man pages as an example, the parallel example for Styx would be: man pages and docs could be packaged alongside programs, no need for separate outputs. They’d appear on disk and and only be downloaded or consume local disk space when read.

flokli · May 31, 2024, 5:33am

Each blob / “regular or executable file content” is its own element, rather than the entire store path contents. You can fetch most of this lazily and only if you need it (only fetch the directories once you care about file listings, maybe even fetch all in bulk once, only fetch blob contents once you access, …).

That’s as far as the tvix data model is going. I see erofs as another representation/ lens into this data. Maybe we don’t need so much things to be lazy, at least for starters when it comes to “render” erofs? You could have fetched the directories, you could even have some basic idea about chunking, you might even have some of the blob data fetched (as they were previously accessed already, …)

dnr · October 21, 2024, 7:19am

Hey, it’s been a while, I thought I’d share some updates on the Styx project:

CI! There’s now a CI system building a basic system closure with Styx at each NixOS channel bump, and generating Styx manifests for all packages too. This has a few consequences:
- You don’t have to build a custom kernel yourself, nor a patched Nix, nor Styx itself.
- You don’t have to wait for manifests to be generated on-demand for many common packages.
Support for bare files.
Diffing through .gz (for man pages) or through .xz (for Linux kernel modules).
Improvements to chunk selection for diffing. There’s still room for improvement but it’s a little smarter now. For example, it automatically requests larger groups of chunks for repeated reads to large files.
styx prefetch to force fetching a package or specific files.
styx materialize to use differential compression but create files on a “real” filesystem (with extent sharing so it doesn’t double space usage!).
styx vaporize to import existing packages into Styx for use as compression bases.
Various other internal improvements and refactoring, more tests and test framework improvements.

I’ve been using Styx on my personal laptop for a few months now, configured to include about a third of the packages in my system (leaf packages that aren’t required to boot into a basic environment and recover if something breaks). It’s been working well so far, though the benefits are limited since it’s not being used on core packages. I haven’t completely broken my system yet, though I came close a couple times, so that’s good

Styx still can’t be used on packages required early in boot, like the kernel itself or linux-firmware. I’m hoping the new “materialize” feature will allow getting many of the benefits of Styx even for those packages (minus on-demand fetching), but I haven’t tested that on a real system yet.

In the near term, I’m planning to work on:

Testing materialize on the rest of my system
GC of the local chunk store
Larger chunks for larger packages to reduce overhead
Using Styx to speed up downloading nixpkgs tarballs

If anyone wants to try out Styx on a personal machine or VM, or hack on it, please get in touch! The README has some instructions but of course this is still alpha-quality software with rough edges.

If anyone working on binary caches or CI systems or similar wants to talk about possible integration, offering Styx as a feature of their cache, etc., also please get in touch!

arianvp · October 22, 2024, 9:00am

I was curious how the announced deprecation of erofs-over-fscache affects Styx? Do the new primitives they were replaced with suffice for your usecase?

dnr · October 23, 2024, 6:03am

Good question! I’m pretty sure the result will be a big improvement in several ways:

The fscache stuff works, but it’s pretty, uh, quirky. Styx goes through various contortions and does stuff you’re not supposed to do with fscache, like directly manipulate the cache backing files. The cache registration is also fragile: if the fd ever gets closed, then new data can’t be supplied to existing mounts, so Styx has to use the systemd fd store to keep the fd safe.

Also, fscache requires the daemon running to mount any erofs even if all the data is present in the cache already, which is pretty unfortunate and makes it hard to support early in boot.

I agree with the erofs developers that fanotify pre-content and plain files is a cleaner way to do things, and I’m hopeful it’ll let me get rid of some of the more fragile hacks. Plus it appears to be going in mainline so it won’t need an experimental kernel.

It’ll take a bit of work to migrate to the new mechanism but it shouldn’t be a huge deal.