Styx - alternate binary substitution mechanism

After working on nix-sandwich to use differential compression to address redundancy in the Nix substitution mechanism’s data transfer, I wanted to try addressing redundancy in local storage as well.

The main motivation is that many packages quite similar and it’s a waste to store and transfer all that data redundantly.

I think what I came up with is a bit crazy, but has some pretty interesting ideas. The highlights:

  • Breaks up files into chunks and only downloads new chunks.
  • But also: can use differential compression on sequences of chunks to get more benefit.
  • Materializes substituted store paths as read-only filesystems that share chunks behind the scenes.
  • But also: chunks are fetched on-demand and cached, without further userspace involvement. (This is like FUSE but better, using some bleeding-edge Linux features.)
  • A Styx-only binary cache would store much less data than a nar-based binary cache, though more than a CDC-based one. However, the benefits go all the way to local disk.

There’s a ton more detail in the README in the repo, please take a look:

It’s still pretty experimental, let’s call it pre-alpha, but all the major features work. You can try it out yourself in a VM (or bare metal if you’re brave). Note that the default configuration uses resources in my AWS account, I may turn it off if it gets too expensive. But there’s terraform to run your own.

Given all the recent happenings, I should add: so far I integrated Styx with Nix 2.18, but I’m very open to integration with other implementations. The actual Nix glue is quite small, most of Styx runs in its own daemon and in the cloud.

46 Likes

Nice. Do you intend to use this for a production workload yourself or is this more of a hobby project?

Sounds a lot like the tvix-store.

The main advantage so is the lack of FUSE. This should make it a lot faster to access the filesystem. In my experience the main issue with FUSE is not the additional required context switches but the double caching before and after the fuse as well as additional userspace ↔ kernel copies that make it slower than in-kernel filesystems. Since erofs doesn’t need to allow any mutations it is potentially also faster than other filesystem that have to have locks for this in place.

3 Likes

tvix-store is not FUSE only. It’s another internal data model, and you can “view” it as a FUSE filesystem (that’s what we do today, mostly because it was the easiest to get going, and the virtiofs backend came for free too).

Nothing is preventing it from also “rendering out erofs”, like styx does, and exposing a filesystem in some scenarious without FUSE, if the performance becomes a problem (and all obvious bottlenecks are addressed). It’s just that noone wrote that code so far :slight_smile:

5 Likes

On that note, I’d love to do some collaboration, styx looks cool :slight_smile:

Would get very interesting if styx could also create erofs from the tvix data model elements, though we use blake3 digests for data mostly, and use our own merkle structure for directory listings, rather than a modified NAR .ls format.

7 Likes

It’s a hobby project for the moment. I’d like to get it to a point where I can use it for my personal computers/servers. After that I’m not sure. This sort of system makes even more sense in a private cluster deployment scenario, since the server-side parts are more expensive to run than a regular binary cache (at least if you include the differential compression bits). I’d be happy to talk with and help anyone who wants to use it for something like that.

Thanks!

I definitely took some inspiration from the tvix store! I decided to try to get something usable with plain nix first, and was a little less ambitious with the overall protocol, partly for expediency and partly because of erofs limitations.

I’m certainly interested in collaborating or adapting some of these techniques for tvix. I just took another look at the castore docs, and some initial thoughts:

  • The digest algo and directory protocol are easy to adapt.

  • Currently Styx is definitely an example of “the chunking parameters … ‘bleed’ into the root hash of the entire data structure itself”, as described in the castore doc. A fixed chunk size in the fs images is forced by erofs (for now), so it was expedient to use the same throughout the server side too. It may be possible to change, though.

  • I don’t think it’s possible to just run styx against a castore right now: The key requirement is to get a digest for each 64KiB (or whatever fixed size) chunk of each file, aligned from the start of the file. I see in StatBlobRequest you can ask for more granular chunks, but there’s no guarantee on what you get, or even a hint to ask for a particular chunk size. Maybe the Bao could help here though? I haven’t looked into that part.

  • It would be nice to just take whatever chunks the server wants to give out there and pass them through to erofs and have it compose a file out of them. As far as I know, erofs might be able to do that, but only using its compression features and mapped devices, rather than indexed devices and chunks. The mapped vs indexed thing should be pretty easy to change, though it introduces some limitations and might break when you hit 16 TiB of unique data. The compression stuff, though, is completely undocumented and I didn’t even try to figure it out yet. I’m not totally sure what’s possible there.

1 Like

@dnr

I definitely took some inspiration from the tvix store! I decided to try to get something usable with plain nix first

hm, i thought The Virus Lounge’s lazy-deps did do something like this for plain nix.
if so, how does Styx compare?

This seems completely unrelated. lazy-deps merely delays realisation of executables in derivations by wrapping them into shell scripts. It’s quite simple-minded. I’ve recently built a slightly more generic variant that strikes a different balance between laziness and correctness by construction: GitHub - fricklerhandwerk/lazy-drv: Realise Nix derivations on demand.

Styx is apparently about the store layer, where lazy realisation is a possible feature.

1 Like

Right, Styx is working at the filesystem layer and only on derivations substituted from a binary cache (it has no effect on locally-built derivations). When using it, you’d see the entire store path show up on disk, but the contents of files would be downloaded lazily. Actually even parts of files, e.g. very large executables might not need to be fully downloaded.

Since lazy-drv used man pages as an example, the parallel example for Styx would be: man pages and docs could be packaged alongside programs, no need for separate outputs. They’d appear on disk and and only be downloaded or consume local disk space when read.

2 Likes

Each blob / “regular or executable file content” is its own element, rather than the entire store path contents. You can fetch most of this lazily and only if you need it (only fetch the directories once you care about file listings, maybe even fetch all in bulk once, only fetch blob contents once you access, …).

That’s as far as the tvix data model is going. I see erofs as another representation/ lens into this data. Maybe we don’t need so much things to be lazy, at least for starters when it comes to “render” erofs? You could have fetched the directories, you could even have some basic idea about chunking, you might even have some of the blob data fetched (as they were previously accessed already, …)