Reducing stat calls for library loading during application startup?

pennae · January 1, 2023, 9:26pm

@winter recently ported a guix feature to nixos that avoids a lot of stat (and other) syscalls by the dynamic linker during application startup. we kind of didn’t like the approach guix took (creating a per-package ld.so.cache file) because it doesn’t work well (or even at all) when cross-compiling and has some other drawbacks, so we went a bit wild and made a variant that creates a new kind of cache and patchelf code to write that new kind of cache. in our tests so far this has worked out very well, cutting switch-to-configuration time in a qemu-emulated armv7 build by almost 50%.

this is a bit of a signal boost for these attempts. does nixos/nixpkgs want to have something like this? which one does it want? does nixos maybe want something else completely? @fzakaria has shrinkwrapping on offer that has the same optimization effects, at the cost of making LD_LIBRARY_PATH less useful.

or does nixos want nothing of those at all? should this go through the architecture team instead? we don’t know. (we’re just very reluctant to put in a lot of effort to add tests, corner case handling etc to our draft if it’ll just be thrown away, but we will complete it if there’s agreement to have it!)

winter · January 1, 2023, 9:27pm

cc @Infinisil wrt NAT comment

blaggacao · January 1, 2023, 10:30pm

I understand that you want to improve this and are asking the community for a mandate to go ahead.

Unfortunately that’s probably not something the community will provide (as nobody may feel in possession of a mandate that they could pass on).

Therefore, I think it is important to put some effort into facilitating and forming a consensus.

Maybe you could host a couple of special interest meetings with the current most prominent stakeholders.

In proper meeting minutes in the style of the NAT you could show and teach the broader community about the considerations and decision making that you have chosen as a “group of experts”.

A so prepared decision is likely to mobilize enough legitimacy and momentum to be swiftly and jointly implemented.

I don’t know if this is a particularly helpful advise, but this is how I would approach to unblock your momentum and motivation to fix this and try to ensure that you won’t get stuck with unstructured decision making or a lack of mandate.

Hope this meta opinion helps, though, in some way, cause it always bothers me when good initiatives get stuck. And I fully hear you on being vary to spend effort without a mandate.

So let’s grow yourselves a mandate!

qyliss · January 2, 2023, 2:17am

Something that I’ve thought about before but never taken further would be to just have a directory somewhere in a package output that had symlinks to all the libraries used by the package. Each binary in the package would then be linked via rpath only to that directory, so it would only be a single stat per library.

The advantage to doing it this way would be that it completely avoids dynamic linker hacks, and would therefore work the same between e.g. musl and Glibc. Do you think that would work?

pennae · January 2, 2023, 2:27am

we did think about that and it would work, but we decided not do it because there are actually a lot of drawbacks:

it’d require one unique directory per dso in a package to work properly (or rpaths would intermingle)
- with a globally unique name, or merging (buildEnv, symlinkJoin etc) will cause problems
- ~~with the same location problems as the guix ld.so.cache hack (which restricts caching to dsos located at a specific depth in the tree)~~ (yeah that’s nonsense if the directory is in rpath)
- needs a lot of extra inodes (most of which can be merged away with store optimization, but that’s not on by default)
still causes a lot of extraneous stats for the hwcaps variants glibcs prefers over non-hwcaps libraries (we don’t ship anything usin hwcaps)

linker hacks seemed like a much better option

addendum: also, setting rpath to only that directory could break dlopen, so the best we could do is add this directory to the head of rpath (adding even more penalty to dlopen lookups). (also obscure the actual location of libraries given by ldd (which isn’t that bad) and somewhat violates glibcs assumptions about how libraries are linked (x.so → x.so.N → x.so.A.B.C)

qyliss · January 2, 2023, 2:55am

Please bear with me, as I’m not a dynamic linker expert, so I might just be incorrect / missing something obvious.

What’s wrong with every DSO in a package having the same rpath?

If the package has rpath set to something in its store path, using that package in a buildEnv or symlinkJoin wouldn’t change that its DSOs’ rpath is set to its own store path, right?

Shouldn’t be a problem on modern filesystems that don’t limit inodes though, right?

That is indeed unfortunate — but maybe it would be worth having a portable solution that does quite a good job (O(n) → O(2k)), and then having a Glibc-specific linker hack that takes that to O(k)?

This is the part where I really feel I’m lacking understanding of dynamic linkers, but I don’t understand how this would be any different to how we use rpath today. Rather than adding a bunch of /nix/store paths, we’d just add a single one. How would that change how dlopen works, when all libraries that were accessible from the previous list of rpaths are now all accessible from the same one?

pennae · January 2, 2023, 3:16am

it’s fine as long as all DSOs had the same rpath before that transformation. guess we should’ve said “one directory per unique DT_RUNPATH”, not “one per DSO”, sorry about that. this is especially important when buildEnv gets involved since it’ll merge directories, thus (indirectly) merging rpaths if we don’t give each rpath proxy directory a unique name.

it would not change the rpath, but if we used a static path for this link directory we’ll run into the same problem that the guix approach has: multiple packages when buildEnv’d could have symlinks to different versions of the same library, causing buildEnv to fail. so ultimately we need something like one directory per rpath hash, with a name that includes that hash, to not run into trouble at some point.

once you have thousands of these directories around the cost of inodes does add up. we don’t remember the exact statistics, but there’s a maximum number of bytes ext4 can inline into its inodes before it has to spill into an entire disk block to contain the target for a symlink. that limit for a not specially configured mkfs call is about 60 bytes, which isn’t even enough to hold the shortest path we found on coreutils. store optimization would make this problem less severe, but it’d still be about 4k storage overhead per DSO in the store in that case.

we’d have to research how all the linkers involved behave exactly, but in principle there’s nothing wrong with that approach. (as mentioned earlier, and apologies if we added that while you were typing, glibc expects a different layout of .so links and other linkers might as well. if we’re not careful such a directory layout may actually break stuff )

if all of them are accessible, in theory not at all! but that requires even more linking than covering only the DT_NEEDED entries. we’d also have to check each linker individually whether it assumes a special layout of .so links, and how the original path a DSO was found at is used to derive $ORIGIN.

pennae · January 2, 2023, 8:32am

@qyliss we’ve checked musls ldso. looks like musl does not do any symlink resolution to determine $ORIGIN, so we’d have to leave alone all rpath entries that could potentially point to a DSO that uses $ORIGIN in its own rpath. (we’d also have to leave alone all entries that use any of the other linker variables, and that’s all assuming that all the sets of dsos reachable from rpath entries are pairwise disjoint). we’d also completely break $ORIGIN in LD_LIBRARY_PATH (and all other places it’s allowed in), so proxy directories have even worse side-effects than the guix per-package ld.so.cache

qyliss · January 2, 2023, 3:52pm

Okay, thanks for investigating!

You obviously know way more about this than me, so I’m happy to accept that it wouldn’t work well.

Thanks for your work on this!

danielbarter · January 2, 2023, 10:23pm

It is worth noting that for the average user running nix on a laptop with an SSD, the stat storms are annoying because they slow down application startup, but they aren’t a huge problem. They are much more troublesome when you try and scale up.

Imagine that you have a nix store in a NFS which is mounted on each node in a cluster. If you try and spin up 1000 pythons across the cluster, and they need to perform 50,000 stats each while looping through the rpath searching for shared objects, it creates an enormous amount of network traffic.

For spack (which is a package manager designed to run on clusters), pretty much every user is impacted by the issue. As far as I am aware, nix is mostly used on PCs and VM instances, where the issue is much less serious.

I do think it is worth solving, but the glibc/patchelf patches are pretty large and will need maintaining . With the increasing popularity of nix style package managers (i.e guix and spack), it feels like trying to upstream a solution into glibc might be the way to go. I wonder if it would be possible for guix, spack and nix to work together on this?

pennae · January 2, 2023, 10:47pm

once you strip the unnecessary debug logging we haven’t removed yet it’s not that large, and the infrastructure it uses is unlikely to change. and if anything does change we can just remove the patch until someone finds time to fix it up without breaking things that worked before, only slowing them down a bit. that’s not nothing, but it does seem like a very low risk all things considered

upstreaming something like this would be pretty nice. maybe musl will adopt it too after a while and just transparently work? unfortunately we can’t really make that effort, but we can get things sorted within nixos and hand off to someone else.

RaitoBezarius · January 3, 2023, 12:09am

To be fair, I run a medium (>20 VMs) (home) infra of NixOS VMs backed by HDD and the DBs/etc is sometimes accelerated by SSDs and if the stat storm would be solved, it would be amazing for me.

RaitoBezarius · January 3, 2023, 12:10am

I’m willing to help upstreaming such things if needed at all.

danielbarter · January 3, 2023, 2:34am

This is a solvable problem now using either GitHub - fzakaria/shrinkwrap: A tool that embosses the needed dependencies on the top level executable to fix the binaries causing you issues or GitHub - fzakaria/nix-harden-needed: Bubble up the correct paths to your shared object libraries in Nix to fix everything at the cost of rebuilding everything. You do loose the ability to shim shared libs, as mentioned in the PRs.

Somewhere, there is a perfect solution to the problem. Excited to see what we end up doing

pennae · January 3, 2023, 9:28pm

much appreciated! will come back to you if good things happen