But during nixos-rebuild switch, I get a bunch of undefined symbols for nvidia-fabricmanager:
the following new units were started: nvidia-persistenced.service
warning: the following units failed: nvidia-fabricmanager.service
× nvidia-fabricmanager.service - Start NVIDIA NVLink Management
Loaded: loaded (/etc/systemd/system/nvidia-fabricmanager.service; enabled; preset: ignored)
Active: failed (Result: exit-code) since Sat 2025-12-20 16:45:35 WET; 5s ago
Invocation: 1a9b517bb5d2402ea68ab4d05ff67b21
Process: 12849 ExecStart=/nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager -c /nix/store/nv8agj4yys737f704scl6macw6vz5a5b-fabricmanager.conf (code=exited, status=127)
IP: 0B in, 0B out
IO: 0B read, 0B written
Mem peak: 1.7M
CPU: 6ms
Dec 20 16:45:35 compute01 nv-fabricmanager[12849]: /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager: /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager: no version information available (required by /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager)
Dec 20 16:45:35 compute01 nv-fabricmanager[12849]: /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager: /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager: no version information available (required by /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager)
...
...
Dec 20 16:45:35 compute01 nv-fabricmanager[12849]: /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager: symbol lookup error: /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager: undefined symbol:
Dec 20 16:45:35 compute01 systemd[1]: nvidia-fabricmanager.service: Control process exited, code=exited, status=127/n/a
Dec 20 16:45:35 compute01 systemd[1]: nvidia-fabricmanager.service: Failed with result 'exit-code'.
Dec 20 16:45:35 compute01 systemd[1]: Failed to start Start NVIDIA NVLink Management.
Command 'systemd-run -E LOCALE_ARCHIVE -E NIXOS_INSTALL_BOOTLOADER --collect --no-ask-password --pipe --quiet --service-type=exec --unit=nixos-rebuild-switch-to-configuration /nix/store/i4mj5wsxnad6ryj98b2qrx8d78rp248v-nixos-system-compute01-25.11.1948.c6f52ebd45e5/bin/switch-to-configuration switch' returned non-zero exit status 4.
I have tried setting hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.dc, but the error is the same. If I use hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.production, then I guess nvidia.nix is incompatible with those because I get errors like:
error: lib.meta.getExe’: The first argument is of type set, but it should be a derivation instead.
What is strange is that my configuration.nix is incredibly simple (the rest of the file is networking stuff and so on) - I never change the kernel or anything - yet, I find noone complaining about such undefined symbols.
By the way, if I do ldd /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager, it’s the same messages as before with “no version information available”.
I managed to work around this. I am posting this in case it helps others.
The way I fixed the problem was by making my own nvidia-fabricmanager (I think there is something wrong with fabricmanager.nix because the binary complains about undefined symbols, but I am yet a novice on nix).
I have not yet been able to test this suspicion, but I think one issue with the official fabricmanager.nix is that the lib/*.so files are not being patchElf’d:
for d in include lib;do
mv $d $out/.
done
I think patchElf which is applied before to bin/* should also be applied here to lib/*…
I’ve got a PR open that fixes this. The explicitly defined phase list was dropped from the fabricmanager module. Adding them back made some extra phases run that broke the build. Things still built successfully since there was no check phase defined though. I added that in too and threw in a few minor changes. Should be fixed in 25.11 after this gets merged to master and gets back-ported!