Hello,
I am configuring NixOS 25.11 in a server with several A100 GPUs.
These GPUs require nvidia-fabricmanager running, so my understanding is that I need hardware.nvidia.datacenter.enabled=true.
Right now, I have this:
nixpkgs.config.nvidia.acceptLicense = true;
hardware.nvidia = {
open = false;
datacenter.enable = true;
nvidiaPersistenced = true;
nvidiaSettings = false;
};
hardware.graphics.enable = true;
But during nixos-rebuild switch, I get a bunch of undefined symbols for nvidia-fabricmanager:
the following new units were started: nvidia-persistenced.service
warning: the following units failed: nvidia-fabricmanager.service
× nvidia-fabricmanager.service - Start NVIDIA NVLink Management
Loaded: loaded (/etc/systemd/system/nvidia-fabricmanager.service; enabled; preset: ignored)
Active: failed (Result: exit-code) since Sat 2025-12-20 16:45:35 WET; 5s ago
Invocation: 1a9b517bb5d2402ea68ab4d05ff67b21
Process: 12849 ExecStart=/nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager -c /nix/store/nv8agj4yys737f704scl6macw6vz5a5b-fabricmanager.conf (code=exited, status=127)
IP: 0B in, 0B out
IO: 0B read, 0B written
Mem peak: 1.7M
CPU: 6ms
Dec 20 16:45:35 compute01 nv-fabricmanager[12849]: /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager: /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager: no version information available (required by /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager)
Dec 20 16:45:35 compute01 nv-fabricmanager[12849]: /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager: /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager: no version information available (required by /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager)
...
...
Dec 20 16:45:35 compute01 nv-fabricmanager[12849]: /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager: symbol lookup error: /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager: undefined symbol:
Dec 20 16:45:35 compute01 systemd[1]: nvidia-fabricmanager.service: Control process exited, code=exited, status=127/n/a
Dec 20 16:45:35 compute01 systemd[1]: nvidia-fabricmanager.service: Failed with result 'exit-code'.
Dec 20 16:45:35 compute01 systemd[1]: Failed to start Start NVIDIA NVLink Management.
Command 'systemd-run -E LOCALE_ARCHIVE -E NIXOS_INSTALL_BOOTLOADER --collect --no-ask-password --pipe --quiet --service-type=exec --unit=nixos-rebuild-switch-to-configuration /nix/store/i4mj5wsxnad6ryj98b2qrx8d78rp248v-nixos-system-compute01-25.11.1948.c6f52ebd45e5/bin/switch-to-configuration switch' returned non-zero exit status 4.
I have tried setting hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.dc, but the error is the same. If I use hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.production, then I guess nvidia.nix is incompatible with those because I get errors like:
error: lib.meta.getExe’: The first argument is of type set, but it should be a derivation instead.
What is strange is that my configuration.nix is incredibly simple (the rest of the file is networking stuff and so on) - I never change the kernel or anything - yet, I find noone complaining about such undefined symbols.
By the way, if I do ldd /nix/store/9h003hg7a5gg7vi76ad8pf6fpxxcfhxj-fabricmanager-570.172.08/bin/nv-fabricmanager, it’s the same messages as before with “no version information available”.