I just updated a server system with an NVidia T400 GPU to latest nixos-unstable
, rev 08f22084e6085d19bcfb4be30d1ca76ecb96fe54 (though it hasn’t seen an update in a month or so and I unfortunately don’t know the previous rev). The system uses a podman container with nvidia-container-toolkit
for GPU passthrough, and this broke after the update.
The GPU shows up on the host machine and is found by nvidia-ctk
:
# nvidia-smi -L
GPU 0: NVIDIA T400 4GB (UUID: GPU-cc33bb28-454b-5df7-0f48-13948ad02329)
# nvidia-ctk cdi list
INFO[0000] Found 2 CDI devices
nvidia.com/gpu=0
nvidia.com/gpu=all
However, when trying to run a container passing it through, that yields an error (same result with -gpus=all
):
# podman run --rm -it --device=nvidia.com/gpu=all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
Error: setting up CDI devices: unresolvable CDI devices nvidia.com/gpu=all
The nvidia-container-toolkit-cdi-generator.service
systemd unit also runs without errors and the generated JSON file looks fine, containing mentions of nvidia.com/gpu=0
and nvidia.com/gpu=all
.
Relevant configuration:
services.xserver.videoDrivers = [ "nvidia" ];
hardware.graphics.enable = true;
hardware.nvidia = {
open = true;
modesetting.enable = true;
package = config.boot.kernelPackages.nvidiaPackages.production;
};
hardware.nvidia-container-toolkit.enable = true;
systemd.services.nvidia-container-toolkit-cdi-generator.environment.LD_LIBRARY_PATH = "${lib.getLib config.hardware.nvidia.package}/lib";
(Don’t ask me what that last line is, don’t remember).
Does anyone have any ideas why this isn’t working?