Huh, so on a p3.2xlarge instance running Ubuntu 20.04.4, I get
ubuntu@threedoo:~$ lsmod | grep -i nvidia
nvidia_uvm 1052672 2
nvidia_drm 61440 2
nvidia_modeset 1159168 2 nvidia_drm
nvidia 39059456 217 nvidia_uvm,nvidia_modeset
drm_kms_helper 253952 1 nvidia_drm
drm 557056 6 drm_kms_helper,nvidia,nvidia_drm
ubuntu@threedoo:~$ ls /dev/nvidia*
/dev/nvidia-modeset /dev/nvidia-uvm-tools /dev/nvidiactl
/dev/nvidia-uvm /dev/nvidia0
But on a p3.2xlarge instance running NixOS
❯ lsmod | grep -i nvidia
nvidia_uvm 1183744 0
nvidia_drm 69632 0
nvidia_modeset 1163264 1 nvidia_drm
nvidia 39100416 2 nvidia_uvm,nvidia_modeset
drm_kms_helper 270336 4 cirrus,nvidia_drm
drm 614400 5 drm_kms_helper,nvidia,cirrus,nvidia_drm
i2c_core 102400 5 drm_kms_helper,nvidia,psmouse,i2c_piix4,drm
❯ ls /dev/nvidia*
/dev/nvidia-modeset /dev/nvidia-uvm-tools /dev/nvidiactl
/dev/nvidia-uvm /dev/nvidia1
❯ nix-shell -p nix-info --run "nix-info -m"
- system: `"x86_64-linux"`
- host os: `Linux 5.10.106, NixOS, 21.11 (Porcupine)`
- multi-user?: `yes`
- sandbox: `yes`
- version: `nix-env (Nix) 2.3.16`
- channels(root): `"nixos-21.11.336674.e80f8f4d833, nix-ld"`
- channels(skainswo): `"home-manager, nixpkgs-unstable-22.05pre343295.adf7f03d3bf"`
- nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
So for some reason it’s numbered /dev/nvidia1 instead of /dev/nvidia0. Perhaps this is the culprit? The nvidia-smi strace reveals that it’s only looking for /dev/nvidia0 at least.
Does anyone know how these devices get named/numbered?