Why won't nvidia-smi recognize my GPU?

I have a NixOS 20.09 machine running on an AWS g4dn.xlarge instance. I’m having trouble accessing the GPU and using CUDA. Following the guide in the wiki, I added

  nixpkgs.config.allowUnfree = true;
  services.xserver.videoDrivers = [ "nvidia" ];

to my /etc/nixos/configuration.nix and rebuilt.

The GPU shows up fine in lspci:

❯ nix-shell -p pciutils
[nix-shell:~/dev/research/research/lottery]$ lspci
...
00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
...

However, I’m not able to get nvidia-smi to recognize it:

❯ nix-shell -p cudatoolkit
[nix-shell:~/dev/research/research/lottery]$ nvidia-smi
No devices were found

What is necessary in order to get the GPU to be recognized/used?

As a secondary question, I frequently resize this particular VM to instances that do not include GPUs. Does this present an issue for the drivers/configuration at all?

Restarting the instance fixed it.

That makes sense. When it’s not blacklisted, the nouveau driver is loaded by default. So, the system was probably using nouveau until after the reboot.

2 Likes