I have the following configuration for nix:
{ config, pkgs, ... }:
{
nixpkgs.config.allowUnfree = true;
imports =
[ # Include the results of the hardware scan.
./nvidia.nix
];
environment.systemPackages = with pkgs; [
docker-compose
# nvidia-docker
];
virtualisation.docker = {
enable = true;
enableNvidia = true;
extraOptions = "--default-runtime=nvidia";
};
}
{ config, pkgs, ... }:
{
# Nvidia specific
nixpkgs.config.allowUnfree = true;
environment.systemPackages = with pkgs; [
# cudaPackages_12.cudatoolkit
];
# Some programs need SUID wrappers, can be configured further or are
# started in user sessions.
# programs.mtr.enable = true;
# programs.gnupg.agent = {
# enable = true;
# enableSSHSupport = true;
# };
# List services that you want to enable:
# Enable the OpenSSH daemon.
# services.openssh.enable = true;
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
# networking.firewall.enable = false;
# This value determines the NixOS release from which the default
# settings for stateful data, like file locations and database versions
# on your system were taken. It‘s perfectly fine and recommended to leave
# this value at the release version of the first install of this system.
# Before changing this value read the documentation for this option
# (e.g. man configuration.nix or on https://nixos.org/nixos/options.html).
# system.stateVersion = "unstable"; # Did you read the comment?
# REGION NVIDIA / CUDA
# Enable OpenGL
hardware.opengl = {
enable = true;
driSupport = true;
driSupport32Bit = true;
};
# Load nvidia driver for Xorg and Wayland
services.xserver.videoDrivers = [ "nvidia" ];
# see https://nixos.wiki/wiki/Nvidia#CUDA_and_using_your_GPU_for_compute
hardware.nvidia = {
prime = {
offload = {
enable = true;
enableOffloadCmd = true;
};
# Make sure to use the correct Bus ID values for your system!
amdgpuBusId = "PCI:6:0:0";
nvidiaBusId = "PCI:1:0:0";
};
# Modesetting is required.
modesetting.enable = true;
# Nvidia power management. Experimental, and can cause sleep/suspend to fail.
powerManagement.enable = true;
# Fine-grained power management. Turns off GPU when not in use.
# Experimental and only works on modern Nvidia GPUs (Turing or newer).
powerManagement.finegrained = false;
# Use the NVidia open source kernel module (not to be confused with the
# independent third-party "nouveau" open source driver).
# Support is limited to the Turing and later architectures. Full list of
# supported GPUs is at:
# https://github.com/NVIDIA/open-gpu-kernel-modules#compatible-gpus
# Only available from driver 515.43.04+
# Currently alpha-quality/buggy, so false is currently the recommended setting.
open = false;
# Enable the Nvidia settings menu,
# accessible via `nvidia-settings`.
nvidiaSettings = true;
# Optionally, you may need to select the appropriate driver version for your specific GPU.
package = config.boot.kernelPackages.nvidiaPackages.stable;
};
# ENDREGION
}
and docker-compose.yaml:
services:
test:
image: nvidia/cuda:12.3.0-runtime-ubuntu22.04
command: nvidia-smi
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
within the container I get the following:
test-1 |
test-1 | ==========
test-1 | == CUDA ==
test-1 | ==========
test-1 |
test-1 | CUDA Version 12.3.0
test-1 |
test-1 | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
test-1 |
test-1 | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
test-1 | By pulling and using the container, you accept the terms and conditions of this license:
test-1 | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
test-1 |
test-1 | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
test-1 |
test-1 | WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
test-1 | Use the NVIDIA Container Toolkit to start this container with GPU support; see
test-1 | https://docs.nvidia.com/datacenter/cloud-native/ .
test-1 |
test-1 | Mon Dec 11 17:48:10 2023
test-1 | +---------------------------------------------------------------------------------------+
test-1 | | NVIDIA-SMI 545.29.06 Driver Version: 545.29.06 CUDA Version: 12.3 |
test-1 | |-----------------------------------------+----------------------+----------------------+
test-1 | | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
test-1 | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
test-1 | | | | MIG M. |
test-1 | |=========================================+======================+======================|
test-1 | | 0 NVIDIA GeForce RTX 3070 ... Off | 00000000:01:00.0 On | N/A |
test-1 | | N/A 50C P8 18W / 115W | 42MiB / 8192MiB | 0% Default |
test-1 | | | | N/A |
test-1 | +-----------------------------------------+----------------------+----------------------+
test-1 |
test-1 | +---------------------------------------------------------------------------------------+
test-1 | | Processes: |
test-1 | | GPU GI CI PID Type Process name GPU Memory |
test-1 | | ID ID Usage |
test-1 | |=======================================================================================|
test-1 | +---------------------------------------------------------------------------------------+
test-1 exited with code 0
How is it that nvidia-smi is detected within the container, yet the warning above that the nvidia-driver is not detected.
Note, the nvidia-smi output matches what is outside the container