CUDA 12.8 support in nixpkgs

lucassong3000 · February 21, 2025, 3:16am

so this will take a little bit explanation, for any of you who run nixos-rebuild switch with latest kernel built/nvidia-driver, you will be using CUDA version 12.8 globally, you will be mostly fine if you are only developing python as this is explained quite well by claude:

This is because libraries like PyTorch and Numba are built to handle CUDA version compatibility more gracefully:

PyTorch and Numba use the CUDA Runtime API in a more abstracted way:

They don’t directly initialize CUDA devices like our raw CUDA C code
They include version compatibility layers
They dynamically load CUDA libraries at runtime

However, if you are developing in raw C, you will have some sort of unknown cuda errors, that is mostly caused by cuda version mismatch, within a shell environment.

And the reason is the latest CUDA/cudapackages/toolkits nixpkgs can give you is 12.4.

AND THERE YOU HAVE IT PEOPLE. If i am forced to do the c development using a container like docker on nixos, that would be very silly people, that would be very silly.

worse yet, the CUDA wiki wrote by whoever is just large incompetent and very annoying with little useful information to say the least.

and when i find out arch linux has already updated their repo to include cuda 12.8, i genuinely felt sad for NixOS.

https://archlinux.org/packages/extra/x86_64/cuda/

is it anything I have done wrong?

this is my shell code

with import (fetchTarball "https://github.com/NixOS/nixpkgs/tarball/9aed71348bff9c6e142cc3f64206a128388494b9") {
  config = {
    allowUnfree = true;
  };
};

mkShell {
  buildInputs = [
    # CUDA 12.4 and matching NVIDIA drivers
    cudaPackages.cuda_cudart
    cudaPackages.cuda_nvcc
    cudaPackages.cuda_cccl
    linuxPackages.nvidia_x11

    # Other development tools
    gcc12
    gdb
    cmake
    gnumake
    ninja
    clang-tools
    valgrind
    libGLU libGL
    xorg.libXi xorg.libXmu freeglut
    xorg.libXext xorg.libX11 xorg.libXv xorg.libXrandr zlib
    pkg-config
    binutils
  ];

  shellHook = ''
    # NVIDIA Driver and CUDA setup
    export NVIDIA_VISIBLE_DEVICES=all
    export NVIDIA_DRIVER_CAPABILITIES=compute,utility
    export CUDA_VISIBLE_DEVICES=0

    # Path setup
    export PATH="${pkgs.gcc12}/bin:$PATH"
    export PATH=${pkgs.cudaPackages.cuda_nvcc}/bin:$PATH

    # CUDA setup
    export CUDAHOSTCXX="${pkgs.gcc12}/bin/g++"
    export CUDA_HOST_COMPILER="${pkgs.gcc12}/bin/gcc"
    export CUDA_HOME=${pkgs.cudaPackages.cuda_cudart}
    export CUDA_PATH=${pkgs.cudaPackages.cuda_cudart}

    # Library paths with specific NVIDIA driver
    export LD_LIBRARY_PATH=${pkgs.linuxPackages.nvidia_x11}/lib
    export LD_LIBRARY_PATH=${pkgs.cudaPackages.cuda_cudart}/lib64:${pkgs.cudaPackages.cuda_cudart}/lib:$LD_LIBRARY_PATH
    export LD_LIBRARY_PATH=${stdenv.cc.cc.lib}/lib:$LD_LIBRARY_PATH

    export LIBRARY_PATH=${pkgs.cudaPackages.cuda_cudart}/lib64:${pkgs.cudaPackages.cuda_cudart}/lib:$LIBRARY_PATH

    # OpenGL driver path
    export LD_LIBRARY_PATH=${pkgs.linuxPackages.nvidia_x11}/lib:$LD_LIBRARY_PATH

    echo "CUDA C/C++ development environment ready"
    echo "GCC version in use:"
    gcc --version
    echo "NVCC version:"
    nvcc --version
    echo "SHELL NVIDIA driver version:"
    cat ${pkgs.linuxPackages.nvidia_x11}/lib/nvidia/version
  '';
}

and this is the c code for testing purpose:

#include <stdio.h>

int main() {
    int deviceCount;
    cudaError_t error = cudaGetDeviceCount(&deviceCount);

    if (error != cudaSuccess) {
        printf("cudaGetDeviceCount failed: %s\n", cudaGetErrorString(error));
        return -1;
    }

    printf("Number of CUDA devices: %d\n", deviceCount);

    for (int i = 0; i < deviceCount; i++) {
        cudaDeviceProp prop;
        error = cudaGetDeviceProperties(&prop, i);

        if (error != cudaSuccess) {
            printf("Failed to get properties for device %d: %s\n",
                   i, cudaGetErrorString(error));
            continue;
        }

        printf("Device %d: %s\n", i, prop.name);
        printf("  Compute Capability: %d.%d\n",
               prop.major, prop.minor);
        printf("  Total Global Memory: %lu MB\n",
               prop.totalGlobalMem / (1024*1024));
    }

    return 0;
}

// compile with: nvcc -I${CUDA_PATH}/include -L${CUDA_PATH}/lib64 -L${CUDA_PATH}/lib hello.cu -o hello

I really wish i made some silly mistakes and i am very wrong about nixos. If anyone can point me to the right direction and free me out of this frustration, that would be great.

I want to hear your opinion on this, thank you.

TLATER · February 21, 2025, 3:36am

~~Most likely your fetchurl is pinning your shell’s nixpkgs to an ancient version which is way behind your system, so you get an old cuda version in the shell.~~

Edit: Nope, you’re correct, nixpkgs doesn’t currently package that version. The latest currently available is 12.6, you could do the legwork of adding the new version upstream.

lucassong3000 · February 21, 2025, 3:43am

before I change my shell code, how do i even verify that nixpkgs already include the latest CUDA version, which is 12.8?

i went on google for ‘nix search’, I sure do not find anything.

Is there, by any chance, that an unstable channel would include CUDA 12.8? and an overlay to the cuda package will come into rescue?

lucassong3000 · February 21, 2025, 3:46am

tbh, it would be my honor to do so other than one small problem:

I dont know how.

And i thought there is a dedicated CUDA team of nix to be doing this, no?

TLATER · February 21, 2025, 4:08am

https://search.nixos.org/packages

Well, good time to learn! Create a fork, make a feature branch, follow the docs, depend on your feature branch and see if you can get it to work. If you do you can try to upstream a PR; there are contributing docs in the repo root.

This is an open source project, if you’re paid to maintain the cuda ecosystem you’re lucky. Mostly it’ll be volunteers who need it for something, and clearly until now most people have been ok with slightly older versions for their use cases, or presumably been using cuda_compat. The cuda team is also just a few people who decided to get together to build some tooling they needed - if you’re experienced with cuda and its packaging, and have an interest in maintaining it, you’re probably welcome to join.

You might not entirely appreciate how much legwork is being done by a comparatively small number of people here. We’re all working together on a really complex project largely for the fun of it (though you will find the occasional commercial interest), demanding stuff be done for you without putting in at least the effort to understand what needs to be done - or paying someone to do that for you - is a quick way to burn any community goodwill you may have and not get any of what you want.

lucassong3000 · February 21, 2025, 4:38am

Otherwise i could do downgrade nvidia driver to 550 with cuda 12.6 instead. just so there is no conflicts.

But at the moment i cant think of any better alernatives

niklaskorz · February 21, 2025, 10:24am

12.6 was merged yesterday: cudaPackages_12_6: init at 12.6.0 by danieldk · Pull Request #377991 · NixOS/nixpkgs · GitHub

No PR yet for 12.8

picnoir · February 21, 2025, 1:10pm

Hey @lucassong3000 ,

This is not an appropriate way to communicate on this forum. Generally speaking, try to keep in mind open source maintainers don’t owe you anything.

The wiki is edited by the NixOS community. That includes you. If you feel like something is incorrect, it’s up to you to correct it.

Calling out people for being “incompetent” on public forums without contributing yourself is bad looks for you. Please refrain yourself from doing that again.

Let’s manage your expectations there.

As for most NixOS teams, the members are mostly nerds doing volunteer work on their sparetime.

You’re free to help them through bumping the CUDA packages yourself. Or, alternatively, feel free to try to contract some of them to bump the packages for you.

I removed some subthreads escalating the discussion in hope to keep the conversation constructive. I’ll close the thread if it continues to escalate.

SergeK · February 21, 2025, 6:40pm

Much of the dedicated team is busy securing resources to fix more CUDA issues in Nixpkgs

Generally speaking, pkgs.linuxPackages from a random Nixpkgs instance pkgs is not compatible with the running kernel. The userspace driver libcuda.so has to match the version string in the kernel module (.ko) exactly, otherwise the driver short circuits into an error. You never want to reference pkgs.linuxPackages.nvidia_x11, except as config.boot.kernelPackages.nvidia_x11 in a NixOS configuration. On NixOS CUDA programs load the kernel-compatible driver from /run/opengl-driver/lib/libcuda.so (grep Nixpkgs for usages of autoAddDriverRunpath). Outside NixOS the current status quo is to use wrappers (nixglhost or nixGL).

lucassong3000:

    export LD_LIBRARY_PATH=${pkgs.cudaPackages.cuda_cudart}/lib64:${pkgs.cudaPackages.cuda_cudart}/lib:$LD_LIBRARY_PATH
    export LD_LIBRARY_PATH=${stdenv.cc.cc.lib}/lib:$LD_LIBRARY_PATH

Note that LD_LIBRARY_PATH takes priority over DT_RUNPATH. Setting LD_LIBRARY_PATH means you disable the deterministic/“static” dynamic loading as configured by Nixpkgs, and force the program instead to load a potentially incompatible library from elsewhere.

To best of my knowledge they do not.

To the best of my knowledge they do, they either dlopen("cudart", ...) or even directly dlopen("cuda", ...) (not sure why, possibly linking pieces of static cudart); please link if there’s something I’m not aware of.

To the best of my knowledge they do not, they rely on the same compatibility guarantees promised by nvidia: cudart version at runtime >= cudart at build time && cudart at runtime <= cudaDriverGetVersion() && libcuda.so version == nvidia.ko (or equivalent) version unless using cuda_compat. EDIT(2025-02-24): the original message had the wrong sign for cudaRuntimeGetVersion vs cudaDriverGetVersion

Likely caused by LD_LIBRARY_PATH but hard to tell without logs

The wiki article hasn’t been updated in years, recent contributors have been mostly putting stuff in the Nixpkgs manual (which would still benefit from more work).

There’s instructions for updating the package set in the Nixpkgs manual (admittedly the current process is more involved and cumbersome than it should be)

This can be helped: it takes time, search, engagement with other users and contributors. Among other places, conversations in https://matrix.to/#/#cuda:nixos.org may be relevant

lucassong3000 · February 22, 2025, 8:07am

hello thank you for this break down of my original post here, I want to start by posting the shell.nix that i currently use for python development. please realize that this shell code will build a virtual environment for cuda 12.4, which is older than the cuda version of the nvidia driver, which is 12.8. without addressing any of your forward mentioned concerns, all gpu related python tests has passed flawlessly without returning any errors, I have tested pytorch with cuda, vllm, and numba with cuda so far.


with import (fetchTarball "https://github.com/NixOS/nixpkgs/tarball/9aed71348bff9c6e142cc3f64206a128388494b9") {
  config = {
    allowUnfree = true;
  };
};

let
  py-pkgs = python312Packages;
in pkgs.mkShell rec {
  name = "impurePythonEnv";
  venvDir = "./.venv";
  buildInputs = [
    # Python and venv
    py-pkgs.python
    py-pkgs.venvShellHook
    py-pkgs.zlib-ng

    # System tools
    gitRepo
    gnupg
    autoconf
    curl
    procps
    gnumake
    util-linux
    m4
    gperf
    unzip

    # CUDA and OpenGL
    cudatoolkit
    libGLU libGL
    xorg.libXi xorg.libXmu freeglut
    xorg.libXext xorg.libX11 xorg.libXv xorg.libXrandr zlib
    ncurses5
    stdenv.cc
    binutils
  ];

  # Run this command, only after creating the virtual environment
  postVenvCreation = ''
    unset SOURCE_DATE_EPOCH
    pip install -r requirements.txt
  '';

  # Now we can execute any commands within the virtual environment.
  postShellHook = ''
    # allow pip to install wheels
    unset SOURCE_DATE_EPOCH

    # CUDA setup
    export CUDA_HOME=${pkgs.cudatoolkit}
    export CUDA_PATH=${pkgs.cudatoolkit}

    # Library path setup - using system NVIDIA drivers
    export LD_LIBRARY_PATH=${stdenv.cc.cc.lib}/lib:/run/opengl-driver/lib
    export LD_LIBRARY_PATH="${pkgs.cudatoolkit}/lib64:${pkgs.cudatoolkit}/lib:$LD_LIBRARY_PATH"
    export LD_LIBRARY_PATH="${pkgs.lib.makeLibraryPath buildInputs}:$LD_LIBRARY_PATH"

    # Add CUDA bins to PATH
    export PATH=${pkgs.cudatoolkit}/bin:$PATH

    # CUDA specific flags
    export EXTRA_LDFLAGS="-L/lib -L${pkgs.cudatoolkit}/lib64 -L${pkgs.cudatoolkit}/lib"
    export EXTRA_CCFLAGS="-I/usr/include -I${pkgs.cudatoolkit}/include"
  '';
}

the troubles comes in soon as i want to convert this shell file into c language compatible, that i am unable to proper enable a cuda run environment for c.

If logs can help identify the exact problem i am willing to try, but at the moment, i think best way to solve the problem is either,

learn how to package the source and upstream PR to nixpkgs

or,

wait until cuda 12.8 to be merged.

with that said, I DO want to tell you that the source of this cuda package that i found from this official nvidia website download, does not support nixos.

fedora and ubuntu people can be laughing at us.

with that said,

It will be my honor to do anything to help merge cuda 12.8 into official nixpkgs. I will try to learn how to get started.

and thanks again for such detailed break down.

lucassong3000 · February 22, 2025, 8:19am

I do want to apology for my very inappropriate attitude towards OSS contributors and I am not in any position to judge their work let along the fact i have benefited from their work.

I hope i can contact the nix cuda team to get some more professional advices as to solve this cuda version compatibility issue.

it is my honor to be this part of the community because nixos has given me a completely different , yet very elegant in its own unique way, experience in the linux world, and i am loving it. I will be learning from people and from my own mistakes, and hopefully become a contributor myself.

thank you again for pointing out my bad acts and I shall refrain from doing again in this community.

good day.

lucassong3000 · February 22, 2025, 11:29am

So it has been a crazy night for me so far, the local cuda 12.8 built with success at last, this is the approach and shell.nix:


# shell.nix
let
  pkgs = import <nixpkgs> { config.allowUnfree = true; };
  cuda128 = pkgs.stdenv.mkDerivation rec {
    name = "cudatoolkit-12.8.0";
    version = "12.8.0";
    src = /home/alice7/.dev/test-cuda_with_C/cuda_12.8.0_570.86.10_linux.run;
    nativeBuildInputs = [ pkgs.autoPatchelfHook pkgs.makeWrapper pkgs.coreutils pkgs.bash ];
    buildInputs = [
      pkgs.stdenv.cc.cc.lib  # libgcc_s, libc
      pkgs.libxml2           # libxml2.so.2
      pkgs.cudaPackages.cuda_cupti  # libcupti.so.12 (from nixpkgs, might be 12.4, but should work)
      pkgs.rdma-core         # libibverbs.so.1, librdmacm.so.1
      # libmlx5.so.1 not directly in nixpkgs; part of Mellanox OFED, ignore for now
    ];
    autoPatchelfIgnoreMissingDeps = [
      "libmlx5.so.1"       # RDMA-specific, optional
      "libcuda.so.1"       # Driver stub, provided by NVIDIA driver at runtime
    ];
    unpackPhase = ''
      echo "Unpacking Makeself CUDA 12.8.0 archive from $src"
      cp $src cuda.run
      chmod +x cuda.run
      mkdir -p $out
      ./cuda.run --tar xvf -C $out
      echo "Extracted contents before rearrangement:"
      ls -lh $out/builds
      mkdir -p $out/bin $out/lib64 $out/include
      mv $out/builds/cuda_nvcc/bin/* $out/bin/ 2>/dev/null || true
      mv $out/builds/*/lib64/* $out/lib64/ 2>/dev/null || true
      mv $out/builds/*/include/* $out/include/ 2>/dev/null || true
      rm -rf $out/builds
      echo "Final extracted contents:"
      ls -lh $out/bin $out/lib64 $out/include
    '';
    installPhase = ''
      echo "Installing CUDA 12.8.0"
      ls -lh $out/bin
      for bin in $out/bin/*; do
        if [ -f "$bin" ] && [ -x "$bin" ]; then
          wrapProgram "$bin" --prefix LD_LIBRARY_PATH : "$out/lib64"
        fi
      done
    '';
    postFixup = ''
      echo "Patching libraries:"
      ls -lh $out/lib64
      for lib in $out/lib64/*.so; do
        patchelf --set-rpath "$out/lib64" $lib
      done
    '';
  };
in
pkgs.mkShell {
  buildInputs = [ cuda128 ];
}

and this is the running result for a quick testing within the shell:


alice7@nixos ~/.d/test-cuda_with_C> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Jan_15_19:20:09_PST_2025
Cuda compilation tools, release 12.8, V12.8.61
Build cuda_12.8.r12.8/compiler.35404655_0
alice7@nixos ~/.d/test-cuda_with_C> nvidia-smi
Sat Feb 22 11:15:52 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.16              Driver Version: 570.86.16      CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   44C    P8              2W /   80W |      15MiB /   8188MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1877      G   ...me-shell-47.2/bin/gnome-shell          2MiB |
+-----------------------------------------------------------------------------------------+

I am very close to make this to work:


// hello.cu
#include <stdio.h>
__global__ void kernel() { printf("Hello from GPU!\n"); }
int main() {
  kernel<<<1,1>>>();
  cudaDeviceSynchronize();
  printf("Hello from CPU!\n");
  return 0;
}

// compile with: nvcc -I{$CUDA_PATH}/include -L{$CUDA_PATH}/lib64 -L{$CUDA_PATH}/lib hello.cu -o hello

it returns:


alice7@nixos ~/.d/test-cuda_with_C> nvcc -I{$CUDA_PATH}/include -L{$CUDA_PATH}/lib64 -L{$CUDA_PATH}/lib hello.cu -o hello
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
sh: line 1: /nix/store/1wvizkjikwvzdy8ni3gxqifflnlmjdw4-cudatoolkit-12.8.0/bin/../nvvm/bin/cicc: No such file or directory

JosW · February 22, 2025, 2:06pm

Perhaps the info on Cuda in this video might be helpfull also?

paperdigits · February 22, 2025, 8:31pm

Then let that carry onto all platforms, seems like your poor attitude persists over at reddit. Very sad.

rjpc · February 22, 2025, 9:18pm

Well, OP has gotten one of the “dedicated” CUDA members defending themself from OP’s unwarranted criticism - that I dislike but OP has issued an apology and if it’s sincere than fine if not whatever. Let bygones be bygones as they say.

numinit · February 22, 2025, 10:10pm

Changed post title (previous was “NixOS has no love for CUDA”) to reflect the new discussion.

And, yeah, the CUDA maintainers are pretty cool folks.

lucassong3000 · February 24, 2025, 4:04am

thanks for the edit, so much to learn

lucassong3000 · February 24, 2025, 9:30am

for any folks looking for the info in future, I got this to work:


# shell.nix
{ pkgs ? import <nixpkgs> { config.allowUnfree = true; } }:
let
  cuda128 = pkgs.stdenv.mkDerivation rec {
    name = "cudatoolkit-12.8.0";
    version = "12.8.0";
    src = ./cuda_12.8.0_570.86.10_linux.run;
    nativeBuildInputs = [ pkgs.autoPatchelfHook pkgs.makeWrapper pkgs.coreutils pkgs.bash ];
    buildInputs = [
      pkgs.stdenv.cc.cc.lib
      pkgs.libxml2
      pkgs.cudaPackages.cuda_cupti

      pkgs.rdma-core
    ];
    autoPatchelfIgnoreMissingDeps = [
      "libmlx5.so.1"
      "libcuda.so.1"
      "libnvidia-ml.so.1"
      "libwayland-client.so.0"
      "libwayland-cursor.so.0"
      "libxkbcommon.so.0"
      "libGLX.so.0"
      "libOpenGL.so.0"
      "libQt6WlShellIntegration.so.6"
    ];
    unpackPhase = ''
      echo "Unpacking Makeself CUDA 12.8.0 archive from $src"
      cp $src cuda.run
      chmod +x cuda.run
      mkdir -p $out
      ./cuda.run --tar xvf -C $out || {
        echo "Warning: --tar xvf failed or incomplete, attempting full extraction"
        mkdir -p tmp
        ./cuda.run --silent --extract=tmp
        mv tmp/* $out/
        rm -rf tmp
      }
      echo "Extracted contents before rearrangement:"
      ls -lh $out

      # Debug: Log the full builds structure
      echo "Builds directory structure:"
      find $out/builds -type d -ls
      find $out/builds -type f -ls

      # Debug: Specifically check for cicc
      echo "Checking for cicc in extracted files:"
      find $out -name "cicc" -ls

      # Create standard CUDA layout
      mkdir -p $out/bin $out/lib $out/include $out/nvvm/bin $out/targets/x86_64-linux/lib $out/targets/x86_64-linux/include

      # Move CUDA binaries
      mv $out/builds/cuda_nvcc/bin/* $out/bin/ 2>/dev/null || true

      # Collect libraries from all relevant subdirs
      for dir in $out/builds/*/lib $out/builds/*/lib64 $out/builds/*/targets/x86_64-linux/lib; do
        if [ -d "$dir" ]; then
          mv "$dir"/* $out/targets/x86_64-linux/lib/ 2>/dev/null || true
        fi
      done

      # Collect headers from all relevant subdirs
      for dir in $out/builds/*/include $out/builds/*/targets/x86_64-linux/include; do
        if [ -d "$dir" ]; then
          mv "$dir"/* $out/targets/x86_64-linux/include/ 2>/dev/null || true
        fi
      done

      # Move NVVM files (ensure cicc is included)
      if [ -d "$out/builds/cuda_nvcc/nvvm/bin" ]; then
        mv $out/builds/cuda_nvcc/nvvm/bin/* $out/nvvm/bin/ 2>/dev/null || true
      fi
      mv $out/builds/cuda_nvcc/nvvm/* $out/nvvm/ 2>/dev/null || true

      # Fix symlinks
      ln -sf $out/targets/x86_64-linux/lib $out/lib/lib64 2>/dev/null || true
      ln -sf $out/targets/x86_64-linux/include $out/include/include 2>/dev/null || true

      # Remove problematic nested symlinks
      rm -rf $out/targets/x86_64-linux/include/include 2>/dev/null || true
      rm -rf $out/targets/x86_64-linux/lib/lib64 2>/dev/null || true

      # Clean up
      rm -rf $out/builds
      echo "Final extracted contents:"
      ls -lh $out $out/bin $out/lib $out/include $out/nvvm $out/targets
      echo "Checking cicc in final layout:"
      ls -lh $out/nvvm/bin/cicc || echo "cicc still missing"
    '';
    installPhase = ''
      echo "Installing CUDA 12.8.0"
      ls -lh $out/bin
      for bin in $out/bin/*; do
        if [ -f "$bin" ] && [ -x "$bin" ]; then
          wrapProgram "$bin" --prefix LD_LIBRARY_PATH : "$out/targets/x86_64-linux/lib:/run/opengl-driver/lib"
        fi
      done
    '';
    postFixup = ''
      echo "Patching libraries:"
      ls -lh $out/targets/x86_64-linux/lib
      for lib in $out/targets/x86_64-linux/lib/*.so; do
        patchelf --set-rpath "$out/targets/x86_64-linux/lib:/run/opengl-driver/lib" $lib 2>/dev/null || true
      done
      for lib in $out/nvvm/lib64/*.so; do
        patchelf --set-rpath "$out/targets/x86_64-linux/lib:$out/nvvm/lib64:/run/opengl-driver/lib" $lib 2>/dev/null || true
      done
    '';
  };
in
pkgs.mkShell {
  buildInputs = [
    cuda128
    pkgs.cudaPackages.cuda_gdb  # Explicitly added cuda-gdb if the run file does not include it or it does not work
  ];
  shellHook = ''
    export CUDA_PATH=${cuda128}
    export LD_LIBRARY_PATH=/run/opengl-driver/lib:${cuda128}/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
  '';
}

and this is the flake:


{
  description = "My CUDA development environment importing shell.nix";

  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
    flake-utils.url = "github:numtide/flake-utils";
  };

  outputs = { self, nixpkgs, flake-utils }:
    flake-utils.lib.eachSystem [ "x86_64-linux" ] (system:
      let
        pkgs = import nixpkgs {
          inherit system;
          config = { allowUnfree = true; };
        };
        # Import shell.nix, passing the Flake's pkgs
        shellEnv = import ./shell.nix { inherit pkgs; };
      in
      {
        # Expose the derivation as a package (optional)
        packages.cuda128 = shellEnv.buildInputs[0];  # cuda128 is the first buildInput
        # Use the shell environment from shell.nix
        devShells.default = shellEnv;
      }
    );
}

with matrix_mul.cu:


#include <stdio.h>

// these are just for timing measurments
#include <time.h>

// error checking macro
#define cudaCheckErrors(msg) \
    do { \
        cudaError_t __err = cudaGetLastError(); \
        if (__err != cudaSuccess) { \
            fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
                msg, cudaGetErrorString(__err), \
                __FILE__, __LINE__); \
            fprintf(stderr, "*** FAILED - ABORTING\n"); \
            exit(1); \
        } \
    } while (0)


const int DSIZE = 8192;
const int block_size = 32;  // CUDA maximum is 1024 *total* threads in block
const float A_val = 3.0f;
const float B_val = 2.0f;

// matrix multiply (naive) kernel: C = A * B
__global__ void mmul(const float *A, const float *B, float *C, int ds) {

  int idx = threadIdx.x+blockDim.x*blockIdx.x; // create thread x index
  int idy = threadIdx.y+blockDim.y*blockIdx.y; // create thread y index

  if ((idx < ds) && (idy < ds)){
    float temp = 0;
    for (int i = 0; i < ds; i++)
      temp += A[idy*ds+i] * B[i*ds+idx];   // dot product of row and column
    C[idy*ds+idx] = temp;
  }
}

int main(){

  float *h_A, *h_B, *h_C, *d_A, *d_B, *d_C;


  // these are just for timing
  clock_t t0, t1, t2;
  double t1sum=0.0;
  double t2sum=0.0;

  // start timing
  t0 = clock();

  h_A = new float[DSIZE*DSIZE];
  h_B = new float[DSIZE*DSIZE];
  h_C = new float[DSIZE*DSIZE];
  for (int i = 0; i < DSIZE*DSIZE; i++){
    h_A[i] = A_val;
    h_B[i] = B_val;
    h_C[i] = 0;}

  // Initialization timing
  t1 = clock();
  t1sum = ((double)(t1-t0))/CLOCKS_PER_SEC;
  printf("Init took %f seconds.  Begin compute\n", t1sum);

  // Allocate device memory and copy input data over to GPU
  cudaMalloc(&d_A, DSIZE*DSIZE*sizeof(float));
  cudaMalloc(&d_B, DSIZE*DSIZE*sizeof(float));
  cudaMalloc(&d_C, DSIZE*DSIZE*sizeof(float));
  cudaCheckErrors("cudaMalloc failure");
  cudaMemcpy(d_A, h_A, DSIZE*DSIZE*sizeof(float), cudaMemcpyHostToDevice);
  cudaMemcpy(d_B, h_B, DSIZE*DSIZE*sizeof(float), cudaMemcpyHostToDevice);
  cudaCheckErrors("cudaMemcpy H2D failure");

  // Cuda processing sequence step 1 is complete

  // Launch kernel
  dim3 block(block_size, block_size);  // dim3 variable holds 3 dimensions
  dim3 grid((DSIZE+block.x-1)/block.x, (DSIZE+block.y-1)/block.y);
  mmul<<<grid, block>>>(d_A, d_B, d_C, DSIZE);
  cudaCheckErrors("kernel launch failure");

  // Cuda processing sequence step 2 is complete

  // Copy results back to host
  cudaMemcpy(h_C, d_C, DSIZE*DSIZE*sizeof(float), cudaMemcpyDeviceToHost);

  // GPU timing
  t2 = clock();
  t2sum = ((double)(t2-t1))/CLOCKS_PER_SEC;
  printf ("Done. Compute took %f seconds\n", t2sum);

  // Cuda processing sequence step 3 is complete

  // Verify results
  cudaCheckErrors("kernel execution failure or cudaMemcpy H2D failure");
  for (int i = 0; i < DSIZE*DSIZE; i++) if (h_C[i] != A_val*B_val*DSIZE) {printf("mismatch at index %d, was: %f, should be: %f\n", i, h_C[i], A_val*B_val*DSIZE); return -1;}
  printf("Success!\n");
  return 0;
}


// compile with: nvcc -I$CUDA_PATH/include -L$CUDA_PATH/lib64 -lcudart -Wno-deprecated-gpu-targets matrix_mul.cu -o run

and the result is :

./run
Init took 0.318279 seconds.  Begin compute
Done. Compute took 1.472699 seconds
Success!

test for GPU:


./hello
Found 1 CUDA device(s)
GPU Name: NVIDIA GeForce RTX 4060 Laptop GPU
Compute Capability: 8.9
Hello from GPU!
Hello from CPU!

problem solved for now

SergeK · February 24, 2025, 5:45pm

We usually start with these two to start localizing the problem:

Outputs of nix run -f "<nixpkgs>" --arg config "{ allowUnfree = true; }" cudaPackages.saxpy (accidentally, but it displays return values of cuda{Runtime,Driver}GetDriver())
In the environment where you observed “weird cuda errors”, run the same thing with LD_DEBUG=libs and show which libcuda.so and libcudart.so are being loaded

Wasn’t my intention to appear defensive. If my assessment is correct, the original post shows some common mistakes, I didn’t want them left unaddressed and indexed by search engines. That said, we should of course revamp the CUDA section of the manual to keep this from happening

lucassong3000 · February 25, 2025, 12:59am

no thats not helpful. it has nothing to do with my local dev environment, but thanks anyways