Overlaying packages using cuda to use zluda

kiara · May 13, 2024, 9:09pm

recently we had a zluda package hit nixpkgs, and i was wondering what might be a good way to use this with nixpkgs.

as per its contents, it looks like a bunch of cuda drop-in .so files:

tree /nix/store/i7sqr79gd0i1m9ml2cv6vb8d7k5b2wqv-zluda-3/
/nix/store/i7sqr79gd0i1m9ml2cv6vb8d7k5b2wqv-zluda-3/
└── lib
    ├── libcublas.so
    ├── libcublas.so.10 -> libcublas.so
    ├── libcublas.so.11 -> libcublas.so
    ├── libcuda.so -> libnvcuda.so
    ├── libcuda.so.1 -> libnvcuda.so
    ├── libcudnn.so
    ├── libcudnn.so.7 -> libcudnn.so
    ├── libcudnn.so.8 -> libcudnn.so
    ├── libcufft.so
    ├── libcufft.so.10 -> libcufft.so
    ├── libcusparse.so
    ├── libcusparse.so.11 -> libcusparse.so
    ├── libnccl.so
    ├── libnccl.so.2 -> libnccl.so
    ├── libnvcuda.so
    ├── libnvidia-ml.so -> libnvml.so
    ├── libnvidia-ml.so.1 -> libnvml.so
    ├── libnvml.so
    └── libzluda_dump.so

based on this, i tried for a bit to make an overlay to make anything looking for cuda look in zluda instead:

{pkgs, lib, ...}:

final: prev: {
  cudaPackages = prev.cudaPackages.overrideScope (_final: _prev: let inherit (prev) zluda; in lib.genAttrs ["cudatoolkit" "cudnn" "libcusparse" "libcufft" "libnccl" "cuda_nvml_dev"] (_: zluda) // {
    backendStdenv = prev.stdenv;
    cuda_cccl = {
      dev = zluda;
    };
    cuda_cudart = {
      dev = zluda;
      lib = zluda;
      static = zluda;
    };
    libcublas = {
      dev = zluda;
      lib = zluda;
      static = zluda;
    };
  });
};

i tried this with local-ai a bit, but that one seemed maybe somewhat ambitious still, if not outside the scope of those libraries currently patched by zluda.
i guess i wonder: would such an overlay even be a sensible approach? or might it make more sense to tackle zluda-patching at the level of such package derivations?

has anyone found packages that were easier to patch for zluda?

/cc @errnoh

hugosenari · May 15, 2024, 8:43pm

Try reaching CUDA Team | Nix & NixOS (@samuela @SergeK and @ConnorBaker)

If they find any solution, would be nice to have it Nixpkgs Reference Manual

ConnorBaker · May 16, 2024, 1:06am

Looks interesting! Unfortunately I’m wrapped up in cuda-modules: fixed output derivations and new modules by ConnorBaker · Pull Request #306172 · NixOS/nixpkgs · GitHub at the moment.

But it’d be interesting to see how to package it similar to the CUDA redistributables (not the CUDA Toolkit runfile installer, which is gigantic).

Atemu · May 16, 2024, 2:46am

Every package that properly uses CUDA via the driver runpath should be able to discover ZLUDA instead when you add it to hardware.opengl.extraPackages in place of CUDA.

Though do note that a dep of ZLUDA that takes insanely long to build (rocmPackages.composable_kernel) is currently uncached or broken. You can test using the original commit though.

kiara · May 16, 2024, 5:23pm

@Atemu i guess nixpkgs would tend to use cuda as a package input rather than by that setting right?

i def noticed the zluda build , felt kinda confused stuff was uncached on nixpkgs-unstable. in the morning waking up i found that not just my laptop was still building, but i could also smell it overheating.

i figured i should prob push my zluda build to a cachix instance if that’s what’s needed to be able to safely build again on a reinstall, then filed explain cache misses · Issue #10695 · NixOS/nix · GitHub to deal with the confusion around caching failure, tho that turned out not viable it seems.

SergeK · May 16, 2024, 6:07pm

This is interesting. My understanding is that zluda ships a number of host libraries that can either transpile cuda PTX code into something that can be interpreted by ROCm devices, or directly delegate to analogous ROCm kernels e.g. in case of BLAS functions. Zluda doesn’t seem to ship headers or static libraries. One builds a program with PTX kernels using nvcc, and then uses zluda’s libcuda “driver” at runtime. I’m not immediately sure if there’s much to be gained by rewriting cudaPackages with zluda compared to exposing the latter through the higher-priority /run/opengl-driver/lib as @Atemu suggests (or LD_LIBRARY_PATH, as shown in the zluda PR). This would also cause fewer rebuilds.

If you go with the overlay approach, you’d need to modify lib but keep the dev outputs intact. I’m not sure how to go about static. E.g. cmake-based projects will probably expect libcudart_static.a to exist, but libcudart_static.a from the original cudaPackages.cuda_cudart will contain nvidia’s real code which might or might not result in a conflict. I tried patching lib independently of other outputs, but magma currently fails with this overlay and I can’t iterate on this just now: zluda-pkgs.nix · GitHub