Petition to build and cache unfree packages on cache.nixos.org

Thanks, I’ll add that to the derivation I’m working preparing for eventual submission to nixpkgs. It’s an application, so reverse dependencies are unlikely. Upstream bundles Wine on Darwin with the Windows binaries, so I’m working to replace those with the equivalent from nixpkgs. Redistributing the upstream Windows binaries is not permissible, and I wouldn’t want to create a possibility of confusion by having them cached.

I guess I’m confused because the thread is discussing CUDA packages, which are marked licenses.unfree not licenses.unfreeRedistributable. I assume those won’t get cached, but isn’t that what’s being requested?

the llama man strikes again! Nice Work!

Jeff minter would be proud of you.

1 Like

This is currently a largely unsolvable problem due to how Nix works. It is not just the unfree aspect – both the CUDA and the MKL license do not allow any modification of the binaries. So anyone who is distributing patchelf-ed binaries without NVIDIA or Intel’s blessing is violating their licenses and opens themselves up to legal action. FHS-based distributions do not have this problem, because they can distribute the CUDA libraries unmodified, which is why Ubuntu and EPEL (Fedora/RHEL) can redistribute CUDA. Also, in the case of Ubuntu, Canonical probably has enough clout to make their own licensing agreements with NVIDIA.

They link in CUDA static libraries unmodified and are therefore not distributing CUDA libraries in violation of the license. Also, I am pretty sure that PyTorch (Facebook) and Tensorflow (Google) can negotiate their own licensing agreement with NVIDIA.

IANAL, I think the only approach that could work for Nix is to build packages linking dynamically linking against CUDA, but not caching CUDA packages themselves. Then the user would only have to build CUDA packages locally. Since the derivation hashes don’t change, the dependent libraries (which would be cached) would not be affected. However, this requires Hydra work, since Hydra caches everything (or nothing). It should be easy to set up with Cachix, since you have more control over what is uploaded.

I think at this point NVIDIA couldn’t care less about Nix or NixOS. Nix not supporting CUDA is not going to break their monopoly. The only people that are affected by such policies are end users.

It is easy to say use OpenCL, use AMD/ROCm. But as someone who has suffered through using PyTorch and Tensorflow with ROCm – the performance is simply not there for a lot of applications and the path is riddled with terrible bugs. Reporting them is useless, AMD simply doesn’t care. (Perhaps they do if you drop millions on Instinct Accelerators, but they will run into the same bugs as researchers, so :person_shrugging:.)

If you are a ML researcher and do not want to waste your life debugging for AMD (who don’t care anyway), you basically have two options: 1. NVIDIA CUDA as the default, and 2. an M1 Pro/Max Mac for training very small networks and doing development [1]

Despite Nix being really nice for reproducibility, I have decided to stop waste time and now just use Fedora on local GPU machines and Ubuntu on cloud instances. :sob:

[1] The AMX in M1 CPUs can do insanely fast matrix multiplication and is accessible through the standard gemm interface, so it’s the happy path in many machine learning libraries. I often use the M1 for training smaller convolution networks or sometimes small transformers.

9 Likes

They link in CUDA static libraries unmodified and are therefore not distributing CUDA libraries [pytorch, tensorflow] in violation of the license.

They ship dynamically-linked libraries (same as you suggest we should do).
Conda at least, unless I’m greatly mistaken, does not redistribute cuda.
It ships a program that automatically downloads it from nvidia.com and installs it on the user’s machine - same as we do.
Furthermore, I think cuda does modify RPATH (it does need its libraries to prefer things in the env over those installed globally):

$ patchelf --print-rpath .conda/pkgs/cudatoolkit-11.3.1-ha36c431_9/lib/libcudart.so
$ORIGIN/.

IANAL, I think the only approach that could work for Nix is to build packages linking dynamically linking against CUDA, but not caching CUDA packages themselves

I think this is exactly what we want

I have decided to stop waste time and now just use Fedora on local GPU machines and Ubuntu on cloud instances

I thought the same, but a few weeks ago my conda-shipped pytorch stopped noticing the GPU again and I’m just tired debugging that :man_facepalming:

2 Likes

I think I was confused too :sweat_smile: I need to take a step back and re-think if I can make the CUDA part work. For now, the main branch is only building unfree+redistributable packages so that’s fine.

1 Like

By the way, I believe the cudatoolkit derivation can also be reworked to make an accidental redistribution less likely:

  • Currently cudatoolkit puts .so and .a in the same output. We could move .a into a separate "static" output to secure that some pytorch doesn’t for whatever reason randomly choose to link a piece of cuda statically
  • There are preferLocalBuild = true and allowSubstitutes = false options that, I understand, could tell nix not to try to look for a built derivation in the remote cache.

(I’d open an issue in nixos/nixpkgs right away, but I haven’t done enough research)

Last time I checked, upstream PyTorch packages (uploaded to PyPi) definitely link CUDA and MKL statically. I don’t know about Conda, because I don’t use it.

Based on my understanding of everything this seems like the best path forward. It doesn’t trigger massive rebuilds for users, sparkly clean on the legal front and doesn’t require modifications to any of the packages.

3 Likes

Hm. I’m not sure how to reliably verify this. Pytorch’s own CI and conda-forge’s pytorch feedstock look rather terse, I couldn’t infer anything from the actual code. I’ll also can’t say anything about MKL for now

I compared which shared libraries in pypi’s torch and nix-built pytorch link dynamically to cuda (patchelf --print-needed).

PyPi:

~/.conda/envs/test-tf/lib/python3.9/site-packages/torch/lib/libc10_cuda.so:
libcudart-80664282.so.10.2
~/.conda/envs/test-tf/lib/python3.9/site-packages/torch/lib/libcaffe2_detectron_ops_gpu.so:
libcudart-80664282.so.10.2
~/.conda/envs/test-tf/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so:
libcudart-80664282.so.10.2
~/.conda/envs/test-tf/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so:
libcudart-80664282.so.10.2
~/.conda/envs/test-tf/lib/python3.9/site-packages/torch/lib/libcaffe2_nvrtc.so:
libcuda.so.1

Nix:

/nix/store/.../torch/lib/libc10_cuda.so:
libcudart.so.11.0
/nix/store/.../torch/lib/libcaffe2_detectron_ops_gpu.so:
libcudart.so.11.0
/nix/store/.../torch/lib/libtorch_cpu.so:
libcudart.so.11.0
/nix/store/.../torch/lib/libtorch_cuda.so:
libcudart.so.11.0
/nix/store/.../torch/lib/libtorch_global_deps.so:
libcudart.so.11.0
/nix/store/.../torch/lib/libtorch_python.so:
libcudart.so.11.0
/nix/store/.../torch/lib/libcaffe2_nvrtc.so:
libcuda.so.1

I think the parts that are linked statically when pytorch is what’s referred to as “built statically” are the cudart and cudnn/cublas.
I think libcuda.so (the “cuda driver”, whatever that means) always links dynamically.
I think pypi’s torch links dynamically.

That being said, nvcc may be linking some pieces statically. Rather, it almost surely is copying some pieces of the cudatoolkit into target binaries.

… NVIDIA hereby grants you …
3. Distribute those portions of the SDK that are identified in this Agreement as distributable, as incorporated in object code format into a software application that meets the distribution requirements indicated in this Agreement.

  • We never modify the object code in .a files (confirmed by reading nix edit nixpkgs#cudatoolkit).
  • We incorporate this object code into the resulting application (pytorch) by running NVIDIA’s own toolchain (nvcc &c) as a blackbox to produce the resulting shared libraries, archives, executables
  • We do not (plan to) redistribute the patched libcudart.so &c (because we patch it), the driver libcuda.so (because it may not be distributed except in docker images derived from “nvidia docker containers”), or the headers except the few mentioned in the EULA. To that end we can
    • Filter cudatoolkit out before pushing the cache to cachix (we do not need to touch hydra or the root nixpkgs repo)
    • Manually verify the absence of cuda-things in cachix via its search function available to the owners of the cache
    • Add the local build flags to the cudatoolkit derivation as a preventive measure

Ping Add NVIDIA licenses by tbenst · Pull Request #76233 · NixOS/nixpkgs · GitHub
Ping @zimbatm @samuela @danieldk

1 Like

My point here is that it is not our moral problem. We are not bound to a moral or legal contract to serve unfree non-redistributable or otherwise annoying stuff. If we do such things completely for free, it is because we want to do this.

As you yourself have said, there are also legal problems patchelfing Nvidia stuff, therefore we can’t cache them anyway. The end users will be affected anyway.

Further, I have said what I think it is the best solution: outsource this.
Pick someone interested in creating an overlay repo containing all those annoying unfree stuff outside NixOS organization.
Also don’t forget to pick all the satellite stuff - servers, programmers, lawyers…

1 Like

For the sake of argument, forget my personal rant about working for free.

What we need to do? Verify bandwidth, disk space, processor cycles and legal advice. Preferably, outside Nixpkgs, in the form of an overlay.

I’m willing to put up the personal energy, time, and legal liability myself. But I need storage and compute.

There’s already a bunch of CUDA stuff in nixpkgs since it’s so fundamental, but ultimately I don’t care if it’s in nixpkgs or an overlay/flake as long as it’s easy to use for users.

4 Likes

For example:

% nm python3.10/site-packages/torch/lib/libtorch_cuda_cpp.so | grep " cublasSgemm"
00000000086cb010 T cublasSgemmBatched
000000000876eec0 T cublasSgemmEx
00000000086b0850 T cublasSgemm_largek
00000000086cb530 T cublasSgemmStridedBatched
000000000867aa90 T cublasSgemm_v2

T means that the means that the symbol is in the code section.

Yes, because libcuda.so is bound to the kernel driver version, while the rest of the toolkit isn’t.

3 Likes

Linking against dynamic libraries is derivative work according to some interpretations, which the cuda license doesn’t permit redistributing. See also why there is LGPL vs GPL. IP law is extremely confusing sometimes.

IANAL, but I think this isn’t as clear-cut as it may seem based on what I know about other similar situations.

1 Like

The agreement specifically introduces a term “[licensee’s] application”. It specifically allows distributing the runtime shared libraries (by giving a list of file names ending unequivocally with .so and .dll) together with an “application”. How otherwise an “application” can interact with these runtime libraries, if not link dynamically?

https://docs.nvidia.com/cuda/eula/index.html

P.S. @danieldk I didn’t post earlier because I didn’t want to create noise in the thread, but thank you a lot for your clarifications, they were very insightful; I suspected I should use nm for comparison, but I didn’t know what exactly to look at

1 Like

Layer 9 problems, basically stifle any innovation at the lower layers…

1 Like

If this were the case then tensorflow, pytorch, jaxlib, and every other package on PyPI that ever links against CUDA would be illegal. I am also not a lawyer, but I’m fairly certain that Google/Facebook’s lawyers must have signed off on their distribution.

1 Like

Relevant thread:

2 Likes

I understand this is around 2 years old but it seems like the issue is still ongoing especially for tensorflowWithCuda. I even tried using https://cuda-maintainers.cachix.org but it still forced a full recompile of everything when setting cudaSupport = true.

I went around it a different way by changing my tensorFlow derivation to use the wheel

  mtensorflowWithCuda = buildPythonPackage rec {
    pname = "tensorflow";
    version = "2.14.1";
    format = "wheel";

    src = fetchurl {
      name = "${pname}-${version}-py3-none-any.whl";
      url = "https://files.pythonhosted.org/packages/99/77/4f31cd29cab69ebc344a529df48b91a14543a83b6fb90efbf82db29a34be/tensorflow-2.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl";
      sha256 = "sha256-mpVcQhZO/011FzLBJ0ykvwWdtgyeI2IJjOHu1xd8P+k=";
    };

    nativeBuildInputs = [ pyPkgs.wheel ] ++ lib.optionals stdenv.isLinux [ autoPatchelfHook ];

    # tensorflow/tools/pip_package/setup.py
    propagatedBuildInputs = with pyPkgs; [
      absl-py
      # abseil-cpp
      astunparse
      flatbuffers
      gast
      google-pasta
      grpcio
      h5py
      keras-preprocessing
      numpy
      opt-einsum
      packaging
      # protobuf-python
      six
      tensorflow-estimator-bin
      termcolor
      typing-extensions
      wrapt
      scipy
      dm-tree
      # No longer in 310 packages, had to be copied
      # from upstream's 311 packages
      (pyPkgs.callPackage (import ./mldtypes.nix) { })
      (pyPkgs.callPackage (import ./keras.nix) { })
    ] ++ lib.optionals withTensorboard [
      tensorboard
    ];

    # During installation it can't find the deps provided above
    # but if we disable this, can assert the module works after
    pipInstallFlags = "--no-deps";

    postFixup = ''
      find $out -type f \( -name '*.so' -or -name '*.so.*' \) | while read lib; do
        # addOpenGLRunpath "$lib"
        echo [MANUAL] patching $lib

        patchelf --set-rpath "${cudatoolkit}/lib64:${cudatoolkit.lib}/lib:${cudnn_8_7}/lib:${nccl}/lib:$(patchelf --print-rpath "$lib")" "$lib"
      done
    '';

    doCheck = false;
  };

I had to override a couple of packages to get specific versions that work with TF 2.14, but it worked ok so far and did not require a full recompile.

I’ve shared this for two reasons

  1. In case anyone else was still looking for a hacky solution without switching to virtual env and keep a full nix build
  2. See if we can take advantage of wheels more often to avoid building from source, as great as it would be to have a build script from source, for certain components like this that aren’t cached by Hydra, its infeasible and would be great if we can pull the wheel along with its dependencies (it took me nearly a full day to get to this point, mainly stuck with getting cuda bindings to eventually work)
3 Likes