Petition to build and cache unfree packages on cache.nixos.org

reckenrode · February 19, 2022, 11:12pm

Is there a way to opt out of being cached? Something like license = lib.licenses.nonfreeYesActuallyForReal?

vcunat · February 19, 2022, 11:13pm

hydraPlatforms = []; is possible, but that won’t prevent reverse dependencies to cause it to build.

NobbZ · February 19, 2022, 11:18pm

If the license is distributable = false then it shouldn’t get built and cached, neither on regular nor on nixpkgs-unfree

reckenrode · February 19, 2022, 11:19pm

Thanks, I’ll add that to the derivation I’m working preparing for eventual submission to nixpkgs. It’s an application, so reverse dependencies are unlikely. Upstream bundles Wine on Darwin with the Windows binaries, so I’m working to replace those with the equivalent from nixpkgs. Redistributing the upstream Windows binaries is not permissible, and I wouldn’t want to create a possibility of confusion by having them cached.

reckenrode · February 19, 2022, 11:34pm

I guess I’m confused because the thread is discussing CUDA packages, which are marked licenses.unfree not licenses.unfreeRedistributable. I assume those won’t get cached, but isn’t that what’s being requested?

nixinator · February 19, 2022, 11:50pm

the llama man strikes again! Nice Work!

Jeff minter would be proud of you.

danieldk · February 20, 2022, 10:41am

This is currently a largely unsolvable problem due to how Nix works. It is not just the unfree aspect – both the CUDA and the MKL license do not allow any modification of the binaries. So anyone who is distributing patchelf-ed binaries without NVIDIA or Intel’s blessing is violating their licenses and opens themselves up to legal action. FHS-based distributions do not have this problem, because they can distribute the CUDA libraries unmodified, which is why Ubuntu and EPEL (Fedora/RHEL) can redistribute CUDA. Also, in the case of Ubuntu, Canonical probably has enough clout to make their own licensing agreements with NVIDIA.

They link in CUDA static libraries unmodified and are therefore not distributing CUDA libraries in violation of the license. Also, I am pretty sure that PyTorch (Facebook) and Tensorflow (Google) can negotiate their own licensing agreement with NVIDIA.

IANAL, I think the only approach that could work for Nix is to build packages linking dynamically linking against CUDA, but not caching CUDA packages themselves. Then the user would only have to build CUDA packages locally. Since the derivation hashes don’t change, the dependent libraries (which would be cached) would not be affected. However, this requires Hydra work, since Hydra caches everything (or nothing). It should be easy to set up with Cachix, since you have more control over what is uploaded.

I think at this point NVIDIA couldn’t care less about Nix or NixOS. Nix not supporting CUDA is not going to break their monopoly. The only people that are affected by such policies are end users.

It is easy to say use OpenCL, use AMD/ROCm. But as someone who has suffered through using PyTorch and Tensorflow with ROCm – the performance is simply not there for a lot of applications and the path is riddled with terrible bugs. Reporting them is useless, AMD simply doesn’t care. (Perhaps they do if you drop millions on Instinct Accelerators, but they will run into the same bugs as researchers, so .)

If you are a ML researcher and do not want to waste your life debugging for AMD (who don’t care anyway), you basically have two options: 1. NVIDIA CUDA as the default, and 2. an M1 Pro/Max Mac for training very small networks and doing development [1]

Despite Nix being really nice for reproducibility, I have decided to stop waste time and now just use Fedora on local GPU machines and Ubuntu on cloud instances.

[1] The AMX in M1 CPUs can do insanely fast matrix multiplication and is accessible through the standard gemm interface, so it’s the happy path in many machine learning libraries. I often use the M1 for training smaller convolution networks or sometimes small transformers.

SergeK · February 20, 2022, 11:14am

They link in CUDA static libraries unmodified and are therefore not distributing CUDA libraries [pytorch, tensorflow] in violation of the license.

They ship dynamically-linked libraries (same as you suggest we should do).
Conda at least, unless I’m greatly mistaken, does not redistribute cuda.
It ships a program that automatically downloads it from nvidia.com and installs it on the user’s machine - same as we do.
Furthermore, I think cuda does modify RPATH (it does need its libraries to prefer things in the env over those installed globally):

$ patchelf --print-rpath .conda/pkgs/cudatoolkit-11.3.1-ha36c431_9/lib/libcudart.so
$ORIGIN/.

IANAL, I think the only approach that could work for Nix is to build packages linking dynamically linking against CUDA, but not caching CUDA packages themselves

I think this is exactly what we want

I have decided to stop waste time and now just use Fedora on local GPU machines and Ubuntu on cloud instances

I thought the same, but a few weeks ago my conda-shipped pytorch stopped noticing the GPU again and I’m just tired debugging that

zimbatm · February 20, 2022, 11:36am

I think I was confused too I need to take a step back and re-think if I can make the CUDA part work. For now, the main branch is only building unfree+redistributable packages so that’s fine.

SergeK · February 20, 2022, 12:14pm

By the way, I believe the cudatoolkit derivation can also be reworked to make an accidental redistribution less likely:

Currently cudatoolkit puts .so and .a in the same output. We could move .a into a separate "static" output to secure that some pytorch doesn’t for whatever reason randomly choose to link a piece of cuda statically
There are preferLocalBuild = true and allowSubstitutes = false options that, I understand, could tell nix not to try to look for a built derivation in the remote cache.

(I’d open an issue in nixos/nixpkgs right away, but I haven’t done enough research)

danieldk · February 20, 2022, 2:00pm

Last time I checked, upstream PyTorch packages (uploaded to PyPi) definitely link CUDA and MKL statically. I don’t know about Conda, because I don’t use it.

samuela · February 20, 2022, 10:53pm

Based on my understanding of everything this seems like the best path forward. It doesn’t trigger massive rebuilds for users, sparkly clean on the legal front and doesn’t require modifications to any of the packages.

SergeK · February 24, 2022, 2:13am

Hm. I’m not sure how to reliably verify this. Pytorch’s own CI and conda-forge’s pytorch feedstock look rather terse, I couldn’t infer anything from the actual code. I’ll also can’t say anything about MKL for now

I compared which shared libraries in pypi’s torch and nix-built pytorch link dynamically to cuda (patchelf --print-needed).

PyPi:

~/.conda/envs/test-tf/lib/python3.9/site-packages/torch/lib/libc10_cuda.so:
libcudart-80664282.so.10.2
~/.conda/envs/test-tf/lib/python3.9/site-packages/torch/lib/libcaffe2_detectron_ops_gpu.so:
libcudart-80664282.so.10.2
~/.conda/envs/test-tf/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so:
libcudart-80664282.so.10.2
~/.conda/envs/test-tf/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so:
libcudart-80664282.so.10.2
~/.conda/envs/test-tf/lib/python3.9/site-packages/torch/lib/libcaffe2_nvrtc.so:
libcuda.so.1

Nix:

/nix/store/.../torch/lib/libc10_cuda.so:
libcudart.so.11.0
/nix/store/.../torch/lib/libcaffe2_detectron_ops_gpu.so:
libcudart.so.11.0
/nix/store/.../torch/lib/libtorch_cpu.so:
libcudart.so.11.0
/nix/store/.../torch/lib/libtorch_cuda.so:
libcudart.so.11.0
/nix/store/.../torch/lib/libtorch_global_deps.so:
libcudart.so.11.0
/nix/store/.../torch/lib/libtorch_python.so:
libcudart.so.11.0
/nix/store/.../torch/lib/libcaffe2_nvrtc.so:
libcuda.so.1

I think the parts that are linked statically when pytorch is what’s referred to as “built statically” are the cudart and cudnn/cublas.
I think libcuda.so (the “cuda driver”, whatever that means) always links dynamically.
I think pypi’s torch links dynamically.

That being said, nvcc may be linking some pieces statically. Rather, it almost surely is copying some pieces of the cudatoolkit into target binaries.

… NVIDIA hereby grants you …
3. Distribute those portions of the SDK that are identified in this Agreement as distributable, as incorporated in object code format into a software application that meets the distribution requirements indicated in this Agreement.

We never modify the object code in .a files (confirmed by reading nix edit nixpkgs#cudatoolkit).
We incorporate this object code into the resulting application (pytorch) by running NVIDIA’s own toolchain (nvcc &c) as a blackbox to produce the resulting shared libraries, archives, executables
We do not (plan to) redistribute the patched libcudart.so &c (because we patch it), the driver libcuda.so (because it may not be distributed except in docker images derived from “nvidia docker containers”), or the headers except the few mentioned in the EULA. To that end we can
- Filter cudatoolkit out before pushing the cache to cachix (we do not need to touch hydra or the root nixpkgs repo)
- Manually verify the absence of cuda-things in cachix via its search function available to the owners of the cache
- Add the local build flags to the cudatoolkit derivation as a preventive measure

Ping https://github.com/NixOS/nixpkgs/pull/76233
Ping @zimbatm @samuela @danieldk

AndersonTorres · February 25, 2022, 1:59am

My point here is that it is not our moral problem. We are not bound to a moral or legal contract to serve unfree non-redistributable or otherwise annoying stuff. If we do such things completely for free, it is because we want to do this.

As you yourself have said, there are also legal problems patchelfing Nvidia stuff, therefore we can’t cache them anyway. The end users will be affected anyway.

Further, I have said what I think it is the best solution: outsource this.
Pick someone interested in creating an overlay repo containing all those annoying unfree stuff outside NixOS organization.
Also don’t forget to pick all the satellite stuff - servers, programmers, lawyers…

AndersonTorres · February 25, 2022, 2:04am

For the sake of argument, forget my personal rant about working for free.

What we need to do? Verify bandwidth, disk space, processor cycles and legal advice. Preferably, outside Nixpkgs, in the form of an overlay.

samuela · February 25, 2022, 5:07am

I’m willing to put up the personal energy, time, and legal liability myself. But I need storage and compute.

There’s already a bunch of CUDA stuff in nixpkgs since it’s so fundamental, but ultimately I don’t care if it’s in nixpkgs or an overlay/flake as long as it’s easy to use for users.

danieldk · February 26, 2022, 8:03am

For example:

% nm python3.10/site-packages/torch/lib/libtorch_cuda_cpp.so | grep " cublasSgemm"
00000000086cb010 T cublasSgemmBatched
000000000876eec0 T cublasSgemmEx
00000000086b0850 T cublasSgemm_largek
00000000086cb530 T cublasSgemmStridedBatched
000000000867aa90 T cublasSgemm_v2

T means that the means that the symbol is in the code section.

Yes, because libcuda.so is bound to the kernel driver version, while the rest of the toolkit isn’t.

github.com

pytorch/builder/blob/e7cc06b8d99e672841877db4c228e7c5a8caa4bf/manywheel/build_cuda.sh#L12


      
          
          set -ex
          
          SCRIPTPATH="$( cd "$(dirname "$0")" ; pwd -P ))"
          
          export TORCH_NVCC_FLAGS="-Xfatbin -compress-all"
          export NCCL_ROOT_DIR=/usr/local/cuda
          export TH_BINARY_BUILD=1
          export USE_STATIC_CUDNN=1
          export USE_STATIC_NCCL=1
          export ATEN_STATIC_CUDA=1
          export USE_CUDA_STATIC_LINK=1
          export INSTALL_TEST=0 # dont install test binaries into site-packages
          
          # Keep an array of cmake variables to add to
          if [[ -z "$CMAKE_ARGS" ]]; then
              # These are passed to tools/build_pytorch_libs.sh::build()
              CMAKE_ARGS=()
          fi
          if [[ -z "$EXTRA_CAFFE2_CMAKE_FLAGS" ]]; then
              # These are passed to tools/build_pytorch_libs.sh::build_caffe2()

github.com

pytorch/pytorch/blob/45a042037fc54ce31284e8cea6e28e309804241a/aten/src/ATen/CMakeLists.txt#L388


      
          
            set(CMAKE_C_FLAGS_DEBUG ${OLD_CMAKE_C_FLAGS_DEBUG})
            set(CMAKE_CXX_FLAGS ${OLD_CMAKE_CXX_FLAGS})
          
            # Set these back. TODO: Use SLEEF_ to pass these instead
            set(BUILD_SHARED_LIBS ${__aten_sleef_build_shared_libs} CACHE BOOL "Build shared libs" FORCE)
            set(BUILD_TESTS ${__aten_sleef_build_tests} CACHE BOOL "Build tests" FORCE)
          endif()
          
          if(USE_CUDA AND NOT USE_ROCM)
            if($ENV{ATEN_STATIC_CUDA})
              list(APPEND ATen_CUDA_DEPENDENCY_LIBS
                ${CUDA_LIBRARIES}
                ${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcusparse_static.a
                ${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcurand_static.a
                ${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcublas_static.a
                ${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcufft_static_nocallback.a
                ${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcusolver_static.a
                ${CUDA_TOOLKIT_ROOT_DIR}/lib64/liblapack_static.a     # needed for libcusolver_static
                )
            else()

TLATER · March 1, 2022, 4:12pm

Linking against dynamic libraries is derivative work according to some interpretations, which the cuda license doesn’t permit redistributing. See also why there is LGPL vs GPL. IP law is extremely confusing sometimes.

IANAL, but I think this isn’t as clear-cut as it may seem based on what I know about other similar situations.

SergeK · March 1, 2022, 5:13pm

The agreement specifically introduces a term “[licensee’s] application”. It specifically allows distributing the runtime shared libraries (by giving a list of file names ending unequivocally with .so and .dll) together with an “application”. How otherwise an “application” can interact with these runtime libraries, if not link dynamically?

https://docs.nvidia.com/cuda/eula/index.html

P.S. @danieldk I didn’t post earlier because I didn’t want to create noise in the thread, but thank you a lot for your clarifications, they were very insightful; I suspected I should use nm for comparison, but I didn’t know what exactly to look at

nixinator · March 1, 2022, 5:47pm

Layer 9 problems, basically stifle any innovation at the lower layers…