Petition to build and cache unfree packages on cache.nixos.org

Then beware of keeping non-redistributable stuff in closures. Otherwise the current substitution algorithm wouldn’t work well I think, even if you somehow convinced the cache to keep some holes in its closures.

Ok, now I’m even more confused… It appears that python3Packages.pytorch is cached on cache.nixos.org even though it links against CUDA/cuDNN as well. Why is PyTorch cached but tensorflowWithCuda and jaxlibWithCuda are not?

For example,

$ nix-shell -p python3Packages.pytorch
...
copying path '/nix/store/klmb0q1kyd8k00n0mfzyhjmvkz54vzvf-python3.9-pytorch-1.9.0' from 'https://cache.nixos.org'...
copying path '/nix/store/f5ghfax668nswhh1r60j4c5dyxbyhjps-python3.9-pytorch-1.9.0-dev' from 'https://cache.nixos.org'...
...
1 Like

From the pytorch-bin package:

    # Includes CUDA and Intel MKL, but redistributions of the binary are not limited.
    # https://docs.nvidia.com/cuda/eula/index.html
    # https://www.intel.com/content/www/us/en/developer/articles/license/onemkl-license-faq.html
    license = licenses.bsd3;

I agree, it’s odd though.

So presumably tensorflowWithCuda and jaxlibWithCuda could be made redistributable as well?

Well, I included the license bit, to demonstrate that it’s odd. It has a bsd3 license, when it should probably have a unfreeRedistributable license.

One option would be to have a config.acceptCudaLicense, and just have the meta check it before someone attempts to install.

2 Likes

The way we handle licenses in Nixpkgs is just not correct. The license that is added in meta is typically the license of the source. The license of the built artifact is not necessarily the same.
https://github.com/NixOS/nixpkgs/issues/106471

8 Likes

The meta.license system was never designed to handle such details. It was just a dumb string not too long ago.

Ok, my solution for now is just to give up on the source builds entirely (related commit). This may not be the nix-iest way but these massive builds make it impossible for me to be productive otherwise. I’ve already spent way too much time on this…

It’s a shame, because I really believe that Nix/NixOS has a lot to offer the reproducible research community. But without a “just works” solution for CUDA packages I can’t recommend Nix to other ML people in good conscience.

8 Likes

I think that’s fair. But I would also like to point out that cuda and similar frameworks have always been awkward when used in tandem with FOSS. If nvidia had cuda under mpl where they reserved patent rights, but allowed for a permissive license, this would be a non-issue.

6 Likes

Yeah, NVIDIA is not without fault here. They haven’t been the best community members in Linux/OSS world to say the least…

Unfortunately dealing with CUDA is just a necessary reality for the kind of work that I do. And as much as possible I’d like to do that work with NixOS :stuck_out_tongue:

3 Likes

@samuela if this is solvable for your particular use case with resources, and a binary cache of a few libraries is the last barrier to making reproducible research in (some area of) ml much better (using Nix), there are certainly those of us in academia who would be able to help out

What is expensive for an individual to front can often be really negligeable in an academic setting (especially given free bandwidth and spare servers).

Please DM or email me if you want to see if we can make this happen!

14 Likes

I will be a bit egotistical here, but why should us cache unfree software? Our lives are already harder because they are not open source.

Why should us make the lives of unfree software projects easier by giving them bandwidth, disk space and processor cycles for free?

3 Likes

@AndersonTorres How would you suggest that users do machine learning on NixOS?

(Also not to nitpick but in many cases unfree != open source != non-redistributable.)

4 Likes

Do not ask me. However, you have asked, so I will answer.

I am on the same side of Linus and his advice to Nvidia here. We should not bow to them merely because they monopolize the market.

If you are taking a more pragmatical way, just try to convince people with bandwidth and disk space and money to pay the eventual lawyers and money to pay the eventual sysadmins about your idea. I am only a humble user of an AMD Ryzen while I am trying to save some money for a RISC-V desktop…

1 Like

If people wanted to run an inhouse hydra server which creates the unfreeRedistributable jobs, then an example can be found https://hydra.jonringer.us/jobset/nixpkgs/nixpkgs-master-unfree

New jobs: https://hydra.jonringer.us/eval/211?compare=master#tabs-new

Don’t get me wrong, I’m not a huge fan of NVIDIA either but I don’t think telling people to fuck off is a practical solution. Regardless of whether we like it or not, a lot of people are forced to work with CUDA.

4 Likes

I previously worked at Azure ML, so I sympathize. CUDA is largely ubiquitous in the machine learning community.

If you’re doing any model training, you have to use GPU acceleration; and right now that’s only nvidia’s CUDA toolkit.

8 Likes
  1. Long-term, these people do not push open-source upstreams to OpenCL adoption that much.

  2. Short-term, these are the people who throw quite a lot of compute at problems anyway, they should be able to cooperate for a CUDA-specific buildfarm.

ML people addressing either of the points is preferable to the non-ML Nix users to Hydra building CUDA stuff (one because of general free software availability, the other because of appearance of a large and cooperatively-operated example of an independent Hydra and binary cache, presumably with things to learn from them)…

The only solution is for nixos foundation to buy Nvida out right, and open source all there stuff, get rid of the reg and pays walls and then just put it on hydra … job done.

8 Likes

OpenCL vendor implementations are generally not open source. OpenCL is slower than CUDA. Even OpenCL’s own author, Apple, has abandoned it in favor of Metal now. OpenCL/OpenGL/<other OSS toolchain> just don’t cut it these days, esp. in ML.

I feel that there’s this misconception that everyone in ML must be rich and have money/compute to blow left and right. As an ML researcher in academia, I can assure you this is far, far from the truth. I cannot afford to blow 80 CPU-hours recompiling tensorflowWithCuda every time some tiny package in its dependency tree changes. It’s just not sustainable.

2 Likes