The current state of ML on nix and nixOS is an unbearable mess. We really do need to improve on that. I would like to use this topic to discuss solutions for how we should split up ML derivations, so that build servers do not have to do as much work.
Issues
I have worked on getting ML packages to work privately for the past month, and I do think I have gained some understanding on how the build process works. I have identified the following issues:
derivation do not properly check that the build environment is valid: CUDA is picky when it comes to compiler versions, using the wrong compiler creates crashes that are sporadic, hard to debug, and usually happen after 100% CPU add RAM utilisation of up-to multiple hours. We must fail early.
nix derivations sometimes depend on specific version of a dependency, even though they would support a never version. This fails because we end up pulling in the same dependency of different versions or worse same version but different derivation inputs. We must separate the nix derivation dependency tree and the python package dependency tree, so that this can be resolved without having to recompile expensive packages.
Currently only the main Cuda version supported by pytorch is available in nixpkgs, this unfortunately means that some GPUs are too old and cannot be used with NixOS for ML development
cudaPackages is not as straight forward as assumed in nixpkgs, some CUDA toolchain parts have their own restrictions in supported compute capability and supported architectures (yes they are separate, ik…)
while there is shared code for different python version, this completely lacks for CUDA specific differences, but because CUDA dependencies exist and are messy this is really required.
Updating part of the ML packages should not easily allow for making the current nixpkgs bugged, i.e. that the supported packages are incompatible with each other, as currently frequently happens and then requires manual intervention.
What I imagine the solution looks like:
Each python package derivation is split into four derivations:
a) source-only derivation
b) build derivation (this has two flavours: build from source, or use existing wheel/bin)
c) high-level derivation
d) python-level derivation
build derivation depend only on packages that they MUST have for building. this way we can avoid having to expensively recompile these (this is especially relevant to build servers as otherwise this exploded combinatorially). This means they may depend on other source-only derivations preferably, or if need be other build derivations. build derivation always depend on their own source’s source-only derivation
high-level derivation do not depend on anything but other high-level derivation. They do not do any hard work, perform fail-early validation if possible to do in a built-in-specific manner. They also by themselves do not provide any build results (as they cannot). In other words, to create a python environment, they must be passed to a function that concretises them.
python-level derivations are created by the concretise function that resolves high-level derivation into build derivation, and makes the specific instance of the python-level derivation depend on both its high-level derivation and concrete build derivation.
the concretise function is always called by the end user. It can only be called with a set of high-level derivations, cuda version (or absense of cuda support), compiler set, required compute capability, and python version. If possible if a expression contains packages created via two calls to the concretise function this should be detected, and result is an early-fail.
Other goals
Whenever possible we should support precompiled wheels
Relevant Links
This is a merged PR that impl separating python runtime and build deps. This seems extremely useful for this here, I am yet to properly investigate tho.
Purpose of this thread
please help me scrutinise these ideas, link other relevant discussion related to this, and provide the insides you have.
Will you be at SCALE this week? Happy to chat about your challenges.
nix derivations sometimes depend on specific version of a dependency, even though they would support a never version. This fails because we end up pulling in the same dependency of different versions or worse same version but different derivation inputs
Nix is designed to pin everything to make it 100% reproducible. This is a feature, not a bug.
Nix PhD thesis
https://edolstra.github.io/pubs/phd-thesis.pdf
Nix original paper
https://edolstra.github.io/pubs/nspfssd-lisa2004-final.pdf
NixOS paper
https://edolstra.github.io/pubs/nixos-jfp-final.pdf
However, if you do want to use newer version of libraries, you absolutely can. In your flake you can do something like this:
And of course if you do find an older package in nixpkgs, once you have your flake version working, it would be awesome if you can submit the pull request for nixpkgs
I am not aware of this event. I am located in Munich/Germany though. Where is it?
I am aware, and that is fine. What I am saying is often we do not need to recompile, as we actually do not depend on variables, but as it is currently we indirectly do, but by splitting this up, we can be reproducible while reusing some compilation derivations, as in this context they depend only on the exact same.
Just depending on a specific git commit is fine, and I know this can work. However, for nearly all common ML libraries wheels are already provided.
Just saying I know how Nix works, and I do read the nixpkgs source. I am doing the same as is done there, to use pre-build wheels and that works. But that is not the issue that I am talking about. The issue is that there are so many cases where stuff doesnt work, and I end up having to fix most packages, in the same way. i.e. repetitive work, that can be generalised
The issue is that there are so many cases where stuff doesnt work, and I end up having to fix most packages, in the same way. i.e. repetitive work, that can be generalised
yeah, i mean, nixpkgs is hard. There are tones of broken packages. Would be great with unlimited resources and time, to constantly scan the repo, mark broken ones, and use some automation to create pull requests to fix them. Similarly, some LLM help on nixpkgs pull request reviews would be great also.
But it’s all a matter of priorities and resources. For example, is fixing broken packages more important than fixing broken runners? e.g. This PR stuck waiting for runners for >24 hours
this is exactly what I am suggesting, create infrastructure that makes maintaining LLM stuff easier, and catches broken derivation asap
anyway, what i described above is the endgoal. I have been working on this for my own use, and will share what I have once it works for the packages I use.
it may even make sense to maintain LLM outside of the nixpkgs tree, as for LLM dev is important that I can use the packages that I need and work together, not the latest packages. and it should not matter what recent version of nix/nixOS one is on
rn I am focussing exclusively at the interplay of cuda, torch, python, and torch dependent packages
but I do plan in the future to make sure that local llm runners also work, because i will need that in about 2 months time
neither do I, but this mostly would become relevant once stuff is upstreamed into nixpkgs. before that it would essentially just building the nix derivations locally. I have some root servers with alotta ram, so my own builds will probably be build there. then there is cachix and as far as I understand they provide a nix cache for free for open source nix derivations, so the build artefacts would be pushed there and available for the general public to use
…Yes, is five years of non-stop merging of technical debt, after half a decade of dormant zombie-like existence.
There’s nothing there to be supported, you already can use them as is, e.g. with nix-ld, as well as with python3XPackages and with autoPatchelf. They are no more broken on nixos than on an ubuntu.
absolutely amazing! i will take a look at this, and hope I didnt reimpl half of what is done there, because I did not find this preexisting work!
What I meant to say is, that pulling in the wheels i.e. have a -bin variant of a pypkg should be standard and the infrastructure should make it easy to maintain such derivations. of course you can always write your own derivation and add it.
What I have done so far, is to write a python script that based on github release tags, parses all wheels and their hashes, and generates a binary-hashes.nix that encodes all the parsed information and easily allows resolving the wheels to then be used with autoPatchelf, similar how its done in nixpkgs for torch-bin
I have now impl quite a lot of the above goal (yes i use LLMs, but I do read the code afterwards, still rn this is WIP so this is only on a best-effort bases)
i am now able to build from bin or source a compatible set of torch, flash-attn, mamba-ssm, and causal-conv1d, while being able to specify the python, cuda, torch, and a special cuda-packages version for pascal gpus.
(comment says otherwise but cuda 13.0 is supported technically, just missing cuda-packages v13 to be provided)
for now I will just have this so stuff works for myself, meaning the packages that I use.