If you don’t mind I would love to hear how this goes. Good luck!
I am going to be contrarian and say that they are better served by Conda. You don’t provide any background, but I assume that they are scientists that do some for of numerical computing (e.g. machine learning or some other form of modeling). And given that they are using Conda, it is likely that they use Python for their work. Nix has the following issues that Conda does not have:
We do not build scientific/ML libraries with CUDA or MKL enabled, because they are unfree. So to get the best performance, you need to rebuild all packages with CUDA or MKL support. Unfortunately these are typically also the packages with the worst compile times. So besides switching to Nix, you will need to set up a binary cache, and get everyone to use it. And then hope that they don’t use another revision for which no builds are cached.
Our macOS support is pretty good given the barriers that Apple introduces, but it is not great. Things break regularly. ‘Fringe’ scientific packages probably do not get a lot of testing on macOS. And then there are issues like our macOS MKL version being quite far behind (for annoying reasons, IIRC, they now distribute MKL as APFS disk images).
Python package versioning is a mess. However, you don’t really notice this when you are in the upstream Python ecosystem, because Python package managers use version constraint solvers and usually it’s possible to find some solution with a given set of version constraints. However, given how nixpkgs works, we only have one version of each Python package. So, you will often find that you need a version of a package that is not compatible with what we have in nixpkgs. So, people need to make their own overrides and/or make their own derivation repositories.
As you said, if you can’t convince HPC cluster maintainers to install Nix, it’s going to be a bad experience. My experience with HPC is that they tend to run very old OSes (one or two generations old CentOS or Scientific Linux). Perhaps you can get somewhere with nix-user-chroot, but often these older systems do not support user namespaces or do not have support for user namespaces enabled.
If they run into problems, it is much more likely that they will get relevant help quickly with Conda, since it has a very large community. Moreover, if they get code from or collaborate with other scientists, it’s also likely that they use Conda.
A significant part of this (more than half?) is a demonstration of how to install NixOS. I find this an odd choice (but then I don’t know the context of this presentation), because my angle is more along the lines of:
I don’t care what OS you use, if you install Nix on it, I can guarantee that everything you ever need in this project Just Works, without you having to lift a finger.
Yes, I also need to look into exactly what the deal is with enabling SSE, AVX, etc. Any pointers on how to write derivations which will use the most performant versions matching the underlying hardware? (And how does Conda deal with this problem?)
I’m assuming that this will involve providing a binary cache with various versions. Which brings me to another can of worms: As I said above, my selling point is “if you install Nix, I can take care of everything else”, but if I then have to turn around and say things like “oh, but you have to enable this cache on each machine you want to use, yourself”, that drastically reduces the value of the whole proposition, not only for them, but also for me as provider. I imagine that there’s no way that a cache can be enabled from within an arbitrary derivation, as the security implications seem quite drastic.
As for Conda, I’ve been avoiding it like the plague for almost 2 years, trying to do everything I can with Nix. Maybe my memory is hazy, or maybe things have changed in the meantime, but I recall that non-Python software installed via Conda tended to be a bit hit-or-miss: sometimes it just didn’t run at all, sometimes it did, but not very well. And even Python stuff that depended on Qt wasn’t all that reliable. Nix seems much more robust in that respect. Having said that, I did manage to get a ****ing
qt.qpa.plugin: Could not find the Qt platform plugin "xcb" in ""
again, this week (for the first time in many months).
Edit: and the
Reinstalling the application may fix this problem.
that follows it, is an amusing insult added to the injury, when you’re on Nix!
These are excellent questions and I’ve been meaning to look into how Nix can/could be better than Conda.
I’d start by emailing your audience in advance (if you have that option) and ask them 1-3 problems they find in Conda. Or google around for blog posts complaining about Conda. That should give you good insight into what to talk about in context of how Nix can do it better.
I’m working on macOS funding campaign to which a few companies already committed to contribute. The goal is to fund Nix development related to macOS so that with time we’ll have excellent support, which as you said is really important for developer environments.
Oh and there’s also GitHub - nix-community/nix-data-science: Standard set of packages and overlays for data-scientists [maintainer=@tbenst] with a slack that has already some people working on sci packages.
I’m happy to jump on a call to see if I can help somehow.
Most good libraries select optimized kernels based on the CPUID instruction, but there are exceptions. You can also build derivations with specific
mtune flags. There was a recent discussion about this:
They distribute MKL:
IANAL, but I think the MKL license would allow us to redistribute it, the only icky part is that it requires that MKL is not modified and the question is whether
patchelf'ing is modification. The other roadblock is that Hydra does not build non-free software packages.
It’s pretty much the same story with CUDA. The CUDA Toolkit is distributed through Anaconda:
Licensing is more complicated for the whole CUDA stack, as far as I remember you cannot redistribute the driver libraries outside very specific circumstances (through the NVIDIA-provided Docker container). But I think most of the toolkit can be redistributed.
Same here. I strongly prefer Nix and I think it is better, but it requires an investment. It’s like writing machine learning code in Rust. I love it, but I would still recommend most of my colleagues to stick with Python + PyTorch for the time being. Doing machine learning in Rust is going off the beaten path and as a result, you have to do a lot of the plumbing yourself.
The are many paper cuts, but our current biggest problem  is that we do not have packages built against MKL and CUDA in our binary cache. This makes the user experience miserable. With conda or plain old pip, you are up and running in a few seconds. With nixpkgs, you are first off to one or two hours of builds. That is if the packages build at all, since we don’t build them in Hydra regressions are not always noticed.
CUDA is pretty much essential to any kind of high-performance numerical computing. MKL is still the dominant BLAS/LAPACK library, because it is faster in a lot of scenarios. E.g. a few month ago I benchmarked some of my transformer models again with various BLAS libraries, MKL was ~2 times faster than the best competition (OpenBLAS), due to having specific optimizations (e.g. batched GEMM). Another issue is that OpenBLAS is not usable in some cases, because it cannot be used in multi-threaded applications.
Another issue is that even without MKL and CUDA, builds of some libraries frequently fail, because there are still some pre-SSE4.2 machines in the Hydra cluster. So, even if you intend to use e.g. our PyTorch builds without MKL or CUDA, it frequently gets built locally because it is not in the binary cache.
 In my experience, machine learning + NLP pays my bills.
As others already said there isn’t a general yes / no answer to this. One data point: we run a lot of numerical code and for us nix is great because:
- some people in team know nix
- we almost exclusively run Linux
- we run a lot of other stuff that’s not python/numerical code
But I know that one of our competitors tried nix and it was universally reviled:
- bad experience on OSX
- no windows support
- often difficult to run software where you don’t have source (ubiquitous /bin/bash)
- if you don’t have someone championing nix in the team and willing to iron out rough edges it’ll be a downward spiral of frustration and failure to bend nix to your will
IMO the last point is probably the most important. Just dumping nix on a team will almost certainly fail, but being there and helping with whatever weird situation people get stuck in could work. If you can’t provide that go with conda.
One thing to remember about conda is they changed their licensing last year:
“We clarified our definition of commercial usage in our Terms of Service in an update on Sept. 30, 2020. The new language states that use by individual hobbyists, students, universities, non-profit organizations, or businesses with less than 200 employees is allowed, and all other usage is considered commercial and thus requires a business relationship with Anaconda.”
(from Anaconda | Anaconda Commercial Edition FAQ). This is a big difference.
I did use conda in the past. It was definitely better than plain pip but still had its own issues. See my reply in a similar thread Data Science on nixos nix, poetry, pip, mach-nix, pynixify: - all fail? - #29 by alexv. The summary of my setup is in the same thread: Data Science on nixos nix, poetry, pip, mach-nix, pynixify: - all fail? - #11 by alexv. I also remember reading something about the low quality of the SAT solver they use to figure out the dependencies and the inconsistent versioning if you use additional channels which explained those massive downgrades I experienced.
We really should retire those super-old pre SSE4.2 machines.
I’m not following this, why is our policy not to distribute those, while pip and conda can? We do have
unfreeRedistributable for such cases.
builds of some libraries frequently fail, because there are still some pre-SSE4.2 machines in the Hydra cluster .
This can be mitigated by either using some of the existing machine features such as
big-parallel or we add another one like SSE42 to skip such hardware.
The good news all of these are solvable problems! The biggest unknown is windows support, the rest is something we’re making progress on.
I agree, the biggest blocker for Nix adoption is that knowledge spreads from person to person rather through text, audio or video.
As far as I understand,
unfreeRedistributable is not built on Hydra, since it sets
free = false, with the exception being
unfreeRedistributableFirmware? The same applies to
issl, the license used for MKL.
If we would permit building
unfreeRedistributable, that would be awesome! We could ship all the relevant libraries with MKL enabled, or at least offer a
CUDA licensing seems to be more complicated, since the license is really oriented towards including CUDA in an application. I guess someone from the foundation could contact NVIDIA to ask them about redistribution as part of a Linux distribution (email@example.com).
I guess a technical workaround would be to allow CUDA/cuDNN to be built on Hydra, but not uploading the output paths to Hydra (
preferLocalBuild?). Then all the dependencies could be built, but CUDA would still be built locally (which is fast anyway).
Oh, that’s a great idea. How does this work? Can we add
requiredSystemFeatures = [ "sse42" ];
to the relevant derivations and then convince a Hydra admin to add this feature to several appropriate machines in the cluster? Or does the feature need to be added to Nix, like there is a
kvm feature now?
Opened a PR: https://github.com/NixOS/nixpkgs/pull/111892
Edit: @Gaelan pointed out that the policy is that Hydra should not be used to build unfree software outside unfree redistributable firmware: https://github.com/NixOS/nixpkgs/issues/83433#issuecomment-608614380
I just want to summaries my experience with conda and why I can’t recommend it to anyone:
- updates can run in circles and you don’t have a stable package set
- if a package is not available for your python version Conda thinks 15 minutes about it and then tells you that
That’s extremely sad. I have sponsored nix-data binary cache on Cachix at https://discourse.nixos.org/t/re-improving-nixos-data-science-infrastructure-ci-for-mkl-cuda/5352/5
Maybe @tbenst could say more about what’s going on with that effort - either way I’m willing to sponsor the resources to make this work.
I get questions about unfree packages and poetry2nix from time to time, and the lack of precompiled MKL packages is a major hassle for these users, to the point where some has gone with Conda instead of Nix.
We are not doing anyone any service by sticking to not pre-building these packages, and we are hampering adoption in a space where Nix could truly shine.
As I’ve noted in the linked issue, we also incentivize packaging unfree software from binaries instead of from source, which is probably the opposite of what we want.
About 6 years ago I switched from conda to nix for several reasons. Reproducibility, because I found it easier to write expressions, because the scientific code I wrote had to run on several machines (unfortunately I could not work with it on our cluster), and because it “made sense” to package all packages, regardless of language, with the same manager. It took quite some effort at the time: cleaning up the spread of Python packages across the tree mixing 2 and 3, packaging Jupyter packages, and so on, but we’ve come a long way since and I’d argue thinks are a lot easier nowadays, especially also with
For myself, the performance improvement with MKL is no longer relevant, however, given all the effort that’s been put in to make MKL work with our packages, together with a package set of packages that work, not requiring “solving”, we should really take that leap and provide MKL compiled packages as well.
Having a binary cache and a build machine to build scientific non-free packages would be great. The package set could be relatively small yet still provide a lot of value.
Seeing also the activity surrounding pytorch and tensorflow in Nixpkgs, I think it would be really good having a separate Nix Scientific Computing community/organization that has the resources for building packages, of course with MKL.
Would it make sense for such an organization to provide a nix version recompiled with a different nix store location, that could be deployed on HPC clusters whithout root access, and a binary cache to go with it, rather than everyone rolling their own?
The store issue is a bigger issue than MKL (for me at least). Unless you have a good relationship with the cluster administrator, this makes nix a non-starter on a lot of HPC environments (or you have to run it via a singularity container)