Comparing Nix and Conda

Here’s a video from an HPC group using Nix:

4 Likes

A significant part of this (more than half?) is a demonstration of how to install NixOS. I find this an odd choice (but then I don’t know the context of this presentation), because my angle is more along the lines of:

I don’t care what OS you use, if you install Nix on it, I can guarantee that everything you ever need in this project Just Works, without you having to lift a finger.

Yes, I also need to look into exactly what the deal is with enabling SSE, AVX, etc. Any pointers on how to write derivations which will use the most performant versions matching the underlying hardware? (And how does Conda deal with this problem?)

I’m assuming that this will involve providing a binary cache with various versions. Which brings me to another can of worms: As I said above, my selling point is “if you install Nix, I can take care of everything else”, but if I then have to turn around and say things like “oh, but you have to enable this cache on each machine you want to use, yourself”, that drastically reduces the value of the whole proposition, not only for them, but also for me as provider. I imagine that there’s no way that a cache can be enabled from within an arbitrary derivation, as the security implications seem quite drastic.

As for Conda, I’ve been avoiding it like the plague for almost 2 years, trying to do everything I can with Nix. Maybe my memory is hazy, or maybe things have changed in the meantime, but I recall that non-Python software installed via Conda tended to be a bit hit-or-miss: sometimes it just didn’t run at all, sometimes it did, but not very well. And even Python stuff that depended on Qt wasn’t all that reliable. Nix seems much more robust in that respect. Having said that, I did manage to get a ****ing

qt.qpa.plugin: Could not find the Qt platform plugin "xcb" in ""

again, this week (for the first time in many months).

Edit: and the

Reinstalling the application may fix this problem.

that follows it, is an amusing insult added to the injury, when you’re on Nix!

These are excellent questions and I’ve been meaning to look into how Nix can/could be better than Conda.

I’d start by emailing your audience in advance (if you have that option) and ask them 1-3 problems they find in Conda. Or google around for blog posts complaining about Conda. That should give you good insight into what to talk about in context of how Nix can do it better.

I’m working on macOS funding campaign to which a few companies already committed to contribute. The goal is to fund Nix development related to macOS so that with time we’ll have excellent support, which as you said is really important for developer environments.

Oh and there’s also GitHub - nix-community/nix-data-science: Standard set of packages and overlays for data-scientists [maintainer=@tbenst] with a slack that has already some people working on sci packages.

I’m happy to jump on a call to see if I can help somehow.

8 Likes

Most good libraries select optimized kernels based on the CPUID instruction, but there are exceptions. You can also build derivations with specific march or mtune flags. There was a recent discussion about this:

They distribute MKL:

https://docs.anaconda.com/mkl-optimizations/index.html

IANAL, but I think the MKL license would allow us to redistribute it, the only icky part is that it requires that MKL is not modified and the question is whether patchelf’ing is modification. The other roadblock is that Hydra does not build non-free software packages.

It’s pretty much the same story with CUDA. The CUDA Toolkit is distributed through Anaconda:

https://anaconda.org/anaconda/cudatoolkit

Licensing is more complicated for the whole CUDA stack, as far as I remember you cannot redistribute the driver libraries outside very specific circumstances (through the NVIDIA-provided Docker container). But I think most of the toolkit can be redistributed.

Same here. I strongly prefer Nix and I think it is better, but it requires an investment. It’s like writing machine learning code in Rust. I love it, but I would still recommend most of my colleagues to stick with Python + PyTorch for the time being. Doing machine learning in Rust is going off the beaten path and as a result, you have to do a lot of the plumbing yourself.

The are many paper cuts, but our current biggest problem [1] is that we do not have packages built against MKL and CUDA in our binary cache. This makes the user experience miserable. With conda or plain old pip, you are up and running in a few seconds. With nixpkgs, you are first off to one or two hours of builds. That is if the packages build at all, since we don’t build them in Hydra regressions are not always noticed.

CUDA is pretty much essential to any kind of high-performance numerical computing. MKL is still the dominant BLAS/LAPACK library, because it is faster in a lot of scenarios. E.g. a few month ago I benchmarked some of my transformer models again with various BLAS libraries, MKL was ~2 times faster than the best competition (OpenBLAS), due to having specific optimizations (e.g. batched GEMM). Another issue is that OpenBLAS is not usable in some cases, because it cannot be used in multi-threaded applications.

Another issue is that even without MKL and CUDA, builds of some libraries frequently fail, because there are still some pre-SSE4.2 machines in the Hydra cluster. So, even if you intend to use e.g. our PyTorch builds without MKL or CUDA, it frequently gets built locally because it is not in the binary cache.

[1] In my experience, machine learning + NLP pays my bills.

5 Likes

As others already said there isn’t a general yes / no answer to this. One data point: we run a lot of numerical code and for us nix is great because:

  • some people in team know nix
  • we almost exclusively run Linux
  • we run a lot of other stuff that’s not python/numerical code

But I know that one of our competitors tried nix and it was universally reviled:

  • bad experience on OSX
  • no windows support
  • often difficult to run software where you don’t have source (ubiquitous /bin/bash)
  • if you don’t have someone championing nix in the team and willing to iron out rough edges it’ll be a downward spiral of frustration and failure to bend nix to your will

IMO the last point is probably the most important. Just dumping nix on a team will almost certainly fail, but being there and helping with whatever weird situation people get stuck in could work. If you can’t provide that go with conda.

6 Likes

One thing to remember about conda is they changed their licensing last year:

“We clarified our definition of commercial usage in our Terms of Service in an update on Sept. 30, 2020. The new language states that use by individual hobbyists, students, universities, non-profit organizations, or businesses with less than 200 employees is allowed, and all other usage is considered commercial and thus requires a business relationship with Anaconda.”

(from Anaconda | Anaconda Commercial Edition FAQ). This is a big difference.

I did use conda in the past. It was definitely better than plain pip but still had its own issues. See my reply in a similar thread Data Science on nixos nix, poetry, pip, mach-nix, pynixify: - all fail? - #29 by alexv. The summary of my setup is in the same thread: Data Science on nixos nix, poetry, pip, mach-nix, pynixify: - all fail? - #11 by alexv. I also remember reading something about the low quality of the SAT solver they use to figure out the dependencies and the inconsistent versioning if you use additional channels which explained those massive downgrades I experienced.

3 Likes

We really should retire those super-old pre SSE4.2 machines.

2 Likes

I’m not following this, why is our policy not to distribute those, while pip and conda can? We do have unfreeRedistributable for such cases.

builds of some libraries frequently fail, because there are still some pre-SSE4.2 machines in the Hydra cluster .

This can be mitigated by either using some of the existing machine features such as big-parallel or we add another one like SSE42 to skip such hardware.

The good news all of these are solvable problems! The biggest unknown is windows support, the rest is something we’re making progress on.

I agree, the biggest blocker for Nix adoption is that knowledge spreads from person to person rather through text, audio or video.

7 Likes

As far as I understand, unfreeRedistributable is not built on Hydra, since it sets free = false, with the exception being unfreeRedistributableFirmware? The same applies to issl, the license used for MKL.

If we would permit building issl and/or unfreeRedistributable, that would be awesome! We could ship all the relevant libraries with MKL enabled, or at least offer a withMKL flavor.

CUDA licensing seems to be more complicated, since the license is really oriented towards including CUDA in an application. I guess someone from the foundation could contact NVIDIA to ask them about redistribution as part of a Linux distribution (nvidia-compute-license-questions@nvidia.com).

I guess a technical workaround would be to allow CUDA/cuDNN to be built on Hydra, but not uploading the output paths to Hydra (preferLocalBuild?). Then all the dependencies could be built, but CUDA would still be built locally (which is fast anyway).

Oh, that’s a great idea. How does this work? Can we add

requiredSystemFeatures = [ "sse42" ];

to the relevant derivations and then convince a Hydra admin to add this feature to several appropriate machines in the cluster? Or does the feature need to be added to Nix, like there is a kvm feature now?

2 Likes

Opened a PR: https://github.com/NixOS/nixpkgs/pull/111892

Edit: @Gaelan pointed out that the policy is that Hydra should not be used to build unfree software outside unfree redistributable firmware: Find a solution to distribute `mongodb` despite its license restrictions · Issue #83433 · NixOS/nixpkgs · GitHub

2 Likes

I just want to summaries my experience with conda and why I can’t recommend it to anyone:

  • updates can run in circles and you don’t have a stable package set
  • if a package is not available for your python version Conda thinks 15 minutes about it and then tells you that
5 Likes

That’s extremely sad. I have sponsored nix-data binary cache on Cachix at https://discourse.nixos.org/t/re-improving-nixos-data-science-infrastructure-ci-for-mkl-cuda/5352/5

Maybe @tbenst could say more about what’s going on with that effort - either way I’m willing to sponsor the resources to make this work.

4 Likes

I get questions about unfree packages and poetry2nix from time to time, and the lack of precompiled MKL packages is a major hassle for these users, to the point where some has gone with Conda instead of Nix.

We are not doing anyone any service by sticking to not pre-building these packages, and we are hampering adoption in a space where Nix could truly shine.

12 Likes

As I’ve noted in the linked issue, we also incentivize packaging unfree software from binaries instead of from source, which is probably the opposite of what we want.

9 Likes

About 6 years ago I switched from conda to nix for several reasons. Reproducibility, because I found it easier to write expressions, because the scientific code I wrote had to run on several machines (unfortunately I could not work with it on our cluster), and because it “made sense” to package all packages, regardless of language, with the same manager. It took quite some effort at the time: cleaning up the spread of Python packages across the tree mixing 2 and 3, packaging Jupyter packages, and so on, but we’ve come a long way since and I’d argue thinks are a lot easier nowadays, especially also with poetry2nix.

For myself, the performance improvement with MKL is no longer relevant, however, given all the effort that’s been put in to make MKL work with our packages, together with a package set of packages that work, not requiring “solving”, we should really take that leap and provide MKL compiled packages as well.

Having a binary cache and a build machine to build scientific non-free packages would be great. The package set could be relatively small yet still provide a lot of value.

Seeing also the activity surrounding pytorch and tensorflow in Nixpkgs, I think it would be really good having a separate Nix Scientific Computing community/organization that has the resources for building packages, of course with MKL.

8 Likes

Would it make sense for such an organization to provide a nix version recompiled with a different nix store location, that could be deployed on HPC clusters whithout root access, and a binary cache to go with it, rather than everyone rolling their own?

3 Likes

This video seems to be very aligned with your presentation

1 Like

The store issue is a bigger issue than MKL (for me at least). Unless you have a good relationship with the cluster administrator, this makes nix a non-starter on a lot of HPC environments (or you have to run it via a singularity container)

2 Likes

Yes, I found this one earlier. Not too long, and uses the time available rather well. Decent short demonstrations of Conda’s instability, in the first 6 minutes.

I would also like to add that the devops experience is a lot nicer in a nix stack as opposed to a docker/oci based configuration. I’m not sure how relevant this is to HPC scenarios, as I don’t work in that space. But if you have a user-facing inference service, then these are still relevant.

  • You can share build outputs, so even if you target a docker file, you don’t have to deal with recomputing expensive docker layers.
    • Dockerfile imperative steps vs Nix’s dockerTools.buildImage { contents = [ a b c ];}
  • If you access to bare metal, you can also export your package as a service/nixos module. Especially nice with nixops.
  • You can create a [private] binary cache of commonly shared packages (dev environments, deployment artifacts)
  • Development / CI/CD / Deployment environment agreement
  • Nix as a “build aware” configuration language also allows some freedom to be able to configure multiple aspects of your configuration stack. (e.g. ports between machines, domains, etc)

For me, the freedom of being able to change, pin, or upgrade any dependency when building a package is not something I’ve ever found to be easy with other package managers. This comes at some complexity with using nix, but at the very least it’s possible and a first class citizen.

Also, with all the talk about centOS in the news, I’m not sure how the “stable/LTS distros” deal with situations like needing the latest stable rustc compiler or python (docker image?). As with nix, you can “opt in” pulling certain packages from unstable at pinned points in time.

When I was working at Microsoft, the ability to run tests against many version of python (even pre-releases) was appealing as well. Painful when you distribute a client sdk, but don’t have good ways to test for breakages in up-coming releases.

1 Like