Comparing Nix and Conda

TLDR

Why is Nix better than Conda?

Background

I’ll be giving a short presentation about Nix, to an audience of scientists who have to dabble in programming. By ‘dabble’ I mean that programming is a crucial part of their day jobs, but nevertheless tends to be viewed as a necessary evil that gets in the way of getting on with real science; as such establishing good programming/development practices/environments (which would pay off handsomely in the long run) can be viewed as a waste of precious time and effort. In this context, some might see Nix as a pointlessly esoteric self-aggrandising exercise by IT people, which has little relevance to their goals of doing science, and would only complicate their lives unnecessarily.

The purpose of the presentation is to get them to consider using Nix instead of Conda, which they have been using (wrapped in ad-hoc shell scripts) to automate installation of a fair subset of the software they use and write.

I would like to be able to enunciate a cogent answer to the hypothetical question:

Conda works fine, why should we swap this for Nix (which I can’t even install on HPC clusters)?

But I also want it to be a fair comparison, and genuine exploration of whether or not using Nix would be beneficial.

Without going into any details whatsoever, here are, very broadly, the main advantages of the two sides, as I see them

Pro Conda:

  • Easier installation, without need for admin privileges
  • Better support for macOS and Windows (My audience cares not one hoot about Windows, but macOS is important.)
  • Easier for non-experts

Pro Nix:

  • More robust
  • Better reproducibility
  • More general

My fear is that the pro-Conda points are high-level and easily appreciated by people for whom computers are a necessary evil, while their eyes are likely to glaze over during explanations of the pro-Nix points.

  • I find the direnv/nix-shell combination amazingly helpful. An important, and perhaps easily overlooked part of this, is that a checkout of an older version of a project, automatically switches the dependencies to those used with that version. This is vital when trying to reproduce or verify older results. Does something equivalent (and equally robust) exist for Conda?

  • They need to run their code on High Performance Computing hardware on which they don’t have admin rights. This throws a huge spanner in the works for Nix. I know that there are various possible workarounds, but the issue simply doesn’t arise in Conda.

  • I seem to recall that Conda is inspired by Nix, but I couldn’t find a reference to it in a quick search. (Just a curiosity, probably not useful as a persuasive argument.)

Anyway, that’s the context in which I’m asking the questions: Is Nix better than Conda? Why?

12 Likes

TLDR
Nix is Great
Flakes are Good

Nix can build, configure and customise from source an entire Operating System, Conda is just a package manager. There’s a lot stuff in the nix ecosystem that is not apparent when you first start playing with it. Hydra and it’s CI/CD stuff is just next level. It’s a case of ‘i wonder if nix can do X Y Z’, i find that it’s in there, or some has had a good crack at solving that problem before…or at least something you can build on.

When you start to dream how you can configure large clusters of machine, and keep them fairly well configured, manageable over the lifecycle of the software with all of it’s associated technical debt… (which can VERY long and very large for scientific software). It’s an exact fit for HPC running long term projects.

If you trying to ‘sell’ the idea of nix to non-computer scientists, then IMHO the ability to pin code to certain commits and have dependency isolation in regards to shared libraries without using containerisation technology it’s got to be a benefit. Scientific code that can not only compile/build today, but 10 years in the future…

We all know secretly that docker was invented so people could actually get tensor flow to build from source ;-). (this may or may not be true).

Nix is rather like unix, as it’s pure genius, you just have to be genius to see it. Hopefully some of the new tooling and improvements will reduce it so non-unix gods can get good result out of it.

Personally nix has performed outstanding… and save my bacon a few times.

Good luck, I hope they ‘buy’ it.

1 Like

If you don’t mind I would love to hear how this goes. Good luck!

1 Like

I am going to be contrarian and say that they are better served by Conda. You don’t provide any background, but I assume that they are scientists that do some for of numerical computing (e.g. machine learning or some other form of modeling). And given that they are using Conda, it is likely that they use Python for their work. Nix has the following issues that Conda does not have:

  • We do not build scientific/ML libraries with CUDA or MKL enabled, because they are unfree. So to get the best performance, you need to rebuild all packages with CUDA or MKL support. Unfortunately these are typically also the packages with the worst compile times. So besides switching to Nix, you will need to set up a binary cache, and get everyone to use it. And then hope that they don’t use another revision for which no builds are cached.

  • Our macOS support is pretty good given the barriers that Apple introduces, but it is not great. Things break regularly. ‘Fringe’ scientific packages probably do not get a lot of testing on macOS. And then there are issues like our macOS MKL version being quite far behind (for annoying reasons, IIRC, they now distribute MKL as APFS disk images).

  • Python package versioning is a mess. However, you don’t really notice this when you are in the upstream Python ecosystem, because Python package managers use version constraint solvers and usually it’s possible to find some solution with a given set of version constraints. However, given how nixpkgs works, we only have one version of each Python package. So, you will often find that you need a version of a package that is not compatible with what we have in nixpkgs. So, people need to make their own overrides and/or make their own derivation repositories.

  • As you said, if you can’t convince HPC cluster maintainers to install Nix, it’s going to be a bad experience. My experience with HPC is that they tend to run very old OSes (one or two generations old CentOS or Scientific Linux). Perhaps you can get somewhere with nix-user-chroot, but often these older systems do not support user namespaces or do not have support for user namespaces enabled.

  • If they run into problems, it is much more likely that they will get relevant help quickly with Conda, since it has a very large community. Moreover, if they get code from or collaborate with other scientists, it’s also likely that they use Conda.

8 Likes

Here’s a video from an HPC group using Nix:

4 Likes

A significant part of this (more than half?) is a demonstration of how to install NixOS. I find this an odd choice (but then I don’t know the context of this presentation), because my angle is more along the lines of:

I don’t care what OS you use, if you install Nix on it, I can guarantee that everything you ever need in this project Just Works, without you having to lift a finger.

Yes, I also need to look into exactly what the deal is with enabling SSE, AVX, etc. Any pointers on how to write derivations which will use the most performant versions matching the underlying hardware? (And how does Conda deal with this problem?)

I’m assuming that this will involve providing a binary cache with various versions. Which brings me to another can of worms: As I said above, my selling point is “if you install Nix, I can take care of everything else”, but if I then have to turn around and say things like “oh, but you have to enable this cache on each machine you want to use, yourself”, that drastically reduces the value of the whole proposition, not only for them, but also for me as provider. I imagine that there’s no way that a cache can be enabled from within an arbitrary derivation, as the security implications seem quite drastic.

As for Conda, I’ve been avoiding it like the plague for almost 2 years, trying to do everything I can with Nix. Maybe my memory is hazy, or maybe things have changed in the meantime, but I recall that non-Python software installed via Conda tended to be a bit hit-or-miss: sometimes it just didn’t run at all, sometimes it did, but not very well. And even Python stuff that depended on Qt wasn’t all that reliable. Nix seems much more robust in that respect. Having said that, I did manage to get a ****ing

qt.qpa.plugin: Could not find the Qt platform plugin "xcb" in ""

again, this week (for the first time in many months).

Edit: and the

Reinstalling the application may fix this problem.

that follows it, is an amusing insult added to the injury, when you’re on Nix!

These are excellent questions and I’ve been meaning to look into how Nix can/could be better than Conda.

I’d start by emailing your audience in advance (if you have that option) and ask them 1-3 problems they find in Conda. Or google around for blog posts complaining about Conda. That should give you good insight into what to talk about in context of how Nix can do it better.

I’m working on macOS funding campaign to which a few companies already committed to contribute. The goal is to fund Nix development related to macOS so that with time we’ll have excellent support, which as you said is really important for developer environments.

Oh and there’s also GitHub - nix-community/nix-data-science: Standard set of packages and overlays for data-scientists [maintainer=@tbenst] with a slack that has already some people working on sci packages.

I’m happy to jump on a call to see if I can help somehow.

8 Likes

Most good libraries select optimized kernels based on the CPUID instruction, but there are exceptions. You can also build derivations with specific march or mtune flags. There was a recent discussion about this:

They distribute MKL:

https://docs.anaconda.com/mkl-optimizations/index.html

IANAL, but I think the MKL license would allow us to redistribute it, the only icky part is that it requires that MKL is not modified and the question is whether patchelf’ing is modification. The other roadblock is that Hydra does not build non-free software packages.

It’s pretty much the same story with CUDA. The CUDA Toolkit is distributed through Anaconda:

https://anaconda.org/anaconda/cudatoolkit

Licensing is more complicated for the whole CUDA stack, as far as I remember you cannot redistribute the driver libraries outside very specific circumstances (through the NVIDIA-provided Docker container). But I think most of the toolkit can be redistributed.

Same here. I strongly prefer Nix and I think it is better, but it requires an investment. It’s like writing machine learning code in Rust. I love it, but I would still recommend most of my colleagues to stick with Python + PyTorch for the time being. Doing machine learning in Rust is going off the beaten path and as a result, you have to do a lot of the plumbing yourself.

The are many paper cuts, but our current biggest problem [1] is that we do not have packages built against MKL and CUDA in our binary cache. This makes the user experience miserable. With conda or plain old pip, you are up and running in a few seconds. With nixpkgs, you are first off to one or two hours of builds. That is if the packages build at all, since we don’t build them in Hydra regressions are not always noticed.

CUDA is pretty much essential to any kind of high-performance numerical computing. MKL is still the dominant BLAS/LAPACK library, because it is faster in a lot of scenarios. E.g. a few month ago I benchmarked some of my transformer models again with various BLAS libraries, MKL was ~2 times faster than the best competition (OpenBLAS), due to having specific optimizations (e.g. batched GEMM). Another issue is that OpenBLAS is not usable in some cases, because it cannot be used in multi-threaded applications.

Another issue is that even without MKL and CUDA, builds of some libraries frequently fail, because there are still some pre-SSE4.2 machines in the Hydra cluster. So, even if you intend to use e.g. our PyTorch builds without MKL or CUDA, it frequently gets built locally because it is not in the binary cache.

[1] In my experience, machine learning + NLP pays my bills.

5 Likes

As others already said there isn’t a general yes / no answer to this. One data point: we run a lot of numerical code and for us nix is great because:

  • some people in team know nix
  • we almost exclusively run Linux
  • we run a lot of other stuff that’s not python/numerical code

But I know that one of our competitors tried nix and it was universally reviled:

  • bad experience on OSX
  • no windows support
  • often difficult to run software where you don’t have source (ubiquitous /bin/bash)
  • if you don’t have someone championing nix in the team and willing to iron out rough edges it’ll be a downward spiral of frustration and failure to bend nix to your will

IMO the last point is probably the most important. Just dumping nix on a team will almost certainly fail, but being there and helping with whatever weird situation people get stuck in could work. If you can’t provide that go with conda.

6 Likes

One thing to remember about conda is they changed their licensing last year:

“We clarified our definition of commercial usage in our Terms of Service in an update on Sept. 30, 2020. The new language states that use by individual hobbyists, students, universities, non-profit organizations, or businesses with less than 200 employees is allowed, and all other usage is considered commercial and thus requires a business relationship with Anaconda.”

(from Anaconda | Anaconda Commercial Edition FAQ). This is a big difference.

I did use conda in the past. It was definitely better than plain pip but still had its own issues. See my reply in a similar thread Data Science on nixos nix, poetry, pip, mach-nix, pynixify: - all fail? - #29 by alexv. The summary of my setup is in the same thread: Data Science on nixos nix, poetry, pip, mach-nix, pynixify: - all fail? - #11 by alexv. I also remember reading something about the low quality of the SAT solver they use to figure out the dependencies and the inconsistent versioning if you use additional channels which explained those massive downgrades I experienced.

3 Likes

We really should retire those super-old pre SSE4.2 machines.

2 Likes

I’m not following this, why is our policy not to distribute those, while pip and conda can? We do have unfreeRedistributable for such cases.

builds of some libraries frequently fail, because there are still some pre-SSE4.2 machines in the Hydra cluster .

This can be mitigated by either using some of the existing machine features such as big-parallel or we add another one like SSE42 to skip such hardware.

The good news all of these are solvable problems! The biggest unknown is windows support, the rest is something we’re making progress on.

I agree, the biggest blocker for Nix adoption is that knowledge spreads from person to person rather through text, audio or video.

7 Likes

As far as I understand, unfreeRedistributable is not built on Hydra, since it sets free = false, with the exception being unfreeRedistributableFirmware? The same applies to issl, the license used for MKL.

If we would permit building issl and/or unfreeRedistributable, that would be awesome! We could ship all the relevant libraries with MKL enabled, or at least offer a withMKL flavor.

CUDA licensing seems to be more complicated, since the license is really oriented towards including CUDA in an application. I guess someone from the foundation could contact NVIDIA to ask them about redistribution as part of a Linux distribution (nvidia-compute-license-questions@nvidia.com).

I guess a technical workaround would be to allow CUDA/cuDNN to be built on Hydra, but not uploading the output paths to Hydra (preferLocalBuild?). Then all the dependencies could be built, but CUDA would still be built locally (which is fast anyway).

Oh, that’s a great idea. How does this work? Can we add

requiredSystemFeatures = [ "sse42" ];

to the relevant derivations and then convince a Hydra admin to add this feature to several appropriate machines in the cluster? Or does the feature need to be added to Nix, like there is a kvm feature now?

2 Likes

Opened a PR: https://github.com/NixOS/nixpkgs/pull/111892

Edit: @Gaelan pointed out that the policy is that Hydra should not be used to build unfree software outside unfree redistributable firmware: Find a solution to distribute `mongodb` despite its license restrictions · Issue #83433 · NixOS/nixpkgs · GitHub

2 Likes

I just want to summaries my experience with conda and why I can’t recommend it to anyone:

  • updates can run in circles and you don’t have a stable package set
  • if a package is not available for your python version Conda thinks 15 minutes about it and then tells you that
5 Likes

That’s extremely sad. I have sponsored nix-data binary cache on Cachix at https://discourse.nixos.org/t/re-improving-nixos-data-science-infrastructure-ci-for-mkl-cuda/5352/5

Maybe @tbenst could say more about what’s going on with that effort - either way I’m willing to sponsor the resources to make this work.

4 Likes

I get questions about unfree packages and poetry2nix from time to time, and the lack of precompiled MKL packages is a major hassle for these users, to the point where some has gone with Conda instead of Nix.

We are not doing anyone any service by sticking to not pre-building these packages, and we are hampering adoption in a space where Nix could truly shine.

12 Likes

As I’ve noted in the linked issue, we also incentivize packaging unfree software from binaries instead of from source, which is probably the opposite of what we want.

9 Likes

About 6 years ago I switched from conda to nix for several reasons. Reproducibility, because I found it easier to write expressions, because the scientific code I wrote had to run on several machines (unfortunately I could not work with it on our cluster), and because it “made sense” to package all packages, regardless of language, with the same manager. It took quite some effort at the time: cleaning up the spread of Python packages across the tree mixing 2 and 3, packaging Jupyter packages, and so on, but we’ve come a long way since and I’d argue thinks are a lot easier nowadays, especially also with poetry2nix.

For myself, the performance improvement with MKL is no longer relevant, however, given all the effort that’s been put in to make MKL work with our packages, together with a package set of packages that work, not requiring “solving”, we should really take that leap and provide MKL compiled packages as well.

Having a binary cache and a build machine to build scientific non-free packages would be great. The package set could be relatively small yet still provide a lot of value.

Seeing also the activity surrounding pytorch and tensorflow in Nixpkgs, I think it would be really good having a separate Nix Scientific Computing community/organization that has the resources for building packages, of course with MKL.

8 Likes

Would it make sense for such an organization to provide a nix version recompiled with a different nix store location, that could be deployed on HPC clusters whithout root access, and a binary cache to go with it, rather than everyone rolling their own?

3 Likes