Comparing Nix and Conda

This video seems to be very aligned with your presentation

1 Like

The store issue is a bigger issue than MKL (for me at least). Unless you have a good relationship with the cluster administrator, this makes nix a non-starter on a lot of HPC environments (or you have to run it via a singularity container)

2 Likes

Yes, I found this one earlier. Not too long, and uses the time available rather well. Decent short demonstrations of Conda’s instability, in the first 6 minutes.

I would also like to add that the devops experience is a lot nicer in a nix stack as opposed to a docker/oci based configuration. I’m not sure how relevant this is to HPC scenarios, as I don’t work in that space. But if you have a user-facing inference service, then these are still relevant.

  • You can share build outputs, so even if you target a docker file, you don’t have to deal with recomputing expensive docker layers.
    • Dockerfile imperative steps vs Nix’s dockerTools.buildImage { contents = [ a b c ];}
  • If you access to bare metal, you can also export your package as a service/nixos module. Especially nice with nixops.
  • You can create a [private] binary cache of commonly shared packages (dev environments, deployment artifacts)
  • Development / CI/CD / Deployment environment agreement
  • Nix as a “build aware” configuration language also allows some freedom to be able to configure multiple aspects of your configuration stack. (e.g. ports between machines, domains, etc)

For me, the freedom of being able to change, pin, or upgrade any dependency when building a package is not something I’ve ever found to be easy with other package managers. This comes at some complexity with using nix, but at the very least it’s possible and a first class citizen.

Also, with all the talk about centOS in the news, I’m not sure how the “stable/LTS distros” deal with situations like needing the latest stable rustc compiler or python (docker image?). As with nix, you can “opt in” pulling certain packages from unstable at pinned points in time.

When I was working at Microsoft, the ability to run tests against many version of python (even pre-releases) was appealing as well. Painful when you distribute a client sdk, but don’t have good ways to test for breakages in up-coming releases.

1 Like

I concur. Having something like /opt/nix is much, much more palatable than /nix for the traditional IT but this has been brought up before even for non-HPC cases. For instance I have a mac which is managed centrally so I can’t use the Catalina workarounds to install Nix because I can’t create volumes.

I don’t have any experience with Singularity. It still requires good relationship with administrator, doesn’t it? [“Singularity must be installed as root to function properly.” though these aren’t the most recent docs, perhaps.] Or have you found that it’s easier to persuade administrators to install Singularity than to install Nix?

Do you have anything noteworthy to report on your experience with Singularity?

Moreso that singularity is very commonly installed on clusters, as it doesn’t require a docker daemon as root (which apparently is a security concern). I don’t particularly like singularity apart from its availability.

They typically use a module system like lmod, which allows you to switch between different software environments. Many commonly-used scientific packages are provided through the module system as well as newer Python versions, etc.

Rust is easy ;), cluster nodes typically have a shared home directory. So, you can just install Rust with rustup as is common on non-NixOS systems (and you only need one system/node anyway, since it is often easy to deploy Rust binaries). At any rate, I don’t think Rust has much uptake in HPC/science yet.

The module system is very clunky, but at least they offer offer MKL, CUDA, Intel Compilers, etc. and libraries compiled against those, out of the box.

I have left academia 6 months ago. But when we would run something on a HPC cluster or Grid, we would just build the software on an old CentOS version (whatever version is supported). On larger clusters, like the European E-Science Grid, you can write a job specifications where you can specify a CentOS/Scientific Linux version, and query how many CPUs are available with that OS version. Luckily, the last group I worked in had the funding to buy large machines for ourselves, so I was relieved from old CentOS versions and could install Nix.

At any rate, HPC clusters are mostly Linux like it was 5-10 years ago, and most people just accept that. For some reason, those maintaining clusters are very conservative. I guess at least stuff doesn’t break very often.

I agree though that Nix would be a much better choice, if we improved our scientific software story.

7 Likes

I don’t think this will help much in HPC, where /opt typically contains software shared between machines through e.g. NFS or AFS. AFAIK multi-user Nix does not work in that scenario. Maybe not everywhere, but /opt was always on some networked storage where I have worked.

The holy grail is allowing arbitrary install paths, since then people could just install Nix into their home directories as long cluster admins do not do cluster-wide installs. As far as I know, CentOS 8 (in contrast to CentOS 7) enables user namespaces by default, so there is hope!

3 Likes

Concerning Singularity and Nix:

1 Like

Apparently this is possible now? No clue how badly it will mess up nixpkgs.

5 Likes

Apparently this is possible now? No clue how badly it will mess up nixpkgs.

I think the point is to have that while still using the binary cache? Otherwise it looks like people generally succeed whenever they invest a week or two into figuring out the details, even without the now-available static Nix.

Of course, if non-privileged access to user/mount namespaces are available, there is no problem with this (and apparently the availability is still getting wider)

1 Like

I think that saving them that week or two by providing a switch, is very worthwhile.

That does seem to be a more expensive problem, so a solution to this one is even more important. IIUC, the default binary cache is tied to a fully-qualified location of the nix store.

  • Will this be true for any binary cache? Or can a cache be configured with a relative path?
  • If full-qualification is unavoidable, is there some fully-qualified location that might be usable by at least a large proportion, if not all, of people in this situation?

The point being to provide a binary cache for a non-standard nix store location, that can be used by the greatest number of users on HPCs-without-admin-rights.

2 Likes

Hmm, and how about a proxy that receives requests for a non-standard location, and forwards them to to the standard cache (or maybe even an arbitrary cache) with a translation of the path?

The problem is that you cannot just overwrite paths in binaries. If the new path is longer than the embedded path, you will overwrite other data. Also, paths may be stored with their lengths (rather than 0-delimiting).

One could create a shim for glibc that rewrites paths for all path-based functions (open, etc.), but then you run into the problem that some programs use syscalls directly (e.g. Go programs).

I agree with @7c6f434c that namespaces are the ultimate solution. Move the store to a more acceptable location such as /opt/nix or somewhere in /var (to make the path more acceptable for global installs) and then use namespaces for unprivileged installs. Probably something like pam_namespace can be used to set up a namespace when the user logs in.

2 Likes

I think that saving them that week or two by providing a switch, is very worthwhile.

Well, it is sometimes hard to separate how much time is for general learning Nix there, switch will also need to be documented and will not remove the need to learn Nix.

But apparently now it is easier to get started, which is indeed nice

  • Will this be true for any binary cache? Or can a cache be configured with a relative path?

Well, people use absolute paths defined at compile time in programs. Sometimes things get compressed, so it is not a textual substitution. For various reasons various people hope to estimate just how bad it is, and probably it is not always a huge problem, but there is risk. I guess an opportunistically-rewriting proxy and a wiki of things that break could be set up…

1 Like

I was not able to get nix-user-chroot working on a cluster due to namespace issues and old kernel version.

On the other hand, I have Nix running on a CentOS 6 cluster with /nix mounted using PRoot. It didn’t require root access or containers. IIRC the only thing I needed to change was

use-sqlite-wal = false

in ~/.config/nix/nix.conf to avoid corrupting the Nix database on NFS.

1 Like

Just a small data point but one of my colleagues spent about 2 weeks trying to get scikits-odes to install as it requires LAPACK and SUNDIALS and for SUNDIALS to be built with LAPACK. I wrote a nix derivation saving everyone else two weeks of their lives.

https://scikits-odes.readthedocs.io/en/stable/installation.html#id1

https://scikits-odes.readthedocs.io/en/stable/installation.html#troubleshooting

https://scikits-odes.readthedocs.io/en/stable/installation.html#using-nix

5 Likes

Based on my overlap in being a physicist and using Nix: you’re really really underselling reproducibility. The ablilty to reproduce, say, a graph from your inputs in your past paper is precious, as in, one step away of reputations being at stake.

14 Likes

I don’t want to go too much off-topic, but apparently this problem is largely solved by mamba, which uses a much faster SAT solver.

(I don’t have any experience with Mamba, since I do not really use the Python ecosystem outside Jupyter + PyTorch notebooks for validation of my Rust code.)

3 Likes