Mach-nix: Create python environments quick and easy

To clarify, the reason I mentioned both chroot and static is that

  • static is easier to distribute than the default shared linked;
  • chroot/proot/namespaces is to avoid needing a rebuild when putting the store elsewhere than /nix/store, thereby avoiding super user rights.

It would have to be a blocker in nixos/nix then first.

Released conda-beta (21 Sep 2020)

Conda provider, Conda parser

TL;DR;

This version introduces a new provider for conda packages and the capability of parsing conda’s environment.yml file. Those are independent features, meaning you can access packages from conda without using the conda environment.yml format, or you can build from an environment.yml file while taking packages from pypi or nixpkgs.

The default provider order is updated to: conda,wheel,sdist,nixpkgs
The provider conda is a shorthand combining two conda channels conda/main,conda/r.

Other conda channels are also available. Just use the channel name as a provider, like:
conda-forge,wheel,nixpkgs

If a conda channel is not yet known by mach-nix, add it via condaChannelsExtra

System dependencies of conda packages are satisfied via the anaconda repo. Those system libraries will take up additional closure space. Therefore the conda provider might not be the optimal choice for container scenarios.

Motivation

Obviously it comes in handy being able to transfer conda based environments to nix. But also people, who never used conda before, might benefit from an improved user experience thanks to this update.

I’d like to mention a few important advantages of the conda ecosystem, which make me believe, the conda provider should become the default provider for future stable versions of mach-nix.

Convertability

Compared to pypi, conda packages/environments seem to be much easier to convert to nix. All meta data of packages (dependencies, etc.) are provided by anaconda via a single file per each repo called repodata.json. This file contains all information necessary to convert conda requirements to nix expressions.
Another important factor is, that conda packages declare their non-python dependencies.

Build variants

Of course nothing can compete with nixpkgs in terms of flexibility, but python packages from nixpkgs create a major problem in conjunction with mach-nix. Since mach-nix always changes some dependencies/attributes, every package must be re-built locally which is not much fun for larger packages.
Like nixpkgs, conda provides different build variants for some packages and those are cheap to install.

Non-python dependencies

Large binary packages from pypi are difficult to install. They are likely to break due to unfulfilled dependencies or wrong dependency versions.
The problem is that the manylinux wheel standard cannot be fulfilled by many large packages, since they require non-python dependencies which are not available from pypi.

Fulfilling non-python dependencies for conda packages, on the the other hand, is easy, since those are fully declared and available from anaconda.

Independent from my infrastructure

Updating the databases for pypi releases requires constant crawling of meta data.
If I stopped running these crawlers, mach-nix users would be stuck with an outdated version of pypi metadata until someone decides to rehost the infrastructure (which is open sourced via nixops template).
For conda, mach-nix just requires repodata.json files. Those can be downloaded from anaconda anytime and do not require any extra infrastructure to maintain.

aarch64 support

It seems like some community maintained conda channels, like for example conda-forge, provide a lot more binary releases for aarch64 than pypi does.

Examples

Import mach-nix with conda support

let mach-nix = import (builtins.fetchGit {
  url = "https://github.com/DavHau/mach-nix";
  ref = "refs/heads/conda-beta";
}) {}; in
...

Build conda environment defined via environment.yml

... # import
mach-nix.mkPython {
  requirements = builtins.readFile ./environment.yml;
}

Select build variants and channels

... # import
mach-nix.mkPython {
  requirements = ''
    tensorflow >=2.3.0 mkl*
    blas * mkl*
    requests
  '';
  providers.requests = "conda-forge";
}

Include extra conda channels

Channels added via condaChannelsExtra are automatically appended to the default providers. This example uses impure fetching for simplicity. It’s better to manually download the repodata.json files and reference them locally.

let mach-nix = import (builtins.fetchGit {
  url = "https://github.com/DavHau/mach-nix";
  ref = "refs/heads/conda-beta";
}) {
  condaChannelsExtra.bioconda = [
    (builtins.fetchurl "https://conda.anaconda.org/bioconda/linux-64/repodata.json")
    (builtins.fetchurl "https://conda.anaconda.org/bioconda/noarch/repodata.json")
  ];
}; in
mach-nix.mkPython {
  requirements = ''
    kraken2
  '';
}
7 Likes

Released 3.1.0 (27 Nov 2020)

flakes lib, cli improvements, bugfixes

Features

  • expose the following functions via flakes lib:
    • mkPython / mkPythonShell / mkDockerImage / mkOverlay / mkNixpkgs / mkPythonOverrides
    • buildPythonPackage / buildPythonApplication
    • fetchPypiSdist / fetchPypiWheel
  • Properly manage and lock versions of nixpkgs and mach-nix for environments created via mach-nix env command.
  • Add example on how to use mach-nix with jupyterWith

Improvements

  • Improve portability of mach-nix env generated environments. Replace the platform specific compiled nix expression with a call to mach-nix itself, which is platform agnostic.
  • Mach-nix now produces the same result no matter if it is used through flakes or legacy interface. The legacy interface now loads its dependencies via flakes.lock.

Fixes

  • mkDockerImage produced corrupt images.
  • non-python packages passed via packagesExtra were not available during runtime. Now they are added to the PATH.
  • remove <nixpkgs> impurity in the dependency extractor used in buildPythonPackage.
4 Likes

Released 3.1.1 (27 Nov 2020)

fix cli

Fixes

  • Fix missing flake.lock error when using mach-nix cli.
3 Likes

Released 3.2.0 (11 Mar 2021)

bugfixes, ignoreCollisions

Features

  • add argument ignoreCollisions to all mk* functions

  • add passthru attribute expr to the result of mkPython, which is a string containing the internally generated nix expression.

  • add flake output sdist, to build pip compatible sdist distribution of mach-nix

Fixes

  • Sometimes wrong package versions were inherited when using the nixpkgs provider, leading to collision errors or unexpected package versions. Now, python depenencies of nixpkgs candidates are automatically replaced recursively.

  • When cross building, mach-nix attempted to generate the nix expression using the target platform’s python interpreter, resulting in failure

Package Fixes

  • cartopy: add missing build inputs (geos)

  • google-auth: add missing dependency six when provider is nixpkgs

5 Likes

I implemented an alternative way of fetching python packages that doesn’t require an index or a dependency database. It is a fixed output derivation which just uses pip. Reproducibility is ensured by a local proxy that filters pypi.org responses via date, to provide a snapshot-like view on pypi.

The Disadvantages I see:

  • all packages have to be re-downloaded after each change in requirements
  • outputHash needs to be updated after each change in requirements

The Benefits I see:

  • doesn’t require to store big index files (They can be annoying as flake inputs etc.)
  • dependency resolution is exactly like in pip.
  • doesn’t require to maintain the resolver/crawlers

I case you’re interested, have a look at the nixpkgs PR: fetchPythonRequirements: init (fixed output pypi fetcher) by DavHau · Pull Request #121425 · NixOS/nixpkgs · GitHub
It includes an example for jupyterlab

Maybe a tool could be built around this which allows similar comfort like mach-nix, but without a lot of its complexity. (packages could somehow be cached locally, to solve the re-downloading issue, etc.)

Not having to maintain the resolver and pypi crawlers would be a big game changer I think.
Just using pip directly is a lot easier than trying to imitate its behavior.

Let me know about your thoughts.

9 Likes

This is awesome! Great work! This is inspiring me for other downstream use-cases, specifically, integrating third-party pip dependencies with Bazel.

Recently I have undertaken some major refactoring of the crawler architecture which is about to be finished.
This happened outside the mach-nix repo, namely in pypi-deps-db and nix-pypi-fetcher.
As you may know, mach-nix depends on both these projects being updated regularly to be able to compute dependency graphs and fetch packages reproducibly.

The motivation behind the changes were:

  • improve maintainability
  • add data for python 3.9 and 3.10
  • simplify the process of introducing new python versions
  • remove any non public infrastructure parts
  • remove the requirement of trusting in me hosting the crawlers
  • make it easy for people to fork and maintain their own data

The following changes have been made:

  • remove the requirement of an SQL database. All update cycles now operate directly on the json files contained in the repo.
  • both projects contain a flake app that updates the data on a local checkout.
  • both projects contain a github action cron job that updates the data regularly
  • python versions can be added / removed by slightly modifying the flake.nix
  • a new directory ./sdist-errors is added to pypi-deps-db, containing information about why extracting requirements of a specific sdist package failed.

If the projects are forked on github, the data should continue to update itself without further interaction as the workflow file will be forked with the project.

On any non-gitub CI system it should be as simple as installing nix with flakes and then executing the included flake app regularly to keep the data updated.

The newest version of pypi-deps-db now supports python 3.9 and 3.10 while 3.5 was removed.
I still kept python 2.7 despite it being EOL. My gut tells me there is still too much software around depending on it. Does anybody still need 2.7?

5 Likes

Released 3.3.0 (22 May 2021)

bugfixes, improvements

Changes

  • The flakes cmdline api has been changed. New usage:
    nix (build|shell) mach-nix#gen.(python|docker).package1.package2...
    
    (Despite this change being backward incompatible, I did not bump the major version since everything flakes related should be considered experimental anyways)

Improvements

  • Mach-nix (used via flakes) will now throw an error if the selected nixpkgs version is newer than the dependency DB since this can cause conflicts in the resulting environment.
  • When used via flakes, it was impossible to select the python version because the import function is not used anymore. Now python can be passed to mkPython alternatively.
  • For the flakes cmdline api, collisions are now ignored by default
  • The simplified override interface did not deal well with non-existent values.
    • Now the .add directive automatically assumes an empty list/set/string when the attribute to be extended doesn’t exist.
    • Now the .mod directive will pass null to the given function if the attribute to modify doesn’t exist.

Fixes

  • Generating an environment with a package named overrides failed due to a variable name collision in the resulting nix expression.
  • When used via flakes, the pypiData was downloaded twice, because the legacy code path for fetching was still used instead of the flakes input.
  • nix flake show mach-nix failed because it required IFD for foreign platforms.
  • For environments generated via mach-nix env ... the python command referred to the wrong interpreter.
  • When checking wheels for compatibility, the minor version for python was not respected which could lead to invalid environments.
  • Some python modules in nixpkgs propagate unnecessary dependencies which could lead to collisions in the final environment. Now mach-nix recursively removes all python dependencies which are not strictly required.

Package Fixes

  • cryptography: remove rust related hook when version < 3.4
6 Likes

Excellent work, mach-nix was the easiest to feed it requirements.txt and get a Nix environment.

1 Like

I’m currently experimenting with python’s import system.
The goal is to allow packages to have private dependencies which are not propagated into the global module scope. If this works, we could build python environments containing more than one version of the same library. This in turn would make dependency resolution trivial/unnecessary and could solve the patching madness in nixpkgs.

I somehow cannot believe that nobody ever tried this, but I could not find such attempts online. In case anybody knows about such attempts or has any input regarding this, I’d appreciate it.

6 Likes

@costrouc looked into this in the past

1 Like

@DavHau this is the specific code that does the import rewrites on files nixpkgs-pytools/import_rewrite.py at 70c7b9db33ea5e31d35d0b67c9171757e4d74bd0 · nix-community/nixpkgs-pytools · GitHub. There are some shortcomings of this approach. Mainly that it can’t touch shared libraries to do the rewrites and I wrote a few others in things that @FRidh linked. The approach seemed pretty robust though when I tried it out with packages.

1 Like

Thanks for that. I have actually taken a look into your approach @costrouc a while ago and it was definitely inspiring. I forgot to mention that earlier. Now I am planning to implement something that doesn’t require any modification of library code.

My current idea is to replace builtins.__import__ which is called on every import. This new import function would then inspect the callers location. Depending from which location it is called, it chooses from a different set of dependencies (every package would bring its own site-packages). Each imported module would get a new unique name in sys.modules to prevent clashes.

My goal is to get rid of sys.path/PYTHONPATH completely and only use the new style of packaging/importing.
The system would be smart enough to detect if two modules depend on the same version and only instantiate that module version once.

In the last few days, I have already implemented something similar via importlib's PathFinder and FileFinder. But that was too hacky and fragile. Later I found out it is possible to just override builtins.__import__. This should make it easier.

So now I’m starting over with a blank page and thought I reach out to you guys first, to prevent ending up in another If-I-had-only-known situation.

I haven’t seen the discussion on the python forum so far. That is definitely interesting.

3 Likes

I do monkeypatch __builtins__.__import__ in resholve, though it may not be a helpful reference since I’m using it to force a namespace on the Oil shell’s python2.7 codebase (and not for Nix/nixpkgs-specific reasons): resholve/resholve at 591ae30b839d0bae6f02ec7abd852004c6acbbb8 · abathur/resholve · GitHub

1 Like

Conda support is now merged into master. By default it is disabled, but can be enabled by adding conda to the providers like this for example:

proviers._default = "conda,wheel,sdist,nixpkgs"

or by passing a requirements.yml content to requirements which will automatically enable the provider.

Supporting conda required me to implement a custom requirements parser, as the new format allows both pip and coda formats and even mixes of these.
Therefore, even if you don’t use conda, there is a chance that you might discover a bug. There is extensive unit testing on these changes, but edge cases could still come up. I might leave this on master for a bit longer and see if anything gets reported by you guys.

2 Likes

Awesome! What should one pass as condaChannelsExtra to enable conda-forge?

conda-forge is included by default. Just add it to the providers (like "conda-forge,conda,nixpkgs"). But good point, docs are missing for some of the new stuff ;). In the mean time, scroll up this topic to when conda beta was released. There are some examples included.

1 Like

Is there any way to do mach-nix and have the resulting env use a modified or custom python. Something like python = (enableDebugging pkgs.python39); in the call to mkPythonShell? I want to set up an environment where I can use cygdb in cython and that requires a debugging python.

You can make an overlay for nixpkgs which replaces python39 with your debugging python. Then import nixpkgs with that overlay and pass it to mach-nix during import via pkgs argument. If you need further assistance, feel free to open an issue on github.