Mach-nix: Create python environments quick and easy

aargh… it got just killed by the OOM killer!
I know that the plone dependency tree is huge, but I really hoped that with some amount of memory and time, it could complete the env task

That doesn’t look too good :wink: I opened an issue for this matter here: Building Plone runs out of memory · Issue #158 · DavHau/mach-nix · GitHub
It would be great if you could comment on the issue and post the exact nix expression or command you used for that build. I wasn’t able to reproduce it.

Released 3.0.0 (14 Oct 2020)

flakes pypi gateway, R support, new output formats, more packages for python 3.5/3.6, improved providers nixpkgs/wheel

IMPORTANT NOTICE

The UI has been reworked. It is backward compatible with a few exceptions. Most importantly, when importing mach-nix, an attribute set must be passed. It can be empty. Example:

let
  mach-nix = import (builtins.fetchGit {
    url = "https://github.com/DavHau/mach-nix/";
    ref = "refs/tags/3.0.0";
  }) {
    # optionally bring your own nixpkgs
    # pkgs = import <nixpkgs> {};

    # or specify the python version
    # python = "python38";
  };
in
...

Features

  • Flakes gateway to pypi. Get a nix shell with arbitrary python packages. Example:

    nix develop github:davhau/mach-nix#shellWith.requests.tensorflow.aiohttp

  • or a docker image
    nix build github:davhau/mach-nix#dockerImageWith.package1.package2 ...

  • or a python derivation
    nix build github:davhau/mach-nix#with.package1.package2 ...

  • New output formats:

    • mkDockerImage → produces layered docker image containing a python environment
    • mkNixpkgs → returns nixpkgs which is conform to the given requirements
    • mkOverlay → returns an overlay function to make nixpkgs conform to the given requirements
    • mkPythonOverrides → produces pythonOverrides to make python conform to the given requirements.
  • New functions fetchPypiSdist and fetchPypiWheel. Example:

    mach-nix.buildPythonPackge {
      src = mach-nix.fetchPypiSdist "requests" "2.24.0"
    };
    
  • When using the mach-nix cmdline tool, the nixpkgs channel can now be picked via:

    mach-nix env ./env -r requirements.txt --nixpkgs nixos-20.09
    
  • R support (experimental): R packages can be passed via packagesExtra. Mach-nix will setup rpy2 accordingly. See usage example.

  • Non-python packages can be passed via packagesExtra to include them into the environment.

Improvements

  • rework the logic for inheriting dependencies from nixpkgs
  • fixes.nix: allow alternative mod function signature with more arguments:
    key-to-override.mod = pySelf: oldAttrs: oldVal: ...;
  • allow derivations passed as src argument to buildPythonPackage
  • stop inheriting attribute names from nixpkgs, instead use normalized package names
  • rework the public API of mach-nix (largely downwards compatible)
  • add example on how to build aarch64 image containing a mach-nix env
  • tests are now enabled/disabled via global override which is more reliable
  • raise error if python version of any package in packagesExtra doesn’t match to one of the environment

Fixes

  • nixpkgs packages with identical versions swallowed
  • pname/version null in buildPythonPackage
  • update dependency extractor to use “LANG=C.utf8” (increases available packages for python 3.5 and 3.6)
  • wheel provider picked wheels incompatible to python version
  • unwanted python buildInput inheritance when overriding nixpkgs
  • properly parse setup/install_requires if they are strings instead of lists

Package Fixes

  • rpy2: sdist: remove conflicting patch for versions newer than 3.2.6
  • pytorch from nixpkgs was not detected as torch
  • pyqt5: fix for providers nixpkgs and wheel
  • httpx: remove patches
5 Likes

@DavHau my requirements has some packages like h5py and theano. I’m using Mach-Nix 3.0.0 and When I activate my shell.nix it show this message:

Multiple nixkgs attributes found for h5py-2.10.0: ['h5py', 'h5py-mpi']
Picking 'h5py' as base attribute name.
Multiple nixkgs attributes found for python-dateutil-2.8.1: ['dateutil', 'python-dateutil']
Picking 'dateutil' as base attribute name.
Multiple nixkgs attributes found for theano-1.0.5: ['Theano', 'TheanoWithCuda', 'TheanoWithoutCuda']
Picking 'Theano' as base attribute name.

It says there are more than one package to that requirement. How can I force Mach-Nix to pick TheanoWithCuda, for instance?

Good question!
First up, If you use the wheel dependency provider (which is the default), then those attribute names have absolutely no effect.

Only if you use the sdist or nixpkgs provider, mach-nix will inherit the capabilities of the nixpkgs package.

Currently there is no support for selecting those variants.
It automatically prioritizes the nixpkgs candidate with the version most similar to the selected one. If multiple nixpkgs candidates have the same version, the one with the shortest attribute name will be picked.

Manually selecting these variants sounds like a good idea. I will try to add this feature soon.

In the meantime you could, for example, include an override via overridesPre which removes the normal Theano from the attribute set, or marks it broken or disabled. Or set’s its source/pname/version to some horrendous value so mach-nix has no chance to identify the package properly.

Of course if you do that, you will probably also break TheanoWithCuda since it inherits from Theano. Therefore you’d need to fix the earlier messed up values again with another override for TheanoWithCuda.

1 Like

Released 3.0.1 (21 Oct 2020)

bugfixes, return missing packages

Fixes

  • Some sdist packages were missing from the dependency DB due to a corrupt index in the SQL DB used by the crawler.
  • When automatically fixing circular deps, removed deps could trigger a No matching distribution found error in higher level parent packages. Now --no-dependencies is set recursively for all parents of removed deps.
  • Mapping out the resulting dependency DAG to a tree for printing could exhaust the systems resources, due to complexity. Now, when printing dependencies, sub-trees are trimmed and marked via (…) if they have already been printed earlier.

Improvements

  • optimized autoPatchelfHook for faster processing of large wheel packages (see upstream PR)
  • networkx is now used for dealing with some graph related problems
2 Likes

Released 3.0.2 (27 Oct 2020)

bugfixes

Fixes

  • fixed “\u characters in JSON strings are currently not supported” error, triggered by some packages using unicode characters in their file names
  • mach-nix cmdline tool didn’t use specified python version
  • wheel provider was broken for MacOS resulting in 0 available packages
  • several issues triggering infinite recursions

I’m excited to announce that conda support is just around the corner. It will probably be released in form of a beta version very soon.

2 Likes

Quick call for your opinions/recommendations.

In the near future I’d like to concentrate on improving the general UX for mach-nix for non nix users.
I believe nix has the potential to attract many more python users if we make it as user friendly as pip and conda.

One important part which I’m not quite happy about yet is the installation itself.

Currently mach-nix requires nix to be installed via:
curl -L https://nixos.org/nix/install | sh

I see a problem with this because it:

  • seems scary
  • asks for super user rights
  • makes a lot of assumptions on the environment and often crashes (try executing inside ubuntu docker etc…)

It would be optimal to package nix itself as a pypi package and/or conda package.
I’ve seen floating around some instructions on how to install nix without sudo, so i guess it should be possible.

Is there any technical obstacle that makes this idea infeasible? I’m happy for any material or idea that helps me realizing this.

2 Likes

You could create a statically linked Nix and distribute that via say PyPI. Still, you will need /nix/store and thus super user. If your users are fine with working in a subshell, you could use a chroot store instead $ nix shell nixpkgs#python3 --store ~/nix. Note this seems to be broken with nix 2.4pre20201102_550e11f.

Thanks. It seems like chroot won’t work on multiple distros by default since it requires userspaces.
An alternative seems to be proot which doesn’t require any privileges.

Sadly nix 3.x fails when building statically via pkgs.pkgsStatic.nixFlakes.
I guess I cannot rely on the pkgsStatic set, since it is not evaluated by hydra.

Would it be possible to add a static nix (pkgs.nixStatic) to nixpkgs, so any breakage becomes a blocker?

An alternative to not rely on static builds would be nix-bundle. It currently also relies on chroot which I’d have to replace with proot first (WIP PR).

…but I feel like using a static build might be the better solution since it is slimmer and contains less stuff. Also I’m not sure if the licenses of the tools contained in nix-bundle will make distribution more complicated.

I’m aware that the nix build sandbox also cannot be used if userspaces are not available, but I assume for most python related things this shouldn’t matter too much.

Users using the portable nix solution could be warned with a suggestion to do an actual installation of nix before going to production.

To clarify, the reason I mentioned both chroot and static is that

  • static is easier to distribute than the default shared linked;
  • chroot/proot/namespaces is to avoid needing a rebuild when putting the store elsewhere than /nix/store, thereby avoiding super user rights.

It would have to be a blocker in nixos/nix then first.

Released conda-beta (21 Sep 2020)

Conda provider, Conda parser

TL;DR;

This version introduces a new provider for conda packages and the capability of parsing conda’s environment.yml file. Those are independent features, meaning you can access packages from conda without using the conda environment.yml format, or you can build from an environment.yml file while taking packages from pypi or nixpkgs.

The default provider order is updated to: conda,wheel,sdist,nixpkgs
The provider conda is a shorthand combining two conda channels conda/main,conda/r.

Other conda channels are also available. Just use the channel name as a provider, like:
conda-forge,wheel,nixpkgs

If a conda channel is not yet known by mach-nix, add it via condaChannelsExtra

System dependencies of conda packages are satisfied via the anaconda repo. Those system libraries will take up additional closure space. Therefore the conda provider might not be the optimal choice for container scenarios.

Motivation

Obviously it comes in handy being able to transfer conda based environments to nix. But also people, who never used conda before, might benefit from an improved user experience thanks to this update.

I’d like to mention a few important advantages of the conda ecosystem, which make me believe, the conda provider should become the default provider for future stable versions of mach-nix.

Convertability

Compared to pypi, conda packages/environments seem to be much easier to convert to nix. All meta data of packages (dependencies, etc.) are provided by anaconda via a single file per each repo called repodata.json. This file contains all information necessary to convert conda requirements to nix expressions.
Another important factor is, that conda packages declare their non-python dependencies.

Build variants

Of course nothing can compete with nixpkgs in terms of flexibility, but python packages from nixpkgs create a major problem in conjunction with mach-nix. Since mach-nix always changes some dependencies/attributes, every package must be re-built locally which is not much fun for larger packages.
Like nixpkgs, conda provides different build variants for some packages and those are cheap to install.

Non-python dependencies

Large binary packages from pypi are difficult to install. They are likely to break due to unfulfilled dependencies or wrong dependency versions.
The problem is that the manylinux wheel standard cannot be fulfilled by many large packages, since they require non-python dependencies which are not available from pypi.

Fulfilling non-python dependencies for conda packages, on the the other hand, is easy, since those are fully declared and available from anaconda.

Independent from my infrastructure

Updating the databases for pypi releases requires constant crawling of meta data.
If I stopped running these crawlers, mach-nix users would be stuck with an outdated version of pypi metadata until someone decides to rehost the infrastructure (which is open sourced via nixops template).
For conda, mach-nix just requires repodata.json files. Those can be downloaded from anaconda anytime and do not require any extra infrastructure to maintain.

aarch64 support

It seems like some community maintained conda channels, like for example conda-forge, provide a lot more binary releases for aarch64 than pypi does.

Examples

Import mach-nix with conda support

let mach-nix = import (builtins.fetchGit {
  url = "https://github.com/DavHau/mach-nix";
  ref = "refs/heads/conda-beta";
}) {}; in
...

Build conda environment defined via environment.yml

... # import
mach-nix.mkPython {
  requirements = builtins.readFile ./environment.yml;
}

Select build variants and channels

... # import
mach-nix.mkPython {
  requirements = ''
    tensorflow >=2.3.0 mkl*
    blas * mkl*
    requests
  '';
  providers.requests = "conda-forge";
}

Include extra conda channels

Channels added via condaChannelsExtra are automatically appended to the default providers. This example uses impure fetching for simplicity. It’s better to manually download the repodata.json files and reference them locally.

let mach-nix = import (builtins.fetchGit {
  url = "https://github.com/DavHau/mach-nix";
  ref = "refs/heads/conda-beta";
}) {
  condaChannelsExtra.bioconda = [
    (builtins.fetchurl "https://conda.anaconda.org/bioconda/linux-64/repodata.json")
    (builtins.fetchurl "https://conda.anaconda.org/bioconda/noarch/repodata.json")
  ];
}; in
mach-nix.mkPython {
  requirements = ''
    kraken2
  '';
}
8 Likes

Released 3.1.0 (27 Nov 2020)

flakes lib, cli improvements, bugfixes

Features

  • expose the following functions via flakes lib:
    • mkPython / mkPythonShell / mkDockerImage / mkOverlay / mkNixpkgs / mkPythonOverrides
    • buildPythonPackage / buildPythonApplication
    • fetchPypiSdist / fetchPypiWheel
  • Properly manage and lock versions of nixpkgs and mach-nix for environments created via mach-nix env command.
  • Add example on how to use mach-nix with jupyterWith

Improvements

  • Improve portability of mach-nix env generated environments. Replace the platform specific compiled nix expression with a call to mach-nix itself, which is platform agnostic.
  • Mach-nix now produces the same result no matter if it is used through flakes or legacy interface. The legacy interface now loads its dependencies via flakes.lock.

Fixes

  • mkDockerImage produced corrupt images.
  • non-python packages passed via packagesExtra were not available during runtime. Now they are added to the PATH.
  • remove <nixpkgs> impurity in the dependency extractor used in buildPythonPackage.
4 Likes

Released 3.1.1 (27 Nov 2020)

fix cli

Fixes

  • Fix missing flake.lock error when using mach-nix cli.
3 Likes

Released 3.2.0 (11 Mar 2021)

bugfixes, ignoreCollisions

Features

  • add argument ignoreCollisions to all mk* functions

  • add passthru attribute expr to the result of mkPython, which is a string containing the internally generated nix expression.

  • add flake output sdist, to build pip compatible sdist distribution of mach-nix

Fixes

  • Sometimes wrong package versions were inherited when using the nixpkgs provider, leading to collision errors or unexpected package versions. Now, python depenencies of nixpkgs candidates are automatically replaced recursively.

  • When cross building, mach-nix attempted to generate the nix expression using the target platform’s python interpreter, resulting in failure

Package Fixes

  • cartopy: add missing build inputs (geos)

  • google-auth: add missing dependency six when provider is nixpkgs

5 Likes

I implemented an alternative way of fetching python packages that doesn’t require an index or a dependency database. It is a fixed output derivation which just uses pip. Reproducibility is ensured by a local proxy that filters pypi.org responses via date, to provide a snapshot-like view on pypi.

The Disadvantages I see:

  • all packages have to be re-downloaded after each change in requirements
  • outputHash needs to be updated after each change in requirements

The Benefits I see:

  • doesn’t require to store big index files (They can be annoying as flake inputs etc.)
  • dependency resolution is exactly like in pip.
  • doesn’t require to maintain the resolver/crawlers

I case you’re interested, have a look at the nixpkgs PR: fetchPythonRequirements: init (fixed output pypi fetcher) by DavHau · Pull Request #121425 · NixOS/nixpkgs · GitHub
It includes an example for jupyterlab

Maybe a tool could be built around this which allows similar comfort like mach-nix, but without a lot of its complexity. (packages could somehow be cached locally, to solve the re-downloading issue, etc.)

Not having to maintain the resolver and pypi crawlers would be a big game changer I think.
Just using pip directly is a lot easier than trying to imitate its behavior.

Let me know about your thoughts.

9 Likes

This is awesome! Great work! This is inspiring me for other downstream use-cases, specifically, integrating third-party pip dependencies with Bazel.

Recently I have undertaken some major refactoring of the crawler architecture which is about to be finished.
This happened outside the mach-nix repo, namely in pypi-deps-db and nix-pypi-fetcher.
As you may know, mach-nix depends on both these projects being updated regularly to be able to compute dependency graphs and fetch packages reproducibly.

The motivation behind the changes were:

  • improve maintainability
  • add data for python 3.9 and 3.10
  • simplify the process of introducing new python versions
  • remove any non public infrastructure parts
  • remove the requirement of trusting in me hosting the crawlers
  • make it easy for people to fork and maintain their own data

The following changes have been made:

  • remove the requirement of an SQL database. All update cycles now operate directly on the json files contained in the repo.
  • both projects contain a flake app that updates the data on a local checkout.
  • both projects contain a github action cron job that updates the data regularly
  • python versions can be added / removed by slightly modifying the flake.nix
  • a new directory ./sdist-errors is added to pypi-deps-db, containing information about why extracting requirements of a specific sdist package failed.

If the projects are forked on github, the data should continue to update itself without further interaction as the workflow file will be forked with the project.

On any non-gitub CI system it should be as simple as installing nix with flakes and then executing the included flake app regularly to keep the data updated.

The newest version of pypi-deps-db now supports python 3.9 and 3.10 while 3.5 was removed.
I still kept python 2.7 despite it being EOL. My gut tells me there is still too much software around depending on it. Does anybody still need 2.7?

5 Likes

Released 3.3.0 (22 May 2021)

bugfixes, improvements

Changes

  • The flakes cmdline api has been changed. New usage:
    nix (build|shell) mach-nix#gen.(python|docker).package1.package2...
    
    (Despite this change being backward incompatible, I did not bump the major version since everything flakes related should be considered experimental anyways)

Improvements

  • Mach-nix (used via flakes) will now throw an error if the selected nixpkgs version is newer than the dependency DB since this can cause conflicts in the resulting environment.
  • When used via flakes, it was impossible to select the python version because the import function is not used anymore. Now python can be passed to mkPython alternatively.
  • For the flakes cmdline api, collisions are now ignored by default
  • The simplified override interface did not deal well with non-existent values.
    • Now the .add directive automatically assumes an empty list/set/string when the attribute to be extended doesn’t exist.
    • Now the .mod directive will pass null to the given function if the attribute to modify doesn’t exist.

Fixes

  • Generating an environment with a package named overrides failed due to a variable name collision in the resulting nix expression.
  • When used via flakes, the pypiData was downloaded twice, because the legacy code path for fetching was still used instead of the flakes input.
  • nix flake show mach-nix failed because it required IFD for foreign platforms.
  • For environments generated via mach-nix env ... the python command referred to the wrong interpreter.
  • When checking wheels for compatibility, the minor version for python was not respected which could lead to invalid environments.
  • Some python modules in nixpkgs propagate unnecessary dependencies which could lead to collisions in the final environment. Now mach-nix recursively removes all python dependencies which are not strictly required.

Package Fixes

  • cryptography: remove rust related hook when version < 3.4
6 Likes