Mach-nix: Create python environments quick and easy

  1. New attribute names are supposed to be normalized names. Older packages may not follow this rule. Additionally, packages that start with a number cannot follow this, so they are prefixed with something (I think py). The latter can be solved by having the attributes be strings, if I recall correctly. Difficult can be bindings that we include that are not on PyPi but may shadow a PyPI package. Those should be fixed. Renaming old packages to follow the normalized names rule would ideally also be done, but would break backwards compatibility unless aliases are added.
  2. Versions should be fine I think. You are right we do not enforce a rule, but I highly doubt this is an issue with Python packages.
  3. Multiple versions indeed has a convention, but I can imagine it can be hard to use that in an automated way. In principle we should not have multiple versions of packages in the Python package set.
  4. That’s quite rare and primarily exists because of 3). The other cause could be variants (see e.g. h5py-mpi). I think those should be removed and a parameter should be added to the original expression.
  5. Cleanup is indeed needed here, and ideally this is a part that would be automated further. Separating expressions in automatically generated and manually overridden is still an idea, but contrary to other package sets in Nixpkgs we have quite a lot of manually added code, especially for testing.
  6. No, there is not. I wish there was. There are some different views on this as well. Hopefully when/if we get flakes we can have a good discussion around that.
  7. You could evaluate? nix eval -f . python3.pkgs.numpy.version

It’s quite cool that one could theoretically have all python packages from nixpkgs in one environment, something that may not be possible with pip/poetry/conda. But practically, as a developer, I want to use the exact versions as specified by authors. Scipy 1.19 and 1.23 have breaking changes between them for example, and this happens all over the place. Not all packages follow semantic versioning, and an override written for one package version may not work for another version. I strongly feel that we must support users installing the precise version that they desire. Using multiple virtualenv is industry standard and required for most developers.

I understand from the nixpkgs perspective that we have limited build resources, etc, and I certainly wouldn’t expect everything to be cached / go through CI. But does it really matter

if the scipy folder in nixpkgs has 10 or 20 or 100 nix files in it?

I have a feeling that both poetry2nix and Mach will have high overlap in failure cases, the packages that need buildInputs. I understand that it may not be possible to combine, but it would be great to identify a concrete way in which the two projects could avoid duplicating work.

Claim: I think it’s in everyone’s interest if we can have one set of overrides that are used by both projects. Each project would benefit, by having more people contribute to these overrides, and maintainers benefit by not having to duplicate work. If this doesn’t happen, my concern is that we’ll (or at least I’ll…) be jumping between the two approaches when a particular package isn’t supported by one or the other.

I would like to see python packages that can be automatically generated by poetry2nix, pypi2nix, Mach, and/or python-package-init NOT in nixpkgs. Only python requiring manual attention should go in, and ideally this work should be used by the respective projects.

If the needs of current nixpkgs python users and those of more automated tooling are too far apart for now, then perhaps Nix-Community could host a shared overlay/overrides. Maybe @zimbatm has some thoughts

Edit: or at least a subset of overrides that are shared by both projects as possible…

When you say a version of a package, you need to not ensure that the expression of that package is correct, but also the dependencies. This is the real (combinatoric) problem.

Aside from that, it is work. We can want a lot but it needs to be done (read: maintained) as well. Given the fact many core packages already don’t get as much attention as they require, I’d say Nixpkgs is not the place for this.

True, although I’d bet that in almost all cases we can restrict versioning to the other python packages. There are notable exceptions for sure, eg tensorflow and bazel or CUDA, but my impression is most dependencies are pretty stable, ie BLAS, ffmpeg, etc. Over the course of six years on Ubuntu 14.04, despite using cutting edge python versions, I can count on one hand the number of times a native dependency has been improperly versioned for a python package.

  • have no requirement to manually write derivations
  • support poetry.lock for collaborating with non-nix users

These are very express goals of poetry2nix :slight_smile:

  • support per-package override versioning

What would you like the API for this to look like?

  • leverage existing work in nixpkgs where possible rather than recreating

I think to do this in a way that works better for py2nix solutions you’d have to restructure the current python.pkgs sets a fair bit.

I see a few issues with the way Python packaging is done in nixpkgs that makes this hard:

  1. Take the pyarrow override for example. The problem is that a later version substantially changed the build and poetry2nix needs to be able to also build the later version.
    In nixpkgs this is a non-issue because there is only ever one version of a package.
  2. Nixpkgs has no split between python & native dependencies. We don’t want to blindly apply whatever inputs a nixpkgs python derivation has.
  3. Nixpkgs doesn’t always use canonical pypi names making collisions more likely if we were to pick build inputs based on information from nixpkgs.

A solution to this may be to make the nixpkgs python builds two-staged by creating a nixpkgs internal poetry2nix style overlay that is only concerned with adding native inputs & touching up upstream packaging bugs like this.
That way we could achieve maximum sharing. It comes at great cost to maintaining nixpkgs expressions though.

On that subject I would absolutely love to have a more principled & shared approach to conditional things like numpy taking a pluggable BLAS implementanion.
This is hacky & brittle.

ultimately be mainlined into NixOS/nixpkgs

poetry2nix is already in nixpkgs :wink:

mach-nix 2.0.0 realeased

I’m excited to announce the release of mach-nix 2.0.0! (Changelog)
It comes with several new features which significantly increase the success rate out of the box, plus it gives you some nice tools to fix problems in case there are any. Besides python wheel support and improved nixpkgs support it brings capabilities for overlays and python overrides which makes it composible with any other python overrides or nixpkgs overlays.

At its core, mach-nix now simply takes a requirements.txt and an arbitrary pkgs.python, and returns a set of python overrides which, if applied, make your pkgs.python conform to your requirements.txt.

Using mach-nix is not anymore an ‘in or out’ decision. Since it harmonizes with the nixpkgs-way of building python environments, it can be mixed, extended and modified in the usual way, or even applied ontop of an existing configuration.

The feature I’m most excited about is the concept of Providers which allows you to freely prioritize the origin and buildSystem for your packages on a granular basis.

The following 3 providers are available in version 2.0.0:

  1. nixpkgs: Provides packages directly from nixpkgs without modifying their sources. Has only a few versions available, but has a high success rate and all the nix features, like cudaSupport for tensorflow for example.
  2. sdist: Provides all package versions available from pypi which support setuptools and builds them via nixpkgs overlays wherever possible to resolve external dependencies. It still supports the nixpkgs specific features no matter which package version is selected. But chances are higher for a build to fail than with the nixpkgs provider.
  3. wheel: Provides all linux compatible wheel releases from pypi. Wheels can contain binaries. Mach-nix autopatches them to work on nix. Wheels are super quick to install and work quite reliable. Therefore this provider is preferred by default.

Mach-nix builds environments by mixing packages from all 3 providers. You decide which providers should be preferred for which packages, or which providers shouldn’t be used at all.
The default preferred order of providers is wheel, sdist, nixpkgs.

Providers can be disabled/enabled/preferred like in the following examples:

  • A provider specifier like "wheel,sdist,nixpkgs" means, that the resolver will first try to satisfy the requirements with candidates from the wheel provider. If a resolution is impossible or a package doesn’t provide a wheel release, it falls back to sdist/nixpkgs for a minimal number of packages. In general it will choose as many packages from wheel as possible, then sdist, then nixpkgs.

  • "nixpkgs,sdist" means, that nixpkgs candidates are preferred, but mach-nix falls back to build from source (sdist). wheel is not listed and therefore wheels are disabled.

A full provider config passed to mach-nix looks like this:

{
 # The default for all packages which are not specified explicitly
 _default = "nixpkgs,wheel,sdist"

 # Explicit settings per package
 numpy = "wheel,sdist"
 tensorflow = "wheel"
}

Mach-nix will always satisfy your requirements.txt fully with the configured providers or fail with a ResolutionImpossible error.

If a mach-nix build fails, Most of the times it can be resolved by just switching the provider of a package, which is simple and doesn’t require writing a lot of nix code. For some more complex scenarios, checkout the following examples.

Examples:

1. Tensorflow with SSE/AVX/FMA support

I have a complex set of requirements including tensorflow. I’d like to have tensorflow with the usual nix features enabled like SSE/AVX/FMA which I cannot get from pypi. Therefore i must take tensorflow from nixpkgs. For everything else I keep the default, which means wheels are preferred. This allows for quicker installation of dependencies.

let
  mach-nix = import (builtins.fetchGit {
    url = "https://github.com/DavHau/mach-nix/";
    ref = "2.0.0";
  });
in mach-nix.mkPython {

  requirements = ''
    # bunch of other requirements
    tensorflow
  '';

  providers = {
    # force tensorflow to be taken from nixpkgs
    tensorflow = "nixpkgs"; 
  };
}

This only works if the restrictions in requirements.txt allow for the tensorflow version from nixpkgs. Sadly in this specific case tansorflow needs to be rebuilt and cannot be retrieved from the nix cache. The reason for this is that nixpkgs uses a wrong version of gast for tensorflow. Mach-nix notices that and corrects it which leads to a rebuild. I could include some dont_fixup_nixpkgs option if there is demand for it. But for now I preferred to keep the API simple.

2. Recent Tensorflow quick install

I’d like to install a more recent version of tensorflow which is not available from nixpkgs. Also I hate long build times and therefore I want to install tensorflow via wheel. Usually most wheels work pretty well out of the box, but the tensorflow wheel has an issue which I need to fix with an override.

let
  mach-nix = import (builtins.fetchGit {
    url = "https://github.com/DavHau/mach-nix/";
    ref = "2.0.0";
  });
in mach-nix.mkPython {

  requirements = ''
    # bunch of other requirements
    tensorflow == 2.2.0rc4
  '';

  # no need to specify provider settings since wheel is the default anyways

  # Fix the tensorflow wheel
  overrides_post = [( pythonSelf: pythonSuper: {
    tensorflow = pythonSuper.tensorflow.overridePythonAttrs ( oldAttrs: {
      postInstall = ''
        rm $out/bin/tensorboard
      '';
    });
  })];
}

3. Recent PyTorch with nixpkgs dependencies, overlays, and custom python

I’d like to use a recent version of Pytorch from wheel, but I’d like to build the rest of the requirements from sdist or nixpkgs, since I’ve already written overlays for those packages which I’d like to continue using. Also I require python 3.6

let
  mach-nix = import (builtins.fetchGit {
    url = "https://github.com/DavHau/mach-nix/";
    ref = "2.0.0";
  });
overlays = []; # some very useful overlays
in mach-nix.mkPython rec {

  requirements = ''
    # bunch of other requirements
    torch == 1.5.0
  '';

  providers = {
    # disallow wheels by default
    _default = "nixpkgs,sdist";
    # allow wheels only for torch
    torch = "wheel";
  };

  # Include my own overlay. (Caution! nixpkgs >= 20.03 required for wheel support)
  pkgs = import <nixpkgs> { config = { allowUnfree = true; }; inherit overlays; };

  # Select custom python version (Must be taken from pkgs with the overlay applied)
  python = pkgs.python36;
}

4. Use overrides from poetry2nix

I have a complex requirements.txt which includes imagecodecs. It is available via wheel, but I prefer to build everything from source. This package has complex build dependencies and is not available from nixpkgs. Luckily poetry2nix` overrides make it work. The peotry2nix overrides depend on nixpkgs-unstable.

let
  mach-nix = import (builtins.fetchGit {
    url = "https://github.com/DavHau/mach-nix/";
    ref = "2.0.0";
  });
in mach-nix.mkPython rec {

  requirements = ''
    # bunch of other requirements
    imagecodecs
  '';

  providers = {
    _default = "sdist";
  };

  # Use unstable channel
  pkgs = import <unstable> {};

  # Import overrides from poetry2nix
  # Caution! Use poetry2nix overrides only in `overrides_post`, not `overrides_pre`.
  overrides_post = [
      (
        import (builtins.fetchurl {
          url = "https://raw.githubusercontent.com/nix-community/poetry2nix/1cfaa4084d651d73af137866622e3d0699851008/overrides.nix";
        }) { inherit pkgs; }
      )
    ];
}

Of course those examples are unsafe since they don’t include hashes. But i hope they demonstrate well enough how you can build complex python environments, while using nixpkgs specific features, fixing up packages, speeding up builds, and including your own overlays and overrides.

My plans for upcoming versions:

  • Github Provider: Some complex packages like tensorflow for example don’t publish sdist releases and therefore mach-nix cannot build arbitraty versions of them from source. Therefore the only way to use them with nixpkgs specific build features, is to take exactly the versions from nixpkgs. Nixpkgs builds them from github. If a github provider would be implemented into mach-nix, it could build arbitrary versions of tensorflow from source including all nix features.
  • Draw Dependency Graph: For debugging purposes it would be nice to have a commanline tool to display the resolved graph of a given requirements.txt + provider settings.
  • buildPythonPackage: Fully automatic buildPythonApplication and buildPythonPackage builders, which are able to extract a packages requirements from the package soruce tree.
  • Additional requirement formats: It would be cool to support more requirement formats than just requirements.txt. I have in mind:
    • project.toml for minimal build dependencies.
    • project.toml poetry style
    • Pipfile / Pipfile.lock
    • setup.py / setup.cfg… the usual setuptools stuff
    • … Are there any other important ones?
  • Compression for Database: Add compression for package index and dependency graph to make mach-nix mroe CI friendly.

Let me know if you have any other ideas in mind!

Also if you’d like to contribute or just share your ideas/problems, I would love to see some collaboration on github!

In case any builds fail, please don’t hesitate to open issues on github. This is the only way we can fix these issues or improve docs in case there are some misunderstandings.

If you figure that some package is only working from specific providers, please commit your provider config for that package to This File. Those are the builtin mach-nix defaults. It will help other people to build those packages out of the box.

I did not yet include any default overrides. There has already been invested a lot of energy to fix python packages in nixpkgs (which we are reusing here) and in poetry2nix. I’d like to not open another project with overrides unless it is really necessary. I guess it is in the interest of all 3 projects to agree on a common format for python fixes. Would it be possible to work together on a specification for this? @FRidh @adisbladis Of course with the ultimate goal to use that format also in nixpkgs.

My requirements for fixes are:

  1. Separate treatment for different package formats like sdist, wheel, github/other.
  2. Different treatment for different versions. Fixes could include a version specifier like “>=1.2.0, <2.0.0”. So if my package has version “1.5.0”, I know i need to apply this fix
  3. Separation between handling python inputs and native inputs + fixes. Like already mentioned by you guys.
  4. A defined mapping from pypi package names to nix attribute names.

Unrelated sidenote: I’m a freelancer and looking for some cool projects to collaborate.

8 Likes

this is awesome progress, and i’m playing around a bit more. This might be too big of an ask, but it would be amazing to have support for conda’s environment.yml format / conda packages. This is used a ton in data science world. It would seem your approach is well abstracted to support multiple providers, which is incredibly cool!

Sounds like an interesting idea. I didn’t have a look at conda’s environment.yml format so far, but everything which is convertible to a list of package names and versions should be easy to add. Maybe there already exists some parser we can use. Not sure if I will find time for this soon, but I’ll keep it in mind.

New version released! Solves a bunch of problems, improves build speed, build-time closure, build success rate and fixes disable_checks option. Also comes with updated python package index. Highly recommend you to upgrade!

With the previous release (2.0.0) I broke the crawler infrastructure which was supposed to constantly update the dependency graph with newly released python packages. This is also fixed and from now on the pypi-deps-db repo should receive daily updates again.

changelog:

2.0.1 (29 Jun 2020)

Fixes:

  • fix: disable_checks did not work for packages built via the sdist provider if the required version matches exactly the version used in nixpkgs.
  • fix: some dependencies with markers were ignored completely
  • fix: providers nixpkgs and sdist inherited many unneeded build inputs from nixpkgs leading to bloated build-time closures, increased failure rate and uneffective disable_checks option. After this fix, only non-python build-inputs are inherited from nixpkgs.
  • mach-nix now sets pname + version for python packages instead of name
2 Likes

Released version 2.1.0.

Bug fixes + new feature buildPythonPackage / buildPythonApplication

Mach-nix now supports buildPythonPackage and buildPythonApplication which allow to build python modules or applications from a local source tree or a github project, for example. It’s not yet fully automatic. You still need to manually provide the list of requirements, but often you can just load the requirements.txt from the project.

Example

Building a python application from github can now be done like this:

let
  mach-nix = import (builtins.fetchGit {
    url = "https://github.com/DavHau/mach-nix/";
    ref = "2.1.0";
  });
in mach-nix.buildPythonPackage rec {
  pname = "projectname";
  version = "1.0.0";
  src = builtins.fetchGit{
    url = "https://github.com/user/projectname";
    ref = "master";
    # rev = "put_commit_hash_here";
  };
  doCheck = false;
  doInstallCheck = false;
  requirements = builtins.readFile "${src}/requirements.txt";
}

buildPythonPackage / buildPythonApplication accept the identical arguments like the same named functions in nixpkgs in addition to all arguments you usually pass to mach-nix.mkPython

changelog

Fixes:

  • fix 'value is null while a set was expected' error when python package is used which is set to null in nixpkgs. (like ipaddress / enum / futures)

Features:

  • buildPythonPackage / buildPythonApplication: Interface to build python packages from their source code + requirements.txt
3 Likes

Hi @DavHau! Good work! A question: Is it possible to use NixOS python packages inside mach-nix.mkPython? Because some python packages don’t have wheel or are in Pypi, but it’s packaged inside NixOS, like VTK or GDCM. Thanks!

Yes, all packages from pypi and nixpkgs should be available inside the requirements.txt. In case a package is available from both sources, mach-nix will prefer pypi, but you can change that. Check the providers section of the readme.
For packages which are neither on pypi nor in nixpkgs, you can make them available by including them via overrides_pre (see optional arguments) and then selecting them inside the requirements.txt.

Released 2.1.1 (30 Jul 2020)

Fix broken wheel packages

Fixes:

  • Some wheel packages could brake through patchelf if they already contained stripped binaries. Packages like numpy wouldn’t work because of this. This is now fixed by passing dontStrip to the autoPatchelf routine.

Thanks for your contributions!

Sadly I cannot edit the original post of this thread anymore. I would like to update the example there to point to a newer version. Is there anything we can do about this?

1 Like

Released 2.2.0 (09 Aug 2020)

Improved success rate, MacOS support, bugfixes, optimizations

Features

  • Improved selection of wheel releases. MacOS is now supported and architectures besides x86_64 should be handled correctly.
  • Whenever mach-nix resolves dependencies, a visualization of the resulting dependency tree is printed on the terminal.
  • The dependency DB is now accessed through a caching layer which reduces the resolver’s CPU time significantly for larger environments.
  • The python platform context is now generated from the nix build environment variable system. This should decrease the chance of impurities during dependency resolution.

Fixes

  • The requires_python attribute of wheels was not respected. This lead to failing builds especially for older python versions. Now requires_python is part of the dependency graph and affects resolution.
  • Detecting the correct package name for python packages in nixpkgs often failed since the attribute names don’t follow a fixed schema. This lead to a handful of different errors in different situations. Now the package names are extracted from the pypi url inside the src attribute which is much more reliable. For packages which are not fetched from pypi, the pname attribute is used as fallback.
  • Fixed bug which lead to the error attribute 'sdist' missing if a package from the nixpkgs provider was used which doesn’t publish it’s source on pypi. (For example tensorflow)

Other Changes

  • Mach-nix now uses a revision of the nixpkgs-unstable branch instead of nixos-20.03 as base fo the tool and the nixpkgs provider.
  • Updated revision of the dependency DB
3 Likes

hello,

what about this error?
error dask

In general good strategies to resolve ResolutionImpossible errors are:

  • relax your top level requirements (The ones specified in your requirements.txt) by for example removing version specifiers like ==.
  • pin other versions of the requirements which are determined as problematic by the error message. In your case the message contains datashader==0.10.0. You could for example force an older version of datashader by adding datashader<0.10.0 or similar to your requirements.txt.
  • Sometimes the conflict is caused by some sub dependency which makes it more difficult to narrow it down. In this case try to remove top level requirements until the error disappears, so you can get a feeling which of the top level requirements causes the error.
  • open an issue for mach-nix
  • change your provider config

Released 2.2.1 (11 Aug 2020)

Handle circular dependencies, fix python 3.8 wheels, improve error message

Features

  • Print more detailed info when the resolver raises a ResolutionImpossible error.
  • Warn on circular dependencies and fix them automatically.

Fixes

  • Fix crash on circular dependencies.
  • Python 3.8 wheels have abi tag cp38, not cp38m. This was not considered before which prevented finding suitable manylinux wheels for python 3.8

Development

  • Added integration tests under ./tests/
1 Like

Note the cp38 versus cp38m depends on how Python is built. We should actually declare in the passthru how the interpreter is built in regard to this pymalloc.

2 Likes

I need you guys’ opinion on something.

Currently mach-nix uses this huge pypi package index mainly to get the sha256 hash for each package.
I’d really like to get rid of this dependency since it cannot be compressed smaller than 130 MB and is even multiple times larger when unpacked.
It always takes time to download, is very unfriendly for CI and it really doesn’t add any trust. The official pypi index is probably 10x more trustworthy than an index maintained by me.

I’m currently thinking about removing the index completely and instead implement my own fetcher which doesn’t require a sha256 or just use builtins.fetchTarball without hash.

As far as I undestand pypi prevents replacing releases, and therefore ensures integrity for each pair of pname + version. Therefore I don’t see any problem with this.

Are there any downsides of fetching without hash checking which I’m not seeing right now?

AFAIK, not passing a sha256 attribute means the expression will need internet access after tarball-ttl timeout. So it will break offline use. (Please correct me if I’m wrong.)