Mach-nix: Create python environments quick and easy

Released version 2.1.0.

Bug fixes + new feature buildPythonPackage / buildPythonApplication

Mach-nix now supports buildPythonPackage and buildPythonApplication which allow to build python modules or applications from a local source tree or a github project, for example. It’s not yet fully automatic. You still need to manually provide the list of requirements, but often you can just load the requirements.txt from the project.

Example

Building a python application from github can now be done like this:

let
  mach-nix = import (builtins.fetchGit {
    url = "https://github.com/DavHau/mach-nix/";
    ref = "2.1.0";
  });
in mach-nix.buildPythonPackage rec {
  pname = "projectname";
  version = "1.0.0";
  src = builtins.fetchGit{
    url = "https://github.com/user/projectname";
    ref = "master";
    # rev = "put_commit_hash_here";
  };
  doCheck = false;
  doInstallCheck = false;
  requirements = builtins.readFile "${src}/requirements.txt";
}

buildPythonPackage / buildPythonApplication accept the identical arguments like the same named functions in nixpkgs in addition to all arguments you usually pass to mach-nix.mkPython

changelog

Fixes:

  • fix 'value is null while a set was expected' error when python package is used which is set to null in nixpkgs. (like ipaddress / enum / futures)

Features:

  • buildPythonPackage / buildPythonApplication: Interface to build python packages from their source code + requirements.txt
3 Likes

Hi @DavHau! Good work! A question: Is it possible to use NixOS python packages inside mach-nix.mkPython? Because some python packages don’t have wheel or are in Pypi, but it’s packaged inside NixOS, like VTK or GDCM. Thanks!

Yes, all packages from pypi and nixpkgs should be available inside the requirements.txt. In case a package is available from both sources, mach-nix will prefer pypi, but you can change that. Check the providers section of the readme.
For packages which are neither on pypi nor in nixpkgs, you can make them available by including them via overrides_pre (see optional arguments) and then selecting them inside the requirements.txt.

Released 2.1.1 (30 Jul 2020)

Fix broken wheel packages

Fixes:

  • Some wheel packages could brake through patchelf if they already contained stripped binaries. Packages like numpy wouldn’t work because of this. This is now fixed by passing dontStrip to the autoPatchelf routine.

Thanks for your contributions!

Sadly I cannot edit the original post of this thread anymore. I would like to update the example there to point to a newer version. Is there anything we can do about this?

1 Like

Released 2.2.0 (09 Aug 2020)

Improved success rate, MacOS support, bugfixes, optimizations

Features

  • Improved selection of wheel releases. MacOS is now supported and architectures besides x86_64 should be handled correctly.
  • Whenever mach-nix resolves dependencies, a visualization of the resulting dependency tree is printed on the terminal.
  • The dependency DB is now accessed through a caching layer which reduces the resolver’s CPU time significantly for larger environments.
  • The python platform context is now generated from the nix build environment variable system. This should decrease the chance of impurities during dependency resolution.

Fixes

  • The requires_python attribute of wheels was not respected. This lead to failing builds especially for older python versions. Now requires_python is part of the dependency graph and affects resolution.
  • Detecting the correct package name for python packages in nixpkgs often failed since the attribute names don’t follow a fixed schema. This lead to a handful of different errors in different situations. Now the package names are extracted from the pypi url inside the src attribute which is much more reliable. For packages which are not fetched from pypi, the pname attribute is used as fallback.
  • Fixed bug which lead to the error attribute 'sdist' missing if a package from the nixpkgs provider was used which doesn’t publish it’s source on pypi. (For example tensorflow)

Other Changes

  • Mach-nix now uses a revision of the nixpkgs-unstable branch instead of nixos-20.03 as base fo the tool and the nixpkgs provider.
  • Updated revision of the dependency DB
3 Likes

hello,

what about this error?
error dask

In general good strategies to resolve ResolutionImpossible errors are:

  • relax your top level requirements (The ones specified in your requirements.txt) by for example removing version specifiers like ==.
  • pin other versions of the requirements which are determined as problematic by the error message. In your case the message contains datashader==0.10.0. You could for example force an older version of datashader by adding datashader<0.10.0 or similar to your requirements.txt.
  • Sometimes the conflict is caused by some sub dependency which makes it more difficult to narrow it down. In this case try to remove top level requirements until the error disappears, so you can get a feeling which of the top level requirements causes the error.
  • open an issue for mach-nix
  • change your provider config

Released 2.2.1 (11 Aug 2020)

Handle circular dependencies, fix python 3.8 wheels, improve error message

Features

  • Print more detailed info when the resolver raises a ResolutionImpossible error.
  • Warn on circular dependencies and fix them automatically.

Fixes

  • Fix crash on circular dependencies.
  • Python 3.8 wheels have abi tag cp38, not cp38m. This was not considered before which prevented finding suitable manylinux wheels for python 3.8

Development

  • Added integration tests under ./tests/
1 Like

Note the cp38 versus cp38m depends on how Python is built. We should actually declare in the passthru how the interpreter is built in regard to this pymalloc.

2 Likes

I need you guys’ opinion on something.

Currently mach-nix uses this huge pypi package index mainly to get the sha256 hash for each package.
I’d really like to get rid of this dependency since it cannot be compressed smaller than 130 MB and is even multiple times larger when unpacked.
It always takes time to download, is very unfriendly for CI and it really doesn’t add any trust. The official pypi index is probably 10x more trustworthy than an index maintained by me.

I’m currently thinking about removing the index completely and instead implement my own fetcher which doesn’t require a sha256 or just use builtins.fetchTarball without hash.

As far as I undestand pypi prevents replacing releases, and therefore ensures integrity for each pair of pname + version. Therefore I don’t see any problem with this.

Are there any downsides of fetching without hash checking which I’m not seeing right now?

AFAIK, not passing a sha256 attribute means the expression will need internet access after tarball-ttl timeout. So it will break offline use. (Please correct me if I’m wrong.)

Released 2.2.2 (17 Aug 2020)

Fixes

  • Packages with dot in name led to invalid nix expression
  • Problem generating error message for resolution impossible errors
  • buildPythonPackage of mach-nix failed if arguments like pkgs were passed.
  • When overriding packages, mach-nix now falls back to using overrideAttrs if overridePythonAttrs is not available.

Package Fixes:

  • pip: installation failed. Fixed by forcing nixpkgs provider
  • gdal: building from sdist doesn’t work. Fixed by forcing nixpkgs provider

Development

  • Merged project pypi-crawlers into mach-nix (was separated project before)

This here is why I opened [RFC 0067] Common override interface derivations by FRidh · Pull Request #67 · NixOS/rfcs · GitHub. Typically one should not need to know what override function needs to be used, it should “just work”.

The reason overrideAttrs had to be used is because makeOverridablePythonPackage is not applied to toPythonModule. I think I can fix that.

If you notice any more inconsistencies, please let me know.

1 Like

How large is the graph without checksums?

They have also pulled packages as well, the version of package you may be looking for may not exist

This here is why I opened https://github.com/NixOS/rfcs/pull/67 . Typically one should not need to know what override function needs to be used, it should “just work”.

@FRidh Thanks for noticing. I was always wondering about those different override methods. So if I understand your RFC correctly, and assuming the problem with toPythonModule would be fixed, then I should deprecate the use of overridePythonAttrs in exchange for overrideAttrs.
But what if a user wants to use mach-nix ontop of nixpkg-19.09? Since which nixpkgs version does python follow the overrides RFC?

Note the cp38 versus cp38m depends on how Python is built . We should actually declare in the passthru how the interpreter is built in regard to this pymalloc.

I regexed the DB and it seems that - with a few exceptions - the whole python world has moved away from pymalloc since python 3.8 . For python 3.8 there are only 204 wheel releases supporting the py38m abi, while there are 24404 releases for the py38 abi.
For python 3.7 it’s the other way round: 10x py37 vs. 57605x py37m
Therefore I don’t think it’s necessary to make that information available through passthru.


They have also pulled packages as well, the version of package you may be looking for may not exist

That’s ok I guess and also no reason to keep the index. If they are gone, they are gone.


How large is the graph without checksums?

Current sizes in MB:

|       | zip | raw |
|-------|-----|-----|
| graph | 38  | 508 |
| index | 142 | 500 |

The compressed size of the graph is only 38M.
Since the graph is only accessed by python, I can do an optimization to accesses the zip file directly without unpacking to disk.
I cannot do the same for the index, since it needs to be accessed by nix and nix doesn’t support such magic as far as I know.
Therefore the best would be to just get rid of the index completely.

I assume that it must be possible to implement a custom fetcher that omits checksum check but still allows permanent caching!?
I see that target has implemented some fetchers which seem to behave similar: GitHub - target/nix-fetchers: A set of morally pure fetching builtins for Nix.
I didn’t take a deeper look into this yet, but it gives me hope.

By keeping the graph compressed and getting rid of the index, the disk space requirement would decrease from 1 GB to 38 MB

2.3.0 (26 Aug 2020)

simplified override system, autodetect requirements, improved success rate

Features

  • Simplified generic override system via _ (underscore) argument for mkPython.
    Example: _.{package}.buildInputs.add = [...]
  • buildPythonPackage now automatically detects requirements. Therefore the requirements argument becomes optional.
  • buildPythonPackage now automatically detects package name and version. Therefore those attributes become optional.
  • buildPythonPackage can now be called while only passing a tarball url or a path
  • mkPython allows to include python packages from arbitrary sources via new argument extra_pkgs
  • mkPython can now be called while only passing a list of tarball urls or paths

Fixes

  • More bugs introduced by packages with dot in name
  • Definitions from overrides_pre were sometimes disregarded due to wrong use of with-statements inside a recursive attrset.
  • Fix installation of the mach-nix tool via pip. (requirements were missing)
  • packages which use a non-normalized version triggered an evaluation error since mach-nix tried to reference their source via normalized version.
  • wheels removed from pypi were not removed from the dependency graph which could result in environments failing to build

The following simplified ways to call buildPythonPackage are now possible:

In all following examples mach-nix will detect and resolve the requirements of all included packages automatically.

to build a package directly from github or from a local path:

mach-nix.buildPythonPackage "https://github.com/psf/requests/tarball/2a7832b5b06d"

to select extras:

mach-nix.buildPythonPackage {
  src = "https://github.com/psf/requests/tarball/2a7832b5b06d";
  extras = "socks";
}

The following ways to call mkPython are now possible:

To add packages from arbitrary sources to your existing requirements:

mach-nix.mkPython {
  requirements = builtins.readFile ./requirements.txt;
  extra_pkgs = [
      "https://github.com/psf/requests/tarball/2a7832b5b06d"   # from tarball url
      ./some/local/project                                     # from local path
      mach-nix.buildPythonPackage { ... };                     # from package
    ];
}

In this case, all requirements specified via requirements and the ones extracted from packages inside extra_pkgs will be merged and resolved as one final python environment.

Also mkPython can now be called in a lazy way if no extra arguments are required:

mach-nix.mkPython [
  "https://github.com/psf/requests/tarball/2a7832b5b06d"  # from tarrball url
  ./some/local/project                                    # from path
  (mach-nix.buildPythonPackage { ... })                   # from package
]

Usage of new Underscore override system

with mach-nix.nixpkgs;
mach-nix.mkPython {

  requirements = "some requirements";

  _.{package}.buildInputs = [...];             # replace buildInputs
  _.{package}.buildInputs.add = [...];         # add buildInputs
  _.{package}.buildInputs.mod =                # modify buildInputs
      oldInputs: filter (inp: ...) oldInputs;         

  _.{package}.patches = [...];                 # replace patches
  _.{package}.patches.add = [...];             # add patches
  ...
}

Example is with builtInputs and patches but this can be applied to every attribute.

6 Likes

I think given packages are fetched over https and asserting the consistency guarantees of pypi, you could drop the index. There are opinions (including my own) that advocate not to
be too eager in replicating an ecosystems native trust guarantees (similar: checksums contained in cargo.lock and go.sum).

But since pypi is a central authority and stores all packages it serves, and given their policies, name-version is a valid content hash. Funnily it has even stronger uniquness guarantees than any content hash algorithm, which — extremely costly though (ca 60k USD - SHA1) — can be draftet to clash.

nix ecosystem might choose one day to genuinely trust valid content hashes, and even since to date I havn’t had word of any argument — I would have remembered — against doing so.

keeping the hashes also means that nix doesn’t have to go over network to re-fetch them

I think @DavHau would like to get a green light for regarding name-version (which he uses as identifiers in the graph database) as genuinly valid hashes (Instead of nix-style hashes).

Not only do the clear text hashes of the graph database compress better, but also are they more secure than cryptographic hashes - given the central pypi authority as the only accepted souce.

I think those are the kind of guarantees that should provide the peace of mind of deviating from standard nix practices (using nix hashes) - which - in the face of those arguments - reduce to a formfactor rather than adding any material value.

In harmonious glory, it will save around 500MB of disk space for every mach-nix database version fetched by a user.

I think the blessing of this idea is warranted.


If this didn’t make sense yet, one can think of cryptographic hashes as nothing more then content adressable pointers. And so are name-version with pypi as central naming authority.


The necesary and sufficient side conditions are, that transport security to pypi can be reasonably enforced. And that pypi upholds it’s “foce-push-prohibition” policy. And that pypi is not hacked at some point.


Guarding against the hack scenario: a cryptographic hash can be calculated and independently validated against some (trusted) nix-community infrastructure. Or even from github through an extremely hacky folder layout.