Mach-nix: Create python environments quick and easy

How large is the graph without checksums?

They have also pulled packages as well, the version of package you may be looking for may not exist

This here is why I opened https://github.com/NixOS/rfcs/pull/67 . Typically one should not need to know what override function needs to be used, it should “just work”.

@FRidh Thanks for noticing. I was always wondering about those different override methods. So if I understand your RFC correctly, and assuming the problem with toPythonModule would be fixed, then I should deprecate the use of overridePythonAttrs in exchange for overrideAttrs.
But what if a user wants to use mach-nix ontop of nixpkg-19.09? Since which nixpkgs version does python follow the overrides RFC?

Note the cp38 versus cp38m depends on how Python is built . We should actually declare in the passthru how the interpreter is built in regard to this pymalloc.

I regexed the DB and it seems that - with a few exceptions - the whole python world has moved away from pymalloc since python 3.8 . For python 3.8 there are only 204 wheel releases supporting the py38m abi, while there are 24404 releases for the py38 abi.
For python 3.7 it’s the other way round: 10x py37 vs. 57605x py37m
Therefore I don’t think it’s necessary to make that information available through passthru.


They have also pulled packages as well, the version of package you may be looking for may not exist

That’s ok I guess and also no reason to keep the index. If they are gone, they are gone.


How large is the graph without checksums?

Current sizes in MB:

|       | zip | raw |
|-------|-----|-----|
| graph | 38  | 508 |
| index | 142 | 500 |

The compressed size of the graph is only 38M.
Since the graph is only accessed by python, I can do an optimization to accesses the zip file directly without unpacking to disk.
I cannot do the same for the index, since it needs to be accessed by nix and nix doesn’t support such magic as far as I know.
Therefore the best would be to just get rid of the index completely.

I assume that it must be possible to implement a custom fetcher that omits checksum check but still allows permanent caching!?
I see that target has implemented some fetchers which seem to behave similar: GitHub - target/nix-fetchers: A set of morally pure fetching builtins for Nix.
I didn’t take a deeper look into this yet, but it gives me hope.

By keeping the graph compressed and getting rid of the index, the disk space requirement would decrease from 1 GB to 38 MB

2.3.0 (26 Aug 2020)

simplified override system, autodetect requirements, improved success rate

Features

  • Simplified generic override system via _ (underscore) argument for mkPython.
    Example: _.{package}.buildInputs.add = [...]
  • buildPythonPackage now automatically detects requirements. Therefore the requirements argument becomes optional.
  • buildPythonPackage now automatically detects package name and version. Therefore those attributes become optional.
  • buildPythonPackage can now be called while only passing a tarball url or a path
  • mkPython allows to include python packages from arbitrary sources via new argument extra_pkgs
  • mkPython can now be called while only passing a list of tarball urls or paths

Fixes

  • More bugs introduced by packages with dot in name
  • Definitions from overrides_pre were sometimes disregarded due to wrong use of with-statements inside a recursive attrset.
  • Fix installation of the mach-nix tool via pip. (requirements were missing)
  • packages which use a non-normalized version triggered an evaluation error since mach-nix tried to reference their source via normalized version.
  • wheels removed from pypi were not removed from the dependency graph which could result in environments failing to build

The following simplified ways to call buildPythonPackage are now possible:

In all following examples mach-nix will detect and resolve the requirements of all included packages automatically.

to build a package directly from github or from a local path:

mach-nix.buildPythonPackage "https://github.com/psf/requests/tarball/2a7832b5b06d"

to select extras:

mach-nix.buildPythonPackage {
  src = "https://github.com/psf/requests/tarball/2a7832b5b06d";
  extras = "socks";
}

The following ways to call mkPython are now possible:

To add packages from arbitrary sources to your existing requirements:

mach-nix.mkPython {
  requirements = builtins.readFile ./requirements.txt;
  extra_pkgs = [
      "https://github.com/psf/requests/tarball/2a7832b5b06d"   # from tarball url
      ./some/local/project                                     # from local path
      mach-nix.buildPythonPackage { ... };                     # from package
    ];
}

In this case, all requirements specified via requirements and the ones extracted from packages inside extra_pkgs will be merged and resolved as one final python environment.

Also mkPython can now be called in a lazy way if no extra arguments are required:

mach-nix.mkPython [
  "https://github.com/psf/requests/tarball/2a7832b5b06d"  # from tarrball url
  ./some/local/project                                    # from path
  (mach-nix.buildPythonPackage { ... })                   # from package
]

Usage of new Underscore override system

with mach-nix.nixpkgs;
mach-nix.mkPython {

  requirements = "some requirements";

  _.{package}.buildInputs = [...];             # replace buildInputs
  _.{package}.buildInputs.add = [...];         # add buildInputs
  _.{package}.buildInputs.mod =                # modify buildInputs
      oldInputs: filter (inp: ...) oldInputs;         

  _.{package}.patches = [...];                 # replace patches
  _.{package}.patches.add = [...];             # add patches
  ...
}

Example is with builtInputs and patches but this can be applied to every attribute.

6 Likes

I think given packages are fetched over https and asserting the consistency guarantees of pypi, you could drop the index. There are opinions (including my own) that advocate not to
be too eager in replicating an ecosystems native trust guarantees (similar: checksums contained in cargo.lock and go.sum).

But since pypi is a central authority and stores all packages it serves, and given their policies, name-version is a valid content hash. Funnily it has even stronger uniquness guarantees than any content hash algorithm, which — extremely costly though (ca 60k USD - SHA1) — can be draftet to clash.

nix ecosystem might choose one day to genuinely trust valid content hashes, and even since to date I havn’t had word of any argument — I would have remembered — against doing so.

keeping the hashes also means that nix doesn’t have to go over network to re-fetch them

I think @DavHau would like to get a green light for regarding name-version (which he uses as identifiers in the graph database) as genuinly valid hashes (Instead of nix-style hashes).

Not only do the clear text hashes of the graph database compress better, but also are they more secure than cryptographic hashes - given the central pypi authority as the only accepted souce.

I think those are the kind of guarantees that should provide the peace of mind of deviating from standard nix practices (using nix hashes) - which - in the face of those arguments - reduce to a formfactor rather than adding any material value.

In harmonious glory, it will save around 500MB of disk space for every mach-nix database version fetched by a user.

I think the blessing of this idea is warranted.


If this didn’t make sense yet, one can think of cryptographic hashes as nothing more then content adressable pointers. And so are name-version with pypi as central naming authority.


The necesary and sufficient side conditions are, that transport security to pypi can be reasonably enforced. And that pypi upholds it’s “foce-push-prohibition” policy. And that pypi is not hacked at some point.


Guarding against the hack scenario: a cryptographic hash can be calculated and independently validated against some (trusted) nix-community infrastructure. Or even from github through an extremely hacky folder layout.

If the index is down evaluation will fail. It also means that if the index goes down permanently, reproducibility is gone, because you cannot substitute the artifact since you don’t know what artifact you need.

The target fetchers mentioned earlier use the exec functionality of nix which is disabled by default and I assume discouraged.

Therefore sadly the only option to not destroy caching at all is to either provide hashes for tarballs or to increase the tarball-ttl option. But I’m not sure if it’s a good idea to force users to change their nix config just for mach-nix.

Therefore my current favorite approach is to keep the hash index but:

  1. replace sha256 hashes with sha1 since it requires less memory. Sounds crazy, but don’t forget, we’re not trying to solve any trust issue here but just want to trick nix to do the damn caching.
  2. use base64 or base32 encoding for the hashes instead of hex
  3. optimize the json file format to require less memory (unpacked). Add some dictionary for python versions /ABIs / file endings since it’s basically 90% redundent information.

I guess all opzimizations together should reduce the raw size of the index more than half. Only the raw size counts since it must be unpacked in order to evaluate it via nix.

Let me know in case you have concerns with that approach.

Trick nix into caching

  • Calculate hashes from hashes?
  • In other words: unique name-version.

— Not thought into detail, but wouldn’t that work?

Released 2.4.0 (20 Sep 2020)

Global conditional overrides, simple overrides for buildPythonPackage, improved metadata extraction, fix wheel selection

TL;DR;
Mach-nix now has a global override system (similar to poetry2nix). Please commit your fixes to this file. The format is designed to be human readible, conditional and reusable for other projects.

Features

  • Global conditional overrides: Similar to the overrides from poetry2nix, this allows users to upstream their ‘fixes’ for python packages. Though, a special format is used here which is optimized for human readability and allows to define a condition for each fix. Therefore fixes are applied on a granular basis depending on the metadata of each package like its version, python version, or provider. This format is designed in a way, so it could easily be reused by projects other than mach-nix. Please contribute your fixes to ./mach_nix/fixes.nix
  • Simplified overrides are now also available for buildPythonPackage (underscore argument)
  • Inherit passthru from nixpkgs: Reduces risk of missing attributes like numpy.blas.
  • Allow passing a string to the python argument of mkPython: Values like, for example, "python38" are now accepted in which case pkgs.python38 will be used. The intention is to reduce the risk of accidentally mixing multiple nixpkgs versions.
  • Improved error handling while extracting metadata from python sources in buildPythonPackage.

Fixes

  • Selecting extras when using buildPythonPackage didn’t have any effect
  • The passthru argument for buildPythonPackage was ignored
  • The propagatedBuildInputs argument for buildPythonPackage was ignored
  • Wheels with multiple python versions in their filename like PyQt5-...-cp35.cp36.cp37.cp38-...whl were not selected correctly.

Package Fixes:

  • tensorflow: collision related to tensorboard
  • orange3: broken .so file caused by fixupPhase (probably due to shrinking)
  • ldap0: add misssing build inputs.
3 Likes

Sorry, some major stuff was broken in 2.4.0. Here a quick bugfix release:

2.4.1 (21 Sep 2020)

bugfixes

Fixes

  • extra_pkgs was broken: Packages didn’t end up in final environment
  • null value error when inheriting passthru for disabled packages
  • Wrong provider detected for sdist packages in fixes.nix
  • overrides from fixes.nix didn’t apply for buildPythonPackage

Package Fixes

  • pip: allow from sdist provider
  • pip: remove reproducible.patch for versions < 20.0

In case maintainers of other tools like to use the fixes.nix overrides from mach-nix. The following function can be used to transform the custom format to conventional nixpkgs python overrides:
(a package’s provider is assumed to be “nixpkgs” if a package doesn’t define passthru.provider)

function
with pkgs.lib;
let
  fixes = import ./fixes.nix { inherit pkgs; };
  meets_cond = oa: condition:
    let
      provider = if hasAttr "provider" oa.passthru then oa.passthru.provider else "nixpkgs";
    in
      condition { prov = provider; ver = oa.version; pyver = oa.pythonModule.version; };
  combine = pname: key: val1: val2:
    if isList val2 then val1 ++ val2
    else if isAttrs val2 then val1 // val2
    else if isString val2 then val1 + val2
    else throw "_.${pname}.${key}.add only accepts list or attrs or string.";
in
  flatten (flatten (
    mapAttrsToList (pkg: p_fixes:
      mapAttrsToList (fix: keys: pySelf: pySuper:
        let cond = if hasAttr "_cond" keys then keys._cond else ({prov, ver, pyver}: true); in
        if ! hasAttr "${pkg}" pySuper then {} else
        {
          "${pkg}" = pySuper."${pkg}".overrideAttrs (oa:
            mapAttrs (key: val:
              trace "\napplying fix '${fix}' for ${pkg}:${oa.version}\n" (
                if isAttrs val && hasAttr "add" val then
                  combine pkg key oa."${key}" val.add
                else if isAttrs val && hasAttr "mod" val && isFunction val.mod then
                  val.mod oa."${key}"
                else
                  val
              )
            ) (filterAttrs (k: v: k != "_cond" && meets_cond oa cond) keys)
          );
        }
      ) p_fixes
    ) fixes
  ));
1 Like

mmm, I’m currently trying to use mach-nix to install Plone and it’s dependencies, but it’s running since 50 minutes and currently is consuming 50.8G of resident memory… it printed the dependency tree but since then nothing more… any way to know more about what is it doing?

aargh… it got just killed by the OOM killer!
I know that the plone dependency tree is huge, but I really hoped that with some amount of memory and time, it could complete the env task

That doesn’t look too good :wink: I opened an issue for this matter here: Building Plone runs out of memory · Issue #158 · DavHau/mach-nix · GitHub
It would be great if you could comment on the issue and post the exact nix expression or command you used for that build. I wasn’t able to reproduce it.

Released 3.0.0 (14 Oct 2020)

flakes pypi gateway, R support, new output formats, more packages for python 3.5/3.6, improved providers nixpkgs/wheel

IMPORTANT NOTICE

The UI has been reworked. It is backward compatible with a few exceptions. Most importantly, when importing mach-nix, an attribute set must be passed. It can be empty. Example:

let
  mach-nix = import (builtins.fetchGit {
    url = "https://github.com/DavHau/mach-nix/";
    ref = "refs/tags/3.0.0";
  }) {
    # optionally bring your own nixpkgs
    # pkgs = import <nixpkgs> {};

    # or specify the python version
    # python = "python38";
  };
in
...

Features

  • Flakes gateway to pypi. Get a nix shell with arbitrary python packages. Example:

    nix develop github:davhau/mach-nix#shellWith.requests.tensorflow.aiohttp

  • or a docker image
    nix build github:davhau/mach-nix#dockerImageWith.package1.package2 ...

  • or a python derivation
    nix build github:davhau/mach-nix#with.package1.package2 ...

  • New output formats:

    • mkDockerImage → produces layered docker image containing a python environment
    • mkNixpkgs → returns nixpkgs which is conform to the given requirements
    • mkOverlay → returns an overlay function to make nixpkgs conform to the given requirements
    • mkPythonOverrides → produces pythonOverrides to make python conform to the given requirements.
  • New functions fetchPypiSdist and fetchPypiWheel. Example:

    mach-nix.buildPythonPackge {
      src = mach-nix.fetchPypiSdist "requests" "2.24.0"
    };
    
  • When using the mach-nix cmdline tool, the nixpkgs channel can now be picked via:

    mach-nix env ./env -r requirements.txt --nixpkgs nixos-20.09
    
  • R support (experimental): R packages can be passed via packagesExtra. Mach-nix will setup rpy2 accordingly. See usage example.

  • Non-python packages can be passed via packagesExtra to include them into the environment.

Improvements

  • rework the logic for inheriting dependencies from nixpkgs
  • fixes.nix: allow alternative mod function signature with more arguments:
    key-to-override.mod = pySelf: oldAttrs: oldVal: ...;
  • allow derivations passed as src argument to buildPythonPackage
  • stop inheriting attribute names from nixpkgs, instead use normalized package names
  • rework the public API of mach-nix (largely downwards compatible)
  • add example on how to build aarch64 image containing a mach-nix env
  • tests are now enabled/disabled via global override which is more reliable
  • raise error if python version of any package in packagesExtra doesn’t match to one of the environment

Fixes

  • nixpkgs packages with identical versions swallowed
  • pname/version null in buildPythonPackage
  • update dependency extractor to use “LANG=C.utf8” (increases available packages for python 3.5 and 3.6)
  • wheel provider picked wheels incompatible to python version
  • unwanted python buildInput inheritance when overriding nixpkgs
  • properly parse setup/install_requires if they are strings instead of lists

Package Fixes

  • rpy2: sdist: remove conflicting patch for versions newer than 3.2.6
  • pytorch from nixpkgs was not detected as torch
  • pyqt5: fix for providers nixpkgs and wheel
  • httpx: remove patches
5 Likes

@DavHau my requirements has some packages like h5py and theano. I’m using Mach-Nix 3.0.0 and When I activate my shell.nix it show this message:

Multiple nixkgs attributes found for h5py-2.10.0: ['h5py', 'h5py-mpi']
Picking 'h5py' as base attribute name.
Multiple nixkgs attributes found for python-dateutil-2.8.1: ['dateutil', 'python-dateutil']
Picking 'dateutil' as base attribute name.
Multiple nixkgs attributes found for theano-1.0.5: ['Theano', 'TheanoWithCuda', 'TheanoWithoutCuda']
Picking 'Theano' as base attribute name.

It says there are more than one package to that requirement. How can I force Mach-Nix to pick TheanoWithCuda, for instance?

Good question!
First up, If you use the wheel dependency provider (which is the default), then those attribute names have absolutely no effect.

Only if you use the sdist or nixpkgs provider, mach-nix will inherit the capabilities of the nixpkgs package.

Currently there is no support for selecting those variants.
It automatically prioritizes the nixpkgs candidate with the version most similar to the selected one. If multiple nixpkgs candidates have the same version, the one with the shortest attribute name will be picked.

Manually selecting these variants sounds like a good idea. I will try to add this feature soon.

In the meantime you could, for example, include an override via overridesPre which removes the normal Theano from the attribute set, or marks it broken or disabled. Or set’s its source/pname/version to some horrendous value so mach-nix has no chance to identify the package properly.

Of course if you do that, you will probably also break TheanoWithCuda since it inherits from Theano. Therefore you’d need to fix the earlier messed up values again with another override for TheanoWithCuda.

1 Like

Released 3.0.1 (21 Oct 2020)

bugfixes, return missing packages

Fixes

  • Some sdist packages were missing from the dependency DB due to a corrupt index in the SQL DB used by the crawler.
  • When automatically fixing circular deps, removed deps could trigger a No matching distribution found error in higher level parent packages. Now --no-dependencies is set recursively for all parents of removed deps.
  • Mapping out the resulting dependency DAG to a tree for printing could exhaust the systems resources, due to complexity. Now, when printing dependencies, sub-trees are trimmed and marked via (…) if they have already been printed earlier.

Improvements

  • optimized autoPatchelfHook for faster processing of large wheel packages (see upstream PR)
  • networkx is now used for dealing with some graph related problems
2 Likes

Released 3.0.2 (27 Oct 2020)

bugfixes

Fixes

  • fixed “\u characters in JSON strings are currently not supported” error, triggered by some packages using unicode characters in their file names
  • mach-nix cmdline tool didn’t use specified python version
  • wheel provider was broken for MacOS resulting in 0 available packages
  • several issues triggering infinite recursions

I’m excited to announce that conda support is just around the corner. It will probably be released in form of a beta version very soon.

2 Likes