Another good way to package Python packages from PyPI

Hi guys, I’ve been working on this for a while now,
would love to hear your thoughts

Some of us know that python packaging is hard:

For packaging Python we currently have some options:

There is a third, it helps you create a Python environment with dependencies from https://pypi.org/

It works like this:

  1. You start with a list of requirements you want to install:

    # /path/to/requirements.yaml
    aioextensions: "*"
    Django: ">3.2"
    
  2. You use Makes for generating all the information required to package the dependencies on Nix.

    The process is automatic, just execute:

    $ m github:fluidattacks/makes@21.09 /utils/makePythonPypiEnvironmentSources \
      "${python_version}" \
      /path/to/requirements.yaml \
      /path/to/sources.yaml  # This file will be generated
    

    The generated file is a bunch of links and hashes:

    $ cat /path/to/sources.yaml
          
       links:
         - name: Django-3.2.6-py3-none-any.whl
           sha256: 04qzllkmyl0g2fgdab55r7hv3vqswfdv32p77cgjj3ma54sl34kz
           url: https://pypi.org/packages/py3/D/Django/Django-3.2.6-py3-none-any.whl
       ...
    
  3. Then use in your project:

    # /path/to/default.nix
    let makes = import "${builtins.fetchGit {
      url = "https://github.com/fluidattacks/makes";
      rev = "7a2b256168a7a5b58cf79d9383e5b463dae9c5e5";
    }}/src/args/agnostic.nix";
    
    in makes.makePythonPypiEnvironment {
      name = "example";
      sourcesYaml = /path/to/sources.yaml;
    }
    
  4. nix-build

  5. source the output and now your shell has available the specified packages!

Good things:

  • works on Linux and MacOS
  • every dependency installer is fetched and cached separately (fixed-output derivations)
  • it’s secure against supply-chain-attacks (hashes everywhere)
  • but you never compute hashes manually
  • works with --option sandbox true
  • you can specify any dependency version of the packages you like
  • you can use packages that are not yet on nixpkgs
  • the generator script checks for dependency conflicts
  • the environment is fully pinned and stable, even if you start with lax selectors like pkg==* or pkg>=1.0

So that’s it, we are using this in production at app.fluidattacks.com (see an example here)

Would be nice to hear your thoughts! bye

3 Likes

There is another option: GitHub - DavHau/mach-nix: Create highly reproducible python environments which seems to be pretty good (but haven’t had to use personally, left a python job before I was aware of it).

Also, you can use nix-template to ease some of the one-off packages:

$ nix-template python -u https://pypi.org/project/libagent/ --stdout
Determining latest release for libagent
{ lib, buildPythonPackage, fetchPypi }:

buildPythonPackage rec {
  pname = "libagent";
  version = "0.14.2";

  src = fetchPypi {
    inherit pname version;
    sha256 = "62aae671df342923475323cf0677bfcef796cc48e6989039a20f29c8e4a9e5b6";
  };

  propagatedBuildInputs = [ ];

  pythonImportsCheck = [ "libagent" ];

  meta = with lib; {
    description = "Using hardware wallets as SSH/GPG agent";
    homepage = "http://github.com/romanz/trezor-agent";
    license = licenses.CHANGE;
    maintainers = with maintainers; [ jonringer ];
  };
}
3 Likes

Mach-nix is working very well to me. I tried Poetry2-nix too and it worked for me but, unfortunately, it’s not using wheels anymore even with preferWheel = true.

@kamadorueda I needed to do a modification to nix-build work:

let
  makes = import "${builtins.fetchGit {
  url = "https://github.com/fluidattacks/makes";
  ref = "refs/tags/21.09";
}}/src/args/agnostic.nix" {};
in
makes.makePythonPypiEnvironment {
  name = "example";
  sourcesYaml = ./sources.yaml;
}

But with this requirements.yaml:

Cython: "*"
matplotlib: ">=3.2.2"
numpy: ">=1.18.5"
opencv-python: ">=4.1.2"
Pillow: "*"
PyYAML: ">=5.3.1"
scipy: ">=1.4.1"
tensorboard: ">=1.5"
torch: "==1.7.0"
torchvision: "==0.8.1"
tqdm: ">=4.41.0"
seaborn: ">=0.11.0"
pandas: "*"
thop: "*"
pycocotools: "==2.0"

And creating sources.yaml this way:

m github:fluidattacks/makes@21.09 /utils/makePythonPypiEnvironmentSources "3.8" $PWD/requirements.yaml $PWD/sources.yaml

I’m having this problem:

❯ nix-build                                                                                                          
error: hash mismatch in file downloaded from 'https://files.pythonhosted.org/packages/1f/bb/5d3246097ab77fa083a61bd8d3d527b7ae063c7d8e8671b1cf8c4ec10cbe/colorama-0.4.4.tar.gz':
         specified: sha256:16w62sm95hmh55rqxn4zwdz0bkh3fqm1qnz9cwi3s510iasb4har
         got:       sha256:05kc902fcqc4xpzj9ph08ia52dzyc9rpdnn855syy7i3fc4fdxc3
(use '--show-trace' to show detailed location information)

I just checked this by downloading the file outside of Nix and computing the hash.
specified: 16w62... is correct.

This is how I was able to pack your dependencies, I added some flags to nix-build to reduce the tarballs cache ttl so hopefully your hash mismatch goes away. In my machine I don’t get the hash mistmatch

Full Github Gist here

Let me know if it works for you

Thanks for trying the tool and the feedback!


Thanks @kamadorueda, but now I’m having this problem:

❯ nix-build --show-trace --option tarball-ttl 1 --option narinfo-cache-negative-ttl 1 --option narinfo-cache-positive-ttl 1 environment.nix
fatal: couldn't find remote ref refs/heads/master
error: program 'git' failed with exit code 128

       … while fetching the input 'git+https://github.com/fluidattacks/makes?rev=801523692d3e09c3f95884ad004ad5786c4f3368'

I’m using NixOS-Unstable with flakes activated.

that one is sad, yeah
for some reason builtins.fetchgit uses git from the OS (it’s not selfcontained)

this is more portable:

  makesSrc = nixpkgs.fetchzip {
    url = "https://github.com/fluidattacks/makes/archive/801523692d3e09c3f95884ad004ad5786c4f3368.tar.gz";
    sha256 = "0xflpvwpz8l67wzlvm5xz6vp8gbbcbkgwpi8q8z4mbmr1wzp0kh6";
  };

Probably the thing you are trying to use a wheel for doesn’t have a wheel available? Probably due to using a newer version of Python.

As the option name implies it’s about preference of a wheel, but if a compatible one can’t be found we’ll still build from source.

Our wheel tests are still passing so I suspect it’s just a misunderstanding of what the option does.

Poetry2nix is way more focused on 100% correctness than most other tooling in the space is.

Hi @adisbladis with this pyproject.toml:

[tool.poetry]
name = "inv_stats"
version = "0.1.0"
description = ""
authors = ["brogos <brogos@gmail.com>"]

[tool.poetry.dependencies]
python = "^3.8"
duckduckpy = "^0.2"
python-whois = "^0.7.3"
lxml = "^4.6.3"
requests = "^2.25.1"
geoip2 = "^4.1.0"
pandas = "^1.2.4"


[tool.poetry.dev-dependencies]
ipython = "^7.23.1"
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

and shell.nix:

{ pkgs ? import <nixpkgs> {} }:
let
  myAppEnv = pkgs.poetry2nix.mkPoetryEnv {
    projectDir = ./.;
    preferWheels = true;
  };
in myAppEnv.env

It compiles numpy, cython and pandas.

I’m having that problem with hash mismatch again:

❯ nix-build --option tarball-ttl 1 --option narinfo-cache-negative-ttl 1 --option narinfo-cache-positive-ttl 1 environment.nix 
error: hash mismatch in file downloaded from 'https://files.pythonhosted.org/packages/ec/30/8707699ea6e1c1cbe79c37e91f5b06a6266de24f699a5e19b8c0a63c4b65/Cython-0.29.24-py2.py3-none-any.whl':
         specified: sha256:11c3fwfhaby3xpd24rdlwjdp1y1ahz9arai3754awp0b2bq12r7r
         got:       sha256:18c7r4nb3j8ymcrylf6hg0nlsg7a4ybckwm644ksb597gw8mrfpn
(use '--show-trace' to show detailed location information)

yeah, it that pyproject.toml poetry2nix should pick any of these wheels:

pandas-1.3.2-cp38-cp38-macosx_10_9_x86_64.whl
pandas-1.3.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
pandas-1.3.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
pandas-1.3.2-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl

maybe poetry2nix only picks generic wheels? (xxx-py3-none-any.whl)


I’m gonna check the hash mismatches, 11c3f2.. (specified) is the correct one this time, too
I’ll let you know what I find

I’m not able to reproduce the bug on my machine, the CI/CD linux/macos machines, or other devs machines

Reading around the following may help:

  1. rm -rf ~/.cache/nix
  2. nix-build --option tarbal-ttl 1
  3. builtins.fetchurl -> nixpkgs.fetchurl

for (3) I made a small modification to the framework:

  makesSrc = nixpkgs.fetchzip {
    url = "https://github.com/fluidattacks/makes/archive/1f535fdedafce35a339ae0ac8baffb8ba3c689db.tar.gz";
    sha256 = "0f88sxrvbzl75kvm8d3xsii96cs9vaia037vpwaz1xqhkscy1snf";
  };

let me know if it worked

even if it works on our machines today I don’t want this bug to appear later,
so I really want to find a solution that works for all of us including you

thanks!

Thanks @kamadorueda! It created the new env. But there is two other problems:

  1. It not installed numpy, see:
$ python
Python 3.8.11 (default, Jun 28 2021, 10:57:31) 
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'numpy'

Numpy is not listed in site-packages:

absl                                future                                     matplotlib                        pyasn1-0.4.8.dist-info             rsa                                      torch
absl_py-0.13.0.dist-info            future-0.18.2.dist-info                    matplotlib-3.4.3.dist-info        pyasn1_modules                     rsa-4.7.2.dist-info                      torch-1.7.0.dist-info
cachetools                          google                                     matplotlib-3.4.3-py3.8-nspkg.pth  pyasn1_modules-0.2.8.dist-info     scipy                                    torchvision
cachetools-4.2.2.dist-info          google_auth-1.35.0.dist-info               mpl_toolkits                      __pycache__                        scipy-1.7.1.dist-info                    torchvision-0.8.1.dist-info
caffe2                              google_auth-1.35.0-py3.9-nspkg.pth         oauthlib                          pycocotools                        scipy.libs                               torchvision.libs
certifi                             google_auth_oauthlib                       oauthlib-3.1.1.dist-info          pycocotools-2.0.0.dist-info        seaborn                                  tqdm
certifi-2021.5.30.dist-info         google_auth_oauthlib-0.4.5.dist-info       opencv_python-4.5.3.56.dist-info  pylab.py                           seaborn-0.11.2.dist-info                 tqdm-4.62.1.dist-info
charset_normalizer                  grpc                                       opencv_python.libs                pyparsing-2.4.7.dist-info          six-1.16.0.dist-info                     typing_extensions-3.10.0.0.dist-info
charset_normalizer-2.0.4.dist-info  grpcio-1.39.0.dist-info                    pandas                            pyparsing.py                       six.py                                   typing_extensions.py
colorama                            idna                                       pandas-1.3.2.dist-info            python_dateutil-2.8.2.dist-info    tensorboard                              urllib3
colorama-0.4.4.dist-info            idna-3.2.dist-info                         past                              pytz                               tensorboard-2.6.0.dist-info              urllib3-1.26.6.dist-info
cv2                                 kiwisolver-1.3.1.dist-info                 PIL                               pytz-2021.1.dist-info              tensorboard_data_server                  werkzeug
cycler-0.10.0.dist-info             kiwisolver.cpython-38-x86_64-linux-gnu.so  Pillow-8.3.1.dist-info            PyYAML-5.4.1.dist-info             tensorboard_data_server-0.6.1.dist-info  Werkzeug-2.0.1.dist-info
cycler.py                           libfuturize                                Pillow.libs                       requests                           tensorboard_plugin_wit                   _yaml
dataclasses-0.6.dist-info           libpasteurize                              protobuf-3.17.3.dist-info         requests-2.26.0.dist-info          tensorboard_plugin_wit-1.8.0.dist-info   yaml
dataclasses.py                      markdown                                   protobuf-3.17.3-py3.8-nspkg.pth   requests_oauthlib                  thop
dateutil                            Markdown-3.3.4.dist-info                   pyasn1                            requests_oauthlib-1.3.0.dist-info  thop-0.0.31.post2005241907.dist-info
  1. pytorch needs libstdc++.so.6. I tried to add nixpkgs.stdenv.cc.cc.lib to searchPaths.bin but it not worked.

Awesome!

I made a modification so numpy and libstd++.so.6 are propagated to the final environment

I can now import torch and numpy, can you?:

let
  makesSrc = nixpkgs.fetchzip {
    url = "https://github.com/fluidattacks/makes/archive/a2271af3b65e817d66b4e2e9a766ad2c3a0c6d49.tar.gz";
    sha256 = "04zbjiv42p444xhfpvzqmzymchwzcrnpd9svhnvkrgzwwvykinqs";
  };
  makes = import "${makesSrc}/src/args/agnostic.nix" { };
  nixpkgs = import <nixpkgs> { };
in
makes.makePythonPypiEnvironment {
  name = "example";
  searchPaths = {
    bin = [ nixpkgs.gcc ];
    rpath = [ nixpkgs.gcc.cc.lib ];
  };
  sourcesYaml = ./sources.yaml;
  withCython_0_29_24 = true;
  withNumpy_1_21_2 = true;
  withWheel_0_37_0 = true;
}

Thanks for the bug report and helping me improve the thing!

1 Like

Thanks @kamadorueda now it’s working!

@adisbladis I submitted a bug report to poetry2nix Poetry2nix not using wheels even with `preferWheels = true` · Issue #362 · nix-community/poetry2nix · GitHub . I tested with python 3.7, 3.8 and 3.9 and Cython and Numpy have wheels.

BTW, as we have so many great tools for managing Python packages, could we make top-level/python-packages.nix smaller?
Most of the packages there are actually dependencies of tensorflow, ceph, searx, calibre, … which could be vendored.

Currently, it is difficult to upgrade tensorflow without causing mass-rebuild, because it requires newer pytest or even wheel than one in top-level/python-packages.nix.
The packages attrset is already has override parameter, which big python apps like tensorflow could use to upgrade and add packages. This way, top-level/python-packages.nix could be made smalled and cleaner by turning python+deps in tensorflow, ceph, calibre … into environments with requirement file and pinned versions in lock-file, and keeping in top-level/python-packages.nix only the most essential packages

@volth https://github.com/kamadorueda/nixpkgs-python :smiley:

1 Like