RFC on setup.nix Python development tool

Hi! My team has been using Nix for Python development for a few years already, but I have kind of failed to find and connect with this greater Nix user community for feedback and ideas…

My use case for Nix in brief is

  • reproducible development environments for developers to develop
  • reproducible CI environment for CI to test, build and release
  • Docker-based deployment with Nix-built docker images (no Nix on servers yet)

The main challenge from the very beginning has been, of course, Python packaging. While Python support on nixpkgs has been getting better and better, the main issue remains: Python in Nixpkgs is designed for Python users. When building software with Python, we need layer on top of Nixpkgs to pin specific versions and add missing or private packages into the mix (and I’m fine with this).

I started with writing everything by hand on top of Nixpkgs and building Docker images with custom approach. Then came various Python to Nix -generators and Nixpkgs dockerTools.

The last time I looked into Python to Nix -generators, pypi2nix seemed the most popular, but it also seemed to duplicate work by defining packages independently from the nixpkgs. I really wanted to be able to re-use nixpkgs’ well maintained rules for most of the Python packages.

Finally, from NixCon 2017 reports I learned about pip2nix, which did exactly what I wanted: it resolved Python packages from any pip-supported sources resulting an overlay on top of nixpkgs pythonPackages.

Then I wanted to abstract as much of the boilerplate as possible into a centrally managed versioned location, resulting in GitHub - datakurre/setup.nix: Nixpkgs based build tools for declarative Python packages [maintainer=@datakurre]

Here’s my workflow, I’ve been designing setup.nix for:

  • each project has requirements.txt defining ALL development, build, test and runtime requiements fo a project (or environment)

  • most projects have setup.cfg to define the project as a Python package with only real build, test and runtime requirements and console scripts (here setup.py is a just wrapper for executing setuptools to read package configuration from setup.cfg)

  • pip2nix is used to turn requirements.txt into requirements.nix from PyPI or from private repository

  • finally, every project has setup.nix-file, which usually just calls GitHub - datakurre/setup.nix: Nixpkgs based build tools for declarative Python packages [maintainer=@datakurre] with current pkgs, pythonPackages, src and possible overrides (to be applied on top of requirements.nix).

So far so good. Because naming is hard, I named my wrapper setup.nix after setup.py and followed that path to make it provide targets similar to some setup.py commands (develop, bdist_wheel), but also some extra commands. My usual use cases are:

  • nix-shell setup.nix -A develop to give me nix-shell with all packages in requirements.nix and Python package in development mode (re-using nixpkgs Python package development shell)

  • nix-build setup.nix -A env to build a Python environment with everything from requirements.nix so that I can configure that as the project interpreter in PyCharm

  • nix-build setup.nix -A build to build the evaluated version of the project

  • nix-build setup.nix -A bdist_wheel to build a Python wheel of the project

  • nix-build setup.nix -A bdist_docker to build a dockerTools based image from the project

Then, time adds complexity.

Raw pip2nix requirements.nix overlay simply replaces nixpkgs packages with new ones, but setup.nix uses existing nixpkgs’ when available by updating their name and src with pip2nix generated data.

When package is not yet defined in nixpkgs, setup.nix cannot guess build dependencies for python packages and I still end up adding a lot of trivial (native) build dependencies such as pytest-runner, setuptools-scm, unzip, etc… for a lot of packages.

Finally, I have a use case with upstream packages with circular dependencies, which I handle by building packages without dependencies at all - assuming they are only used together anyway.

I guess my current main issues are the two last ones: Should I follow pypi2nix practice of adding usual build inputs for all packages whether they needed them or not? And is there already some “wheelhouse” style approaches for packaging Python products without making every package its own derivation at first?

10 Likes

I’m just a newbie on Nix(OS) so I’ll not be able to answer your questions, I just want to add my (newbie’s) thinking on the matter. Until now I’ve only deployed a small application based on Django whose dependencies I’ve compiled by hand, reusing nixpkgs packages when possible, with a bit of customization done overriding the src and other attributres and defining new packages for those that aren’t in nixpkgs.

For the next applications that I have to deploy I’ve tried your setup.nix and I found it effective. At the same time I’ve tried pypi2nix with its off-nixpkgs package tree. I’ve suspended any further development to take some time to try and understand which is the best approach on packages. Is an approach like the one of pypi2nix inevitable? Rok Garbas expressed briefly this opinion on IRC, on the other hand I think it would be nice to maintain the development in nixpkgs if possible, with a broader community effort.(?)

2 Likes

Would you mind pasting Rok’s opinion here? I was unable to find it.

I’ve been thinking the same issue and I don’t have final opinion on that yet. Currently building on top of nixpkgs has been working on me. Of course, sometimes the difference between the package version I need differs too much from the version maintained in nixpkgs and it is not always trivial to fix the build rule.

Ideally we have a tool for automatically generating the package set in Nixpkgs, and a way for users to generate a set on top of that, or entirely separate. pypi2nix is a good tool, however, because it uses pip to resolve dependencies it won’t work for Nixpkgs because of typically overstrict constraints in Python packages. As long as there is no curated set of packages for Python quite some manual work will be needed.

1 Like

pip2nix also depends on pip for resolving packages, but that also allows it to resolve packages from multiple sources and also from private repositories (with authentication). Sure it is not the ideal solution, but I can live with is as a compromise as long as it results in something that works with just nix (currently does).

I’m not holding my breath for a tool that can fully automatically generate package sets for nix, for any possible package versions.

I’m very very happy how well you manage to maintain nixpkgs Python packages. Yet, in practise I need to pin package versions in our projects, and those versions rarely completely matches the versions in nixpkgs, and something custom is required.

For a real possible example, if I would like to package and maintain Plone distribution within nixpkgs, how should I do that properly? Supported distribution requires specific package versions and I’m not sure if it makes sense to package every dependency separately? Or does it? Have you ever tried packaging “package set” as a single nix derivation and would you have anything to share about that?

Yet, in practise I need to pin package versions in our projects, and those versions rarely completely matches the versions in nixpkgs, and something custom is required.

And this is where it gets tricky. By e.g. overriding the Nixpkgs package set it may suddenly start breaking other packages, so that’s not very desirable. On the other hand, by generating the whole set yourself, you may have to repeat fixing things and have lower test coverage.

What is the reason you need to pin to different versions? To match the environment of co-workers? Or simply because the code was developed for those versions?

Supported distribution requires specific package versions and I’m not sure if it makes sense to package every dependency separately? Or does it? Have you ever tried packaging “package set” as a single nix derivation and would you have anything to share about that?

I never tried that.

I can do better :slight_smile:, see the log of #nixos around midnight for the 6th of March.

I agree, maybe the problem comes from the fact that there can be multiple versions of the same package in PyPi (with one wheel per version-interpreter[-platform] combination) but this isn’t reflected on the packages available in nixpkgs.

Agreed. Neither direction is perfect. Although, the three years we have been doing this (building on top of nixpkgs) has been fine.

The latter. To avoid surprises. When we start a new project, we want to build on top of the latest stable nixpkgs (NixOS) release, but still use the latest Python package versions (or sometimes specific versions, because of dependencies). Yet, once the project has been release, we do want to get the possible security updates for underlying Python and libraries via updating nixpkgs, but don’t want sudden (often breaking) updates to Python packages (unless security issues in them).

For example, most of our projects have been depending on aiohttp and that package (and its ecosystem) has gone through a quite a lot of changes in three years.

1 Like

Good to know I haven’t missed the possible helpers for that when they don’t yet exist :slight_smile:

To be honest, my only use case where that could really make difference is Plone with hundreds of Python packages. Packaging every package separately seems to add a few seconds into startup time (compared to non-nix install).

Thanks. I believe that with setup.nix you will not be as “blindly merging” two separate python paths as Rok assumes, because setup.nix tries to align those paths, and I’ve not seen “conflicting paths” for some time now.

(I’ve been laterly iterating with Plone once again and only conflicts were real ones, not caused nix. I’ve been telling myself that if setup.nix can build Plone, it should be able to build anything :wink: )

Well, actually, this is still not perfect, but a recent example would help:

There I still need to add “pyopenssl” explicitly into requirements.txt (needed by coveralls on Python 2) to make setup.nix to discover it from requirements-python2.nix and replace colliding dependencies with versions from requirements-python2.nix.

Together with cachix I get reasonable build times also at Travis Travis CI

I’m in a slightly different position maintaining an app deployed on Heroku that needs to install its dependencies via requirements.txt in production, test, and (at least until I complete a compelling nix alternative) Vagrantfile based development setups.

I’ve enjoyed developing it on Nix, but I’m also not keen on having to duplicate all dependency updates, and knowing that over time mistakes will inevitably be made because there’s no longer a single source of truth.

I think when I first built the requirements.nix I tried using pip2nix first but found it wasn’t handling something well; the one I’m using now was generated by pypi2nix but it took quite a bit of troubleshooting and manual modification to get that one working as well.

I’ve been thinking this might all be more maintainable if the Nix copy was the source of truth, and we had a good toolchain for generating requirements.txt files in a well-controlled way (e.g., we have a requirements_production.txt, requirements_common.txt, requirements_dev.txt, and regular requirements.txt which get composed to build dev, test, and production dependencies).

I’ve been meaning to start a thread on this topic, but other small tyrannies have been demanding time. :slight_smile:

If you could create a dummy public project with similar dependencies, I could make a setup.nix based version of it for discussion purposes.

I believe that we could get as close to the simple source of truth as possible considering that heroku still uses requirements.txt

I’ll try to carve out a little time this week. Feel free to nag me with mentions if it doesn’t show up!

I’m not certain it’s meaningful here, but your argument syntax reminded me of another thought I’ve had regarding development environments. The impact isn’t huge so it’s pretty far down my list.

Sometimes I’ll have a language/environment-specific tool (pipdeptree is the best example I have off the top of my head) that I’d like to have available any time I’m working on a Python project whether or not that project includes it in its dependencies.

For now I just put these in system/user packages, but it might be nice to list environment-specific packages in the system/user config that will be included or excluded based on how the environment/shell is loaded (args to control whether it loads unmodified environment, environment+utils, pure environment, pure environment+utils) without manually peppering different shell/requirements.nix files with unrelated tool dependencies.

pypi2nix is a good tool, however, because it uses pip to resolve dependencies it won’t work for Nixpkgs because of typically overstrict constraints in Python packages.

@FRidh thank you for this response. Can you say more about why pip won’t work as a source resolver for python packages in nixpkgs?

1 Like

Often times in setup.py files are overly restrictive such as requests==2.1.3 or requests>2.* for example. But when you run the test suite these restrictions are not necessary. This is possibly what he means?

Also shameless plug. I have been developing a tool GitHub - nix-community/nixpkgs-pytools: Tools for removing the tedious nature of creating nixpkgs derivations [maintainer=@costrouc] which can now auto add a python package to nixpkgs (requires a little user input).

4 Likes

I would also add the common (?) case when the package name from Pip is not the same as the name in pythonPackages. Example: scikit-learn (pypi) vs. sklearn (import statements) vs. scikitlearn (pythonPackages).

@costrouc Thanks for sharing the tool! Does it support recursive behavior when a dependency is on a package which is also not in nixpkgs? Can it find common issues with the tests (like hardcoded paths or test-only data files)? I have a few packages (mainly data science-related such as shap) I have added to our environment which I don’t have time to submit to nixpkgs due to disabled tests.

1 Like

Yes package name normalization is a big issue. I would like to fix this and I know others would like to as well. The blocker is that the nix formatters are too specific and we need something that will allow renaming all the python package names in an automated way.

The tool I wrote does not automate recursive dependencies (the main blocker is the issue above). If you use the tool you will find that it is actually quite quick to package 5-20 new packages in less than an hour. I am working on the tool to make this easier and easier. If the package has all python dependencies the tool creates a derovation that is nearly complete.

1 Like

Aside from overly restrictive requirements there is also the issue of state. In order to resolve using pip we need the PyPI. We need to resolve for multiple platform/python combinations and that means multiple runs. While there are valid reasons for having differences, you will also capture differences due to a change of PyPI state (package “foo” got updated in the meantime).

Anyway, overly restrictive requirements is a far bigger issue that completely makes it impossible. I’ve still been wanting to set something up to fetch all requirements from all the packages on PyPI and put them somewhere in a file or repo. Then use e.g. conda’s SAT solver to resolve, using that file, a list of requirements, and with the possibility of passing overrides.

Yes, that’s unfortunate. It’s the reason why new packages should use the “normalized” name. Unfortunately, that still allows for names to start with e.g. a number. The only solution here is what pypi2nix does, and that is recording names as strings:

{
  "foo" = buildPythonPackage ...
}

By the way, there is a discussion on Python’s Discourse on a standardized lock file. There have been attempts to write pipenv and poetry to Nix converters, but there are some issues with the lock files they generate. I wrote on the discussion what I think we need.

1 Like

I’m currently revisiting the internals of my toolchain and I’m having encouraging results with the following approach:

  • I use pip (my WIP pip2nix version) to resolve working project specific package set that almost always has somewhat different versions than the current nixpkgs. (Years ago I choosed pip2nix over pypi2nix, because I needed to support private repos.)

  • I choose none-any-wheels when possible to simplify and speed up building the package set.

  • I use packageOverrides to merge my custom Python packages set with the current nixpkgs package set.

  • At packageOverrides, for each custom package, I reuse the existing nixpkgs definition with new src, format and merged inputs (using overridePythonAttrs).

  • Finally, I still need to manually solve packages that have different naming in nixpkgs and in my custom package set, and clear patches and postPatch from packages where they break the build with my package version.

This approach seems to merge my project specific packages and versions properly with nixpkgs without conflicts.

Currently this looks too good to be true…

2 Likes