Several comments about priorities and (new) policies in the Python ecosystem

So lately I’ve found myself surprised and perplexed by a few changes the Python ecosystem has gone through, and I’d like to discuss them with more people (besides those that commented on some PRs).

build-system v.s nativeBuildInputs

What information / accuracy do we get from this division? What was wrong about having only nativeBuildInputs? I do appreciate the idea of making nativeBuildInputs provide only executable that don’t pollute PYTHONPATH, but if that’s not ready yet, why bother adding the build-system attribute?

dependencies v.s propagatedBuildInputs

Hmm OK, but that’s just a semantic change, so why bother? The only place I found dependencies being evaluated is here:

Even if the plan is to eventually treat these lists differently, treating them differently shouldn’t break packages (at least in the beginning) and requiring people to change their expressions should be done after the change is apparent.

Putting dependencies both in dependencies and in build-system

This topic is related to this review comment, which from some reason links to this tracking issue: Python: Remove test & runtime dependencies from build time closure tracking issue · Issue #272178 · NixOS/nixpkgs · GitHub .

The general questions / considerations that guide me towards this discussion are:

  • Why would an upstream package author put the same dependency both in setup_requires and install_requires?
  • If an upstream author doesn’t do it, but they import one of the install_requires dependencies in setup.py, is that a good reason to add a dependency to both build-system and dependencies list?
  • What if upstream imports the dependency in their setup.py in order to get dynamically the include/ directory in which some headers of the dependency are available? In that case we have to patch the package to support enabling us to specify manually the include/ directory of the hostPlatform’s package - and in that case again we don’t need to put the same package in both dependencies and build-systsem.
  • What if upstream uses an executable from a package that also provides headers? An example is scipy that uses the f2py executable from numpy, and we also support cross compiling it. by specifying the headers of the host numpy, and thankfully what the build platform’s f2py generates is compatible with host platform’s numpy.
  • Is the (seemingly semantic) distinction between build-system and nativeBuildInputs supposed to help distinguish between the above 2 scenarios?

How to deal with multiple versions of packages?

So the numpy 2.x release came a few months ago, and although not many backwards incompatible changes were apparent, many packages have probably gotten broken simply due to version constraints in setup.py and pyproject.toml. Hence the attributes numpy_2 and numpy_1 == numpy were introduced, and now it’s an open question how to deal with propagation of numpy == numpy_1 due to dependent packages.

I raised this issue on my (currently draft) attempt to add a few Python packages that support only numpy_2, which raises a clash between these two principles:

  1. Spawning a python3.withPackages (ps: [ ps.XXX ]) interpreter, must be able to import XXX - i.e not raise an ImportError.
  2. Using package XXX that requires either the dependency YYY_1 or YYY_2 should not require you to use packageOverrides and rebuilding your set of python3Packages.

Another interesting case to compare is qtpy, which is a compatibility layer between 4 different Python Qt packages. Spawning a python3.withPackages (ps: [ ps.qtpy ]) interpreter hence always fails with an ImportError, but that’s of course is the point of the package - you should be able to choose a Python Qt implementation and add it to your ps: [] list.

In comparison, scipy explicitly states that it can be built against numpy 2.x and it does specify “numpy” in their install_requires, so that pip users could uninstall numpy 2.x that is installed after they pip install scipy and then pip install numpy==1.26. However, with Nix, we don’t have that privilege currently.

These two examples raise the question: Should install_requires be imitated in the dependencies attributes blindly? What if a package doesn’t support a new version of a dependency, and this is not even explicitly described with a version constraint - should Nixpkgs try to fix that elegantly to make users avoid as much as possible rebuilding attributes?

How long to wait for packages to support new versions of their dependencies?

Somewhat related to the above, this general question raises another priorities competition:

  1. Always put the latest version of all packages as the default version of the original attribute
  2. Make sure that as many leaf Python packages as possible build.

In an attempt to update quantities: 0.15.0 -> 0.16.0, we noticed the upstream maintainer of a dependent package is aware that their package doesn’t support the latest version of the dependency. That upstream issue is 2 weeks old. Hence I ask:

  • For how long we’ll wait for them?
  • What if someone really wants to use the latest quantities - how many rebuilds should they suffer because some other leaf package they don’t care about requires an old version of quantities?
10 Likes

I am no authority over the python package set, this is only how i understand it.

build-system v.s nativeBuildInputs

build-system mirrors the entry in pyproject.toml and should not include non-python dependencies. build-system is added to passthru, while nativeBuildInputs are not.

dependencies v.s propagatedBuildInputs

Same as above, dependencies is also added to passthru, and this distinction will help facilitate splitting python builds into a (1) wheel build derivation and a (2) wheel unpack, install and check derivation, solving most issues tied to circular dependencies and reduce the frequency of mass-rebuilds.
Once this is in place we can finally also stop propagating the deps for buildPythonApplication packages outside python3Packages while retaining the ability to use propagatedBuildInputs.

How to deal with multiple versions of packages?

python3Packages must be globally consistent, this is a limitation of python. If you want to use numpy 2, then you will have to override numpy.

I am in favor of adding a couple of overrides to the release set so that expensive packages like scipy and torch get built with numpy2 on hydra.
Packages like torchWithRocm exist just for cache and nixpkgs-review purposes, but this pattern really should be reconsidered IMO.

I am also in favor of adding the alias python3PackagesWithNumpy2 which is disabled if config.allowAliases=false, to improve the UX tied to this transition.

For packages that specify numpy 2 as a build-system dep while still supporting numpy 1, see python3Packages.relaxBuildSystemRequiresHook: init by natsukium · Pull Request #331315 · NixOS/nixpkgs · GitHub
If a package only supports numpy 2, then it should be disabled if lib.versionOlder numpy.version "2".

How long to wait for packages to support new versions of their dependencies?

In general it has been python-updates that push through breakages, and those are well times w.r.t. nixos stable releases.
buildPythonApplication packages are free to downgrade their inputs, conflicts within the python3Packages scope must consider breakages on a package-by-package basis.

2 Likes

Seems to be underway already.

2 Likes

That proposal is being opposed by the de facto Python maintainers.

My mind skipped a section

Putting dependencies both in dependencies and in build-system

  • If an upstream author doesn’t do it, but they import one of the install_requires dependencies in setup.py, is that a good reason to add a dependency to both build-system and dependencies list?

For most cases, yes, as it is required for cross.
nativeBuildInputs are added to PATH while buildInputs are used for includes and linking.
Likewise, build-system dependencies will be compiled for the host platform and are thus importable during cross builds but are not propagated nor added to $out/bin wrappers. dependencies inputs are not neccesarily importable at build time, but do get propagated and added to wrappers.

The following edge-cases you list however are tricky and strikes me as patch-territory.
That is, if the package does not manage to make use of the headers from the host version of the dep while linking to the target version of the dep.

1 Like

How in your opinion the migration should be carry out? To me it makes sense that we start by preparing the ecosystem, the inert and distributed ball of entangled stuff, and then do an atomic switch.

Side note: imo it’s not so much about the PYTHONPATH as about creating a sparser and flatter dependency graph and not forcing so many false-positive rebuilds.

One for build platform one for host platform to be used at runtime, although in case of python packages more likely just by mistake.

numpy2

I’ll reserve the right to comment on numpy2 later, but I’ll mention that I think we shouldn’t just accept the narrow compatibility constraints of the python interpreter and the excessive rebuilds, that we need a more granular control over python import system’s cache, and that the links in Else Someone: "@corbin@defcon.social @stargirl@hachyderm.io Nice…" - Mastodon are probably relevant

Interesting. Could you elaborate what exactly is being opposed and maybe attach references? I believe I genuinely fell out of the loop

1 Like

See the comments on python3PackagesWithNumpy2: init by dotlambda · Pull Request #339657 · NixOS/nixpkgs · GitHub.

1 Like

Off the top of my head:

  • Another package set combination that we need to build and maintain
  • Lots of noise (newly failing packages) because packages want either numpy 1 or 2
  • We’ll eventually end up with lots of pins to that package set, for which clean up will likely fall to the package set maintainers
  • No data on whether we can make the switch, and researching it is somewhat costly, comparable with python-updates

I proposed sticking with numpy 1 for 24.11 and evaluating the switch after branch-off.

3 Likes

I didn’t write my long post for you to repeat what is written clearly in the docs. I asked for someone to explain the motivation for creating this distinction.

That still can be done without the semantic rename of attributes, by treating propagatedBuildInputs exactly as you treat now dependencies.

Why? Who said so?

Again, this is not a “Help needed” thread, I want to discuss why do we design things this way.

And what do you think of the option of not propagating any numpy package with torch, and let the user choose either numpy_1 or numpy_2 ? That was suggested in python3Packages: don't propagate numpy_1 if also numpy_2 is supported by doronbehar · Pull Request #338334 · NixOS/nixpkgs · GitHub . Note also that also there the dependency on numpy is artificial - no shared objects are linked to numpy’s shared objects.

$ nix why-depends --all --precise nixpkgs#python312Packages.{torch.dev,numpy}
/nix/store/d427afxv9w8qcxx07az6crcfvmf3qmv7-python3.12-torch-2.3.1-dev
├───nix-support/propagated-build-inputs: …thon3.12-click-8.1.7 /nix/store/jc6k07xdz70lfwbfiffz9m2wn6ag3699-python3.12-numpy-1.26.4 /nix/st…
│   → /nix/store/jc6k07xdz70lfwbfiffz9m2wn6ag3699-python3.12-numpy-1.26.4
└───nix-support/propagated-build-inputs: …hon3.12-future-1.0.0 /nix/store/h91y6mxwmrb2qa71pjk315zsmzglgrlk-python3.12-tensorboard-2.17.0 /…
    → /nix/store/h91y6mxwmrb2qa71pjk315zsmzglgrlk-python3.12-tensorboard-2.17.0
    └───bin/.tensorboard-wrapped: …3.12/site-packages','/nix/store/jc6k07xdz70lfwbfiffz9m2wn6ag3699-python3.12-numpy-1.26.4/lib/pyt…
        bin/tensorboard: …':'}.PATH=${PATH/':''/nix/store/jc6k07xdz70lfwbfiffz9m2wn6ag3699-python3.12-numpy-1.26.4/bin'':'…
        nix-support/propagated-build-inputs: …hon3.12-markdown-3.6 /nix/store/jc6k07xdz70lfwbfiffz9m2wn6ag3699-python3.12-numpy-1.26.4 /nix/st…
        → /nix/store/jc6k07xdz70lfwbfiffz9m2wn6ag3699-python3.12-numpy-1.26.4
$ nix why-depends --all --precise nixpkgs#python312Packages.{torch.lib,numpy}
'flake:nixpkgs#python312Packages.torch.lib' does not depend on 'flake:nixpkgs#python312Packages.numpy'
$ nix why-depends --all --precise nixpkgs#python312Packages.{torch.out,numpy}
'flake:nixpkgs#python312Packages.torch.out' does not depend on 'flake:nixpkgs#python312Packages.numpy'

Thanks for the link, that seems like an excellent hook I support adding. However, that’s not really helpful for the discussion - as still a decision has to be made regarding whether it makes sense to put Python dependencies in both dependencies and in build-system.

What do you mean by “push through breakages”?

So you mean that we should strive to break packages only when nixos releases are released? If so, what makes a stable release more stable then nixos-unstable? :slight_smile: If I understood correctly, what is the point of using nixos-unstable if not getting the latest and greatest of all packages with the risk of leaf packages breaking?

You are again explaining to me the meaning of these attributes as if this is a “Help requested” thread, when it isn’t.

I think you meant “the build platform”.

I agree, and I gave this argument in this discussion and the Python maintainers ignored this option so it seems.

Hmm OK I understand. The docs could have communicated that better. Instead of the current two lines (quoted already), I’d write:

  • build-system ? []: Build-time only Python dependencies. Items listed in build-system.requires/setup_requires.
  • nativeBuildInputs ? []: Build-time only dependencies. Typically executables. Currently the two above are treated the same, but in the future we plan to not pollute the PYTHONPATH with derivations from nativeBuildInputs, and only from build-system.

How is that helpful to reduce false-positive rebuilds? Changed nativeBuildInputs will also trigger rebuilds…

Still waiting for that comment:).

That’s very optimistic and far fetched :slight_smile: I hope to focus in the near future on simpler solutions.

If I understood correctly what you mean as for “pins”, I don’t believe that many people will start adding buildPythonApplication based packages that rely on that package set. I see this package set as a middle ground for the near future, until we’ll distribute a single numpy == numpy_1.

I’m surprised by this part - isn’t it easy to get that list of breakages from Hydra, once it builds python3PackagesWithNumpy2, and compare that to the regular python312Packages?

I’m sorry for explaining rather than discussing, I slipped into the wrong mode there.

There is no semantic guarantee that propagatedBuildInputs dependencies can be omitted from the wheel build stage, dependencies however are explicit “install requirements”.

Torch itself may not depend on numpy, but 99% of data loaders people write for torch do.
This change would break downstream code where the dependency on numpy is implicit.
Interestingly, we already explicitly do not propagate pytest, and this is a recurring gotcha that has a 50% chance of being caught in review.

I prefer keeping a sound dependency graph, and rather solve version preferences through overrides. The ability to apply an overlay and then forget about it is one of the core strengths of nix and nixpkgs.
The issues tied to this however is (1) the UX and (2) the cache.

Addressing (1), this is why I proposed making python3PackagesWithNumpy2 an alias that is not recursed into.
This means the set is not built by hydra, nor will any other package in nixpkgs be able to reference and pin against it.
It is explicitly a neat UX kinda thing rather than a “we maintain and support this” deal.

Addressing (2), I voiced I’m in favor add some overrides to the “release set”, but this is worded somewhat vague.
To expand upon the idea, I want a separate a attribute set whose only purpose is only to be recursed into, but which disallows being referenced internally in nixpkgs much like aliases are.
torchWithRocm is not meant to be used in python3.withPackages as I understand it, it is rather a crude way to ensure torch is cached when config.supportRocm=true;.
I want to take that concept and run with it, adding override configurations that package maintainers explicitly commit to maintain.

This should completely address Several comments about priorities and (new) policies in the Python ecosystem - #8 by hexa

IMO it makes perfect sense to add a dependency to both, and hoped with my response to show reasons for why one might want to do so (cross), and why one would like to avoid doing so (patching out numpy>=2 build-system requirements).
setuptools is for example frequently added to both due to runtime use of its pkg_resources.

Single package bump PRs targeting master, that as a result break other packages with no apparent fix, have a tendency stall.
This is because the majority (i assume/project) do not feel they have the authority to merge a breaking change.
python-updates on the other hand (and likewise large PRs targeting staging now that I think about it) is a comparatively bigger hammer that inevitably end up bumping packages that then require following fixups.
This is known and taken into consideration when nearing a stable release.

I’m not saying this is the way it should be, it is rather the “solution” that has emerged.
I’m all for establishing a clear time window after which we explicitly allow breaking packages that do not keep up.

After release, not right before release (I assume you meant that, it was just slightly ambiguous).
And python-updates don’t just happen right after release, although that is usually the time they are the most visible.
The newest one was merged in august.

nixos-unstable as I understand it is a rolling but quite stable release, not an absolutely breaking-edge-above-all-else release.
The branch name nixos-unstable has frequently been noted to be misleading.

3 Likes

OK I got convinced in 1 thing: If, we had built wheels, and handle conflicts with dependencies at the stage where python.withPackages environments are created, and not at the build stage of packages, all of my suffering would have been spared - because choosing to use numpy_2 instead of numpy_1 would not require rebuilding heavy packages like Scipy which don’t depend directly on Numpy’s shared objects etc… That’s still very far in the future, and so I guess it is not that bad in that outlook to keep propagating a single numpy version, although both are supported, and although the propagated version is outdated.

Now I understand that all of the topics I raised above really sum up together to the correct policies the Python maintainers try to enforce. But still though, these could be communicated a bit better, and that can be handled in better documentation, which hopefully the Python maintainers will agree to improve.

I think the python.section.md file should explain the goals of long and on-going processes such as this, even if they are not complete, and explain the drawbacks of alternative workarounds to issues such as discussed here. This kind of documentation is not common in our ecosystem, and it is unfortunate IMO, but that’s a topic for a different thread.

4 Likes