Package management is particularly challenging for scientific python projects, so I started the Python Package Management Rodeo (GitHub - nbren12/python-packaging-rodeo).
The goal is to find a tool that can install all three of the following
cartopy. a library for making maps. (depends on several c libraries)
tensorflow
<random pypi package>. This is a package that can pip can easily install, but isn’t in nixpkgs, debian, conda, etc.
I setup some simple CI that tries to install these packages with several package managers, and import them later on. There is currently no tool that succeeds at this.
Both poetry2nix and mach-nix have the potential to survive this rodeo, but I cannot currently get them to work, and would really appreciate contributions.
Cartopy and Tensorflow are both in Nixpkgs. Poetry with Poetry2nix is a good way to install non-nixpkgs things. Do you need help installing a particular package?
Did you look at the CI statuses? I tried poetry2nix, and it cannot install shapely, a dependency of cartopy. poetry2nix doesn’t automatically work just bc cartopy and tensorflow are in nixpkgs. The version in poetry2nix lock file differs from the nixpkgs one. If you can get poetry2nix to work, feel free to submit a PR.
The point is not about any particular package. I use these three packages as a litmus test. Currently no tool is crossing the finish line. It would be interesting to know how much effort is required to get mach-nix/poetry2nix to work.
The root causes of mach-nix failing on cartopy are:
nixpkgs’ sphinx package is missing certifi as a dependency. Nixpkgs maintainers are often not aware of misisng deps, since some sub dependency has the missing dependency inside checkInputs, making the dependency present for the hydra build. But as soon as doCheck is disabled, those dependencies are missing, and the build fails.
cartopy doesn’t specify it’s dependencies on pypi in a way that is recognized by the crawler maintining mach-nix’ dependency DB. Usually mach-nix would automatically fix the missing dependencies, but it cannot in this case since there is no data about it.
This problem is easily fixable, by manually adding some of cartopy’s dependencies to the requirements to make mach-nix aware of their sub dependencies.
More specifically, since building sphinx and fiona fail with a missing dependency error, we just need to add sphinx and fiona to the requirements manually. By doing this, we make mach-nix aware those packages and their sub-dependencies, and therefore allow it to build a better nix expression containing all necessary dependencies instead of trusting nixpkgs.
I extended your expression a bit. This one should work:
let
mach-nix = import (builtins.fetchGit {
url = "https://github.com/DavHau/mach-nix/";
ref = "refs/tags/3.2.0";
}) {};
in
mach-nix.mkPython {
requirements = ''
cartopy
tensorflow>=2
docrep
# manually add some of cartopy's deps to make mach-nix aware of their sub-deps
sphinx == 3.1.1
fiona == 1.8.18
# add scipy to speed up build since mach-nix will use the wheel now
scipy
'';
}
Take away:
It would be nice if we could permanently fix the checkInputs dependency leakage in nixpkgs.
it could be better to pin to a certain nixpkgs version (channel could be unstable or nur …)
For sure, I was just starting with a simple baseline along the lines of what the docs describe. I think this is what most beginners would start with. I didn’t e.g. pin the conda or pip installations either.
Why doesn’t mach-nix know to grab the scipy wheels without it being in the requirements? The docs seem to imply that wheels are the default strategy.
BTW, one issue I have had with numpy wheels is that they bundle their own version of libgfortran which can conflict with the nixpkgs version. This can cause problems when wrapping other fortran codes with python under some linker flags.
The downside is that it takes 40 minutes to compile gdal, and small changes to the requirements/providers/etc often require a recompile. A 40 minute iteration loop will make it hard to hack the mach-nix configuration (e.g. to add a new package, change provider, switch to gpu enabled package, etc).
Can mach-nix use the cached versions from nixpkgs? setting providers.numpy = "nixpkgs"; still forces a recompile. In principle, it seems like mach-nix could reuse the nixos cache if it allowed different versions at build and runtime as the wheel/conda providers do. It could then use dependency resolution to infer which runtime dependencies are compatible with the nixpkgs build.
Since mach-nix doesn’t have information about cartopy in its database, it doesn’t even know about the scipy dependency. Scipy is just in the closure, because the cartopy in nixpkgs has it as a dependency. Mach-nix is not aware of this and therefore doesn’t replace it with a wheel build. By manually adding scipy to the requirements, mach-nix will actively handle that dependency and override it.
I didn’t know about this issue. Maybe you can open an issue for mach-nix with some details on how to reproduce the problem.
This is unfortunate, and difficult to fix. The problem here is that pythons limitation of having only one global module scope per environment won’t allow us to make use of one of the best features of nix, which is to have several different versions of a package inside our dependency tree.
For python, even if you just add another top level python requirement, that can have an effect on very low level dependencies, since all requirements specifications of the new package plus all it’s dependencies need to be merged all together. Any little change in requirements is likely to have an effect on many other dependencies.
Earlier versions of mach-nix were optimized to increase cache hits whenever the nixpkgs provider is used. But I have given up on that, because those optimizations often lead to instabilities. Instead I now try to optimize the derivations for building successfully by doing more modifications to them in general with the cost of not hitting the cache. One example would be that mach-nix disables tests globally. Reasons are, that tests in nixpkgs are often patched and likely to fail when the package version is replaced etc.
Also, as mentioned in the previous paragraph, whenever you add any dependency to your requirements, it is likely that mach-nix has to replace some low level dependency, and therefore we’re likely to miss the cache anyways.
As far as I understand one long term goal of the nixpkgs python ecosystem is to split up python derivations into several phases/derivations, so that the build is independent of the runtime dependencies. If this is done, then we will be able to hit the cache without crazy hacks.
We could of course already hack our way around it, by just taking the original nixpkgs package and patching it, like we patch binaries from pypi or conda. Maybe we’d also need to replace some paths contained in wrapper scripts etc.
Feel free to work on this on your upcoming 40 minutes waiting times
If I ever manage to finalize the conda support and get it into the stable version, this would probably already solve the problem kind of, since conda provides many more binary releases than pypi.
Thanks for all the replies. The main sources of inflexibility seem to be:
Python has a global module space
nixpkgs python packages do not correctly model that python runtime requirements frequently don’t need to be there at build time.
mach-nix does not recurse into nixpkgs dependencies or explicitly represent their dependency graph in the dependency resolution
We could of course already hack our way around it, by just taking the original nixpkgs package and patching it, like we patch binaries from pypi or conda.
Is this required? It seems by merging the pypi and nixpkgs dependency graphs, it should be possible to avoid modifying anything in nixpkgs, only adding packages from pypi.
I’d be happy to contribute here if it doesn’t sound too hard.
If I ever manage to finalize the conda support and get it into the stable version, this would probably already solve the problem kind of, since conda provides many more binary releases than pypi.
I agree. Conda packages are often cleaner too, since they won’t bundle e.g. libgfortran in numpy. BTW, mach-nix’s dependency resolution is about 100000 times faster than condas.
No problem! 2.4 needs even more patching. I have this in a branch, but wanted to make a commit that could be cherry picked to release 20.09 (which is v2.1) first. Didn’t expect the merge would take so long…