Mach-nix makes the process of creating a python environment from a requirements.txt file a no-brainer.
Motivation: When I started using NixOS, I was amazed! (Thanks BTW!) But what I was missing was a quick way to set up a python environment from a requirements.txt file. There are amazing tools like pypi2nix which simplify the task a lot, but the whole experience is still far from perfect. Non-pythonic build inputs are not handled, which often makes it a long manual process of rebuilding stuff over and over again and fixing errors by adding inputs. Compared to a simple pip install -r requirements.txt, this was still quite unpleasant. To gain back the comfort, I started working on my own tool a few months ago. The whole problem turned out to be a bit more complex, but I kept working on it until it did what I wanted. I optimized the UI to be super simple. Iām confident to assume that this could be the perfect tool for nix beginners who quickly need to load a python environment, and also for anyone else because it is just super simple to use.
The main differences to similar projects like pypi2nix are:
Speed: mach-nix resolves python dependencies via an internal graph and does not need to download and try-build packages first.
Reproducibility: If you run the same version of mach-nix against the same requirements.txt, it will always produce the same result.
No tool required: mach-nix can be called directly via nix expression (see below)
Extra build inputs are handled: Many python packages require non-pythonic build inputs and are difficult to build from scratch. Mach-nix reuses the definitions from nixpkgs to build those packages smoothly
Be patient on first use. It needs to fetch the dependency graph which is around ~ 200 MB.
Alternatively, mach-nix can be installed and used as a cmdline tool. Its interface is similar to pythonās venv. It is supposed to be newcomer friendly and does not require any knowledge about nix.
A big thanks goes to the guys from genesiscloud.com who supported me with the necessary computation resources to generate that massive pypi dependency graph. Check out their service! They offer cheap GPU based cloud computing. Currently 50% off due to beta.
There is still a lot to do to make this tool perfect. Suggestions / ideas / bugs / issues or any other feedback is very welcome!
I was thinking just this morning on a related concept: Why do we need to write an almost identical buildPythonPackage and buildPythonApplication derivation every time we want a new Python package in Nixpkgs ? Almost always this doesnāt consist of anything specific besides putting the requirements as specified by upstream. Since pypi makes that link by itself anyway, whatās the point in regenerating that data on our end?
While I was merely thinking about it, youāve done it - you automated the process of linking the dependencies data from pypi and made it available to Nix. Plus, you took one step further - the step of wrapping it nicely in cleaner fetchpypi function. Thatās just one benefit this brings.
Perhaps your project, is only an implementation of this idea for Python. Rust has crates.io, Golang has itās modules system where most of them are documented and (I think) indexed etc.
I have some comments / questions:
You wrote:
Extra build inputs are handled: Many python packages require non-pythonic build inputs and are difficult to build from scratch. Mach-nix reuses the definitions from nixpkgs to build those packages smoothly
Just to make sure, if thereās a Python package available on Pypi that has an external dependency and itās not pacakged in Nixpkgs, Mach-nix isnāt capable of knowing how that external dependency is named in Nixpkgs right?
Do you use Pypiās shas, or do you generate it yourself? It should be available according to this.
I forgot to make the pypi-crawlers project public. Thanks for noticing. It should now be available.
Pypi actually does not provide any information about dependencies. Please correct me if Iām wrong. Because this would make my project a lot simpler. As far as i figured it out, getting information about python dependencies is non-trivial. It was in fact one of the major points to tackle while developing mach-nix.
To sum it up: Dependencies of a python package are only revealed during its installation itself. Regexing the setup.py for definitions like āinstall_requires =ā. or using the abstract syntax tree module failed utterly, since there are too many variants to be considered on how these variables can be defined. Sometimes requrements are loaded from txt file etc. Also, over the years, python introduced additional methods to specify dependencies which are unrelated to setup.py. Itās baically a jungle.
It appeared to me that there is no simple way of mining dependency information without executing the actual installation.
The current strategy of the crawler maintaining the dependency graph is, to run all packages through a nix builder which fake-installs them through a patched python version. That means, it executes each projects setup.py which is untrusted code. Can i trust the nix sandbox? Should i add additional encapsulation layers?
I need to add that Iām only handling sdist python distributions so far and ignore wheels. Using wheels could make this process easier since they might contain some useful extra metadata. I started out handling only sdist because not all python packages on pypi have wheels and also it seemed python packages in nixpkgs are usually built using sdist. I guess thatās for a reason. Only a few of them use wheels. I need to learn more about wheels, so any information regarding this is welcome. I would like to support wheels in the future since some projects like tensorflow only release wheels.
That is correct. A package which requires external dependencies and is itself not specified in nixpkgs will fail. During this project I already started working on a tool with the goal to build a mapping from ābuild error messagesā to nix package attributes to build a database for missing inputs. But it apppeared that really nearly all of these difficult-to-build python packages are already packaged in nixpkgs. Therefore I trashed that project again and decided to better just rely on nixpkgs as a general base. In case my filter bubble view is wrong and external dependencies will still be a problem for many people, we could start working on a solution for that. But currently I have the feeling that other things would gain more benefits, like adding support for wheels for example.
Indirectly Iām using the hashes coming from pypi. But as mach-nix itself is running as a nix builder i cannot and should not do arbitrary api requestās since we cannot trust the integrety of the data coming from pypi. Therefore I built my own pypi fetcher tool nix-pypi-fetcher. It includes a mapping from (pkg_name, pkg_version) to URL and sha256. The mapping is updated twice a day (by querying pypi). Mach-nix pins one specific version of that fetcher together with one specific version of the dependency graph.
Itās interesting to think that other languagesā dependecies websites have learned the lesson I guess, and made it easy to discover the dependency graph. But Python which is super popular, hasnāt.
Reading your explanation, I realise how much work youāve done. Well done .
Since PEP 517/518 there is now support for backends other than setuptools for building a wheel. This means there are now tools for building wheels from projects that do not have a setup.py. Several backends already exist and are in use. Often these backends declare their dependencies somehow, but they donāt have to; itās really up to the back-end. Thus, the only reliable way for extracting dependencies is from wheels. Clearly one does not always want to build a wheel, so indeed in case of setuptools monkey-patching is a popular choice and should work fine.
let pkgs = import <nixpkgs> { };
in pkgs.mkShell {
buildInputs = let
python-with-pkgs = pkgs.python3.withPackages (ps:
with ps;
[
# Figure out what to put here based on requirements.txt
]);
in [ python-with-pkgs ];
}
I can now put this shell.nix alongside requirements.txt:
I just checked that thread and none of these solutions are similar to what i use. All solutions proposed there require you to do a full installation of the package which is exactly what i wanted to avoid. Installing costs far more computational resources than my current solution which is implemented and explained with more detail in the pypi-crawlers project.
Thanks, but i could not find any information about dependencies there. Can you?
I saw that the actual dump produced by your make-pypi-dump project actually got taken down by github with the notice Repository unavailable due to DMCA takedown.
Do you think i need to worry about such complications regarding the data Iām publishing? I guess your data contained wrong information about some projects license and that could have been the problem.
The only thing Iām publishing are project names, their dependency relations, and download URLs.
Could this lead to any trouble of that kind? It would be sad if the project would break because of that. Iād like to make sure to undertake any measures to prevent that.
Yes but it seems you still need to crawl them: API Documentation - Libraries.io . I also wonder whether their search is capable of giving you all pypi projects there areā¦
It got taken down because a certain company had uploaded an apparently internal package. Even though they removed it, the package description was still in the PyPI database and thus in my dump.
Cool, I now played around with their API a little bit. Their general collection of python packages seems quite complete. But concerning the requirements, their data seems to be less complete than mine. First of all, they are not differentiating between install_requires, setup_requires, tests_require, extras_require. Information about markers or differentiation between python versions is also missing. Checking the dependencies for requests i noticed that they are missing the requirements idna and urllib3. Not sure how they mine their data. I also checked for scipy for which my crawler failed to extract the requirements. And there they also donāt have the data at all.
As far as I know they collect their data from the wheels. Because wheels are already produced artifacts, it wonāt contain setup_requires because thatās setuptools specific for building the wheel. The same goes for tests_require, which upstream still wants to remove, but that issue has stalled.
Iām currently trying to understand which benefit it will bring to support wheels. For example scipy and tensorflow are currently unsupported by mach-nix. But Iām not sure if wheel support will help.
They both release manylinux{x} wheels. Iāve seen that manylinux wheels are now supported since nixos 20.03. I tested using their wheels with buildPythonPackage. They build, but then fail during import because they link against libstdc++.so.6. This proposal here seems to target these linking issues, but it has been closed. Not sure why. If this could be accomplished, it would be a really nice thing for mach-nix.
In general i assume, with the current state of nixpkgs, wheels which include pre-compiled binaries will most likely not work and therefore supporting wheels wonāt help for those python packages.
Apart from that, Iām aware that wheel is the current distribution standard. Therefore there are libraries out there which only release wheels and no sdist even they donāt contain any binaries. For these libraries wheel support in mach-nix would make them available. It would be interesting to know how high the number of these libraries actually is. I assume itās low. In nixpkgs19.09 the number of libraries using wheels as an installation method is less than 10. But of course nixpkgs might not be representing the general situation well. Maybe itās time for another pypi crawling session
All in all, supporting wheels doesnāt sound anymore like that big of an advantage as i originally thought.
Maybe it might be more beneficial to use nixpkgs itself as a provider for the resolver and take packages directly from there. Currently only sdist releases on pypi are considered. That makes tensorflow drop out which would actually be available in nixpkgs and could easily be included. Of course then only that specific version of tensorflow which is in nixpkgs would be available and as soon as you specify anything else in your requirements.txt, it will fail again.
Correct, they need to be patched with this method and additional libraries that are used need to be included as well.
Currently that count is very low, but given the new backends that number will go up. New packages I create (though theyāre private/work) seldom use setuptools. Also, there are already widely used packages that have such packages as dependencies. An example of such a dependency is the entrypoints package, which includes a generated setup.py for compatibility reasons and is used in packages such as flake8 and nbconvert.
Thanks so much! Thatās amazing! I now manged to build tensorflow from wheel. And itās for sure much much faster than building from source. One of the problems of mach-nix currently is that build times can be very long. Should i consider using wheels by default wherever possible? Or are there any troubles ahead Iām not seeing right now?
As youāve seen already, wheels are build artifacts, thus they do not list build-time dependencies. Thatās no problem if your users are fine using those pre-built wheels. In Nixpkgs we prefer source builds.
Then i should probably let the user decide if using sdist releases should be preferred/enforced.
Of course wheel support might blow up the dataset for the dependency graph. To reduce the size, can i make the following 2 assumptions ?:
Since my current dependency extraction uses setup.py and fails on anything else, i know that the build backend for all packages in my current database must be setuptools.
If a packageās sdist release uses setuptools as build backend, the requirements specified via install_requires are exactly the requirements of the wheel release for this package on pypi.
If this is true, I donāt need to store any dependency information for the wheel release if i already know the dependencies of the sdist.
Also, I would only need to download and analyze wheelās for packages which either donāt have āsdistā or dependency extraction failed on their āsdistā.