Could we've implemented multi output packages better?

doronbehar · April 7, 2020, 8:39pm

After quiet some time of contributing packages and fixes to Nixpkgs, I came to the conclusion that multiple outputs are not implemented good enough. I’d like to hear some opinions / experiences around this. Here’s my own:

I once tried to package a library which installs cmake files, headers and shared objects. The cmake files referenced the full paths of the headers and the shared objects. From the library’s Makefile’s prespective, it installs everything to the prefix ($out). However, if you set outputs, after all the files are installed, we move the headers to $dev and the libraries to $lib etc at the fixupPhase.

How could our package know that it actually installs it’s files to different prefixes? I’m just curios what was the expectation out of this idea.

When I tried to use that library with the outputs split, and use it as a dependency for another package, I found out that the package didn’t find the cmake files of the library which are found at $out/share/<lib-name>/cmake and not as expected by multi-outputs.sh - in $out/lib/cmake. Even when I managed to tell the package where exactly to find the library’s cmake files, the build still failed because the cmake files pointed to where the headers were originally installed - to the original $out and not to ${<lib-name>.dev}.

Today, I tried to improve a little bit a closure size issue of a package that splits outputs and I got hit by this error which I can’t even explain.

What’s adding to the frustration, is that it’s hard to debug these shell errors because they happen at the end of the build and you have no idea what shell function exactly failed - there’s no context to the errors.

If I were to design this feature for our ecosystem from the ground up, I would do this:

Set the same value for $dev, $out, $lib etc for all packages.
Teach Nix itself, to distinguish between outputs - for example:
- $out/include should be downloaded only if you want to build something with this library.
- $out/share/man should be downloaded only if you use this package directly.
Make Nix itself, download only needed files from cache.nixos.org according to the context of the request:
- If it’s nix-env -iA or nixos-rebuild then download only the actually needed outputs.
- if it’s nix build or nix-shell, download everything unless told otherwise.

I believe we can reach closure size decreases we have never dreamed of with this approach.

Is there an implementation problem I’m not seeing here?

jtojnar · April 8, 2020, 1:18am

We do not actually move libraries to lib output (search the setup hook for moveToOutput) because it would cause precisely the incorrect references as you mention. We pass proper --libdir (-DCMAKE_INSTALL_LIBDIR for CMake packages, see cmake: use multiple outputs for GNUInstallDirs by jtojnar · Pull Request #52859 · NixOS/nixpkgs · GitHub) and rely on the projects’ build scripts to work properly. Unfortunately, writing CMakeLists.txt is so hard that almost no-one in the world can do it correctly, so CMake projects are full of incorrect references.

It is generally fine to move headers as they are rarely referenced by absolute path – the main exceptions are pkg-config files, which we patch automatically and CMake modules which should just work as mentioned in the first paragraph.

As such, if you see incorrect reference, it will be caused by one of the following:

Project uses some non-standard build system flags (i.e. not GNUInstallDirs in CMake, or Autotools-style flags with ./configure script). Then we should set cmakeFlags, configureFlags or makeFlags appropriately.
Project uses those but they have some issues (e.g. expecting only relative paths in GNUInstallDirs variable, not joining paths properly in CMake because ~~it is terrible~~ does not support it). Then we should open an issue or send a patch upstream.
It is a Meson project and it does not handle multiple prefixes correctly. Then we can try sending patch upstream but Meson officially forbids multiple prefixes so projects are not incorrect when rejecting our patches. We need to lobby for Meson to revert this weird decision but we might need to carry in-tree patches until then.

Your design sounds nice but most projects that work outside of FHS already handle multiple outputs properly. As such it would make Nix much much more complex and harder to understand for little benefit:

Currently, it is immediately clear which output something is in, you can consider the store paths atomic for copying, etc. That would not be possible with the virtual outputs merged into a single store path.
You could also no longer have the same file with different contents in different outputs (useful for signalling files like nix-support/propagated-build-inputs).
Currently, Nix is rather stupid (which is a good thing) and simply tracks all referenced store paths. If you wanted to handle the virtual prefixes, you would need to somehow determine the use case of each reference. That might not even be possible without complex source code analysis – what paths would you depend on based on the following source code join_path(LIBDIR, "my-app", "libfoo.so")?

doronbehar · April 8, 2020, 9:08am

I think @jtojnar that since you are somewhat experienced with the build systems themselves you imagine yourself that with an ideal usage of every build system, we could package every project with multiple outputs with nothing more then outputs = [..];.

I’m not idealistic - I expect projects the worst support for all standards.

Could please explain what do you mean by “projects that work outside of FHS”?

I too think that almost always the store paths should be considered atomic. I was only suggesting, as a start, to not download $out/share/man and $out/include for store paths that are downloaded by nix-env and nixos-rebuild. For someone using nix-build, naturally they should be downloaded.

Hmm that sounds like an edge case where my design definitely needs more thinking. Do you have an example for this?

I’m not sure I understand what do you mean by “use case for each reference” and “virtual prefixes”. Perhaps I could have explained my idea better:

It doesn’t necessarily relate to multiple outputs. I’m only suggesting to introduce a super simple behaviour to nix: Not download (this is a mere example and this should be configurable) references’ $out/include and $out/share/man if it’s nixos-rebuild or nix-env being used. If it’s a package which is referred directly by the user (be it configuration.nix or nix-env), all paths should be downloaded.

The only relation this idea has to multiple outputs is that having this ability at hand makes multiple outputs seem purposeless as users can always ask Nix to garbage collect / not download paths of their choice to their store.

jtojnar · April 8, 2020, 11:33am

Of course projects are often less then ideal but we should strive to make world better and more portable. We can open issues, create patches or, in the worst case, use single output for such broken package. Fortunately, in majority of cases it just works.

Some projects still assume that every system has /usr/local and stuff can be just copied there. I encourage all newly introduced packages to fix incorrect assumptions and send patches upstream. They are usually pretty receptive, see few recent examples of improvements we spearheaded: kissfft patch, smartdns issue, xow issue…

Yeah, I am mostly talking about the binary cache use case here, for building we need everything currently as well because the outputs do not exist at build time yet. But from the cache’s POV the store paths can either be atomic, or you are dividing a path into overlapping virtual outputs (they are atomic too, just at file, not store path level). Then you need to come up with a method to decide what depends on what at the file level which I think would be much more complex that working at store path level (see the example I mentioned).

It is useful for things like Python bindings, which you want to split away from the main outputs, but you might still need to propagate some dependencies. I just pushed an example: python3.pkgs.libmodulemd: init bindings · NixOS/nixpkgs@af47659 · GitHub

It doesn’t necessarily relate to multiple outputs. I’m only suggesting to introduce a super simple behaviour to nix: Not download (this is a mere example and this should be configurable) references’ $out/include and $out/share/man if it’s nixos-rebuild or nix-env being used. If it’s a package which is referred directly by the user (be it configuration.nix or nix-env ), all paths should be downloaded.

Well, there is no difference between what nixos-rebuild or nix-env and nix-build do. In fact nixos-rebuild calls nix-build internally, and nix-env calls the same functions as nix-build. nix-build produces a store path and nixos-rebuild/nix-env just create symlinks to it in appropriate locations. Every output (store path) is created by building some derivation and the build result can be cached so that users do not have to run the builds themselves.

Currently, the behaviour is extremely simple with the run-time (outputs) closure (things you need to download) being determined by what store paths are referenced in the output. In order to use your path heuristics, you would need to somehow track what use case you are entertaining – daemon will need to know “should this command from nix build obtain just the library or headers as well”. And I did not mention the trouble between finding out if we need just the library, or possibly also GUI app in /bin. And garbage collection would become nightmare as well.

It is hard to imagine a system more elegant or reliable than what Nix currently does regarding dependency resolution. The few projects where it fails are annoying but even from pragmatic point of view, fixing them is still more reasonable solution in my opinion.

doronbehar · April 24, 2020, 8:02am

Just to note more examples of multiple outputs causing issues:

https://github.com/NixOS/nixpkgs/pull/73940#issuecomment-618859434

https://github.com/NixOS/nixpkgs/issues/65325

https://github.com/NixOS/nix/issues/3538

jtojnar · April 30, 2020, 8:37pm

After encountering yuzu: init at 482 by IvarWithoutBones · Pull Request #84117 · NixOS/nixpkgs · GitHub, I have rage-written GitHub - jtojnar/cmake-snips: Portability problems I frequently encounter in projects using CMake.

deubeuliou · June 2, 2020, 10:11am

I witnessed a variant of this too: some projects install their CMake package config file under lib/<name>/cmake/ and this isn’t supported either. But my understanding is that it is not a design problem and can be definitely fixed by following the same rules as the search procedure described in CMake’s doc.

I am a complete newbie regarding nixpkgs but I have a slightly more experience with yocto and, compared to the latter, I was surprised that multi-output was not the default: in yocto, the packages are automatically split following rules that are probably similar to nixpkgs (you can, of course, manually tweak the split when needed).

What about making multi-output the default (with, at least “out” and “dev”) with the option to opt out when it does not work? It will not fix any issue per se but it could encourage better packaging practices and therefore reduce closure sizes (note that I’m saying this but since I’m new to nix/nixpkgs, I don’t know what’s the current state of affairs). That would surely be a lot of work, though. In the same direction, why not make separateDebugInfo the default?