Meta attribute for binary packages

timokau · August 11, 2018, 10:21pm

We currently document the license of packages and let users disable unfree packages. Personally, I care way more about weather or not the package is compiled from source than the license. But we don’t currently have a way to disable binary packages or even tell without looking at the nix specification.

Most of the time unfree and binary overlap, but not always. There was some discussion about this on irc. Some points from there

making the distinction is not entirely straightforward. To remain practical, we would have to make exceptions for bootstrapping
maybe it would be worth generalizing this to a set of “anti-features” like f-droid does

vcunat · August 12, 2018, 7:57am

The distinction does seem difficult around bootstrapping. A binary tool would be considered “bad” and the same tool bootstrapped with the binary version would be OK? And other stuff compiled with the binary tool directly would be “bad” as well? EDIT: and not just bootstrapping – what if I download a PDF instead of building it from text-like files?

timokau · August 12, 2018, 9:06am

While its hard to come up with a set of rules in theory, I think the distinction isn’t that hard in practice. The bootstrapping exception could be limited to a small set of compilers. I think bootstrapping a compiler is very different from fetching some pre-compiled enduser application and putting it in the correct directories.

While I generally prefer to build pdfs etc. from source (mostly because I don’t trust the author to remember updating it every time they make a release), I think we should limit this to executable files.

vcunat · August 12, 2018, 10:29am

TL;DR: Yes, I’m over-engineering it. There’s a strong push to build packages from source anyway, and most of the other users probably don’t care, so there’s little use for any precise tracking and we could start with some rough approximations.

Intuitively, I would go for a property expressing if the build output contains “binaries” (i.e. machine code?) copied from some of its inputs. By default I’d expect you want to avoid those in runtime closures without minding build-time closures (of your system and user profiles), though to make this distinction we would most likely have to modify nix itself (which is probably not worth the work ATM).

Such a property might later become an attribute of each /nix/store path. The thing is that everything was ultimately built from source, and the important information is who made this transition. Using upstream’s binary is alike to using some other binary cache that is perhaps “less trusted” (in some sense) than binaries built on our farm or binaries that you build yourself.

timokau · August 12, 2018, 11:15am

Yes, pretty much. Basically just ask yourself “would a gentoo user want to install this package”. Although I don’t think we should allow anything in the build-time closure to be “binary”.

Yes for me a big bonus of a distribution is that I’m centralizing my trust. I trust one build cluster instead of trusting 500 different package maintainers and their distribution mechanism. Also it is possible to disable binary substitutions and use nix as a sort of “gentoo with benefits”.

Edit: Also, in the future there is the possibility of reproducible builds verified by multiple sources.

zimbatm · August 12, 2018, 12:30pm

@timokau what is the motivation for disabling binary packages only? Is it to not allow users to install applications on the machine?

timokau · August 12, 2018, 1:09pm

Same as for allowUnfree. The user might not want binary packages and currently it is easy to accidentally install them.

7c6f434c · August 12, 2018, 1:48pm

Yes for me a big bonus of a distribution is that I’m centralizing my trust. I trust one build cluster instead of trusting 500 different package maintainers and their distribution mechanism. Also it is possible to disable binary substitutions and use nix as a sort of “gentoo with benefits”.

«Yes, but» — please remember that we as a project generally trust the apparent upstream source distribution mechanism anyway, so in terms of upstream compromise risks you can only cut the problem in half.

7c6f434c · August 12, 2018, 1:50pm

I would hope that the question that flag would answer is «do we understand how to apply a patch if desired».

(Of course, drawing the line is annoying in some cases — what about Java packages built from source with dependencies fetched as JARs…)

Ekleog · August 12, 2018, 2:22pm

Idea: split dependencies between “binary dependencies” and “source dependencies.”

A binary dependency is a dependency which is used as a binary. A source dependency is a dependency which is only used as source code.

With this split, for instance (not complete, just for explanation):

gcc has source-dep gcc-source and as binary-dep gcc-bootstrap-stuff (the binaries-downloaded-from-upstream)
tree has as source-dep tree-source and as binary-deps gcc and coreutils
firefox-bin has as binary-deps firefox-bin-source and coreutils

With this distinction made, it becomes possible, for a package, to backtrack all its transitive dependencies in order to check whether the binary-deps are trusted or not.

This would make nix much better against trusting-trust than pretty much any other tool I know of, because it’d allow to maybe say “OK I’m allowing only gcc-bootstrap as binary-dep,” and be guaranteed to not be able to install anything else. And thus if one wants rustc they’d have to bootstrap it via mrustc. And why not set nasm as only allowed binary-dep, and build gcc-bootstrap from some chain of compilers.

Obviously, the issue of trusting the nix executable itself, as well as the underlying OS / CPU microcode / hardware remains, but at least the build chain could be as clean as possible, while not restricting uses for other people.

Now, as for how to technically do this… I’d say maybe just add meta.binaryDeps that would include all dependencies used as binaries? I think this would not trigger any issues with cross-compilation, as anyway whether one trusts a binary should be independent of whether it’s on arch X or Y, but I’m not sure, and maybe this would require the whole binaryBuildInputs, binaryNativeBuildInputs, etc. series?

timokau · August 13, 2018, 12:09am

Yes, but the problem is still less when building from source. When building from source, you can inspect the source or at least hope that somebody does it. When building a binary, you can’t.

Thats also a good metric. I think the question should be relatively straightforward in most cases. Just some edge cases with compilers.

In that case I’d say that the dependencies are binary packages. Therefore the package shouldn’t install if a user set enableBinary = false.

Would that be any different from a meta.binary attribute that “infects” dependencies? A way to selectively opt-out would be nice anyways. For example I’d like to explicitly whitelist unfree packages.

Why would gcc and coreutils be binary packages, assuming they were built from source (excluding gcc bootstrap)?

Yes, that would be awesome. Also maybe not very practical as bootstrapping would take forever when somebody changes something at the root of the tree.

Ekleog · August 13, 2018, 2:43am

Hmm… My initial idea was to tag how packages are used, rather than how packages are made. The aim being to support things like using glib as a dependency, while only relying on the headers part of it. glib could be a binary-only dependency, it wouldn’t matter much, as only source code is used from it. This also answers why gcc and coreutils would be binary dependencies: they are used as binaries.

Now, upon giving it a bit more thought, I’m no longer sure this is really useful. I guess I missed the “infection” part of your proposal, when writing this message!

Indeed, so with the infecting meta.binary attribute, I guess one option would be to have a nixpkgs configuration option allowBinary: derivation -> bool, that would allow (or not) installing a binary by doing something like all $ map (d: isBinary d || allowBinary d) transitiveDeps on the transitive dependencies of each to-be-installed package?

Indeed, I’m not really suggesting we turn this on by default on hydra (although this may be something that works, so long as the root of the tree is “carved in stone” – it shouldn’t require many changes anyway, so maybe the cost would be palatable?). However, it would may make sense for paranoid-enough individuals

Now comes the question of how to migrate from the current “don’t tag anything” to tagging all derivations as either binary or built-from-source. I think that, for safety, meta.binary should default to true. But requiring a change to all derivations in nixpkgs sounds impossible.

So maybe just set meta.binary that defaults to true for the fetching builtins? With this, all packages that come from outside would be considered as binary by default, and derivations built from other derivations would default to non-binary, which seems to make sense. It would still require quite a bit of work, though.

7c6f434c · August 13, 2018, 6:22am

When building from source, you can inspect the source or at least hope that somebody does it. When building a binary, you can’t.

You can inspect the source; in many cases you cannot realistically hope someone else does it. Also, with modern optimisers there are some cases where inspecting binaries via disassembly gives more confidence than inspecting source (of course, inspecting disassembly and syscall traces is only practical for local inspection, but reading the entire source code is also not always practical)…

7c6f434c:

(Of course, drawing the line is annoying in some cases — what about Java packages built from source with dependencies fetched as JARs…)

In that case I’d say that the dependencies are binary packages. Therefore the package shouldn’t install if a user set enableBinary = false.

If there are separate dependencies. If they are fetched inside the main tarball, that’s more annoying to discover.

(Also, is a fully-source-based build of LibreOffice feasible for us?)

Why would gcc and coreutils be binary packages, assuming they were built from source (excluding gcc bootstrap)?

Not binary by origin, but used-as-binary.

Another approach is to say that some output of a binary package might still be source-only while the main output is binary.

Yes, that would be awesome. Also maybe not very practical as bootstrapping would take forever when somebody changes something at the root of the tree.

The very lack of practicality in the bootstrap should be enough to reduce the deep changes!

Ekleog · August 13, 2018, 12:53pm

I think a difference is that it is quite easier (not saying it’s easy at all!) to check for a backdoor in a source-based package than in a binary-based package. Which means that there is more risk in including said backdoor in the source code than in the package itself. Would it be only because a would-be contributor might discover the backdoor by pure randomness

Then, we do agree it’s not a silver bullet, just a step.

Hmm… do you know if things like that happen often? I haven’t met with any in packages I’m packaging yet.

If it happens rarely enough, it would be possible to have the main tarball FO derivation be meta.binary = true, then have a derivation that extracts only the source from the first tarball with meta.binary = false, which would stop backtracking. Or, as you say, to split the meta.binary attribute along the different outputs (which would have the nice property of allowing to set a pre-built documentation PDF as meta.binary while keeping the source code output non-binary)

If it happens often, maybe it would mean my idea of binary-by-destination would be better after all? but I fear it would lead to a much slower propagation of the binary-related flags.

7c6f434c · August 13, 2018, 3:25pm

7c6f434c:

If there are separate dependencies. If they are fetched inside the main tarball, that’s more annoying to discover.

Hmm… do you know if things like that happen often? I haven’t met with any in packages I’m packaging yet.

Hopefully that’s rare outside specific ecosystems (I would expect more of that in Java world) and maybe monster packages (LibreOffice happens to fetch the dependencies separately, but TDF does host binaries of some of LO dependencies — you guessed it, Java ones).

Also, some compilers like CCL ship sources pre-wrapped with a bootstrap compiler.

On the other hand, it’s easy to miss such things if they are not monstrous (but I hope small packages just don’t do that).

timokau · August 13, 2018, 4:44pm

[quote="Ekleog, post:12, topic:657]

Hmm… My initial idea was to tag how packages are used, rather than how packages are made. The aim being to support things like using glib as a dependency, while only relying on the headers part of it. glib could be a binary-only dependency, it wouldn’t matter much, as only source code is used from it. This also answers why gcc and coreutils would be binary dependencies: they are used as binaries.
[/quote]

Is that distinction common enought to warrant the difference?

[quote="Ekleog, post:12, topic:657]
Now, upon giving it a bit more thought, I’m no longer sure this is really useful. I guess I missed the “infection” part of your proposal, when writing this message!
[/quote]

I think I didn’t make that explicit. But thats the same way unfree packages currently work: If allowUnfree is false, I can’t build anything that has an unfree package anywhere in its closure.

[quote="Ekleog, post:12, topic:657]

Indeed, so with the infecting meta.binary attribute, I guess one option would be to have a nixpkgs configuration option allowBinary: derivation -> bool, that would allow (or not) installing a binary by doing something like all $ map (d: isBinary d || allowBinary d) transitiveDeps on the transitive dependencies of each to-be-installed package?
[/quote]

I’m guessing (no clue how it actually works) the current allowUnfree mechanism could be re-used. But for whitelisting, such a function would be awesome (also for allowUnfree). Probably best to give a default that just checks a simple whitelist (allowed-binary = [pkgs.not-a-virus pkgs.foobar]) though.

[quote="Ekleog, post:12, topic:657]

Indeed, I’m not really suggesting we turn this on by default on hydra (although this may be something that works, so long as the root of the tree is “carved in stone” – it shouldn’t require many changes anyway, so maybe the cost would be palatable?). However, it would may make sense for paranoid-enough individuals
[/quote]

Yes it might be, we wouldn’t know before trying It would certainly be awesome to have the possibility. A great effort too, though.

[quote="Ekleog, post:12, topic:657]
Now comes the question of how to migrate from the current “don’t tag anything” to tagging all derivations as either binary or built-from-source. I think that, for safety, meta.binary should default to true. But requiring a change to all derivations in nixpkgs sounds impossible.
[/quote]

I think given that the vast majority of nixpkgs should be build from source right now, just defaulting to true and gradually mark the binary packages is more practical. It won’t be reliable right away, but it would be minimally invasive, better than the current state and “eventually consistent”.

[quote="Ekleog, post:12, topic:657]
So maybe just set meta.binary that defaults to true for the fetching builtins? With this, all packages that come from outside would be considered as binary by default, and derivations built from other derivations would default to non-binary, which seems to make sense. It would still require quite a bit of work, though.
[/quote]

Don’t basically all derivations fetch their src from outside? Packages using fetchFromVCS should normally be built from source though.

timokau · August 13, 2018, 4:49pm

Its about the possibility. many-eyes has its flaws but “maybe-one-or-two-eyes” is better than “definitely-no-eye-at-all”. Also better for debugging, patching, understanding the software behaviour…

So source and binary dependencies in the same tarball? That should probably be avoided anyways. Java…

Bootstrapping everything from nasm is a very extreme version of my proposal and (while cool) not really what I’m suggesting.

timokau · August 13, 2018, 4:52pm

Then, we do agree it’s not a silver bullet, just a step.

The perfect summary of my arguments

zimbatm · August 13, 2018, 5:46pm

I’m skipping over a lot of the discussion because I am lazy today

My recommendations:

Agree on a meta attribute to mark the binary packages. This is generally an interesting information.
Make a PR to convert nixpkgs and post it to this thread. It has to include documentation and review guide updates.
Profit!

pkgs/stdenv/generic/check-meta.nix has a generic predicate system to allow/disallow derivations. It’s not that hard to write your own (drv.meta.hasBinary or false) predicate.

stdenv.mkDerivation also has a mechanism to inherit meta attributes from the src so the attribute could be pushed even further if we have a special binary fetcher.

samueldr · August 13, 2018, 6:03pm

This can be done even if nothing is done yet with the information; the information can be tagged right now, and later consumed.