Right now, many Nixpkgs packages will take a package licensed under Apache-2.0 OR MIT and assign meta.license = with lib.licenses; [ asl20 mit ].
This is incorrect: the reason that meta.license accepts a list is to mention the licenses for different parts of the same package.
The correct way of specifying such a license is with lib.licenses; OR [ asl20 mit ], utilizing the lib.licenses.OR function. This feature was merged into nixpkgs a few months ago so it is not widely adopted yet.
After seeing code in nixpkgs like this:
meta.licenses = with lib.licenses; [
asl20 # or
mit
];
I think something needs to be done to remedy the issue.
Proposal
I have an example documentation change in my personal nixpkgs fork, which properly documents the use of these operators. Going through the entire nixpkgs repository is difficult, as each one of these packages must be manually checked with upstream. For example, there are packages which actually include some files licensed under Apache-2.0 and other files licensed under MIT, despite the fact that those two licenses are the most common use of OR.
While this is important, I don’t think a tracking issue makes sense as every package with more than one license listed as meta.license is a candidate. Its also not really urgent, as the failure case is that we treat a package as if it had a more restrictive license, which is fine.
Instead, please check with the packages that you personally maintain to see if their license is actually an SPDX compound license expression, and if so use the lib.licenses.AND,OR,WITH, or PLUS operators.
This feature is already merged into nixpkgs here, but some things that should probably be in the documentation or mentioned as footguns:
Checking license properties is complex
As an example, the ci.nix file in the NUR templates repository uses something similar to the following expression to determine if a package has a free license:
# WRONG! DO NOT COPY
isFree = license: license.free or true;
If a package uses meta.license = with lib.licenses; AND [ bsl11 fsl11Asl20 ] (both licenses have free = false), then this predicate will incorrectly report that the resulting license is free!
The correct way to write such a predicate is using the license helpers:
# CORRECT
isFree = license: lib.licenses.evaluateNamedProperty "free" true license;
# This is also predefined
isFree = lib.licenses.isFree;
While you can look at the more complex functions in the ./lib/licenses/helpers.nix file, the functions you normally want are:
I’m personally not convinced that meta.license in nixpkgs is a good place to express detailed intricacies around licensing. The main distinction there (to me) is whether it’s free and if not, perhaps whether it’s redistributable. (including how that may change by some settings of the build exposed by the derivation)
That’s a good opportunity to discuss who wants what from Nixpkgs in that regard.
In the software provenance team we briefly touched on how SBOMs (in the sense of “what’s in there”) are related to knowing the license “flow” (for example to answer “is it tainted?”). So for that purpose, encoding and being able to extract precise license information is a good thing to have, even if there are many possible ways to do that. I can imagine for many personal use cases people will not care at all, or maybe only care for “is it all free?”. Maintainers may have yet other concerns, such as how much noise reviews or discussion or CI errors will produce. People dealing with the infrastructure side of things will ask how much, when scaling up such a pattern, it would add to resource consumption for evaluation.
I think this is a good change since it really does help with SBOM’s. I’ve had to write my own SPDX expression generator but if meta.license can provide me a simplistic way I can just join all of the elements together, that greatly simplifies things.
how much noise reviews or discussion or CI errors will produce
While license-fix PRs could clog up the PR list, I don’t think reviews or CI errors would be an issue. A package’s license is very hard to argue with, so its not like there will be long back-and-forth arguments about it.
how much, when scaling up such a pattern, it would add to resource consumption for evaluation
This was mentioned in the original PR thread, but I really don’t think this will be noticeable. The operator functions don’t perform any sort of conditional – its just building up attrsets.
I’m interested in future developments here. I remember from some Nix conference that good SBOM support is a reason why some major players (I think it was an Amazon employee(?)) are looking at Nix. Nixpkgs has some references to SBOMs in the documentation but I haven’t really taken a hard look at it since I don’t need to do so personally.
Having just taken a cursory look, I do see that buildNimSbom can take an SBOM as input instead of generating Nix like crate2nix does: it would be interesting to come up with a confident “Use SBOMs for generated lockfiles” statement across the Nix ecosystem as it does seem like a good fit.
If you are required to generate SBOMs, how much of an issue are “incorrect” meta.license expressions (assuming that “MIT OR Apache-2.0” is essentially translated to “MIT AND Apache-2.0”)? I’m not sure if, for compliance purposes, that distinction matters.
If there are scenarios in which it matters, it may be worth creating a tracking issue that lists “Packages That Actually Have Different Files Multiple Licenses”, so nixpkgs can work towards the eventual goal of correctly labeling all packages. There could even be a bot that just makes a comment on new PRs, saying “I see you listed multiple licenses in meta.license, please make sure that the package includes different files licensed under each license. If the package offers a choice, use lib.licenses.OR” to remind users of the new feature.
So far, it was only LLVM that broke iirc. I scan for a few thousand packages. There may be others. But now that we have compound licenses, I can look into using them.
To anyone else thinking along and wondering why we don’t just make a composite license eval to a piece of structured license data where you could simply read the free attribute like with any other license:
Notice the true argument; whether a composite license is free requires some interpretation and this interpretation can be different for each license property.
How should i write the license for a package where the source code is asl20 (License of the Software) but the build process downloads a proprietary binary from the creator of the package (for licensing checking). The creator of the software allowed redistribution for nixpkgs. My guess would be something like this:
license = with lib.licenses; AND [
asl20
unfreeRedistributable
];
Maybe; though I think getting users to use specific functions in lib.licenses is more future proof. It could also be more evaluation work to force, though I have no sense of that.
(More future proof because there may be properties that rely on more complex information than Boolean and/or on children, where we would need to change the implementation of the isX predicate)
I didn’t take a close look, but I’d assume that should stay as a list because the proprietary binary is a separate file from the Apache files (ie. someone could partition the outputs into “asl20” and “nonfreeRedistributable”)
If we wanted to improve the state of this in the long term, we could ban lists in meta.licenses, and mandate usage of lib.licenses.OR / lib.licenses.AND (which produces an attrset). However, this would be a massive refactor.
SPDX licenses apply to artifacts, which could mean files. So I think a list still makes sense, but I have two ideas.
SPDX has two different license concepts: that of the inputs and that of the outputs. Right now nixpkgs doesn’t add license information to fetchers, but that could be done, and nixpkgs files that have done so can be checked-off as “complete”.
The other observation: Nix packages’ outputs can have different licenses. If there was interest, a per-output license field could be possible, which would also allow us to determine if license information has been “modernized”.
This really depends on how people consume the information though; if anyone that uses FlakeBOM knows the SPDX data model it would be useful to chime in.