Package URL's (purl) for Nix packages

We should leave as little room for interpretation as possible (that is, no default based on values of other fields) as downstream consumers will misinterpret these.

That is why I suggested an output-hash qualifier. Aside from that, output hashes do not necessarily reflect the build tool/package definition used. A reproducible static binary may be produced by Nix or by OpenEmbedded (assuming same source repo and similar compiler versions/settings). Both just provide the build infrastructure, then delegate to make.

Not core nix devs, but people working on Nix + SBOMs:
@henrirosten (GitHub - tiiuae/sbomnix: A suite of utilities to help with software supply chain challenges on nix targets)

Quick comment from the vulnerability scanners’ viewpoint, which is one potential downstream use of this data:

Most current vulnerability scanners identify packages based on CPE since that’s what NVD supports.
OSV is one example that supports purl, however, nix ecosystem is not currently supported in OSV.

IMHO, mapping nix packages to purl will be usefull and this discussion is surely needed. However, CPE seems most widely used currently, therefore, the ability to map nix packages to CPEs accurately (as accurately as possible) would have more concrete benefits right now.

1 Like

I agree with @wamserma that we should not default to NixOS/Nixpkgs (or GitHub) and be explicit instead.

Would it be an option to have two purl types for Nix? One corresponding to an evaluation attribute, and another to the evaluated derivation?

That could be another purl type, entirely independent from Nix. pkg:ca/<hash>, or instead of ca the hash type used.

For Python packaging there is now a PEP for describing external (native) dependencies, using purl. Discussion

After reading this discussion and the PURL spec, I’ve come up with my own proposal.

The main consideration was that it’s not always possible to locate a given Nix package, since the provenance is not always recorded, as is the case with channels; and a given exact package might be present in multiple locations, such as different nixpkgs revisions or forks.

Also, I believe the “main” triplet of namespace/name@version must uniquely correspond to a particular derivation.

As such, I think we should disregard using namespace and instead rely on optional qualifiers to locate the package if possible.

Here is my proposal:

Let package be a Nix derivation output (e.g. nixpkgs#pkgs.hello.out);

  • type is a constant nix;
  • No namespace
  • name is the package name; precisely speaking, it is the output of (builtins.parseDrvName package.name).name (e.g. hello)
  • version is the file name of the package derivation file, sans the .drv extension; precisely it is builtins.concatStringsSep "." (lib.init (lib.splitString "." (builtins.baseNameOf package.drvPath))). (e.g. qb6j8v8z50shmrgsj2pk4fwrk2ff5jpn-hello-2.12.1)
  • Optional qualifier flakeRef is the url-encoded locked flake-ref of the flake from which this package was evaluated, if applicable&known. Mutually exclusive with channelUrl (e.g. github%3Anixos%2Fnixpkgs%2F074522643cc9ccbb871ca3b31ed599e9b1b7b5a2)
  • Optional qualifier channelUrl is the url-encoded absolute http(s) URL pointing to a tar.gz archive containing default.nix, from which package was evaluated, if applicable&known. Mutually exclusive with flakeRef (e.g. https%3A%2F%2Fgithub.com%2Fnixos%2Fnixpkgs%2Farchive%2F074522643cc9ccbb871ca3b31ed599e9b1b7b5a2.tar.gz)
  • Optional qualifier attrPath is the (fully-qualified for flakes) attribute path from which the package was evaluated, if known and applicable (e.g. legacyPackages.x86_64-linux.hello.out for flakes or hello.out for channels)
  • Required qualifier outputName is package.outputName (e.g. out)
  • Optional qualifier outPathCA is the url-encoded output path, only present if the package is content-addressed and the output path is relevant&known; precisely package.outPath
  • Optional qualifier substituter is a url-encoded URL pointing to a substituter (binary cache) in which the output path is present.

Versions should be considered opaque and non-ordered, as is the current practice in nixpkgs; as such, the only possible comparison is of version equality.

Here is the example for nixpkgs#hello.out in full: pkg:nix/hello@qb6j8v8z50shmrgsj2pk4fwrk2ff5jpn-hello-2.12.1?flakeRef=github%3Anixos%2Fnixpkgs%2F074522643cc9ccbb871ca3b31ed599e9b1b7b5a2&attrPath=legacyPackages.x86_64-linux.hello.out&outputName=out

And here is an example for nix-build '<nixpkgs>' -A hello: pkg:nix/hello@qb6j8v8z50shmrgsj2pk4fwrk2ff5jpn-hello-2.12.1?attrPath=hello&outputName=out

Notice how even though the source of the package is different, and in the second case it’s impossible for Nix to locate the package source on the internet, we still end up with the same first part of the PURL.

For cases when it is possible to locate the source (whether flake or not), we provide our consumers with a way to fetch it and evaluate the package.

Thanks for your thoughtful input!

Notice how even though the source of the package is different, and in the second case it’s impossible for Nix to locate the package source on the internet, we still end up with the same first part of the PURL.

For cases when it is possible to locate the source (whether flake or not), we provide our consumers with a way to fetch it and evaluate the package.

I think those are sensible properties.

Also, I believe the “main” triplet of namespace/name@version must uniquely correspond to a particular derivation.

Why? I don’t think this is true for other purl types (i.e. pkg:deb/debian/curl@7.50.3-1?arch=i386&distro=jessie seems like it may be slightly different depending on when/where/how you build it).

I think including the derivation hash makes the version field overly specific: for instance, if I want to express, “hello 2.12.1 is vulnerable to CVE-2024-foo”, how would I do that? Enumerating all derivation hashes of hello 2.12.1 is unfeasible. It’s true that just saying “hello 2.12.1 is affected” is also imprecise, as some derivations of 2.12.1 may have a patch for CVE-2024-foo applied - but I think that should be solved by taking patch information into account like sbomnix and vulnix do - by no means perfect, but I don’t see another way, and I don’t see including the derivation hash in the version as helpful. For those cases where it’s useful, though I can’t think of any, you could still get it from the attribute.

(the above is my main response, what comes below is perhaps more nitpicky)

Optional qualifier flakeRef is the url-encoded locked flake-ref of the flake from which this package was evaluated, if applicable&known

I wonder if we should have this information in one field or split it out into its parts

Optional qualifier channelUrl is the url-encoded absolute http(s) URL pointing to a tar.gz archive containing default.nix, from which package was evaluated, if applicable&known

We should probably also allow pointing to other things than .tar.gz’s here, e.g. git repositories?

Optional qualifier attrPath is the (fully-qualified for flakes) attribute path from which the package was evaluated, if known and applicable (e.g. legacyPackages.x86_64-linux.hello.out for flakes or hello.out for channels)

I wonder if we should remove .out here (as it can be derived from the outputName), and perhaps similarly derive legacyPackages.x86_64-linux from system?

Not so sure about that, it doesn’t seem to be easy to normalize, is it? Should we take non-canonical representatives of the set of equivalent system classes?

@raboof thanks for the detailed feedback!

Yes, but AFAIU that’s mainly because Debian doesn’t provide a common “hash of all dependencies” in the same way as Nix does. For an example where there is such a hash, see OCI images (The version is the sha256:hex_encoded_lowercase_digest of the artifact and is required to uniquely identify the artifact.) and Docker images (The version should be the image id sha256 or a tag. Since tags can be moved, a sha256 image id is preferred.).

I think including the derivation hash makes the version field overly specific: for instance, if I want to express, “hello 2.12.1 is vulnerable to CVE-2024-foo”, how would I do that? Enumerating all derivation hashes of hello 2.12.1 is unfeasible.

Maybe something like *-2.12.1 would work? And then, for cases when only a small list of particular derivations is affected (e.g. a dependency update/patch quickly solved the issue without a version update), you have the ability to list those precisely.

For those cases where it’s useful, though I can’t think of any, you could still get it from the attribute.

Indeed, I agree it won’t often be useful for vulnerability scanning, but it is useful in the general context of purl, in that it identifies the package more precisely and uniquely.

I’ve spent some time deliberating on where exactly to split the full flakeRef. I think it makes more sense to keep the reference to the flake it as a single field, because the flake-ref syntax is quite complex and has various URI “parts” depending on the “scheme”. Inheriting this complexity into purl can lead to the need for updates of the schema in the future, which I think we should avoid if possible. However, I’ve come to the conclusion that it’s better to split off the attrPath from the flakeRef to keep compatibility with channelUrl and lack of any source.

I think this should be something which nix accepts as a --file argument or part of NIX_PATH. AFAIR there’s no documentation on that, and it only accepts local files and http(s) URLs pointing to tar.gz files, but I might be wrong here.

I was considering the same, but I think it makes sense to keep the output here and also as outputName, since it’s not clear how to derive one from another and vice versa (consider a flake which re-exports zlib.dev as outputs.myPackageScheme.zlib-dev, overriding dev to be something else; then, just appending outputName to the attrPath would possibly break things, and it’s impossible to infer outputName from the path). Ditto for system: the flake output schema is not strict, there can easily be a custom output schema which doesn’t use ${system} in the attrPath at all.

I’m not sure I’m following; the system qualifier is simply copied from the system attribute of a derivation, which is part of the derivation hash. Why would we normalize it at all?

Well, maybe a more general question: does the purl format you propose has a canonical representation? Do we have purl(p) == purl(p') <=> p = p' for some reasonable = and == is canonicalize(purl(p)) = canonicalize(purl(p')) ?

It seems like to me that system is not canonical, thus you do not have this property and it seems undesirable (?).

Hi, sorry to interrupt. As a relatively new Nix user (but one very interested in making SBOMs from Nix derivations), I’m not sure I understand.

I agree that it is desirable to have a canonical form for nix purls.

It wasn’t immediately obvious to me whether system is canonical. Could it be that x86_64-linux vs amd64-linux are both valid and refer to the same thing?

Why do we need system as a required qualifier? It’s not required to uniquely identify the package. It’s just needed to build the package. But with only the required qualifiers it’s still not enough to build the package (attribute Path and channel / flakeRef are missing). So system might as well be optional?

After all, we cannot hope to make a canonical purl which includes a flakeRef or attrPath, since there are likely many equivalent flakeRefs and attrPaths which cannot easily be determined.

All these examples are torturous, ugly, and vague because Nix and purl are antithetical to each other. It’s never going to work.

If you are listing a Nix path in an SBOM then use a component with properties in a nix: namespace - Pre-RFC: CycloneDX BOM taxonomy.

I don’t think it’s possible to have a canonical purl for Nix; there can be multiple flakeRefs that produce the same package. However, I don’t think system plays any role here; p = p' ⇒ p.system = p'.system.

And, if we disregard all qualifiers, then purl(p) = purl(p') ⟺ p = p', and there’s no need for any canonicalization.

I think I tend to agree here; there’s no real need for a separate system qualifier, as it is already part of the “version” in a way. I’ll drop it from the proposal.

1 Like

I agree that purl and Nix aren’t made for each other, and the work you’re linking looks better and probably works better too. However, I think we can still have some purl for Nix, even if it’s ugly; the two solutions are not mutually exclusive.

I agree with @balsoft here. Sure, the information is more nicely represented in the CycloneDX properties, but that doesn’t mean PURLs are useless. For one, they are more cross-platorm, SPDX supports PURLs too for instance. Also, as much as I don’t like it, many vulnerability scanners rely on the PURL to perform their lookups.

Perhaps we should actually harmonize both efforts, and have the same fields as purl attributes and as CycloneDX properties as much as possible?

We might be able to unify some fields. But generally, I don’t think that would work. The properties are just that, properties of the component. The PURL is supposed to uniquely identify a package, without any other data.

Sorry if this is too lazy of a question, but what are examples of tools that would benefit from being able to refer to nix packages via PURLs?

RE: Hashes

Just an observation: with Nix it’s trivial to identify “concrete” packages, but Nixpkgs doesn’t offer much tooling for reasoning about “different versions of the same package”