Package URL's (purl) for Nix packages

There are all kinds of distro-agnostic tools and file formats that try talk about software packages and versions - recently, there’s a lot of activity around SBOM and security scanning.

Identifying software is always a recurring problem for those. One promising emerging standard for this is package-url (purl). I think it is time we start defining how to refer to Nix packages using purl’s.

This has been discussed in a couple of places, such as Add guix and nix as package types · Issue #149 · package-url/purl-spec · GitHub, The future of the vulnerability roundups, Things to learn from tea.xyz, Add Nix cataloger by wagoodman · Pull Request #1696 · anchore/syft · GitHub and in the #slsa:nixos.org matrix channel (not sure if there’s any public history I can link to?).

To give a super-quick introduction to purl, a purl is typically of the structure scheme:type/namespace/name@version?qualifiers#subpath, where scheme is always pkg, type is a type from the registry at https://github.com/package-url/purl-spec/blob/c02b002f09bdc88a501f62259eec18761957828a/PURL-TYPES.rst, and namespace, name and qualifiers are type-specific. We could define a nix purl type and decide how to populate it.

You’ll notice ‘the same’ software could be present in multiple types. This is intentional and useful: that way you can distinguish between information about ‘software X’ generally and information about ‘software X as packaged in NixOS’.

I think the nix purl type should be symbolic enough so tools have enough information to perform some level of ‘fuzzy matching’, but can also contains all the information to know exactly how to recreate that specific build of a package.

Of course, we already have a format to refer to Nix packages: flake URI’s. As a straw man to get the discussion started, I would like to propose a definition of the nix purl type as a sort of different representation of the flake URI (since they have slightly different rules). I came up with some rules for defaults to make it succinct to refer to nixpkgs packages, but keep things general enough to also use this type to refer to any 3rd-party nix package:

pkg:nix/[<org>/]<attr>?<qualifiers>

Where:

  • org defaults to NixOS when not specified
  • attr is the attribute path to the package

And the following qualifiers can be added:

  • type: corresponds to the Flake type. For the NixOS org, for now default to github (though we can reserve the right to change change this default in the future, as long as history is kept across forges)
  • repo: the GitHub repo under the org. Defaults to nixpkgs when the org is NixOS, otherwise to (the first segment of) the attribute path
  • ref: tag or branch in the repo
  • rev: revision, which must be part of the ref tree
  • output: the derivation output, default to out

This leads to the following examples (purl and flake syntax side-by-side):

purl flake
pkg:nix/wget github:NixOS/nixpkgs#wget
pkg:nix/wget@1.21.3?ref=nixos-unstable&rev=897876e4c484f1e8f92009fd11b7d988a121a4e7 github:NixOS/nixpkgs?rev=897876e4c484f1e8f92009fd11b7d988a121a4e7#wget
pkg:nix/tiiuea/sbomnix?type=github github:tiiuea/sbomnix#sbomnix
pkg:nix/tiiuea/nixgraph?type=github&repo=sbomnix github:tiiuea/sbomnix#nixgraph
pkg:nix/python3Packages.enamlx github:NixOS/nixpkgs#python3Packages.enamlx
pkg:nix/eicas/omeka-s?type=git+https://codeberg.org&rev=bfe132f6540a175beb432c2c95472f929cbf310f git+https://codeberg.org/eicas/omeka-s-flake?rev=bfe132f6540a175beb432c2c95472f929cbf310f#omeka-s
pkg:nix/grub2@2.06?output=doc&ref=nixos-unstable&rev=897876e4c484f1e8f92009fd11b7d988a121a4e7 github:NixOS/nixpkgs?rev=897876e4c484f1e8f92009fd11b7d988a121a4e7#grub2!out

Now this is different from what’s being proposed in syft: they seem to just take the pname (?) and add the output hash. I can see how that is much easier for a filesystem scanning tool such as syft to discover, but it also seems much less useful: it is almost impossible from such a purl to ‘work backwards’ and find the exact derivation without additional context.

Should we ‘allow’ both ‘output-centric’ and ‘input-centric’ purls for the nix type? That seems like while it’d make ‘creating’ purls much easier for some cases, it also might make doing anything useful based on them much harder…

7 Likes

I’m not quite convinced the “pkg source code” makes up a good package identifier.

  • Multiple revisions of nixpkgs have exactly the same package recipe / literal .drv contents, so you end up with a lot of purls describing the same thing, so it’s not a unique identifier.
  • The “source” of a package is not clear. Usually, you start with a nixpkgs repo, and then build on top. You rarely bootstrap everything on your own. Normally you have a nixpkgs pin, and slightly override it inside your own override in your own repo. You probably don’t want to loose all references to the nixpkgs pin used, when building a static binary without any references.

The .drv hash, or the output hash(es) however uniquely describe the package and allow tracing back to the build recipe. It’s just not very nice UX to look it up, but cache.nixos.org seems to populate the Deriver field, but doesn’t upload the derivations themselves. IMHO, we should do that, and additionally work on some plugins/tooling to include auxillary metadata into container images etc - but I wouldn’t want to abuse the purl for that.

2 Likes

Agreed: it is unique in that it precisely points to a particular version of the software, but indeed there would be many identifiers that differ only in the rev/ref fields and “point to the same thing”. I’m not sure that is necessarily a problem compared to using something like output hashes: after all, many changes that will change the output hash will be ‘irrelevant’ in the context of a given tool/use case, so they’ll need to deal with different purl’s for ‘essentially the same’ package anyway.

The “source” of a package is not clear. Usually, you start with a nixpkgs repo, and then build on top. You rarely bootstrap everything on your own. Normally you have a nixpkgs pin, and slightly override it inside your own override in your own repo. You probably don’t want to loose all references to the nixpkgs pin used, when building a static binary without any references

I agree that’s a valid use case: it is useful when the identifier can express not only ‘wget from nixpkgs rev xyz’, but also ‘wget as overridden by Alice in project X’. With this you’ve convinced me that we probably do indeed want to have a way to refer to things that don’t have an attribute path.

The .drv hash, or the output hash(es) however uniquely describe the package

Agreed

and allow tracing back to the build recipe

I’m not entirely convinced there: maybe that works(/can be made to work) for everything that’s in cache.nixos.org, but as you mention above we want to support people slightly overriding things and not lose references. Also I think the nix purl type should support pointing to things that are not in nixpkgs at all (similar to flakes).

IMHO, we should do that, and additionally work on some plugins/tooling to include auxillary metadata into container images etc - but I wouldn’t want to abuse the purl for that.

I agree we will likely want to do work on plugins/tooling to ‘close the loop’. I’m not sure what exactly you mean by “abuse the purl for that”, I do think that it’s on-topic for this thread to discuss what’s feasible since that might inform what a useful purl format can look like.

1 Like

I’d prefer to move a bit from the qualifiers to the namespace and do away with some of the defaults in favour of explicitly stating things:

namespace = org : packageset

where

packageset = repo | repo : branch.

Example pkg:nix/NixOS:nixpkgs:release-22.11/ponysay@3.0.4&rev=...

Multiple purls pointing to effectively the same package will also occur after release branch-off, not only by rev changing on a single branch. This is actually an important aspect w.r.t to BOMs, as this may signal a change in the transitive dependencies of a package and e.g. in the case of static linking, a difference between a vulnerable and patched version of a binary.
The output hash/part of the store path could still be adopted as an optional qualifier, which helps downstream consumers (e.g. syft) to conflate purls for vulnerability tracking. Plus: a store path can then (probably) be realized from a purl. :slight_smile:

1 Like

There’s another slight way of looking at this that can become quite useful matching the currently scheduled work of the Nixpkgs Architecture Team.

purl PkgFun / PkgMod
pkg:nix/wget wget PgkMod w/ default version
pkg:nix/wget@1.21.3 wget PkgMod w/ version = 1.21.3

This perspective echoes this sentiment:

There is an emerging tension where the above concern is true for a Nixpkgs Style cataloguing repository, but where it is (mostly) not true for a flake style in-tree nix-base build pipeline.

I think that distinction may be important.

Thank you for getting the discussion going again!

I like the : notation. I’m not sure about removing some of the defaults, I found the succinct id’s for common cases pretty nice, but I’m not opposed to it. It would be nice not to be too GitHub-centric, so while for NixOS/nixpkgs we could default to github, this should probably remain mandatory to specify for other nix purls?

Yes, I’d say a change in transitive dependencies should definitely be expressable in the purl. @flokli’s criticism of using rev is that rev (unlike the output_hash) may change even when the transitive dependencies don’t change.

thanks for raising this for discussion. I think it’s great we can flesh out some of the key decisions early before submitting to the PURL spec repo.

Some of the existing nixpkgs versioning discussion would come in useful here too


Is this something where content-addressed nix could come in really useful?


It might also be good to involve some of the core nix devs in it’s design

We should leave as little room for interpretation as possible (that is, no default based on values of other fields) as downstream consumers will misinterpret these.

That is why I suggested an output-hash qualifier. Aside from that, output hashes do not necessarily reflect the build tool/package definition used. A reproducible static binary may be produced by Nix or by OpenEmbedded (assuming same source repo and similar compiler versions/settings). Both just provide the build infrastructure, then delegate to make.

Not core nix devs, but people working on Nix + SBOMs:
@henrirosten (GitHub - tiiuae/sbomnix: A suite of utilities to help with software supply chain challenges on nix targets)

Quick comment from the vulnerability scanners’ viewpoint, which is one potential downstream use of this data:

Most current vulnerability scanners identify packages based on CPE since that’s what NVD supports.
OSV is one example that supports purl, however, nix ecosystem is not currently supported in OSV.

IMHO, mapping nix packages to purl will be usefull and this discussion is surely needed. However, CPE seems most widely used currently, therefore, the ability to map nix packages to CPEs accurately (as accurately as possible) would have more concrete benefits right now.

I agree with @wamserma that we should not default to NixOS/Nixpkgs (or GitHub) and be explicit instead.

Would it be an option to have two purl types for Nix? One corresponding to an evaluation attribute, and another to the evaluated derivation?

That could be another purl type, entirely independent from Nix. pkg:ca/<hash>, or instead of ca the hash type used.

For Python packaging there is now a PEP for describing external (native) dependencies, using purl. Discussion