Security fixes should bypass staging — unbound CVE batch as a case study

Three weeks ago, unbound 1.25.1 was released fixing 11 CVEs including a CVSS 10.0 (cache poisoning) and CVSS 9.8 (RCE during DNSSEC validation).

The nixpkgs security tracker flagged it within 24 hours.

The fix was two lines, bump the version and update the hash. ofborg CI passed all 36 checks across all platforms. It has been sitting in staging ever since.

As of today, June 9th 2026 — 20 days after release — users on both nixos-unstable and nixos-26.05 are still running the vulnerable 1.25.0. The fix is merged and sitting in staging-next-26.05

The rebuild count that forced it into staging was reported as 5k+ packages. The actual number of packages that link libunbound is a fraction of that — a significant part of the count is nixosTests.* VM infrastructure that is not shipped software and cannot be broken by a DNS resolver point release with no ABI changes.

The staging process exists to protect against cascading breakage from dependency tree changes. A security point release with no API/ABI changes, a passing test suite, and a fixed changelog is the lowest-risk change imaginable. The heuristic of “rebuild count > N → staging” breaks down entirely here and the tooling applies it anyway.

Concrete proposal:

  1. Exclude nixosTests.* from rebuild counts — they are validation artifacts, not shipped software
  2. Add a security fast-lane: CVSS ≥ 9.0 + no ABI changes + passing tests → eligible for direct master merge with NixOS Security Team sign-off
9 Likes

I believe it exists to prevent mass rebuilds from impacting infra, not to prevent breakage.
Pure ABI changes shouldn’t matter when everything gets rebuilt.

2 Likes

The counts of 5k+ do not count any nixosTests.*

1 Like

Or more precisely, infra wouldn’t be able to manage such amount of rebuilds (i.e. long times without binaries, similarly to now if you followed staging). And even the moment of merging to master would be unpleasant, as people, PRs etc. would be missing huge amounts of binaries. Mass rebuild security fixes get merged at daily basis nowadays.

1 Like

fair enough, but if the mechanism exists, why didnt it apply here? seems like a clear cut case for exactly this?

I think it might be helpful to re-read vcunat’s reply a few times. As far as I can tell, nothing they said implies the existence of some mechanism that did not get applied for libunbound.

PR merging is distinct from building is distinct from channel updates, despite all these things being related. Keep in mind that nix is input addressed, thus any referrer (including indirect referrers) MUST be rebuilt; this is the whole rationale for staging to exist.

1 Like

i understand that part, but my point is that a lag of 20 days for a cve of this severity is too long? i did misunderstand the process a bit, but why is current nixpkgs, even unstable is still on the vulnerable version?

2 Likes

Per CONTRIBUTING.md, ‘critical security fixes’ should be merged to staging-next, not staging.

I don’t know who makes the decision about whether a security fix is critical per this policy, or what the relevant considerations are. So perhaps the process was followed here, as designed; perhaps not. But in principle, if it’s a big deal, it ‘should’ be processed faster than this.

2 Likes

There was no staging-next at that point. It got merged to master about 1 day before the unbound PR got opened. Some very critical fixes were merged directly to master in the past, but that won’t speed up getting the fixes into the big channels too much in comparison to the usual process. (it speeds up the -small channels)

Note that the rebuilds are expensive for the infra. One staging* cycle takes 2 weeks, and almost all of that is blocked by waiting for the infra to process everything. And right now, we’re supporting 3 different branches at once (!) (unstable/master, 26.05 and 25.11), so the worst-case lag is more than 3*2 weeks. Though the overlap only lasts for a few weeks, but even without it there’s about 1 month worst case.

8 Likes

I believe we can improve some things, though. A few relevant points:

  1. unbound in particular. I suspect that if we removed it from gnutls (default) build inputs, we would significantly reduce the amount of rebuilds which it’s causing. There it’s only used for ${gnutls.out}/lib/libgnutls-dane.so* which I suspect to be very niche (would be nice to really confirm) so we could avoid building it in the default version (and e.g. provide a gnutls-dane derivation instead).

  2. x86_64-darwin is being dropped since 26.11 (current unstable/master/staging-next). That will decrease the load on the infra.

  3. in infra chat we recently seem to have a consensus that we wouldn’t build nixosTests anymore. Of course we’d keep those which are chosen to be channel blockers, and perhaps there would be some other aspect to compromise, but currently the costs/benefits of building all of them all the time doesn’t seem favorable. (they take a large fraction of total Hydra’s time)

9 Likes

Note that we can expect higher frequency of security fixes for Unbound now “thanks” to LLMs. Public info about that e.g. here: OARC 46 (Edinburgh, Scotland) (16-17 May 2026): Brace for Upgrades - Plural · DNS-OARC (Indico)

3 Likes

I’d say that due to the nature of Nix, doing this security fixes quickly is just not possible.

For NixOS, an alternative way could be to maintain a module (something like system.applySecurityPatches) with a list of hotfixes to be applied with system.replaceDependencies. This is updated as soon as the fix PR is merged and eventually removed when the staging cycle is complete.

For the last point, we could alternatively enable auto-allocate-uids (if it’s not already the case?) on the infra and push people to use container tests for the application tests and only use VM tests when strictly required.

We’ve been doing this in some out of tree project, the gains in terms of CI time are pretty massive.

3 Likes

The build infra supports container tests since May 23rd. Container tests can dramatically improve runtime speed, but I think what bogs hydra down is the large closures NixOS tests have. In the classic queue-runner model they are funnelled through the coordinator.

7 Likes

Is this something the rust rewrite of the Hydra queue runner addresses, or something else?

I would defer that answer to @Conni2461, who designed and authored the new queue-runner.

1 Like