Fixing GCC-14-related build failures in Nixpkgs

A while back, nixpkgs changed stdenv to default to GCC-14. This version changed some warnings to errors by default which caused plenty of builds to fail and I noticed that a lot of them are still broken today. I did some Hydra-mining and found 297 packages that broke after the stdenv change, are still broken now, and whose logs indicate build failures caused by bespoke changes in GCC-14.

I compiled these into a list: Fixing broken builds after change to GCC14

Potential fixes to this issue (roughly in order of preference), thanks to @emily:

  1. Check if the issue is fixed upstream and we are just missing an update. If so, bump the version, check that everything builds/works and it should be good to go.
  2. Search for existing patches (other distros have the same issues) or write your own, fixing the cause of the warnings. Consider contributing the patch upstream. Fix the package by adding the patches to the derivation.
  3. If patching the code for some reason is too much effort, you can disable the errors and turn them back into warnings. This is done by adding something like this to the derivation:
env.NIX_CFLAGS_COMPILE = lib.concatStringsSep " " [
  "-Wno-error=int-conversion"
  "-Wno-error=incompatible-pointer-types"
  "-Wno-error=implicit-function-declaration"
];
  1. Changing to stdenv with older GCC (e.g. gcc13Stdenv). This should almost never be needed and only a last resort (like hitting a compiler bug, and if you do, please report it).

So, if you want to help out Nixpkgs or are looking to get some experience with contributing to Nixpkgs while having plenty of examples how to fix similar issues already in place, this is a good opportunity. Help out a package in need.

To find commits where others fixed the issue, you could run

git log --oneline | grep gcc14

to find plenty examples.

13 Likes

I think we should employ a Linux-Kernel-like model. ā€œIf your change breaks any other subsystem(s) package(s), you should be the one to fix it.ā€

1 Like

I am happy to see people work on fixing these issues! But I would kindly but firmly request this happen in the opposite order. Pinning an old compiler version should never happen unless absolutely necessary (and requires care to handle nonā€GCC platforms), and we should always prefer updating or fixing to silencing errors when possible ā€“ especially as those flags are likely to persist even if the issues are fixed upstream. These errors to catch invalid code were added for a reason!

Thereā€™s also another intermediate step after updating but before writing patches ourselves ā€“ getting patches from other distros, multiple of which have already moved to GCC 14 and handled a substantial amount of the fallout. Fetching those patches is often an easy and quick way to handle these failures.

This could work for, say, packages that have 50 others depending on them. For compiler upgrades, the affected packages are literally every package in Nixpkgs. If you demanded that bumping the default GCC version never broke a package, we would be on a version of GCC from 2006.

The work that the people involved in the staging cycle do to handle changes to core packages like GCC, glibc, curl, OpenSSL, etc. necessarily affects more packages than any one person can possibly take responsibility for. Yet itā€™s also work that is absolutely critical to the health of NixOS and our ability to support new versions of leaf packages, keep on top of the most critical security fixes, and so on. It also disproportionately falls on a small number of people to do: Iā€™d say a dozen would be an overly optimistic estimate for the number of people active in staging work.

As the person who made the PRs to bump to GCC 14 and LLVM 19, and helped shepherd the immediate work to fix the problems that resulted, I can say that we did a lot of work to test and fix builds before Hydra even started building packages. You can see some of that work in the LLVM 19 PR, though there were lots of followā€ups from the same few people as well. GCC 14 had been partially prepared for before the last release before we had to roll it back, and then had followā€up work in separate PRs this time, so itā€™s harder to pull up exhaustive links, but it was still a handful of people fixing a large number of packages before anyone even knew anything was broken. We had people building their entire system configurations with the new compilers before that point. But we have tens of thousands of packages, and at some point you have no choice but to distribute the effort for the long tail.

Are there process improvements we could make? Sure ā€“ e.g. if we could ping maintainers when their package breaks during staging-next then more things would probably get fixed before hitting the channels. OTOH, we have no expected SLA for maintainers and often the listed maintainers are not who actually takes care of a package, which is its own problem. You would be surprised how many packages are effectively unmaintained but continue to stay just above water because of the work of a very small number of people who handle issues in packages they donā€™t use or care about when they come up during staging-next. In some cases I suspect those packages may have no users at allā€¦

I find this work fun, but it can be very burnoutā€inducing. I know we have had a Rust maintainer put huge amounts of individual effort into mitigating the fallout of two Rust changes that caused mass problems with packages across the tree and still mostly get a negative response for the things that were left over both times. People donā€™t understand just how much bigger the task is than something one person can achieve, donā€™t understand how unsuitable much of our tooling is for these kinds of changes, donā€™t understand how much work gets done by individuals all the same, and then usually the only reaction from users and downstream package maintainers is annoyance that their builds broke and asking why it was allowed to happen. Hereā€™s a loooong comment I wrote a while back about the amount of work that goes into these migrations and the degree to which taking the easy way out as a downstream package maintainer amplifies the already large burden of working on these mass upgrades further.

Anyway, I appreciate the effort people put in to fix these things, and if you would like to help out with the early stages of the next compiler bump, please join the staging room on Matrix!

14 Likes

I also think itā€™s fantastic to see this kind of initiatives in nixpkgs! Iā€™ve created a tracking PR to monitor the progress:

1 Like

No, that wonā€™t work for a distro that accepts about any package. (even from very low quality upstreams)

3 Likes

You are completely correct. These were the things that came to mind first, but I should have spent the time to put them into proper order. I updated that section of the initial post.

Many thanks for the insights into the world of Nix staging development. I want to say that making and publishing this list was in no means meant to be accusatory or to facilitate blame. I just noticed very similar issues in a number of packages, decided to take a closer look and found quite a few packages more than I could tackle myself. So I made this list for fellow ā€œnixpkgs janitorsā€ as a guideline.

What you wrote about keeping (effectively) unmaintained packages alive resonated well with me. A lot of fixes I do are for packages that I have no interest in using myself. But the act of fixing it is engaging, educational, curious, strangely relaxing, and sometimes leads to discovery of a software Gem. Itā€™s my way of giving back since the time I can spend on nixpkgs is very spotty.

However, Iā€™d be very happy to join efforts (similar to ZHF) of fixing packages on the fringes of nixpkgs due to larger changes in staging. Iā€™m not sure if a list of volunteers or just an announcement in the forums (like ZHF) would be best, tho. Right now I donā€™t really notice whatā€™s going on in staging-next until it hits master.

Many thanks for that tracking PR. I was pondering how to keep that list up-to-date, but Iā€™ll just link there instead.

3 Likes

No worries at all :slight_smile: I didnā€™t get that from your post in the slightest and I appreciate you doing this! Iā€™ve just written out the response to ā€œcanā€™t we just make sure updates donā€™t break any package?ā€ so many times now that itā€™s something of a reflex.

Thereā€™s a cycle every few weeks so unfortunately I think the only real way to keep on top of things is to hang out on Matrix and use the reports. Thatā€™s quite highā€commitment, which is unfortunate. I really would like a system where maintainers get early warning of build failures on staging-next, but itā€™d take a lot of work to make that happen in a sensible way.

3 Likes