RFC: More C errors by default in GCC 14

TL;DR: I want to propose a GCC 14 change which will impact distributions, so I’d like to gather some feedback from NixOS.

Clang has disabled support for a few historic C features by default over the last few releases. This mirrors a process that Apple has begun in Xcode even earlier (perhaps motivated in part by their AArch64 Darwin ABI, which is pretty much incompatible with some of the C89-only features).

These changes bring real benefits to C programmers because errors are much harder to miss during the build than warnings. In many cases, the compiler is not able to generate correct code when such issues are present, and programmers who look at the generated machine code suspect a compiler bug. And all this happens because they missed a warning. So we want this change for GCC, too.

On the other hand, many distributions use GCC as the system compiler, and there the focus is not so much on developing software, but building the sources as they exist today. It’s somewhat different the usual GCC C++ updates (both language changes and libstdc++ header changes) because it impacts pre-build feature probing (mostly autoconf). If that happens and the probe goes wrong due to a new compiler error, it’s possible that a build still succeeds, passes its test suite, but lacks the intended feature or ABI because parts got automatically disabled due to the failing configure check. With C++ transitions, that seems more rare (C++ programs—if they use autoconf—often run the checks with the C compiler).

Specifically, I’m investigating the following changes:

  • -Werror=implicit-function-declaration: Functions can no longer be called without be declaring first. Fixing this may need additional prototypes in package header files, or inclusion of additional header files (both package-specific and system headers).
  • -Werror=implict-int: int types can no longer be omitted in old-style function definitions, function return types, or variable declarations or definitions. Fixing that involves adding the int type (or the
    correct type if it is not actually int). If there is already a
    matching declaration in scope that has a different type, that needs
    to be resolved somehow, too.
  • (tentative) -Werror=int-conversion: Conversion between pointer and integer types without an explicit cast is now a compiler error. Usually fixed by one of the two things above. I do not have data yet how many other cases remain after the initial issues are fixed, but should have that in the coming weeks. (Quite frankly, I’m amazed that we created 64-bit ports without making this an error.)
  • (very tentative) -Werror=incompatible-pointer-types: GCC will no longer automatically convert between pointer values of unrelated pointer types (except when one of them is void * or its qualified versions). The fallout from this could be quite large, I do not have data yet. Clang does this for function pointer types only (probably based on the Apple ABI issues), but I’m not sure if it’s a reasonable distinction for GCC-supported ABIs (the powerpc64le cases I’ve seen had explicit casts and no warnings, so no difference there).
  • For -Wdiscarded-qualifiers (e.g., using const pointers as non-const), and -Wpointer-sign (using char * as unsigned char *) I do not have any plans.

I want to propose at least the first two for GCC 14.

The exact mechanism how the default is changed may not be -Werror=, perhaps something along the lines of -fpermissive for C++. The C89 modes (-std=gnu89 etc.) will likely still enable all these features (even if they are not part of C89). As an opt-out mechanism, -std=gnu89 is insufficient because there are packages out there which use features which are C99-and-later-only in GCC (predominantly C99-style inlining, I believe) together with implicit-int/implicit-function-declaration.

For Fedora, we are using an instrumented compiler to find packages that need fixing. More details on the process are here (but please ask if you are interested in specifics):

The numbers so far don’t look great, but are manageable. Fedora has 23,101 source package last time a looked. We have fixed 796 of them, and 85 are still pending investigation (with 5-10 false positives expected remaining). This puts the per-package failure rate at 3.8%. I don’t have data on silent failures; most issues do not seem to be silent, and fully-silent packages are rare. The silent output-changing issues definitely exist, they are usually accompanied by something else. Those 3.8% also include some packages which we fixed by removing C89 constructs, but where the relevant autoconf tests failed for other reasons.

Fedora would be hit pretty hard if we made the GCC switch without this preparation because we do a mass rebuild of the entire distribution right after importing a new GCC upstream release. I have considered automating some of the autoconf updates, but usually it’s some generic autoconf issue (long since fixed in autoconf) plus a package-specific issue, so that doesn’t seem to be particularly helpful.

The changes we have made in Fedora are captured here:

In general, if there is an upstream reference for change (bug tracker, mailing list), we have not filed downstream bugs. Neither if it’s something that is the result of an old autoconf bug. I don’t know how useful this data is going to be for other distributions.

Gentoo has been fixing various packages for building with Clang, which covers a superset of the issues that need to be addressed:

IIRC, Gentoo has its own mechanism to detect silent build breakage, but I think it’s mostly focused on autoconf, so it’s less comprehensive, and also fixes the stuff that is actually relevant to the distribution.

Like the Fedora effort, they try to upstream patches (if an upstream is still around). Xcode/Homebrew/Macports users have upstreamed some patches as well, but perhaps less consistently so. Most upstreams are receptive to the changes. If they reject them, it’s mostly because of CLA processes. But for Fedora, there’s a large overlap between impacted packages and packages without an active upstream maintainer, which is perhaps not unexpected.

My questions

How would you manage this transition in NixOS? How do you normally consume new GCC upstream releases? How do you think this will impact you? Do you have any questions?

I’m going to summarize the discussion here for the c-std-porting list, and later for the actual GCC proposal (which will cover other distribution feedback as well).

7 Likes

I would say it’s a worthy change. And very painful to handle.

I think it’s very important to have a switch to revert to exact previous behavior in gcc-14 to ease falling individual packages back to previous state. I would say -std= tends to change too much and can’t be thrown into packages without much thought. Having a specific flag we at least can upgrade to gcc-14 without fixing the whole world.

nixpkgs normally switches default gcc (or clang) from one version to another in one step when enough packages are known to work.

I think we had a somewhat similar experience with -fno-common transition. NixOS took the following approach:

  • set -fcommon option via specs for a few gcc releases downstream (and patched llvm) and masked the problems
  • forget about it for two years (most obvious fixes landed upstream in that time)
  • create a separate CI job to build full repository by disabling downstream -fcommon override
  • fix or work around about 240 package failures found by CI job
  • once all packages are fixed disable -fcommon hack in gcc and merge the work to the main branch

The whole process of active fixing took about 2 months. This is 100 packages per month.

The main lesson for me here was that having a clean switch to revert back to previous behavior is very important: if the package fix is not trivial we can report the bug upstream without the fix and stick -fno-common in CFLAGS (nixpkgs has a reliable way to pass compiler options to gcc driver).

If NixOS is to follow the similar path it will probably take about 1.5 years to fully transition to a new default: looking at Gentoo’s https://bugs.gentoo.org/870412 bug tracker implicit errors impact is about twice as bad (~1800 packages) as 705764 – (-fno-common) [TRACKER] Packages failing with -fno-common -fno-common (~850 packages).

5 Likes

As implied by Trofi, in nixpkgs we tend to switch default gcc a bit later, avoiding many of the issues by “naturally occurring” package updates. I really appreciate you/Fedora cleaning up such garbage code in upstreams. For example, 12 became default during February 2023, and I think this time it was sooner than average for us.

I believe the first two bullets certainly shouldn’t be accepted by default nowadays (I write C for day job).

And yes, such upgrades get helped by having flags like -fcommon or -Wno-error=foo. Typically there are some cases that are hard to fix, and mixing compiler versions tends to be a worse option (especially if C++ is involved).

4 Likes