Pre-RFC: Gradual Transition of NixOS x86_64 Baseline to x86-64-v3 with an Intermediate Step to x86-64-v2

Sure but I don’t think that in itself needs an RFC; that’d just be an uncontroversial improvement like any other.

Taking advantage of it generally is would require a switch to clang/llvm as the default compiler. This is somewhat easy for us to do technically but may require a bit more community input given that GCC has been the standard compiler for Linux for most distros (including us) for almost the entire history of Linux distros and that it’s FLOSS vs clang’s permissive FOSS license.

Even there I’d only open an RFC when there’s enough people who’d actually care about that enough to block a PR implementing that.

1 Like

Where would someone propose this though without it getting buried?

You’d make a PR, state/show the benefits of it and start tagging people who might feel “responsible” or are knowledgeable about it.

To gather community feedback, you could also create an issue and tag it or a discourse thread like this one. This one certainly did not get buried :wink:

1 Like

My (home) desktop is an Athlon II X4 620, so only v1 support.[1]

Yes, it is old and I don’t want to recompile e.g. Firefox on it on a regular basis. But it is perfectly capable for web browsing, mail and editing Nix configs. Actually, having this modest machine was one of the reasons I got into NixOS. :slight_smile:

I’m all in for optimizing software that benefits from these extensions of the instructions set. But as NixOS can be a lightweight distro (aside from disk space), keeping compatibility with older machines would be nice.

How about introducing a “max-microarch-level” option and forcing it to x86_64_v1 for a core subset of derivations (such as the Kernel, Python) and maybe have two builds for a few selected huge packages (Firefox, Chrome, LibreOffice, VLC) such that older systems can still benefit from the cache (and those are the system that profit most).

[1] I don’t want to sound selfish here, just throwing in one of my machines as an example. Imho NixOS is a very fine system to keep older hardware alive and usable.

Edit/Addendum: The installer could make use of GitHub - HenrikBengtsson/x86-64-level: x86-64-level - Get the x86-64 Microarchitecture Level on the Current Machine to set the max-microarch-level in hardware-configuration.nix

2 Likes

Sadly, I don’t think that is reasonable. A GUI ISO contains a couple thousand derivations. Now, not all of those are packages obviously but I’d have called such an effort infeasible even if it was just hundreds of packages.

As far as I’m aware, we are not using O3 because sometimes decreases performance (by agressive unrolling, etc).

But, has anyone ever benchmarked a full-o3 system to an o2 system?

It might be more relevant than x86-64-v2 since it doesn’t involve dropping hardware.

O3 includes potentially optimisations that produce potentially unsafe and/or wrong code. We won’t use that.

2 Likes

Hm… Do we have metrics on misses on cache.nixos.org? If a lot of users have the max microarch set to v1 in their configs then we should see a lot of misses for the relevant packages and can selectively choose what to build.
Or we push on the development Trustix to get a (distributed) community cache of v1 builds.

1 Like

TL;DR Ubuntu is looking into this topic aswell and have come to the same conclusion as we have: There is no reliable data on the performance benefit and we need actual real-world people to test their workloads on real systems to find out.

2 Likes

My router is firewall appliance with a Celeron J3060 running NixOS. I could set up a builder if the change is made, but I’d really prefer not to have to do that.

2 Likes

Phoronix has a decent set of benchmarks on Ubuntu’s experimental x86_64-v3 ISO:

Summary:

The Ubuntu x86-64-v3 performance benefits overall were typically small but consistent. In some workloads the x86-64-v3 applications obtained from the archive could be a great deal faster but ultimately it comes down to a subset of software that will really benefit.

2 Likes

Now Fedora is looking into providing optimized packages too.

3 Likes

RHEL 10 will also go for it, apparently: https://developers.redhat.com/articles/2024/01/02/exploring-x86-64-v3-red-hat-enterprise-linux-10

But note that distros with very long support cycles (5–10 years) can be more aggressive about this than e.g. NixOS/NixPkgs, as just after several months you’d be left without compatible binaries on any maintained version.

1 Like

From the Fedora 40 announcement:

Systemd will be modified to insert the additional directories into the $PATH environment variable (affecting all programs on the system) and the equivalent internal mechanism in systemd (affecting what executables are used by services). Individual packages can provide optimized libraries via the glibc-hwcaps mechanism and optimized executables via the extended search path.

This is an interesting workaround for the “doesn’t work with static executables” limitation of glibc-hwcaps that we should be able to trivially replicate.

1 Like

tbf we could also do that but just announce the changes now and implement them 5 years later (eg announcing a v2 baseline this year and then actually implementing the baseline in 2029)

I’m afraid I fail too see an advantage in announcing such a thing years in advance.

Providing specific packages that make good use of the optimizations would be valuable for saving build time and storage space. However, being Nix, it’s already possible to build packages for specific CPU micro-architectures. This prevents using the cache as would be expected, but it’s also just really confusing to use, and depending on how you do it, it could involve rebuilding everything all the way down to the bootstrap tools just to optimize one package.

Before changing the default, I’d suggest we make it easier for users to select which micro-architecture or level to build for to apply to individual packages. If the package in question is a library, then dependant packages could use it with overlays, though this would also require rebuilding those dependent packages.

Another advantage we have being Nix is that changing the default does not mean completely dropping support. Much like how you can already set a micro-architecture higher than the baseline, increasing the baseline should still allow using older hardware at the cost of specifying so explicitly and less ability to use the caches. Assuming we can actually make setting the micro-architecture simple enough so people would actually understand how to do it, we could just say to do so in case one has older hardware. They’d need to compile manually, but it should be possible for Hydra to build at least stdenv for multiple micro-architectures, even if not all of nixpkgs.

TLDR: Increasing the baseline isn’t a compatibility problem for us like it is with other distros, but rather a documentation problem.

Also, Nixpkgs will simply deny v4 optimizations when using GCC 12, which is the default on unstable right now. Staging has updated to GCC 13, but it could be some time before that propogates to other channels.

1 Like

There’s rarely such a thing as “optimising one package”. The vast majority of packages use libraries for much (if not most) of their work. This means these libraries also influence the package’s performance.

I.e. optimising ffmpeg would likely gain very little (if any) vorbis encoding performance because ffmpeg uses a dedicated library for the performance-critical part of that.

*no ability to use the caches.

You likely wouldn’t even be able to boot an installer ISO on an unsupported CPU.

Raising the default baseline will exclude the majority of users of then-unsupported hardware. There’s no way around it.

Just the stdenv wouldn’t really help anyone. If you’re going to be building your average desktop closure, having the stdenv cached or not doesn’t make a huge difference.

Suffice to say, this is not a feasible option if you want users of then-unsupported hardware to still be able to reasonably use NixOS. Especially considering their hardware likely being highly unsuited for compiling modern software.

If anyone should be building their own packages, it’s the users of powerful modern CPUs who need the handful of % mean improvement for …what reason exactly?


Anyhow, I think it’s reasonably well established that:

  • There is a point to this. With v3, the increase is small and highly package dependant but it’s there, it’s real and likely desirable, especially for those few packages.
  • There are many users who would be effectively excluded from using the distro were the baseline raised to v3 and even some who would be excluded by v2.
  • We’re can have our cake and eat it too at the expense of a closure size increase (how much?) using glibc-hwcaps in addition to the $PATH workaround for the executables themselves.

The next step is to build code infrastructure for glibc-hwcaps usage and use it on packages&libraries that benefit.
Until that is done and well established, I think we can pause any discussion on raising the baseline.

7 Likes

Would this change only apply to Linux or also include Darwin?

Darwin has two levels of support:

  • x86_64: Implies SSE3 but not necessarily SSE4.2.
  • x86_64h: This appears equivalent to x86-64-v3. It targets Haswell CPUs or newer.

The only caveat is Rosetta 2 does not support AVX, so it can’t really benefit from x86_64h. However, the most likely approach would be to build fat binaries supporting both architectures, so it could continue to run x86_64 code. And if you really concerned about performance on an aarch64 Mac, you should be using native code anyway.

(Whether it’s worth the effort is a separate question. My inclination is to say it’s not.)