Pre-RFC: Gradual Transition of NixOS x86_64 Baseline to x86-64-v3 with an Intermediate Step to x86-64-v2

nyanbinary · November 23, 2023, 5:45pm

Summary

This pre-RFC proposes a phased update of NixOS’s x86_64 baseline architecture, transitioning to x86-64-v2 in 2024 and subsequently to x86-64-v3 in 2027. This strategy is designed to gradually optimize NixOS for modern CPU architectures, balancing performance improvements with compatibility and transition challenges.

Background

NixOS currently supports a broad range of x86_64 processors. However, this extensive compatibility limits the utilization of advancements in newer CPU architectures. A gradual transition would allow NixOS to progressively adopt modern CPU features while maintaining broader hardware support.

Phase 1: Transition to x86-64-v2 (2024)

Implementation

Compiler Updates: Shift compiler flags to -march=x86-64-v2, targeting CPUs supporting up to SSE4.2 and SSSE3.
Evaluation: Assess compatibility and performance impacts on the existing package ecosystem.

Rationale

Intermediate Compatibility: x86-64-v2 offers slight improved performance over the current baseline while maintaining broader compatibility, including with some older CPUs.
Building Foundations for Phase 3: This phase sets the stage for the subsequent transition to x86-64-v3, allowing time for testing and community adaptation.

Phase 2: Transition to x86-64-v3 (2027)

Implementation

Compiler Adjustments: Update to -march=x86-64-v3, optimizing for CPUs with support for AVX2 and additional bit-manipulation instructions.
Legacy Support Strategy: Develop and maintain strategies for supporting systems incompatible with x86-64-v3.

Rationale

Maximizing Performance: The shift to x86-64-v3 aims to fully leverage modern CPU capabilities, significantly boosting performance for various applications.
Alignment with Industry Trends: The transition aligns NixOS with industry trends in the Linux space towards utilizing more advanced CPU features. This includes distributions like Arch Linux, CentOS, Red Hat, and openSUSE Linux distributions.

Drawbacks and Challenges

Compatibility

Reduced Compatibility: Each phase might render NixOS incompatible with some older systems, potentially impacting a subset of the user base.
System Types vs. Features: There’s an ongoing discussion about whether these architectural levels should be system types or features within NixOS, with implications for package compatibility and system configuration.

Migration Complexity

Implementation Challenges: The phased approach requires meticulous planning, testing, and execution.
Toolchain and Build System Adjustments: Accommodating new compiler flags and architecture levels in the NixOS toolchain and build systems.

Maintenance Workload

Increased Burden: Switching architectures adds to the maintenance workload.
Ofborg Support: Making Ofborg work nicely with the different microarchitecture levels and making sure all the tests run smoothly as well.

Contingency Plan

Staged Rollouts with Community Feedback: Implement changes in stages, with active monitoring and adjustments based on feedback.
Enhanced Testing and Validation: Prioritize extensive testing and validation before each phase.
Community-Led Support for Legacy Systems: Develop initiatives for supporting users with older, incompatible hardware.
Transparent Communication: Continuous communication with the community on transition status and issues.
Adaptation Rather Than Reversion: Focus on adapting the approach based on encountered issues.

Alternatives

Single-Step Transition to x86-64-v3: Potential for greater compatibility issues.
Extended Support Periods: Prolonging support for each architecture level for smoother community adaptation.

Benchmarking

NixOS Benchmarks

Existing Data: Limited benchmarks are available specifically for NixOS regarding x86_64 microarchitecture levels. This scarcity underlines the need for NixOS-specific benchmarking initiatives.
PedroHLC
SuperSandro2000 PR
Proposed Actions: Conduct a series of comprehensive benchmarks, focusing on key performance metrics across various workloads and comparing x86-64-v1, v2, and v3.

CachyOS Benchmarks

Overview: CachyOS, an Arch Linux-based distribution, provides valuable insights with its v3 packages & kernels.
Reference Use: Analyze CachyOS benchmark data to infer potential performance improvements in NixOS under similar conditions.

Arch Linux Benchmarks

Findings: Arch Linux’s benchmarks provided from a maintainer in a mail list have shown a 10–20% improvement when transitioning from v1 to v3.
Implications for NixOS: These results suggest possible performance gains for NixOS, emphasizing the need for a detailed comparison under NixOS’s unique environment.

Additional Sources

Community Contributions: Encourage NixOS community members to share their benchmarking results.
External Comparisons: Look at benchmarks conducted by other Linux distributions or independent reviewers to gain a broader understanding of the performance impacts.

Unresolved Questions

Community Engagement: How to effectively communicate and implement the transition plan within the NixOS community?
Hardware Compatibility Checks: Mechanisms for users to easily determine the compatibility of their hardware with each new baseline.
Support Mechanisms: Strategies to support users and maintainers during each transition phase, especially those with older hardware.

RaitoBezarius · November 23, 2023, 6:03pm

It would be nice if anyone wanting to work on this would take the time to read systems/architecture: bump default architecture to x86-64-v2 by SuperSandro2000 · Pull Request #202526 · NixOS/nixpkgs · GitHub and summarize it and put the important objectives that we higlighted in that PR to make this pre-RFC realistic.

As of now, it contains no useful information for NixOS stakeholders.

nyanbinary · November 23, 2023, 11:33pm

Arch Linux Community Discussions: On the Arch Linux mailing list, there’s a discussion about the shift to x86_64-v3 microarchitectures. See the Benchmark here.
Sunnyflunk’s Analysis: A GitHub user named Sunnyflunk provides a comprehensive analysis of the x86-64-v3’s performance, revealing a varied/mixed bag impact across different applications. Refer to the analysis here.
CachyOS Performance Insights: Phoronix tested CachyOS, an Arch Linux-based distribution with v3 support, and reported some notable performance improvements. You can go through the data here.
CentOS ISA Performance Investigation: The CentOS ISA Special Interest Group conducted an extensive review of different ISA levels, including x86-64-v3, offering valuable insights into performance changes. Their detailed findings can be found here.
Red Hat’s Strategy with RHEL 9: Red Hat discusses their decision to build Red Hat Enterprise Linux 9 for the x86-64-v2 microarchitecture level, considering various CPU incompatibilities and performance. The blog post is available here.

Atemu · November 24, 2023, 10:11am

First of all, thank you for looking into this topic. Although I am highly sceptical of the benefits, I also think we should take advantage of them should they actually exist, so I support any efforts towards clearing up this matter.

It’s been reasonably well shown that generic compiler optimisations can provide significant benefit for many applications. Clear Linux significantly outperforms most generic Linux distros across a wide range of tasks by using package-specific optimisations. These often include march flags aswell but I don’t think it’s clear whether these drive the performance benefit.

Benchmarks I have seen I’ve seen on raising march have not been very convincing so far. They usually have massive biases (i.e. selecting only packages which are known to benefit from generic compiler optimisations), do not actually test µArch optimisations in isolation but in combination with unsafe optimisations (-O3, which we will not use) and none of them demonstrate benefits for users, only higher (or lower) numbers in a collection of semi-synthetic benchmarks.

Based on this rather poor quality data, you can already tell that the benefit is highly dependant on the specific package. The amount of packages that benefit significantly appear to be rather low, possibly less than 50%.

This is conjecture but I additionally do not believe that most of these synthetic benchmarks necessarily reflect a better user-experience.
I think we should instead focus on applications which users actually need to be performant. On the desktop, this would include commonly interacted tools such as coreutils, browsers, text editors, word processors and the like.

A note on hardware:

The thought that compiler optimisations such as AVX only exclude “older systems” (as in: decades old) is wrong.

There is hardware as recently released as this year which does not support AVX of any kind: Tremont (microarchitecture) - Wikipedia. Moving to v3 in 2027 would exclude this hardware 4 years after release.

Such low power Celerons are somewhat popular in the homelab scene for their extreme power efficiency and low prices (commonly available in used thin clients).

My NAS uses a Celeron J4105 from 2017 (Goldmont Plus) and I know that @musicmatze uses a similar chip aswell.

I am of the opinion that, if compiler optimisations only really help a small group (or category) of packages, they should be applied to those specific packages only. Ideally by switching the code paths at runtime which many packages already do.

If you can show data which shows a more wide-spread significant increase in performance (let’s say a median improvement >5% across 10^2-10^3 “desirable” packages), I’d revise that opinion as that’s too many to reasonably “optimise” by hand.

All in all, this whole endeavour gets a big rejection from me until there is clearer data showing the benefits.
As a general purpose distro, we should not take excluding hardware lightly. Even if it actually is very old, there might still be uses left for it. Not to mention people who aren’t quite as socioeconomically privileged as most of us whose only access to anything resembling a PC is ancient hardware we threw away.

It is also worth mentioning that the people who really need it (i.e. HPC people or misguided Gentoomen) can and could always apply these generic tree-wide optimisations themselves for their environments.

Flakebi · November 24, 2023, 10:15am

Many of these analysis are also mentioned in the issue @RaitoBezarius posted.
Unfortunately, many of them have problems that make it hard to know if x86-64-v3 is beneficial for NixOS:

Sunnyflunk’s Analysis: As mentioned at the end of the blog post, it didn’t compare x86-64 to x86-64-v3, but it also change other compiler flags, so the results are useless if we want to show that x86-64-v3 is worth it.

However, this post was intended to be more about x86-64-v3, but some quick tests (which requires further analysis) suggest that CachyOS using -O3 is what’s actually responsible for some of the larger gains rather than x86-64-v3.

CachyOS Performance Insights: Same comparison as 2., same problems.
CentOS ISA Performance Investigation: As mentioned in the blog post, it didn’t compare x86-64 to x86-64-v3, but it also changed the compiler version, so the results are useless if we want to show that x86-64-v3 is worth it.

Given that we changed both the compiler version and the baseline, we dug into which of those variables contributed the most impactful change to the results. For the latter two benchmarks, we saw a 2.2x speed up. Mocassin seems to benefit the most from the auto-vectorization that GCC12 does.

As also mentioned in the a fore mentioned issue, I think we need benchmarks of a proposed NixOS change, so we can see what performance uplift we really get.

gytis-ivaskevicius · November 24, 2023, 12:52pm

I feel like waiting till 2027 to migrate to x86_64-v3 puts Nix quite behind the industry. Does NixOS Foundation have enough resources to run v3 or maybe even v4 builds in addition to standard x86_64? If that’s the case, it might be the right way to go about it purely from marketing perspective

RaitoBezarius · November 24, 2023, 1:07pm

Nope, we don’t have the resources to do so.

Atemu · November 24, 2023, 2:34pm

I don’t believe we should follow “industry trends” just for the sake of following them.

In the current moment, there are very clear downsides to moving to new march targets and little to no good data on the benefits.

“But everyone is doing it” does not count as a benefit IMHO.

nixinator · November 24, 2023, 5:11pm

what resources do you need?

maybe the companies that are interested in these optimisations can help with those resources ?

RaitoBezarius · November 24, 2023, 5:13pm

~100-400TB extra of S3 storage and maybe something like 1000ish cores of compute time to mass rebuild all of that when needed?

Given that no one proved they bring anything to the table, that’s highly uncertain.

nixinator · November 24, 2023, 5:14pm

whoa…

ok, that’s a big ask.

Agreed.

delroth · November 24, 2023, 7:08pm

(Playing devil’s advocate, I’m not personally convinced that -v2 does much performance-wise.)

FWIW, x86_64-v2 doesn’t mandate AVX. Steam’s hardware survey (one of the best public data source to look at, IMO) shows SSE4.2 at 99.52% availability, vs. AVX at 97.28%.

I don’t know if I generally agree with that. We probably exclude more users and interesting use cases by not supporting ARMv7 than we’d do by moving the x86_64 baseline to -v2. NixOS makes it trivial for users to build from source if they have specific architecture constraints, so “someone might need it” doesn’t seem like a super strong argument to me.

hexa · November 24, 2023, 8:08pm

For machines out there that are used for gaming. Skips over lots of machines out there: servers, routers, workstations.

I’m pretty sure that due to growing system requirements, the average gaming PC is more modern, than everything else out there.

pauldoo · November 24, 2023, 8:51pm

How do enthusiastic NixOS users go about testing the impact of these flags themselves? The last time I tried to set build flags to optimize for a specific x86-64 psABI level for my whole system following the guide on the wiki, it simply didn’t seem to work: Nix CPU global CPU flags - #2 by pauldoo

RaitoBezarius · November 24, 2023, 8:53pm

I would argue the only real reason we don’t support ARMv7 is that because it’s hard to have it in CI, we have a… surprisingly good and active maintenance of ARMv7 in NixOS (yes, people are running systemd with it and what not.)

RaitoBezarius · November 24, 2023, 8:56pm

In the past, I did the work to look into this, you can use two of my branches towards this:

They simulate what would be the changes to nixpkgs if we bumped the minimal baseline.

I built them over https://hydra.newtype.fr/jobset/nixos/trunk-combined-x86_64-v2 and https://hydra.newtype.fr/jobset/nixos/trunk-combined-x86_64-v3, but I think I removed recently the binaries because I wanted to bump with a recent unstable and use the new timeout features for tests because my Hydra often ended up stuck in NixOS tests for no reason.

If people are interested, I can rebase, clean up and ask Hydra to re-evaluate.

(The Hydra links are IPv6-only, I am sorry for people who may not have IPv6, I do not have money to spare on IPv4.)

pauldoo · November 24, 2023, 8:59pm

Also, with glibc-hwcaps, shouldn’t it be possible to provide multiple compiled libraries in a single package. One could be for x86-64 and another for x86-64-v2, and a third possibly for x86-64-v3.

It would also allow for this to be enabled or disabled at a package level. Some performance sensitive packages could build for multiple levels (media codecs, compression libraries, etc), while others might opt to build only for the baseline (a basic text editor, mkfs, lots of other examples).

RaitoBezarius · November 24, 2023, 9:03pm

This doesn’t change the storage costs.

If only someone can come up with a list of package that benefit from it.

pauldoo · November 24, 2023, 10:31pm

Surely it must. There is more to a compiled package than the binaries and libraries. There are all sorts of other assets. Using glibc hwcaps only the libraries and binaries are duplicated, not the entire package.

Atemu · November 25, 2023, 6:45am

The note was pertaining x86_64-v3.

v2 is a much easier pill to swallow as hardware without SSE4 really is getting to the point of not being useful anymore as even basic ARM SoCs outperform the best CPUs from that era nowadays. Even there I’d err on the side of caution though.

With v2 however, the benefits are even more questionable than with v3.

I’m all for supporting armv7l-linux too. I’ve got two older RPIs that I’d like to put NixOS on.

Difference is that we never supported armv7l-linux to any decent capacity while x86_64-v1 has pretty much always been supported.

The problem is that we don’t know who might need it. It could be literally noone or thousands; we’re blind here.

That could probably happen organically.

For example, let’s say someone wants to compress their music library to a higher FLAC level to save on storage. Being a typical NixOS user, they might spend an unreasonable time optimising the re-encode to be a few minutes faster. Assuming such a flag optimising for separate HWCAPS was already proliferated in Nixpkgs, they might try it out to see whether it makes a difference and whip up a quick PR if it shows a significant benefit.

What I also like about the glibc HWCAPS approach is that we could optimise packages for even higher levels (i.e. x86_64-v4 with AVX512) where I wouldn’t be surprised if gains were quite significant without breaking the other >90% of users’ systems.