Pre-RFC: Gradual Transition of NixOS x86_64 Baseline to x86-64-v3 with an Intermediate Step to x86-64-v2

So even for computational demanding software the performance difference between v2 and v3 is mostly within the error margins.

My takeaway: Stick with current settings and add a good guide that:

  • explains how to optimize individual packages by setting architecture feature flags
  • explains how to confirm that there is an actual speed-up (which is not guaranteed, see zstd compression in the CachyOS benchmarks) on the target machine and with the expected workload.

But I guess people that have a real requirement for these optimizations (e.g. in the HPC context) are already tuning their packages.

5 Likes

Came here to post this too.

General observation seeems to be not much difference except some particular cases, which matches discussions so far generally.

But who would have guessed that PHP was one of those cases?!

Yep, I had this intuition when I was doing performance diagnostics on Nextcloud deployments and looking at perf for a while. I think PHP is not exploiting the hardware in any serious way via cpuid alas.

But this is an interesting observation, this makes it really compelling to encourage people to use PHP on a higher tier if they need that performance boost on modest CPUs albeit having advanced CPU instruction sets.

Though, I guess now it’d be interesting to do real world benchmarks on PHP applications. :slight_smile:

Just a footnote:

performance difference between v2 and v3 is mostly within the error margins.

Ok, but we’re not even enabling v2 yet IIRC. It seems an easy win to enable compilation for v2 given its broad compatibility (any post~2009 cpu). This would allow us to test the waters and identify misbehaving packages/toolchains early. instead of relying on random end user builds to discover packages that break with arch specific flags, and have to patch in exceptions for those packages.

Also eventually I’d love to see PGO/BOLT enabled packages on NixOS but it seems unlikely we’d be able to get there if we can’t stabilize builds enabling post-2009 cpu features.

I don’t see any data suggesting a significant “win” if you account for the insane error margins that the measurement bias imposes.

PGO/BOLT don’t exclude any hardware to my knowledge. They are an entirely tangential topic.

2 Likes

Pre-RFC: Gradual Transition of NixOS x86_64 Baseline to x86-64-v3 with an Intermediate Step to x86-64-v2 - #38 by riceicetea
we are not using O3 because sometimes decreases performance (by agressive unrolling, etc).

But, has anyone ever benchmarked a full-o3 system to an o2 system?

I’ve asked the CachyOS people (ricers, I know!) about this kind of thing over telegram, and their response is that 2020s CPUs actually respond well to -O3, but only if you also use x86-64-v3.

See also Sunnyflunk.

Pre-RFC: Gradual Transition of NixOS x86_64 Baseline to x86-64-v3 with an Intermediate Step to x86-64-v2 - #39 by Atemu
O3 includes potentially optimisations that produce potentially unsafe and/or wrong code.

Are you confusing -O3 with -Ofast, or are you talking about undefined behavior getting surfaced? Either way, it’s not totally the compiler’s fault, and buggy packages can be marked as O2-only anyways until more investigation is done.

(There are some more commonly used UBs, but compilers have accommodating flags like -fwrapv. Some use fno-strict-aliasing too.)

That’s nice but that’s not data.

There is a ton of things you can unknowingly do wrong when producing such data (see my previous post), so even if it was data, it’d have to be extremely clear and plentiful, not just a minor difference in a subset of benchmarks.

I’m not entirely sure where I have that from, so it might not be true (anymore?) but I don’t mean -Ofast. -Ofast enables things that knowingly break certain aspects of specifications/standards.

Long ago gcc -O3 really used to produce buggy code relatively commonly IIRC. I don’t think that should hold anymore. But if -O3 was a good default in general (e.g. for a whole distro), I wonder why gcc is still keeping -O2 as the default.

2 Likes