Feature Request: CachyOS-like optimized pre-built binaries for various platforms

Hi,

It would be great if Nixpkgs had pre-built binaries optimized for various architectures, from Zen 2/3 onwards for AMD, and similarly for Intel, ARM, and RISC (?) architectures.

Assuming you mean beyond what the cache currently offers, I think only a vanishingly small amount of programs would see any measurable improvements in performance to the point where the additional storage and compute requirements would not outweigh the benefits. Trust me on this one, I used to do full -funroll-loops rice back in the 2007-2009 Gentoo heyday, had to deal with 24+ hrs compile times and saw barely any improvements :v

That being said, offering an “x86-64+” pseudo-architecture allowlisting newer CPU instructions (think AVX & friends) for a small subset of packages that do benefit from them (e.g. media encoders like ffmpeg, but i think they already handle it somewhat on their own) and defaulting to regular x86-64 for everything else could be discussed.

ARM and RISC platforms are too fragmented to attempt this, IMHO.

3 Likes

Whilst I was also a fan of that at some point (and did maintain a cache for my system with v3 builds), but dropped it at some point due to a lot of extra afford needed).

I think without any obvious changes (that I am not aware) it will not happen due to the afford vs cost for the infrastructure etc.

it’s basically the same discussion like here https://discourse.nixos.org/t/pre-rfc-gradual-transition-of-nixos-x86-64-baseline-to-x86-64-v3-with-an-intermediate-step-to-x86-64-v2/

1 Like

Does it need funding? Maybe frame.work guys or some other sponsors can help with that.

I think frame.work would love to see NixOS running blazing fast on their systems.

Maybe even System76–what a lazy and dull company name. :stuck_out_tongue:

As already discussed elsewhere, there are no good data to indicate that this results in a notable performance improvement. Why would we lock out a significant chunk of the userbase to make no improvement?

If you have specific programs that would benefit, burn the trees needed to recompile those; otherwise it’s a waste of time.

5 Likes

Less of this, please.

10 Likes

According to Phoronix:

It’s not too entirely surprising given the aggressive stance that the CachyOS Linux distribution has taken on out-of-the-box performance, but for those curious, it continues largely leading over the newly-released Ubuntu 26.04 LTS and Fedora Workstation 44 distributions for the leading performance on modern hardware.

I myself have also installed CachyOS next to NixOS and it seems a bit more responsive.

Plus, it doesn’t lock out any of the userbase, as CachyOS—for example—also has x86-64-v3 binaries for older CPUs.

Modern CPUs have many instructions that increase performance but are left unused with generic binaries. Dedicated binaries for each new architecture makes sense.

This leads to the Gentoo paradox: if someone’s system is relatively old, compiling packages locally take a very long time. So CachyOS addresses that paradox by building the binaries on the server side.

The NixOS foundation is a non-profit with limited resources.

The small performance gain for a small subset of users would unnecessarily: 1. Lock many users out, since they would need to compile for their older (and likely slower) system that doesn’t meet the new baseline OR 2. Widely increase the cost of build infrastructure and binary cache usage and be a larger maintenance burden by building for both new and old architectures.

If the performance gain really is that important for your workload, you have several options:

  1. Use CachyOS as it is designed for your use case
  2. Configure nixpkgs to compile a subset of your desired programs with cpu optimizations
  3. Go full Gentoo and compile your entire system with cpu optimizations. Your cpu is likely reasonably fast enough to do so if it will benefit from said options

2 and 3 are just a few lines in your NixOS config.

3 Likes

I think that if someone cares enough to initiate additional variants they can do the work to make that happen. And I think they’d be very welcome to do so, by the vast majority of this project.

What such work would entail is beyond me. Consider contacting the Nixpkgs team and declaring your skillset and availability towards this endeavor.

I, for one, want faster software.

1 Like

AFAIK CachyOS responsiveness is mostly due to some kernel config and a boot param (preempt=full, think that needs some kernel build config too), also sched_ext/scx-based schedulers. I think the main one is CONFIG_HZ=1000 in the kernel build, but you should be able to find more params if you want. I think the other packages have mostly marginal benefits from their compiling optimizations.

2 Likes

I agree that if it’s financially not feasible, it’s a no go.

I disagree that architecture-optimized binaries and kernels do not lead to a boost in performance.

I’ve already enabled preempt=full, but that’s just like the tip of the iceberg.

Looks like the out of the box scheduler is a “tuned EEVDF scheduler”.

But:

The linux-cachyos kernel ships with a tuned scheduler for responsive interactivity, plus options for BORE, sched-ext, BMQ, and RT. All kernel builds are CPU-optimized with x86-64-v3/v4, Zen4 and LTO.

I was just requesting a feature, not asking for alternatives, as NixOS is my preferred distro, not CachyOS or Gentoo.

I think it’s better to close this thread as it’s going off rails.

It’s not a matter of opinion, that’s just a fact that it generally doesn’t. The data show that compiling for -v3 shows basically no improvement in most cases, and this holds across distros (Ubuntu found a 1% improvement in their case, for example). The computer architecture doesn’t get affected by whether you disagree.

2 Likes

You should be able to get the CachyOS kernel/scheduler, ananicy rules, and proton-cachyos in Nix via Chaotic’s Nyx (recently revived).

I was curious so I looked at https://www.phoronix.com/review/ubuntu-2510-amd64v3 for some benchmarks.

Briefly glancing over the figures does concord with the idea that the benefit (or regression) in most situations was slight from enabling v3.

There was a big win with RawTherapee that’s probably worth noting and experimenting with, and it doesn’t necessarily seem generalized by software type given that DarkTable actually performed ever so slightly worse with v3.

2 Likes