Why is the nix-compiled Python slower?

I stumbled upon the regression section on the new wiki today.

I repeated the synthetic benchmark between Ubuntu Noble (3.12.2, in Docker) and NixOS Unstable (3.12.2) and apparently we are faster now?

root@18cfe19cdf22:/# python3 -c "import timeit; print(timeit.Timer('for i in range(100): oct(i)', 'gc.enable()').repeat(5))"
[4.54713627404999, 4.525180662050843, 4.5404908519703895, 4.536089212982915, 4.561447929008864]
❯ nix-build -A python312
/nix/store/fmwqa8nvva4sh18bqayzrilrzxq9fm0f-python3-3.12.2
❯ ./result/bin/python3.12 -c "import timeit; print(timeit.Timer('for i in range(100): oct(i)', 'gc.enable()').repeat(5))"
[3.88701227796264, 3.9353582749608904, 3.904149228008464, 3.8978957710787654, 3.8842451529344544]

Additionally, I’ve also corrected some assumptions in that section, notably that LTO was disabled entirely, which is not the case. It is however only enabled on Linux x64.

LTO had initially been enabled unconditionally back in 2021-05, so I’m not sure why the post said it wasn’t enabled at all.

Feel free to fact-check me.

https://wiki.nixos.org/w/index.php?title=Python&diff=11303&oldid=11301

2 Likes

Awesome, thanks for checking and updating! FWIW, I did check if we had LTO enabled and from what I could tell, on the channel/derivation we were trying to use, it is disabled, see:

enableOptimizations = false;
enableLTO = false;

Also, at least internally, testing various versions of Python, I did find that 3.6 was slower than 3.8 and 3.8 was slower than 3.10. It could make sense that 3.12 is faster than 3.8 or 3.10. :slight_smile:

Regarding the original post, at least through my own testing, the performance of the nix 3.8 Python matches a non-LTO and non-PGO compilation on a Ubuntu 18.04 x64 system.

Your wiki update looks fine to me!

1 Like

That’s python3Minimal, which you more or less shouldn’t be using unless you need to.

1 Like

thanks, that was helpful!

btw, here is a buildFHSUserEnv version if someone needs it:

{ pkgs ? import <nixpkgs> {} }:
(pkgs.buildFHSUserEnv {
  name = "python optimized";
  targetPkgs =
    let python310FullOptimized=pkgs.python310Full.override {
        enableOptimizations = true;
        reproducibleBuild = false;
        self = python310FullOptimized;
      };
    in pkgs: (with pkgs; [
      python310FullOptimized
      # Other stuff you want to include form pkgs ...
      python310Packages.pip
      ]);
  multiPkgs = pkgs: (with pkgs; []);
  profile = '''';
  runScript = "bash";
}).env

However, on my machine, there was no speedup when i executed this benchmark test: python -c "import timeit; print(timeit.Timer('for i in range(100): oct(i)', 'gc.enable()').repeat(5))". In fact, it was even slightly slower… It could be, because of 3.10 is faster anyway and optimization therefore as a smaller impact…

Another dimension to note, nixpkgs generally disables architecture optimization out of fear of creating binaries which are incompatible with older hardware. So the CPython interpreter may be less optimized as well. However, not sure how other package managers handle issues like this.

Macports for one does deal with this - but with a different release and build for each version of macOS SDK and macOS itself drops older hardware so the later releases can optimise for newer hardware.

If you have old hardware you use the older macports version. The oldebrmacports version’s will keep getting macports upgrades.