Hello friends,
We’re trying to replace a custom-built toolchain package manager with nix. So far it’s going well, but we hit a pretty large roadblock with the performance of the Python interpreter that nix provides. On average, running the nix Python 3.8 interpreter results in a ~20% performance penalty for most operations.
I’ve narrowed this down to GCC or something that deals with the interpreter & its compilation. However, the compilation flags used for the nix Python and the Ubuntu Python are the same. Any guidance or insight into why the Python interpreter is slower with nix would be much appreciated.
With a little benchmark script, we see the difference more clearly:
Python 3.8.0 (via apt-get) + Ubuntu 18.04 + GCC 7.5.0
7.592594s (oct)
1.802671s (iter str)
1.616609s (list str)
1.426657s (map)
1.073226s (gen 1000000)
11.625021s (gen 10000000)
1.694831s (small json dump)
0.945864s (small json load)
16.688743s (big json dump)
11.152032s (big json load)
0.357358s (small pickle dump)
0.384992s (small pickle load)
3.645764s (big pickle dump)
4.759644s (big pickle load)
Python 3.8.12 (via python38) + Nix 2.3.15 + Nixpkgs 21.11 + GCC 10.3.0
10.383724s (oct)
2.246434s (iter str)
2.022653s (list str)
1.788243s (map)
1.342692s (gen 1000000)
14.281956s (gen 10000000)
2.212560s (small json dump)
0.956572s (small json load)
21.813559s (big json dump)
11.343651s (big json load)
0.325251s (small pickle dump)
0.425216s (small pickle load)
3.472391s (big pickle dump)
5.079145s (big pickle load)
This benchmark script runs a bunch of tests in a loop for built-in modules; there are no 3rd party imports
yet.
To further isolate things; I even downloaded the official Python 3.8.13 source from python.org, and compiled it inside and outside a nix-shell. The results were the same, with the nix version being slower. This points to something with how nix compiles Python; and you can reproduce this on an Ubuntu client with, it’s the same benchmark as (oct) above:
mkdir /tmp/py38; cd /tmp/py38wget https://www.python.org/ftp/python/3.8.13/Python-3.8.13.tgztar -xvf Python-3.8.13.tgzcd Python-3.8.13./configure --enable-optimizationsmake -s -j./python -c "import timeit; print(timeit.Timer('for i in range(100): oct(i)', 'gc.enable()').repeat(5))"
On my machine with Intel® Xeon® Gold 5118 @ 2.30GHz; this results in about a 7.6-7.7s average.
Then running the same test on nix:
cd /tmp/py38/Python-3.8.13make cleannix-shell --pure -I nixpkgs=http://nixos.org/channels/nixos-21.11/nixexprs.tar.xz -p stdenv./configure --enable-optimizations./python -c "import timeit; print(timeit.Timer('for i in range(100): oct(i)', 'gc.enable()').repeat(5))"
The results will be around 10.4s on average when doing this inside nix-shell.
This all points to something below the Python layer, maybe how GCC is invoked. We don’t really understand why arithmetic operations are impacted this much; if both are compiled with all optimizations.
Thanks in advance, and if any info is needed / tests suggested, I’m happy to try!