Hello friends,
We’re trying to replace a custom-built toolchain package manager with nix
. So far it’s going well, but we hit a pretty large roadblock with the performance of the Python interpreter that nix
provides. On average, running the nix
Python 3.8 interpreter results in a ~20% performance penalty for most operations.
I’ve narrowed this down to GCC or something that deals with the interpreter & its compilation. However, the compilation flags used for the nix
Python and the Ubuntu Python are the same. Any guidance or insight into why the Python interpreter is slower with nix
would be much appreciated.
With a little benchmark script, we see the difference more clearly:
Python 3.8.0 (via apt-get) + Ubuntu 18.04 + GCC 7.5.0
7.592594s (oct)
1.802671s (iter str)
1.616609s (list str)
1.426657s (map)
1.073226s (gen 1000000)
11.625021s (gen 10000000)
1.694831s (small json dump)
0.945864s (small json load)
16.688743s (big json dump)
11.152032s (big json load)
0.357358s (small pickle dump)
0.384992s (small pickle load)
3.645764s (big pickle dump)
4.759644s (big pickle load)
Python 3.8.12 (via python38) + Nix 2.3.15 + Nixpkgs 21.11 + GCC 10.3.0
10.383724s (oct)
2.246434s (iter str)
2.022653s (list str)
1.788243s (map)
1.342692s (gen 1000000)
14.281956s (gen 10000000)
2.212560s (small json dump)
0.956572s (small json load)
21.813559s (big json dump)
11.343651s (big json load)
0.325251s (small pickle dump)
0.425216s (small pickle load)
3.472391s (big pickle dump)
5.079145s (big pickle load)
This benchmark script runs a bunch of tests in a loop for built-in modules; there are no 3rd party imports
yet.
To further isolate things; I even downloaded the official Python 3.8.13 source from python.org, and compiled it inside and outside a nix-shell
. The results were the same, with the nix
version being slower. This points to something with how nix
compiles Python; and you can reproduce this on an Ubuntu client with, it’s the same benchmark as (oct)
above:
mkdir /tmp/py38; cd /tmp/py38
wget https://www.python.org/ftp/python/3.8.13/Python-3.8.13.tgz
tar -xvf Python-3.8.13.tgz
cd Python-3.8.13
./configure --enable-optimizations
make -s -j
./python -c "import timeit; print(timeit.Timer('for i in range(100): oct(i)', 'gc.enable()').repeat(5))"
On my machine with Intel® Xeon® Gold 5118 @ 2.30GHz; this results in about a 7.6-7.7s average.
Then running the same test on nix
:
cd /tmp/py38/Python-3.8.13
make clean
nix-shell --pure -I nixpkgs=http://nixos.org/channels/nixos-21.11/nixexprs.tar.xz -p stdenv
./configure --enable-optimizations
./python -c "import timeit; print(timeit.Timer('for i in range(100): oct(i)', 'gc.enable()').repeat(5))"
The results will be around 10.4s on average when doing this inside nix-shell
.
This all points to something below the Python layer, maybe how GCC is invoked. We don’t really understand why arithmetic operations are impacted this much; if both are compiled with all optimizations.
Thanks in advance, and if any info is needed / tests suggested, I’m happy to try!