why is nix-build using too many cpu cores?
why is nix-build failing with random errors?
is nix-build running out of memory?
lets try to answer these questions
stress-test.nix
a build file to waste some cpu and memory
# stress-test.nix
# nix-build ./stress-test.nix
{ pkgs ? import <nixpkgs> { } }:
let
nix-build-profiler = (pkgs.callPackage (pkgs.fetchFromGitHub {
owner = "milahu";
repo = "nix-build-profiler";
rev = "60efb7314532fc21b5b67518f50fabfb71d10740";
sha256 = "sha256-zeSasab1jA34To9bd995HPBf+BKLqfL98xVVt9AAXMQ=";
}) { });
in
pkgs.stdenv.mkDerivation {
name = "stress-test";
dontUnpack = true; # no source
nativeBuildInputs = with pkgs; [
stress-ng
nix-build-profiler
];
# https://www.mankier.com/1/stress-ng#Examples
buildPhase = ''
(
set -x
echo stressing 90% of $NIX_BUILD_CORES of $(nproc) cores
stress-ng --cpu-method all --cpu-load 90 --timeout 10s --cpu $NIX_BUILD_CORES
echo stressing 90% of $(nproc) of $(nproc) cores
stress-ng --cpu-method all --cpu-load 90 --timeout 10s --cpu 0
echo stressing 80% of available memory
stress-ng --vm 8 --vm-bytes 80% --timeout 10s
)
exit 1 # force rebuild
'';
}
save it as stress-test.nix
and run nix-build ./stress-test.nix
now you should see the output of nix-build-profiler
nix-build-profiler
https://github.com/milahu/nix-build-profiler
this is a python script based on psutil, running inside the nix-build sandbox.
every second, it dumps the process tree to the build log
the output of nix-build-profiler is interleaved with the build log.
advantage: we see performance side-by-side with the build status
example output
example build:
nix-build -A qt6.qtwebengine
problems:
htop
shows a load of 130…150 but should be 32.
random build errors with export NINJAFLAGS="-j32 -l32"
hypothesis: nix-build is running out of memory
load mem rss proc @ 2022-06-24 15:13:15 +0000
127.9 1 4G bash info={'child_procs': 2, 'cmdline': ['bash', '-e', '/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh'], 'cwd': '/build/qtwe>
127.9 1 4G ninja info={'child_procs': 15, 'cmdline': ['ninja', '-j32', '-l32']}
1.0 0 36M g++ src/webenginequick/ui/CMakeFiles/WebEngineQuickDelegatesQml.dir/webenginequickdelegatesqml_qmltyperegistrations.cpp.o -o src/webengin>
0.9 0 36M g++ src/webenginequick/ui/CMakeFiles/WebEngineQuickDelegatesQml.dir/.rcc/qmlcache/WebEngineQuickDelegatesQml_qmlcache_loader.cpp.o -o src>
0.9 0 36M g++ src/webenginequick/ui/CMakeFiles/WebEngineQuickDelegatesQml.dir/.rcc/qmlcache/WebEngineQuickDelegatesQml_AlertDialog_qml.cpp.o -o src>
0.9 0 36M g++ src/webenginequick/ui/CMakeFiles/WebEngineQuickDelegatesQml.dir/.rcc/qmlcache/WebEngineQuickDelegatesQml_AuthenticationDialog_qml.cpp>
115.5 1 3G sh info={'child_procs': 1, 'cmdline': ['sh', '-c', 'cd /build/qtwebengine-everywhere-src-6.3.1/build/src/gn && /nix/store/vrsv842wx54vknd>
115.5 1 3G cmake info={'child_procs': 1, 'cmdline': ['cmake', '--build', '.']}
115.5 1 3G ninja info={'child_procs': 1, 'cmdline': ['ninja']}
115.5 1 3G ninja info={'child_procs': 130, 'cmdline': ['ninja', '-C', '/build/qtwebengine-everywhere-src-6.3.1/build/src/gn/Release', 'gn'], '>
0.9 0 25M g++ -c ../../../../src/3rdparty/gn/src/base/command_line.cc -o src/base/command_line.o
0.9 0 25M g++ -c ../../../../src/3rdparty/gn/src/base/environment.cc -o src/base/environment.o
1.0 0 24M g++ -c ../../../../src/3rdparty/gn/src/base/files/file.cc -o src/base/files/file.o
1.0 0 25M g++ -c ../../../../src/3rdparty/gn/src/base/files/file_enumerator.cc -o src/base/files/file_enumerator.o
0.9 0 24M g++ -c ../../../../src/3rdparty/gn/src/base/files/file_path.cc -o src/base/files/file_path.o
0.8 0 23M g++ -c ../../../../src/3rdparty/gn/src/base/files/file_path_constants.cc
for gcc and g++, most arguments are hidden, to make the process tree more compact
the whole build process uses only about 1% memory
(rss 4G / total 256G = 1.56%)
so the “out of memory” hypothesis seems false
ninjas are overloading
the total load is around 128 on a 128 core machine
but $NIX_BUILD_CORES is 32 …
so the build is using too many cpu cores
we see 2 ninja procs. the first ninja has 15 child processes …
ninja info={‘child_procs’: 15, ‘cmdline’: [‘ninja’, ‘-j32’, ‘-l32’]}
… and the second ninja has 130 child processes
ninja info={‘child_procs’: 130, ‘cmdline’: [‘ninja’, ‘-C’, ‘/build/qtwebengine-everywhere-src-6.3.1/build/src/gn/Release’, ‘gn’],
… because the flags -j32 -l32
were not passed to the second ninja
when ninja calls ninja, they dont negotiate cpu limits
because the default ninja version does not implement the gnumake jobserver protocol
solution:
use ninja-tokenpool with jobclient and jobserver support (via)
and start the main build process with ninja --tokenpool-master
builders are overloading
when a builder wants to run multiple processes or threads in parallel (2 or more)
then it should use the gnumake-tokenpool protocol
to negotiate the number of parallel workers with the main build process (jobserver)
before and after
before: build was using up to 128 cores
(todo: graph)
after: max load is 32 == NIX_BUILD_CORES
see also