splitBuildInstall: split buildPhase and installPhase for large packages

use case: qtbase takes 2 hours to compile
the buildPhase is passing, but the installPhase is broken

by caching the result of buildPhase,
i reduce the feedback loop from 2 hours to 2 minutes : )

obvious question: do we have something like this in nix already?
this very much feels like reinventing some wheel …

related: google takes this build-caching even further
by caching every compilation object
edit: i mean the bazel incremental build tool (via ycombinator)
… but one step at a time ; )

full working prototype in
3e50ae7 qt6.qtbase: implement splitBuildInstall, make qtbase build

in the commit, left some comments
to demonstrate how crazy hard this task is with the ninja build tool
short answer: when we run ninja install,
ninja verifies all the build files by mtime and murmurhash64
(hash of build command or hash of output file, not sure),
which all change when we patch the output paths …

challenge: edit the installPhase without causing a rebuild.
currently, i simply copy-paste the installPhase from drv1 to drv2,
modify the installPhase only in drv2, and when it works, move it back to drv1

todo: replace only the basename of /nix/store/hhhh-name
(replace -name with a temporary random hash)
to handle the edge-case, where only the basename appears in the build files

and now …
150 lines sample implementation of the option splitBuildInstall

{ stdenv, cmake }:

let
splitBuildInstall = true;
buildPhaseResult =

# the original mkDerivation
stdenv.mkDerivation {
  pname = "sample";
  src = "...";

  # this splitBuildInstall implementation works only with cmake
  nativeBuildInputs = [ cmake ];

  # boilerplate code for splitBuildInstall ...
  # ideally should be hidden in stdenv.mkDerivation
  # or stdenv.mkDerivationSplitBuildInstall,
  # to avoid rebuilding ALL packages
  dontInstall = splitBuildInstall;
  dontFixup = splitBuildInstall;
  # maybe more phases must be disabled
  phases = if (!splitBuildInstall) then todoGetTheDefaultPhasesOfMkDerivation
  else "${todoGetTheDefaultPhasesOfMkDerivation} splitBuildInstallPhase";
  splitBuildInstallPhase = ''
    # magic is here :)
    # part 1: replace the output paths $out $dev $bin ...
    # 1. to fix: error: cycle detected in build
    # 2. to use the output paths of the second derivation

    echo debug installPhase: copy /build to $out
    cp -r /build $out

    echo "debug: create empty outputs bin + dev"
    # fix: builder failed to produce output path for output 'bin'
    mkdir -v $bin $dev

    nixStoreEscaped=$(date +%s.%N | sha512sum -)
    #                 v bug in the discourse.nixos.org nix syntax highlighter. wasnt me! :P
    nixStoreEscaped=''${nixStoreEscaped:0:11}
    echo "debug: nixStoreEscaped = $nixStoreEscaped"

    outHash=''${out:11:32}
    binHash=''${bin:11:32}
    devHash=''${dev:11:32}

    echo "debug: store escaped paths in $out/buildPhaseEscapedPaths"
    cat >$out/buildPhaseEscapedPaths <<EOF
    outHash=$outHash
    binHash=$binHash
    devHash=$devHash
    nixStoreEscaped=$nixStoreEscaped
    EOF

    # a: /nix/store/a3vjswd3i42xy5hzxras78z0m40g9jk7-qtbase-6.2.0
    # b: xxxxxxxxxxxa3vjswd3i42xy5hzxras78z0m40g9jk7-qtbase-6.2.0
    #    ^          ^ outHash: 32 chars
    #    ^ nixStoreEscaped: 11 chars

    echo "debug: regex = s,/nix/store/($outHash|$binHash|$devHash),$nixStoreEscaped\1,g"

    # note: the output paths also appear in binary files = *.so, etc
    # so we use the same length as the original path
    # tr -d '\0': fix "ignored null byte" when replacing binary files
    (
    cd $out
    find . -type f | while read f
    do
      if [ -n "$(sed -i -E "s,/nix/store/($outHash|$binHash|$devHash),$nixStoreEscaped\1,g w /dev/stdout" "$f" | tr -d '\0')" ]
      then
        # file was replaced
        echo "$f" >>$out/patched-files-with-escaped-output-paths.txt
      fi
    done
    )
    echo "debug: replaced install paths in $(wc -l $out/patched-files-with-escaped-output-paths.txt | cut -d' ' -f1) files. see $out/patched-files-with-escaped-output-paths.txt"

  '';
}

in
if (!splitBuildInstall) then buildPhaseResult
else (buildPhaseResult // stdenv.mkDerivation {

  buildInputs = [ buildPhaseResult ]; # not sure if this is needed
  inherit (buildPhaseResult) pname version outputs nativeBuildInputs;
  # TODO just inherit everything ... override? something more elegant

  src = buildPhaseResult.out;

  # TODO replace qtbase-everywhere-src-6.2.0 with sourceRoot from buildPhaseResult
  unpackPhase = ''
    # magic is here :)
    # part 2: replace the output paths $out $dev $bin ...

    echo "installing from cached build ${buildPhaseResult}"

    # this takes about 30 seconds for qtbase. we must copy to get write access
    echo "copying cached build files ..."
    t1=$(date +%s)
    cp -r ${buildPhaseResult}/qtbase-everywhere-src-6.2.0 /build/
    echo "copying cached build files done in $(($(date +%s) - $t1)) seconds"

    chmod -R +w /build

    # set: nixStoreEscaped outHash binHash devHash
    source ${buildPhaseResult}/buildPhaseEscapedPaths

    outHashNew=''${out:11:32}
    binHashNew=''${bin:11:32}
    devHashNew=''${dev:11:32}

    # replace install paths
    echo "replacing output hashes:"
    echo "  out: $outHash -> $outHashNew"
    echo "  bin: $binHash -> $binHashNew"
    echo "  dev: $devHash -> $devHashNew"

    (
    cd /build
    cat ${buildPhaseResult}/patched-files-with-escaped-output-paths.txt | while read f
    do
      if [ "$f" = "./env-vars" ]; then continue; fi
      if [ ! -e "$f" ]
      then
        echo "fatal error: no such file: $f"
        exit 1
      fi
      if [ -z "$(
        sed -i -E "s,$nixStoreEscaped$outHash,/nix/store/$outHashNew,g w /dev/stdout" "$f" | tr -d '\0'
        sed -i -E "s,$nixStoreEscaped$binHash,/nix/store/$binHashNew,g w /dev/stdout" "$f" | tr -d '\0'
        sed -i -E "s,$nixStoreEscaped$devHash,/nix/store/$devHashNew,g w /dev/stdout" "$f" | tr -d '\0'
      )" ]
      then
        echo "fatal error: no paths replaced in $f"
        exit 1
      fi
    done
    )
  '';

  installPhase = ''
    cd /build/qtbase-everywhere-src-6.2.0/build
    cmake -P cmake_install.cmake
  '';
  # "make install" calls "cmake -P ..."
})
1 Like

In other distributions I was able to successfully use ccache to evade most of rebuild time for C and C++ projects. It pays out especially when one applies small code changes. Nice thing is that most of the time ccache can be enabled transparently without build process change.

I’m not sure how good cache nits are for nix builds given that underlying store paths are moving around after even small derivation changes and given that nix does quite a bit og magic under gcc wrapper. If those details could be made to work reliably that would not require changing original derivations much to get the benefits of caching. And probably to get hits even at times when ninja would still trigger full rebuilds.

1 Like

I think a generic solution would be nice that allows you to split away arbitrary phases, or even have each single phase being executed in their own derivation.

For example, I’d find it useful to split off the unpackPhase. For some builds the unpackPhase represents substantial computation overhead which would be nice if it could be cached.

2 Likes

caching the unpackPhase is much simpler
cos there is no need to patch the output paths

vague concept:

{ stdenv }:

let
unpackPhaseDrv = stdenv.mkDerivation {
  src = "...";
  phases = "unpackPhase unpackPhaseExport";
  unpackPhase = "...";
  unpackPhaseExport = ''
    mkdir $out
    cp -r "/build/$sourceRoot" $out
    echo "sourceRoot=$sourceRoot" >$out/sourceRoot.env.sh
  '';
};
in

stdenv.mkDerivation {
  src = unpackPhaseDrv;
  /* probably not needed:
  prePhases = "unpackPhaseImport";
  dontUnpack = true;
  unpackPhaseImport = ''
    . ${unpackPhaseDrv}/sourceRoot.env.sh
    cp -r "${unpackPhaseDrv}/$sourceRoot" "/build/$sourceRoot"
    cd "/build/$sourceRoot"
  '';
  */
  buildPhase = "...";
}