Deferred builds (for training & deploying machine learning services on NixOS)

Fair warning, this may seem a little bit frivolous.

Piqued by Experiment: Machine learning in Nix I started playing with a deferred builder service today that came out looking like this gist.

Roughly this boils down to:

  • Generate some /nix/store/…-model.drv output as part of a regular nixops deployment.
  • Separately, in a systemd service:
    • nix-store --realise /nix/store/…-model.drv to generate a trained model output
    • start the corresponding inference service using the trained model as input (tensorflow-serving or whatever)

The eventual idea would be to “build” (train) a machine learning model given by a particular derivation that goes into a corresponding machine learning (inference) service as an input. Since the service alone depends on the model, there is no need to block the NixOS system activation (or nixops deployment) while waiting on it. Instead I simply want to kick off the inference service for each model once it has been fully trained.

I’m curious to see if there’s a more thoroughly thought out approach than my fast & loose experiment. Anyone else looked into this sort of thing?

Somewhat related is Nix issue #693

I’m also using nixops, distributed build machines etc, but don’t want to muddy the conversation with those details

1 Like

Nixpkgs issue #33486 would seem to suggest something like this for tying the lazy build output to the current generation’s gcroot.

system.extraDependencies = [ 
  (builtins.unsafeDiscardStringContext package.outPath)

Though, of course, this wouldn’t work since system.extraDependencies takes not a list of paths, but a list of packages instantiated like this.

This makes me wonder if #33486 should be revisited? I could imagine possibly wanting to populate a nix.gc.roots with some symlink paths not directly from /nix/store?

In case anyone is following along, I think this is roughly the strategy I’ve settled on for dealing with gcroots for lazy build outputs. I’ll pretty much just do a similar thing to system.extraDependencies, except for using unsafeDiscardStringContext. I had to refresh my memory on how nix tracks run-time dependencies.

To be honest I’m a little shocked that this works. Tested with:

  unbuildable = pkgs.runCommand "example-unbuildable" {} "echo 'example-unbuildable' > $out";
  buildable = pkgs.runCommand "example-buildable" {} "echo 'example-buildable' > $out";
  system.extraSystemBuilderCmds = ''
    echo 'unbuildable drv: ${builtins.unsafeDiscardOutputDependency unbuildable.drvPath}'
    echo 'buildable drv: ${builtins.unsafeDiscardOutputDependency buildable.drvPath}'

    echo '${builtins.unsafeDiscardStringContext unbuildable.outPath}' > $out/extra-lazy-dependencies
    echo '${builtins.unsafeDiscardStringContext buildable.outPath}' >> $out/extra-lazy-dependencies

Checking that it works:

$ nixos-rebuild test

$ cat /run/current-system/extra-lazy-dependencies 

$ ls /nix/store/7cmbb5q3w9djzdsq8ia1jwvv7c3na6f8-example-buildable
ls: cannot access '/nix/store/7cmbb5q3w9djzdsq8ia1jwvv7c3na6f8-example-buildable': No such file or directory

$ nix-store -r /nix/store/vxmfvnvg0n1mly9q4y5p6j4hhj5np9aa-example-buildable.drv
these derivations will be built:
building '/nix/store/vxmfvnvg0n1mly9q4y5p6j4hhj5np9aa-example-buildable.drv'...
warning: you did not specify '--add-root'; the result might be removed by the garbage collector

$ nix-store -q --roots /nix/store/7cmbb5q3w9djzdsq8ia1jwvv7c3na6f8-example-buildable

$ nix-collect-garbage
finding garbage collector roots...
deleting garbage...

$ ls /nix/store/7cmbb5q3w9djzdsq8ia1jwvv7c3na6f8-example-buildable

$ nix-store --gc --print-live | grep buildable
finding garbage collector roots...
determining live/dead paths...

A further concern I had was that exporting closures might break due to missing run-time dependencies. However, I think due to the implicit nature of nix’s run-time dependency tracking, this seemed not to be a problem. At least, so far, so good…

nix-store --export $(nix-store -qR $(readlink /run/current-system)) > out

…seems OK.