Trampolining Nix with genericClosure

Four years ago, @sternenseemann showed that builtins.genericClosure can serve as a trampoline in Nix, encoding each step of a tail-recursive function as a “node” and letting the C++ worklist loop drive iteration instead of Nix call frames. [1] It worked at 10,000 iterations where plain recursion overflows, but the overhead from copying state at every step led him to call the approach “pretty much useless.” Worse, thunks accumulate in the non-key fields and overflow the C++ stack when you force the result. “I’ve not toyed with this,” he wrote.

I’ve been building nix-effects, an algebraic effects library for Nix, and the interpreter needs to chain a lot of operations through a freer monad. The thunk problem was the wall.

The fix is one line:

key = builtins.deepSeq newState (step.key + 1);

genericClosure forces the key field at every step for deduplication. Embed deepSeq newState inside that expression and you piggyback on the forcing. No thunk chain forms. Paste into nix repl:

let
  trampoline = init: step:
    let
      steps = builtins.genericClosure {
        startSet = [{ key = 0; state = init; }];
        operator = s:
          let result = step s.state;
          in if result.done then []
             else [{
               key = builtins.deepSeq result.state (s.key + 1);
               state = result.state;
             }];
      };
    in (builtins.elemAt steps (builtins.length steps - 1)).state;
in (trampoline { i = 0; total = 0; } (s:
    if s.i >= 100000
    then { done = true; state = s; }
    else { done = false; state = { i = s.i + 1; total = s.total + 1; }; })).total

Returns 100000. Now remove the builtins.deepSeq result.state from the key line and run it again. genericClosure churns through all 100,000 steps without complaint. Then .total tries to force a chain 100,000 thunks deep and dies. Same trampoline, same steps. The failure shows up when you try to use the result, not during the computation. Looks like the trampoline is broken when it isn’t. That misdirection is why sternenseemann shelved it.

Trade-offs. genericClosure’s std::map adds O(log n) per step for key dedup, and it shows: at 1M iterations the trampoline takes about a second versus ~90ms for foldl' doing the same work (roughly 11x overhead, benchmarked with hyperfine, Nix 2.31). If you know the iteration count, use foldl'. The trampoline is for when you don’t, and for compound state that foldl' can’t handle without building its own thunk chains.

Full writeup in a blog post: Trampolining Nix with genericClosure (the thunk trap mechanism, a real freer monad interpreter, benchmarks).

To the best of my knowledge, nobody has documented the deepSeq-in-key trick before. I could be wrong; it’s the kind of thing that might live in someone’s dotfiles and never get written up. Has anyone hit the stack limit in production Nix code, and is there a realistic path for builtins.trampoline (#8430)?

[1] @sternenseemann, Tail Call Optimization in Nix Today, February 2022.

12 Likes

Taking the time to share this is much appreciated :folded_hands:

1 Like

putting the error here as well for those putting it in the search that may have use for this solution:

error: stack overflow; max-call-depth exceeded
2 Likes

Hey @mikabo, I’m having a great time reading the docs of nix-effects, thanks for all the fun (as a nerd, to me that is much better than watching netflix :D), also thanks for mentioning my nfx as prior art. really excited to see how Nix can benefit from this kind of libs <3

1 Like

I just reinvented trampolining with genericClosure today… so that was a timely tip, in my case. Thanks! :slight_smile:

3 Likes