The following is some kind of reflection on Nix code beyond stdenv.mkDerivation
- or general purpose Nix if you will. What I present is not necessarily practical, but may be interesting.
With Nix being a purely functional programming language, recursion is the obvious way to solve most problems. However, since the Nix evaluator doesn’t optimize, handling arbitrarily big input can be an issue:
- Stack size is limited, so recursion can’t be arbitrarily deep. This situation is worsened that thunks reserve a lot of space on the stack. This can be seen easily by comparing
builtins.valueSize {}
andlet foo = {}; in builtins.deepSeq foo (builtins.valueSize foo)
. The available workaround,deepSeq
ing at every recursion step, sadly kills any semblance of performance. - Since Nix is pure and non-optimizing, any operation on data structures copies the data structure, even if it would be possible to do it in place. Thus extending / updating lists and attribute sets is quite expensive.
As a result, the art of Nix programming is (ab)using builtins
to the greatest extent possible, since these are implemented in C++ and thus not constrained by the semantics of the Nix language, specifically, some things may happen in place and builtins
like foldl'
will rely on loops over recursion, eliminating the stack overflow issue.
One very powerful, but undocumented such primop is builtins.genericClosure
which was initially conceived to compute the dependency of, say, a package. It receives a startSet
(a list of attribute set whose elements all need to have an unique key
attribute) and an operator
(a function that receives one such element and returns a list of new such elements). The operator
is then repeatedly applied to the elements of the growing startSet
until no new elements are produced (operator
is only applied to each unique element once). There’s also a nice usage example in a Nix issue (concerning its documentation) which may help you get a sense of its workings. Apart from its intended use (computing a dependency closure of sorts), it can be used for basically any recursive algorithm that produces a list in the end, e.g. using it made my pure Nix UTF-8 decoder much more efficient (where much is an understatement).
It’s relatively obvious that we can in fact express any recursive algorithm in terms of builtins.genericClosure
. But instead of leaving it as the proverbial exercise to the reader, we can show it for real by writing a Nix function that converts a recursive function into one that uses builtins.genericClosure
. This will also constitute (semi-manual) tail call optimization for Nix, hence this post’s title.
For this we’ll have to impose the following restrictions on the function we’ll take as an input:
- We’ll need to inject some custom code to break up the normal recursion of the function, so we require that the self reference necessary for the recursion is realized via a fixed point we can manipulate (as opposed to Nix’s lexical scope).
- Also, the function needs to be tail recursive (i. e. for its return value, the outermost call must be to itself or a value that doesn’t depend on the function’s self reference), otherwise we can’t really inject anything useful to us.
So we may need to rewrite our recursive functions a bit, for example:
let
# instead of
facSimple = n: if n == 0 then 1 else n * facSimple (n - 1);
# we'll use a fixed point and tail recursion
fac = self: acc: n: if n == 0 then acc else self (n * acc) (n - 1);
in
# Compute 15! without any kind of optimization
lib.fix fac 1 15
The basic idea for the optimization is to pass a fake version of the function that merely captures and returns all information pertaining to the call.
let
fakeFac = acc: n: { __tailCall = true; args = [ acc n ]; };
in
fac fakeFac 1 15
# => { __tailCall = true; args = [ 15 14 ]; }
fac fakeFac 1307674368000 0
# => 1307674368000
As you can see, if the function tries to recurse, it just returns an attribute set containing the arguments it wanted to pass. When we reach the base case, however, it returns normally.
This will allow us to limit the amount of recursion steps possible to a single one when calling the function in operator
. The next recursion step would then be performed in the next invocation of operator
, using the arguments we just captured. The __tailCall
attribute is an imperfect way to distinguishing an ordinary return value from a captured tail call.
This should already allow us to wire something up with builtins.genericClosure
, but it’s not really pretty that we have to set up fakeFac
manually, so we’ll take a small detour and figure out some hacks in order to do this automatically.
First of all, we’ll need to figure out how many arguments a function receives. We’ll do that by feeding it builtins.throw
until it starts to explode:
let
argCount = f:
let
# N.B. since we are only interested if the result of calling is a function
# as opposed to a normal value or evaluation failure, we never need to
# check success, as value will be false (i.e. not a function) in the
# failure case.
called = builtins.tryEval (
f (builtins.throw "You should never see this error message")
);
in
if !(builtins.isFunction f || builtins.isFunction (f.__functor or null))
then 0
else 1 + argCount called.value;
in
argCount (lib.fix fac)
# => 2
Then we’ll need a way to make a function that collects its n
arguments into a list. I’ve called this unapply
because of a) a lack the imagination on my part and b) the fact that it can be used to form the identity relation together with apply
(which I’ll show later):
let
unapply =
let
unapply' = acc: n: f: x:
if n == 1
then f (acc ++ [ x ])
else unapply' (acc ++ [ x ]) (n - 1) f;
in
unapply' [ ];
in
unapply 3 lib.id 1 2 3
# => [ 1 2 3 ]
This allows us to generate the fakeFac
function we had above programmatically:
unapply (argCount (lib.fix fac)) (args: {
__tailCall = true;
inherit args;
})
apply
can be used to call a function with such a list of arguments - which we’ll have to do eventually:
let
apply = f: args: builtins.foldl' (f: x: f x) f args;
in
apply builtins.sub [ 10 20 ]
# => -10
Now it’s time to unceremoniously paste the entire monstrosity that puts everything together. I’ve tried to comment the code a bit with explanations, but feel free to ask about anything unclear below.
let
tailCallOpt = f:
let
argc = argCount (lib.fix f);
# This function simulates being f for f's self reference. Instead of
# recursing, it will just return the arguments received as a specially
# tagged set, so the recursion step can be performed later.
fakef = unapply argc (args: {
__tailCall = true;
inherit args;
});
# Pass fakef to f so that it'll be called instead of recursing, ensuring
# only one recursion step is performed at a time.
encodedf = f fakef;
# This is the main function, implementing the “optimized” recursion
opt = args:
let
steps = builtins.genericClosure {
# This is how we encode a (tail) call: A set with final == false
# and the list of arguments to pass to be found in args.
startSet = [
{
key = "0";
id = 0;
final = false;
inherit args;
}
];
operator =
{ id, final, ... }@state:
let
# Generate a new, unique key to make genericClosure happy
newIds = {
key = toString (id + 1);
id = id + 1;
};
# Perform recursion step
call = apply encodedf state.args;
# If call encodes a new call, return the new encoded call,
# otherwise signal that we're done.
newState =
if builtins.isAttrs call && call.__tailCall or false
then newIds // {
final = false;
inherit (call) args;
} else newIds // {
final = true;
value = call;
};
in
if final
then [ ] # end condition for genericClosure
else [ newState ];
};
in
# The returned list contains intermediate steps we need to ignore
(builtins.head (builtins.filter (x: x.final) steps)).value;
in
# make it look like a normal function again
unapply argc opt;
in
tailCallOpt fac 1 15
# => 1307674368000
Okay, so we have something that behaves just as (lib.fix fac)
does - but does it help us anywhere beyond that? In terms of stack overflows, fac
is not a good example, since integers take very little space on the stack, so the returned integer will overflow long before the stack.
So let’s take a function that has something big on the stack (i. e. a thunk):
let
emphasize = self: acc: before: after: n:
if n == 0
then before + " " + acc + after
else self (acc + "very ") before after (n - 1);
in
lib.fix emphasize "" "I like Nix" "much" 2
# => "I like Nix very very much"
How does it measure up against its optimized version? Well, let’s see:
let
# pass the starting accumulator already for convenience
emphPlain = lib.fix emphasize "";
emphOpt = tailCallOpt emphasize "";
in
emphPlain "This is" "cursed" 20 == emphOpt "This is" "cursed" 20
# => true
emphPlain "This is" "cursed" 10000
# error: stack overflow (possible infinite recursion)
emphOpt "This is" "cursed" 10000
# => "This is very very very very very very very very very very very very…
So there it is, tail call optimization in Nix! To return to the initial point about practicability, however, it is pretty much useless. In most situation where this would help you, the “optimized” algorithm would probably be dealing with an attribute set or list and be slowed down incredibly by the constant copying involved with list/set operations. Additionally, any hand rolled use of builtins.genericClosure
is probably considerably faster, as it would save on a lot of function applications and intermediate data structuers for a large enough n
. It was a fun exercise, though, and hopefully interesting to you as well.
As an additional note, it is still possible to overflow the stack despite tailCallOpt
if you pick a big enough n
: My theory is that the thunk can just grow that much that it’s capable of overflowing the available stack space without any recursion. You can work around this by using builtins.deepSeq
on your accumulator argument and it’s even possible to make a strict version of tailCallOpt
that does this for you. I’ve not toyed with this, since in practice I’d just run out of memory before the computation finished. Not sure if there’s a way past that with the right GC settings.