Attribute interpolation breaks referential transparency

vazae · July 6, 2023, 9:10am

I’m very new to Nix and the Nix expression language and I’m trying to understand the semantics for declaring attribute sets.

Running nix-instantiate --eval --strict on

let x = "y";
in
{
  z = { ${x} = true; };
  z = { y = false; };
}

errors with error: attribute 'y' already defined as one would expect.

However, switching the order of the attributes to

let x = "y";
in
{
  z = { y = false; };
  z = { ${x} = true; };
}

silently overwrites the second declaration, evaluating to

{ z = { y = false; }; }

This also violates referential transparency, as replacing x with its value "y" as in either

{
  z = { ${"y"} = true; };
  z = { y = false; };
}

or

{
  z = { y = false; };
  z = { ${"y"} = true; };
}

both error as one would expect.

This behavior violates my two intuitions that the order of attribute declarations in an attribute set shouldn’t matter and that substituting variable names in the body of a let with their values shouldn’t change the value of the expression.

Indeed, the manual says

Attributes can appear in any order. An attribute name may only occur once.

Is this a bug? I think the root cause is that the syntax needs to support things like

{
  z.a = true;
  z.b = false;
}

to evaluate to

{ z = { a = true; b = false; }; }

as opposed to what the seemingly contradictory statements

{
  z = { a = true; };
  z = { b = false; };
}

would suggest when expanded out. To me the above seems to redefine z, but I guess there’s a way to sensibly “merge” the two definitions as long as they don’t share any names. Is there really a merging process? What’s the best way to think about this behavior?

Thanks!

fricklerhandwerk · July 6, 2023, 12:20pm

Thanks for pointing that out. The wording in the manual is misleading here, because it conflates syntax with semantics. Please open a documentation issue about it. The behavior you’re describing certainly is surprising to me, and we should at least document it. Please open an issue on that one, too. It probably can’t be changed as that will almost certainly break existing expressions in subtle ways.

vazae · July 6, 2023, 12:46pm

I submitted #8658 but I’m not sure how to phase the documentation issue. If I understand correctly, the “syntax” part is the fact that an attribute name can only occur once, if it occurs multiple times, then that is a syntax error. While the claim that attributes can be permuted without changing the value of the attribute set is a claim about the behavior of attribute sets i.e., a claim about its semantics.

Is that what you mean by syntax versus semantics? Or more precisely, how could the documentation be improved?

sternenseemann · July 7, 2023, 2:48pm

I think you are actually experiencing two separate issues. I hope I can point those out below and illuminate what is going on behind the scenes.

First of all, I have to agree with you that Nix breaks referential transparency in a sense, even without involving attribute set internal merging. This has to do with a syntax rewrite based optimization Nix does, namely that a dynamic identifier that just holds a string literal is rewritten to a literal identifier. This happens at parse time:

> nix-instantiate --parse -E '{ x = 2; }'
{ x = 2; }
> nix-instantiate --parse -E '{ ${"x"} = 2; }'
{ x = 2; }

Consequently, this optimization ignores statically known names that are not attainable by a simple AST rewrite, since they’d require a scope lookup:

> nix-instantiate --parse -E 'let name = "x"; in { ${name} = 2; }'
(let name = "x"; in { "${name}" = 2; })

Since it happens at parse time, it of course doesn’t compute values as well:

> nix-instantiate --parse -E '{ ${"x" + ""} = 2; }'
{ "${("x" + "")}" = 2; }

This would be okay if dynamic and non dynamic attribute name identifiers behaved the same at run time. This is not the case, since dynamic attributes are subject to different merging behavior (I’ll discuss this in more detail below).

We can illustrate the problem by replacing the literal value in the following expression with a computed value one of the same value:

{
  foo.x = 1;
  ${"foo"} = { y = 2; }; } 
}
# Evaluates to
# => { foo = { x = 1; y = 2; }; }

(Note that the merged attribute set is already available at parse time, i.e. nix-instantiate --eval --strict and nix-instantiate --parse give the same output.)

Now consider:

{
  foo.x = 1;
  ${"f" + "oo"} = { y = 2; };
}
# Crashes with
# => error: dynamic attribute 'foo' at (string):3:3 already defined at (string):2:3

Since the optimization doesn’t kick in, the attribute name is not static here and behaves differently.

The fix for this is quite simple: Treat ${…} (dynamic attribute names) always as dynamic just like is done for "${…}" (string interpolation) where this optimization doesn’t exist:

> nix-instantiate --parse -E '{ "${"x"}" = 2; }'
{ "${("x")}" = 2; }

As it turns out, there is already an issue for this open although no one noticed the issue of referential intransparency before (to my knowledge).

Now for internal attribute set merging which is a bit peculiar. The need for this exists because of the attribute path syntax that is arguably necessary to make writing NixOS configurations bearable:

{
  services.openssh.enable = true;
  services.pipewire = {
    enable = true;
    pulse.enable = true;
  };
}
==
{
  services = {
    openssh = { enable = true; };
    pipewire = {
      enable = true;
      pulse = { enable = true; };
    };
  }; 
}

This merge happens at parse time (which was surprising to me). The main merging logic is implemented in the addAttr function in the parser which constructs an attribute set expression representation for the evaluator. I’ll try to summarize the merging logic below, but it will be a simplification: attribute sets are a quite complicated topic, since their implementation is intertwined with rec { … } attribute sets (that also have an accompanying scope) and let … in … bindings.

One key thing about the merging logic is that only statically inferrable merges are done, i.e. attribute path syntax is just syntactic sugar. This simplifies the implementation significantly—attribute sets are complicated enough as it stands— and allows it to be implemented more efficiently. (I’m sure there are also cases where dynamic merging would be confusing, although I don’t know any off the top of my head.) It does violate the symmetry between dynamic and static attribute names, though, as we’ll see.

When parsing an attribute set expression to create an ExprAttrs structure, we can see treat it as a list of bindings that are of one of the following forms:

inherit attr or inherit (from) attr: These are inserted if attr isn’t already bound in the ExprAttrs structure we are building. If they are already present, parsing fails. This is simplified tremendoulsy by the fact that dynamic attribute names are disallowed in inherit. (Merging { inherit (foo) bar; bar = { x = 1; }; } isn’t possible since we can’t tell statically if foo is an attribute set and what keys it contains.)
attr = value or attr.path = value (with any number of attribute path parts) are subject to the merging logic.

ExprAttrs contains two separate lookup tables for bindings:

attrs, mapping statically known single attribute names (not paths) to an expression representing their eventual value.
dynamicAttrs, a list of expressions representing attribute names coupled with an expression representing their eventual repsective value.

That it always maps from single attribute name (or expression) to value expression, means, for starters, that the parsers needs to synthesize ExprAttrs structs for attribute paths: E.g. { foo.bar.baz = 1; } can only be represented by three nested ExprAttrs structs that correspond to { foo = { bar = { baz = 1; }; }; } (this is why nix-instantiate --parse doesn’t print the original expression in the former case, but in the latter). This is the reason why attribute sets also get merged (like in your original example)—the parser would not be able to distinguish between explicitly created ExprAttrs and synthesized ones from attribute paths.

There is no special code for this synthesization, in fact this is just addAttr at work which works like this:

It looks at the head of the attribute path:

If it is a dynamic entry, it is pushed onto dynamicAttrs. If necessary, a new attribute set is created and populated with the remaining attribute path. Finally, the attribute value expression is inserted where appropriate. No merging is done across these and merging of static attribute paths can go only up to the first dynamic attribute name (if any). (Note that this description doesn’t correspond to the implementation, but, since it is actually buggy, I thought I’d only describe it conceptually.)
```
> nix-instantiate --parse -E '{ ${"fo"+"o"}.bar = 3; ${"fo"+"o"}.baz = 3; ${"fo"+ "o"} = 3; }'
{ "${("fo" + "o")}" = { bar = 3; }; "${("fo" + "o")}" = { baz = 3; }; "${("fo" + "o")}" = 3; }
```
For static entries, it looks at the current ExprAttrs. If no matching one exists, it is inserted into attrs and merging of the rest of the attrpath continues as normal, creating empty ExprAttrs as necessary (this is how nested ExprAttrs are synthesized in the example above). If a matching entry already exists, there are two possibilities:
1. It is not an ExprAttrs, parsing fails due to duplicate attribute definitions.
2. If it is ExprAttrs, the attribute path with the entry we just looked split off is merged into that ExprAttrs.
The algorithm is notably expressed without recursion, but can probably best be understood in terms of recursion.

This means, that after parsing, ExprAttrs contains an expression representation of the eventual attribute set structure as far as statically inferrable (across their respective attrs fields) with all attribute path lists eliminated. Dynamic attributes are inserted in the most inwards ExprAttrs’ dynamicAttrs as is possible to know statically.

{
  foo.bar = 1;
  foo.baz = { jdf = 2; };
  ${"ab" + "cd"}.rr = 3;
  foo.${"ba" +"z"}.ghf = 4;
}
==
# after parsing
{
  foo = { bar = 1; baz = { jdf = 2; }; "${("ba" + "z")}" = { ghf = 4; }; };
  "${("ab" + "cd")}" = { rr = 3; };
}

How do the dynamic attributes end up in the attribute set value, though? This happens at evaluation time (necessarily) and happens in the following steps:

First the attribute set is constructed according to attrs (not recursively though, since Nix is lazy!).
Scoping related things happen for rec { … } sets (and the obscure __overrides feature is handled). Dynamic attributes are ignored in the rec { … } scope!
Dynamic attributes are added to the attribute set. Notably, they are not merged into the attribute set. If an attribute with the same name as the dynamic one already exists in the attribute set, evaluation fails regardless of the attribute being another attribute set or not. We have already seen this in our earlier investigation of referential transparency.

Now for the merging bug you’ve experienced: All of these should evaluate to the attribute set { x = { y = 3; z = 2; }, but the third element in the list does not:

[
  { x.y = 3; x.${"z" + ""} = 2; }
  { x.${"z" + ""} = 2; x.y = 3; }
  { x = { y = 3; }; x = { ${"z" + ""} = 2; }; }
  { x = { ${"z" + ""} = 2; }; x = { y = 3; }; }
]

Thanks to nix-instantiate --parse we can further determine that this seems to be some sort of parser bug, perhaps in addAttr:

# > nix-instantiate --parse tmp.nix
[ ({ x = { y = 3; "${("z" + "")}" = 2; }; }) ({ x = { y = 3; "${("z" + "")}" = 2; }; }) ({ x = { y = 3; }; }) ({ x = { y = 3; "${("z" + "")}" = 2; }; }) ]

Notice that the third element becomes ({ x = { y = 3; }; }) after parsing.

vazae · July 7, 2023, 4:00pm

Thanks for the very thorough response! It made a lot of sense and was very satisfying to read.

The fix for this is quite simple: Treat ${…} (dynamic attribute names) always as dynamic just like is done for "${…}" (string interpolation) where this optimization doesn’t exist

I assume this won’t be done due to possible breakage in existing programs? It’s not too big of a deal either way as I assume ${"x"} is quite an obscure thing to write.

One key thing about the merging logic is that only statically inferrable merges are done, i.e. attribute path syntax is just syntactic sugar.

This makes a lot of sense to me, as you mention, for convenience while in writing configuration.

I’m sure there are also cases where dynamic merging would be confusing

In my opinion writing things like

{
  z.a = true;
  z.a.b = false;
  z.c = true;
}

make a lot of sense just from natural intuition but something like

{
  z = someFunction { };
  z = anotherFunction { };
  z.a = yetAnotherFunction { };
}

seems non-obvious how they would merge if, hypothetically, dynamic and static attribute names were to behave the same. Instead the current behavior mandates explicitly declaring merging with something like

{
  x = builtins.foldl' lib.attrsets.recursiveUpdate [
    { ... }
    { ... }
    ...
  ];
}

for example, or by using imports from the module system.

Thanks again for clearly uncovering the heart of the issue!

sternenseemann · July 7, 2023, 4:17pm

It could break programs like let ${"foo"} = 13; in foo, but we could special case let bindings, since the optimization is unproblematic here (there’s no truly dynamic counterpart to it). I can’t think of anything else that would be broken at the moment.