Nix interpreter and Nix language quirks

djhshih · February 1, 2023, 3:18am

I thought it would be fun to implement a Nix interpreter (lexer + parser + interpreter) from scratch… but what I didn’t anticipate is how easy it was to do in Typescript:

I guess this speaks to how simple the Nix language is. But in this process, I noticed some quirks of the Nix language that I’d like to discuss…

Why do equality on functions always return false? Would it not make more sense to do a reference-based comparison?

> isNull == isNull
false

Why aren’t all attribute sets rec by default? I implemented the self-reference set and the non-self reference set. While the code for the self-reference set is more complex, it was not more computationally expensive compared to the non-self reference set if no attribute relies on self-reference. If a non-self reference set relies on self-reference, a runtime error would result. Therefore, there is little computational efficiency benefit to disable self-reference set by default. The main difference is the attribute scope, as illustrated below:

> let x = 10; in { x = 2; z = x + 1; }
{ x = 2; z = 11; }

> let x = 10; in rec { x = 2; z = x + 1; }
{ x = 2; z = 3; }

> let x = 10; in let x = 2; z = x + 1; in { inherit x z; }
{ x = 2; z = 3; }

However, I am not sure this behaviour would be preferred.
The attribute scope in the non-self reference set would probably lead to unintended behaviour. The non-self reference set’s behaviour deviates from the let expression, which is probably not desirable, either.

The undocumented or operator. It does not appear to work like the or operator in lua yet, and it only seems to work with set select expressions.

> { x = 1; }.x or 3
1

> { z = 1; }.x or 3
3

> x or 3
error: undefined variable 'or'

       at «string»:1:1:

            1| x or 3
             | ^

Is this an unfinished operator?

Is it desirable to allow true, false, and null to be shadowed?

> let true = false; in true
false

> let false = true; in false
true

> let null = 1; in null
1

On a related note, my implementations of rec set and let expression are probably suboptimal, and I was wondering whether anyone have any suggestions?

Here is my approach:

Attempt to evaluate each attribute in the rec set or let, if an undefined identifier is encountered, the evaluate would result in a ‘dependent/undefined’ value that stores the yet-to-be-defined identifier. Binary operations on involving multiple ‘dependent/undefined’ values combine the yet-to-defined identifiers, so that we can keep track of attribute dependencies.
During the evaluation attempt, create an attribute dependency graph.
Use Kahn’s algorithm to sort the dependency graph to find an topological order so that the attributes can be successfully evaluated in this order. (If a cycle exists, throw an error.)

Is there a better way to generate the dependency graph without attempting to evaluate the rec set or let expression?

I would love to hear your thoughts.

tejing · February 1, 2023, 3:50am

A reference-based comparison would expose internal quirks of the interpreter as part of the language. It’s fairly common in purely functional contexts to completely disallow equality comparison on functions. In haskell, for example, even attempting to compare them is a type error.

Because, the programmer usually doesn’t want them to be. It’s quite common to create attrsets with attributes in them with the same names as existing variables, which you don’t want to shadow. Also, it’s generally preferable to use some kind of fixed-point system for overrideability rather than rec, even if you do want them as variables.

Frankly, in my view, rec is basically an anti-pattern.

or isn’t really an operator. foo.bar or baz is a single trinary operator. It causes some very unintuitive parsing in some cases. See nix#7405.

Eww. I didn’t know that was possible…

djhshih · February 1, 2023, 1:26pm

or isn’t really an operator. foo.bar or baz is a single trinary operator. It causes some very unintuitive parsing in some cases. See nix#7405 .

Oh… this makes sense now.
I guess the non-intuitive behaviour exists because select . has precedence over apply , though this does not have to mean that the s . x or y ternary operator also takes precedence over function application.
The ternary operator also seems like non-intuitive syntactic sugar for if s ? x then s.x else y.

Wouldn’t it just be better to have an or operator? e.g. x or y
The precedence of or can then be lower than , which would remove the non-intuitive behaviour reported above.

Frankly, in my view, rec is basically an anti-pattern.

Are there examples in nixpkgs that rely on the currently set scoping? i.e. any examples that rely on the below behaviour?

> let x = 10; in { x = 2; z = x + 1; }
{ x = 2; z = 11; }

Most package definitions that I see use stdenv.mkDerivation rec { ... }

Also, I just came across nickle today, and it looks like they use self-reference set by default:

nickel> let x = 10 in { x = 2, z = x + 1}
{ z = 3, x = 2 }

I don’t see any rec in their ott source file:

github.com

tweag/nickel/blob/b930544978e657dc2f214a9357bcbb06979095d7/spec/type-system/grammar.ott

% Nickel types and expressions grammar

metavar var, x, y, z ::=
metavar uvar, d {{ tex {?a} }}, f {{ tex {?b} }} ::=
metavar labels, l, l' ::=
metavar typevar, a {{ tex \alpha }}, b {{ tex \beta }}, c {{ tex \gamma }} ::=
metavar rowvar, A {{ tex \theta }}, B {{ tex \iota }} ::=
indexvar i, j, k, n, m ::=

grammar
type_vars :: 'type_vars_' ::=
 | a1 ... an :: :: Many

row_vars :: 'row_vars_' ::=
 | A1 ... An :: :: Many

rows_notail {{ tex \rho }} :: 'rows_notail_' ::=
 | empty                   ::  :: Empty
 | l1 : t1 , ... , ln : tn  ::  :: Many
 | rows_notail , l : t     ::  :: Append

This file has been truncated. show original

tejing · February 1, 2023, 1:35pm

Yes, it would, but you’d need a concept of exception catching for the semantics of that operator to make sense, so it’s not a trivial change.

djhshih:

Are there examples in nixpkgs that rely on the currently set scoping? i.e. any examples that rely on the below behaviour?
> let x = 10; in { x = 2; z = x + 1; }
{ x = 2; z = 11; }
Most package definitions that I see use stdenv.mkDerivation rec { ... }

Anything that uses the newer function syntax of mkDerivation, which is better than rec because overrides affect the self-references:

stdenv.mkDerivation (finalAttrs: with finalAttrs; {
   ...
})

Also, all-packages.nix and similar overrideable package sets.

sternenseemann · February 1, 2023, 4:00pm

Don’t be too quick to judge that! It has plenty of quirks and subtle behaviors. How far are you into passing the C++ Nix test suite? There are also plenty of (sometimes very weird edge) cases we discovered while implementing the Tvix Nix evaluator here. Especially attribute sets and scoping are very subtle.

Sadly due to the current lack of spec, finding out the correct behavior mostly involves trying to break your interpreter compared to the C++ one or reading the latter’s code.

Comparing functions does not make sense. However, for historical reasons, functions can be compared by pointer in certain circumstances.

jade · February 1, 2023, 9:40pm

Here’s another fun one with toString and equality of attrsets:

let a = { type = "derivation"; outPath = { type = "derivation"; __toString = _: "nix is a totally sane language"; outPath = "bar"; }; }; b = { type = "derivation"; outPath = { type = "derivation"; outPath = "bar"; }; }; in [ (a == b) (builtins.stringLength a) (builtins.stringLength b) ]
[ true 30 3 ]

Apparently the true/false/null shadowing is because true is builtins.true in a similar way that indexOf is builtins.indexOf, but you can also shadow builtins >:)

nix-repl> let builtins' = builtins // {true = 2;}; true = 1; in let builtins = b
uiltins'; in (true == builtins.true) == (1 == 1)
false

pennae · February 2, 2023, 4:16am

yup. nix doesn’t have very many keywords. what we would normally call a keyword (true, false, null) is just another binding for a builtin constant.

but in the same way nix doesn’t have very many operators either, so you can go a lot further and shadow some of those. please don’t though.

nix-repl> let __lessThan = a: b: b - a; in 1 > 2  
-1

nix-repl> let __div = c: map (__mul c); in 2 / [ 1 2 3 ]            
[ 2 4 6 ]

amjoseph · February 2, 2023, 9:21pm

Hey, it’s great that you’re trying this out, but let me warn you: the first 95% is the easy part.

If you’re serious about creating something that can handle Nix expressions found “in the wild” you should definitely follow all the links in sterni’s post, and follow the development of Tvix generally. At this point Tvix (and the work that has emerged from it) is the best documentation of the details of the Nix language – by a very large margin.

Two more gotchas:

let a=1; in with { a=2; }; a evaluates to 1, not 2 (unlike javascript)
the details of default arguments are completely crazy.

Non-rec sets can be implemented without a thunk, which provides performance and memory usage benefits.
rec sets can’t shadow:

{ lib
, stdenv
, patches ? []
}:

stdenv.mkDerivation rec {
  patches = patches ++ [ ./patches/frob-all-the-things.patch ];
  # the line above causes infinite recursion

Definitely.

The alternative is having to maintain a list of “special” things that can’t be shadowed. Frankly that’s even ickier. What happens if you need to expand that list?

If you can’t shadow it, then it’s actually a keyword, not an identifier. Like let and inherit. Nix keeps the number of keywords to a bare minimum.

asymmetric · February 7, 2023, 8:05pm

Could you explain why?

amjoseph · February 15, 2023, 7:11am

There’s a pretty good explanation in this section of the nixpkgs manual; the crux of it is

The rec keyword works at the syntax level and is unaware of overriding.

Anything which can be written as rec { ... } can also be written as (see footnote):

lib.fix (final: { with final; ... })

Once you apply fix, you “seal off” the recursive set, preventing any overriding. So it’s strictly more general to pass around the final: { ... } and defer applying fix as long as possible, because you can’t “unapply it”.

Unfortunately the rec syntax is much more ergonomic. I say “unfortunately” because the rec style doesn’t offer a non-fix-prefixed form. So its ergonomicity encourages premature fixpointing.

I bet there is an alternate universe out there where Nix has a nice ergonomic rec syntax that doesn’t automatically fix.

[footnote] this isn’t exactly correct because of Nix’s quirky with semantics, but it’s close.

tejing · February 15, 2023, 9:42am

lib.fix (final: with final; { ... }), but yes.

asymmetric · February 16, 2023, 10:51am

Thanks! I also found this article (linked from this Nix pill) super helpful.