Ideas for a "data only" subset of nix

In the discussions for RFC 193, @matklad and me independently had the idea of formalizing a “data only” subset of nix, similar to zon (zig object notation) files in the zig ecosystem (see their comment and mine).

This format would:

  1. Be 100% valid Nix code, but
  2. guarantee linear evaluation complexity by
  3. disallowing the definition and usage of functions and most other language features

This could be very useful for defining the “data only” portion of a Flake (which is exactly what RFC 193 is about) but also generally to replace JSON and TOML in the nix ecosystem, like in flake.lock , lockfiles of npins or niv, or profile manifests, which are already stored as a non-formal .nix when using nix-env, but .json when using nix profile. It could even replace aterms as the serialization format for derivations!

This file format needs a new file extension (I liked the suggestion .nox (Nix Object eXpression) and the evaluator then needs to raise errors if any disallowed language constructs are used within these files. They could still be imported without a problem from regular .nix files, of course.

New builtin functions toNOX and fromNOX would also be provided.

Some questions that remain:

  1. What should the name of the file format be? (.nox, .non, .nxn , .nixon)
  2. What syntax should actually be allowed in this format?

The second questions seems much harder to answer. A minimal subset is easy to think about:

{
  a = “a”;
  # Comments are important
  b = { c = null; d = 3; };
  e.f.g = true; # Are we sure this is a good idea though?
}

Attrsets, strings, int, bool and null all must be supported. Comments most likely as well.

Nested attribute names (e.f.g in the example above, maybe there’s a better technical term) already aren’t quite as clear. They are useful to reduce indentation and make some things easier to write, but they also make serialization ambiguous and make it harder to write tooling that can work with this format. However, this hasn’t kept TOML from exploding in popularity, so maybe not much of an issue.

But then it becomes less clear; what about floats? let ... in? import of other files of the same format? Every new feature makes this format more useful, but of course, it also makes it much harder to properly serialize, deserialize and modify programmatically.

So, what are your thoughts on this? I am feeling kinda motivated to write an RFC, just for having the format itself. Whether we put it in all the different places I mentioned above is a completely separate story.

1 Like

Hi. What primary use-case do you have in mind?

It might be that a few rounds of “why”s after you’ll end up with “pure/restricted eval support”. If that is the case, maybe we should instead discuss “how do we specify which files are allowed to (fetch &) import external nix sources” and/or “how to specify which files are allowed to specify FOD hashes”.

3 Likes

The primary motivation is to formalize the constraints that the evaluator puts on some attributes of flakes. In the context of RFC 193, this means to move inputs and metadata to a separate file with a format that does not allow anything except “simple values”, which means something like JSON or TOML. Which begs the question: why not nix? Nix has a good syntax for attribute sets, but it is simply too complex to compete with something like JSON or TOML.

The usecases for such a format in Nix itself are plentiful. Just to name the examples I can come up with of the top of my head:

  1. nix.conf (bespoke INI-like format with significant whitespace)
  2. /etc/nix/registry.json
  3. flake.lock (currently JSON)
  4. RFC 193s flake.toml
  5. manifest.nix (from nix-env, is already a subset of nix, but serialization to nix is very hacky and not formalized)
  6. manifest.json (from nix profile, uses JSON specifically because full nix is a bad format for a data file)
  7. .drv files (currently ATerm, I’m aware a switch like this is very unlikely to happen)
  8. lockfiles or output of other nix-related tools like npins or niv

But there’s probably many more. If there were a “data file” nix subset, there wouldn’t even be a question about which format to use for data files in the nix ecosystem. But because there isn’t, it’s unclear whether JSON, TOML or sometimes even YAML should be used.

Of course this file format is pure, but that’s not why I care about it. The objective here is somewhat ideological, I’ll admit as much. It feels wrong that nix is such a powerful language that allows you to configure absolutely everything with it, except itself.

1 Like

Leaving aside the existing work on json serialization of derivations, {to,from}JSON builtins, reproducibility of toXXXX, and other blah-blah that others will surely bring up, the question I feel you haven’t fully addressed is “what is the use-case that isn’t addressed today, that would be solvable with this feature?” (in contrast to “what can use-cases can be served in a nicer/cleaner way”)

E.g. “specify trusted paths for restricted-eval on per-project basis” is one such task that isn’t really addressed today: iirc restricted-eval trusts CWD and <channels>. Another one is “how do you restrict allow{Unfree,Insecure}Predicates in transitive (flake or non-flake) dependencies”. One benefit of “static” metadata (as in flake.nix or similar) and pure eval is that you don’t have to scan third party projects for import (fetchurl …)

Defining the subset of nix used for flake input specification clearly makes sense in either case, but I don’t think it should become generally used as a data format within the nix world.

I think the question would still be valid, especially since you’re proposing features like let bindings and imports. I think nox would be chosen out of patriotism, rather than technical merit, and that the choice would be worse than the alternatives.

A large benefit of using an existing data format is interoperability with other ecosystems. To modify a .toml file with a script, you do not need to parse nix (or nox).

It’d probably be easy enough to implement a parser for a truly dumb nox format, which does not permit anything but representing data with static values, but I don’t see the benefit of that over JSON5. It’s just data markup; nix’ format isn’t meaningfully distinct, in fact it’s almost identical to JSON5, there’s no need for yet another basic markup language.

Beyond that, the more features this sublanguage permits, the less likely it’ll be feasibly scriptable. Modifying files written in a language that permits variable bindings is horrendously complicated; you now need to track where a value comes from and have a parser that understands context.

Even with the most basic feature set, the rest of the world would need to implement nox parsers, and we’ve placed yet another barrier on nix adoption.

Compare this to JSON - efficient parsers exist in every language. I wouldn’t be surprised if you could parse it with whitespace.

I feel like this is a fallacy of some kind. Of course nix can configure itself, we do so regularly. This does not mean that our data representation format needs to universally be nix - especially not a version of nix that isn’t nix at all.

There’s nothing wrong with using what the wider world has produced.

6 Likes