Text based IR for nix formatters

Many different tools are being developed for nix formatting and at this current moment I am still confused which one would be the best to use. Related to code formatting I see transformations to nix code as an equally important topic for automation that could take advantage of the parsers written for formatting. For example with nixpkgs derivations I would like to:

  • sort all of the packages in all-packages.nix alphabetically in groups
  • enforce that in all python-modules that nativeBuildInputs, buildInputs, propagatedBuildInputs, checkInputs be specified in the derivation in this specific order
  • convert all http:// urls to https:// if they exist
  • standardize all python package names to lowercase with dashes

All of these applications need an ast to represent the code (and preserve comments whitespace etc) thus they are highly tied to the concrete syntax trees in the nix formatters. I can envision that not everyone would like to write their transformations in rust, haskell, etc. I believe that it is important to have a text based IR (similar to llvm) that allows for tools to perform transformations that a nix formatter would not specifically think about. All this requires is that the formatter prints out an IR represention of the concrete syntax of the code and reads in the IR and perform the formatting.

If we achieve a standardized text based IR then we can use whatever language we like to transform the code.

Anyways wanted to post this here and hear other’s thoughts!

2 Likes

I skimmed @arianvp’s thesis ( https://dspace.library.uu.nl/handle/1874/380853 ) the other day.
There are mentions of a stack machine, and edit scripts for editing trees.

Depending on what one want’s to do, this might be a direction to look into.
Also of interest may be jq ( GitHub - jqlang/jq: Command-line JSON processor ) and it’s DSL ( Redirecting to jqlang.github.io jq Language Description · jqlang/jq Wiki · GitHub )

There are probably good resources and research on such tree editing things if you know what to look for…Which I do not. A good literature review or finding/asking an expert would be warranted I think.

This is a subset of graph editing so that may also be worth a look.

I can think of at least two ways (or both) to go about this:

  • operating on a tree data structure
  • operating on some sort of serialized format

(if this distinction doesn’t make sense, maybe someone else can phrase it better?)

So maybe a DSL for pattern matching over a set of nodes and being able to specify edits.
(This is starting to sound dangerously like jq? I don’t know about the editing part.)
Or is the whole point to use grep and sed? :smiley:

That’s how GitHub - hercules-ci/canonix: Experiment in Nix formatting works, but it’s not production ready yet.

Canonix is based on top of Tree Sitter and @cstrahan’s tree-sitter-nix parser.

The nice thing about Tree Sitter is that is has C FFI bindings to many languages and a parser written for it can therefor easily be used in all those languages. Another aspect is that it has been developed for Atom and has therefor quite a bit of people working on it as a side-effect. One more thing is that the Tree Sitter AST preserves all the information, including whitespace and comments, which is very important to formatters and code highlighters.

The only thing that I haven’t see yet is a tree-sitter ↔ JSON converter. That would be the ultimate but slow interop.

Another tool that is interesting is GitHub - Synthetica9/nix-linter: Linter for the Nix expression language which focuses more on semantic rules than the formatting itself. In fact it seems to be using the hnix parser which isn’t comment or whitespace preserving.

I should probably also mention that @matklad (who is the real hero on this project) and I have been working on nixpkgs-fmt which was designed specifically to format nixpkgs.

The main design difference from most of the other tools out there was to use rewrite rules instead of pretty-printing the AST. This allows us to leave most of the original Nix code untouched, except where it matters like indent and spacing between elements. That way we will be able to create a more consistent style in nixpkgs without rewriting all of the files completely. And because the tool is based on rewrite rules, it’s quite easy to add more conversions that are specific to nixpkgs (if you know how to read Rust code).

In terms of formatting style, the tool puts more weight on minimizing diffs between commits. For example it doesn’t try to do fancy things like vertical alignments of values because it generates more diff. It also leaves decisions to the developer on whenever a container should be on a single-line or multiple lines (and thus doesn’t enforce line lengths), and leaves empty lines alone in case the developer want to use them to separate code blocks.

If you want to play with it, we also have a WebAssembly version over here: nixpkgs-fmt-wasm

A good first step would be to just converge on the names of grammar productions. Is there an up-to-date version of Nix: A System for Software Deployment, which also names intermediate productions?

Adding JSON format for rnix should also be straightforward

I love this discussion. I look forward to when these formatters have a text based IR (json counts in my mind).

Then I will be able to contribute a python script that automatically updates all http → https urls and alphebetizes all-packages.nix and much more