Nixfmt-rs: towards nicer parser error messages

One thing that I always liked about the rust compiler was the nice error reporting. Now we had this initiative a couple of years ago for runtime errors in Nix to produce nicer error traces with reference to the source. However using a parser generator for the actual nix code parsing often still comes at the cost of not easy to read errors.

In nixfmt-rs, I therefore chose to implement a hand-written parser. It’s more code but I think the result speaks for itself:

We can hopefully use this output to also enhance feedback in editors. I haven’t look yet at what output the require to show this information inline.

39 Likes

Amazing, thanks for this ! Going to use it by default now :slight_smile:

Great work! Curious how hard it would be to make the formatter configurable like ocamlformat (e.g. different profiles)? Would it just be adding a bunch of flags and do matches on those flags when producing the final output string (number of spaces, how the lines should be split, etc.)?

I still mainly use nixpkgs-fmt bc imo it reads more ergonomically

I tried to make it fast so performance doesn’t suck when using in nixpkgs while maintaining compatibility. So I rather not want to introduce indirection that make it slower. Also would make testing more difficult. I don’t have too many opinions about what the code should look like as long as it is consistent. Nixpkgs-fmt was fine and I also got used to nixfmt.

2 Likes

Having a manually written recursive descent parser is probably the right way forward for upstream nix too - that would certainly help a lot with error messages. Though the lexer could probably be something like re2c - flex is definitely not the fastest.

4 Likes

Is this the chance to get tab indentation in for variants like Lix where tabs are handled correctly in the magic strings? Does the benchmarks include calling nixfmt (Haskell) with GNU Parallel (or xargs) since it is single-threaded & operating on many files which could be a more reasonable comparison? Can there be an overlay exposed for stable Nix users?

2 Likes

nixfmt-tree is the parallel benchmark in the table and recommended interface of the formatting team, you can see that that userland CPU time between that and the single-threaded benchmark add up roughly. For legacy nix users, pkgs.callPackage <nixfmt-rs/nix/package.nix> {}; should work without introducing the evaluation overhead of an overlay. Just now I am not planning on deviating from how original nixfmt behaves, so this feature needs to lands there first.

2 Likes

Do you mean to publish the parser separately for general use?

2 Likes

Given there is GitHub - nix-community/tree-sitter-nix: Nix grammar for tree-sitter [maintainer=@cstrahan] · GitHub what value does this parser adds?

Ah, nixfmt said they weren’t touching tabs until Nix C++ addresses its bug; since this project looks to not deviate any at this point, that makes sense. It’s good to see no special dependencies are required in the dependencies for it to be easily callPackageable for stable Nix.

What is bug with tabs about?

I’ve had a look at the code and it looks like a 1:1 LLM translation of the existing nixfmt Haskell codebase. Some honesty and clarification on that fact would be greatly appreciated here. As much as I don’t really like Haskell myself, I appreciate that it is a very good tool for the job, and none of the new (unidiomatic) Rust code looks maintainable to me personally.

8 Likes

It seems that the definition of maintainable is changing :smiling_face_with_tear:

1 Like

In continuation of this point, it feels a bit icky that love, labor and energy went into the Haskell version, meanwhile the Rust version was put together using an LLM. Instead of working with the Haskell people and the existing project, now we have two competing ones. I will personally not be switching to nixfmt-rs due to that. And i hope that the Haskell version remains the official version.

6 Likes

Could you please consider joining than the formatting team, because afaik everyone that was able to write Haskell fluently is no longer active (maintenance issue). There is another other lingering issue that we have for years with ghc in nixpkgs: Riscv64 support? · Issue #168 · NixOS/nixfmt · GitHub
I am not a Haskell person so contributing to a Haskell project is out of my comfort zone. I also don’t think upstream would have accepted my radical parser architecture change because it would have been a risky migration. I find the result however worth it (see error messages), so I am glad I did it. I feel more much more comfortable debugging this because if there is ever an issue, I can just drop a dbg!()into the parser and also performance debugging makes more fun because I can just drop in perf or sample.

I also discovered some issues with nixfmt around idempotency and I created a bigger fuzzer corpus reaching now practical 100% coverage that could be used to reduce regressions in the future: nixfmt-rs/fuzz at main · Mic92/nixfmt-rs · GitHub

The Readme there tells you what the issue with nixfmt is.

7 Likes

(Speaking at formatting team lead, I have not talked to other members of the team yet)

  • It’s awesome that you found some bugs in the Haskell implementation. Please do file them upstream as issues. It’s crazy how difficult it is to write a formatter that’s actually idempotent.
  • I’m glad to see experimentation in the space, and I think the goal of remaining drop-in compatible with the official implementation will help both projects improve.

Speaking as a treefmt maintainer: have you investigated the performance difference between nixfmt-rs --check . and treefmt driving nixfmt-rs? Is the cost entirely in exec-ing nixfmt-rs, or is treefmt’s go code just slower at walking the filesystem (relevant: GitHub - jfly/polyglot-walks: Experiments with walking the filesystem in many languages · GitHub)?

1 Like

treefmt is a single-threaded filesystem walker that uses git as a filelist, whereas treefmt-rs uses 12 threads and pipelines formatting while scanning the tree (most of logic is coming from the ignore crate that was written for ripgrep). It also only has to do 1 stat per file, whereas treefmt has to do a stat() after to check if a file has changed.

1 Like

pipelines formatting while scanning the tree

treefmt does this as well

The other differences you cite are accurate, but are you sure all of them are significant? In GitHub - jfly/polyglot-walks: Experiments with walking the filesystem in many languages · GitHub, I experimented with walking the filesystem across a much larger dataset than just nixpkgs, and saw much less of a time difference.

Is it possible the entire difference is that treefmt has to shell out to git to figure out which files to include?

(Sorry for the skepticism and wild guesses without doing any actual experiments myself.)

1 Like

Looks like I got confused because I looked at syscalls with perf stat, not code… So it was actually git that was the sequential/blocking part. Now the overhead of treefmt is much smaller…

Also found a deadlock, that can potential trigger if you press ctrl-C

2 Likes

Perhaps this is SIGINT (ctrl-c) ignored and treefmt (appears to?) hang during long formatting runs · Issue #662 · numtide/treefmt · GitHub? Or something else?

Edit: nevermind, I see this is also in your PR.