Nixfmt-rs: towards nicer parser error messages

Mic92 · May 2, 2026, 5:27pm

One thing that I always liked about the rust compiler was the nice error reporting. Now we had this initiative a couple of years ago for runtime errors in Nix to produce nicer error traces with reference to the source. However using a parser generator for the actual nix code parsing often still comes at the cost of not easy to read errors.

In nixfmt-rs, I therefore chose to implement a hand-written parser. It’s more code but I think the result speaks for itself:

We can hopefully use this output to also enhance feedback in editors. I haven’t look yet at what output the require to show this information inline.

drupol · May 2, 2026, 5:51pm

Amazing, thanks for this ! Going to use it by default now

StepBroBD · May 2, 2026, 6:04pm

Great work! Curious how hard it would be to make the formatter configurable like ocamlformat (e.g. different profiles)? Would it just be adding a bunch of flags and do matches on those flags when producing the final output string (number of spaces, how the lines should be split, etc.)?

I still mainly use nixpkgs-fmt bc imo it reads more ergonomically

Mic92 · May 2, 2026, 6:25pm

I tried to make it fast so performance doesn’t suck when using in nixpkgs while maintaining compatibility. So I rather not want to introduce indirection that make it slower. Also would make testing more difficult. I don’t have too many opinions about what the code should look like as long as it is consistent. Nixpkgs-fmt was fine and I also got used to nixfmt.

xokdvium · May 2, 2026, 6:28pm

Having a manually written recursive descent parser is probably the right way forward for upstream nix too - that would certainly help a lot with error messages. Though the lexer could probably be something like re2c - flex is definitely not the fastest.

toastal · May 2, 2026, 8:29pm

Is this the chance to get tab indentation in for variants like Lix where tabs are handled correctly in the magic strings? Does the benchmarks include calling nixfmt (Haskell) with GNU Parallel (or xargs) since it is single-threaded & operating on many files which could be a more reasonable comparison? Can there be an overlay exposed for stable Nix users?

Mic92 · May 2, 2026, 8:45pm

nixfmt-tree is the parallel benchmark in the table and recommended interface of the formatting team, you can see that that userland CPU time between that and the single-threaded benchmark add up roughly. For legacy nix users, pkgs.callPackage <nixfmt-rs/nix/package.nix> {}; should work without introducing the evaluation overhead of an overlay. Just now I am not planning on deviating from how original nixfmt behaves, so this feature needs to lands there first.

mightyiam · May 3, 2026, 1:20am

Do you mean to publish the parser separately for general use?

Mic92 · May 3, 2026, 7:23am

Given there is GitHub - nix-community/tree-sitter-nix: Nix grammar for tree-sitter [maintainer=@cstrahan] · GitHub what value does this parser adds?

toastal · May 3, 2026, 8:46am

Ah, nixfmt said they weren’t touching tabs until Nix C++ addresses its bug; since this project looks to not deviate any at this point, that makes sense. It’s good to see no special dependencies are required in the dependencies for it to be easily callPackageable for stable Nix.

Mic92 · May 3, 2026, 9:36am

What is bug with tabs about?

piegames · May 3, 2026, 9:37am

I’ve had a look at the code and it looks like a 1:1 LLM translation of the existing nixfmt Haskell codebase. Some honesty and clarification on that fact would be greatly appreciated here. As much as I don’t really like Haskell myself, I appreciate that it is a very good tool for the job, and none of the new (unidiomatic) Rust code looks maintainable to me personally.

mightyiam · May 3, 2026, 10:06am

It seems that the definition of maintainable is changing

MagicRB · May 3, 2026, 10:48am

In continuation of this point, it feels a bit icky that love, labor and energy went into the Haskell version, meanwhile the Rust version was put together using an LLM. Instead of working with the Haskell people and the existing project, now we have two competing ones. I will personally not be switching to nixfmt-rs due to that. And i hope that the Haskell version remains the official version.

Mic92 · May 3, 2026, 12:13pm

Could you please consider joining than the formatting team, because afaik everyone that was able to write Haskell fluently is no longer active (maintenance issue). There is another other lingering issue that we have for years with ghc in nixpkgs: Riscv64 support? · Issue #168 · NixOS/nixfmt · GitHub
I am not a Haskell person so contributing to a Haskell project is out of my comfort zone. I also don’t think upstream would have accepted my radical parser architecture change because it would have been a risky migration. I find the result however worth it (see error messages), so I am glad I did it. I feel more much more comfortable debugging this because if there is ever an issue, I can just drop a dbg!()into the parser and also performance debugging makes more fun because I can just drop in perf or sample.

I also discovered some issues with nixfmt around idempotency and I created a bigger fuzzer corpus reaching now practical 100% coverage that could be used to reduce regressions in the future: nixfmt-rs/fuzz at main · Mic92/nixfmt-rs · GitHub

The Readme there tells you what the issue with nixfmt is.

jfly · May 3, 2026, 6:29pm

(Speaking at formatting team lead, I have not talked to other members of the team yet)

It’s awesome that you found some bugs in the Haskell implementation. Please do file them upstream as issues. It’s crazy how difficult it is to write a formatter that’s actually idempotent.
I’m glad to see experimentation in the space, and I think the goal of remaining drop-in compatible with the official implementation will help both projects improve.

Speaking as a treefmt maintainer: have you investigated the performance difference between nixfmt-rs --check . and treefmt driving nixfmt-rs? Is the cost entirely in exec-ing nixfmt-rs, or is treefmt’s go code just slower at walking the filesystem (relevant: GitHub - jfly/polyglot-walks: Experiments with walking the filesystem in many languages · GitHub)?

Mic92 · May 3, 2026, 7:54pm

treefmt is a single-threaded filesystem walker that uses git as a filelist, whereas treefmt-rs uses 12 threads and pipelines formatting while scanning the tree (most of logic is coming from the ignore crate that was written for ripgrep). It also only has to do 1 stat per file, whereas treefmt has to do a stat() after to check if a file has changed.

jfly · May 3, 2026, 8:15pm

pipelines formatting while scanning the tree

treefmt does this as well

The other differences you cite are accurate, but are you sure all of them are significant? In GitHub - jfly/polyglot-walks: Experiments with walking the filesystem in many languages · GitHub, I experimented with walking the filesystem across a much larger dataset than just nixpkgs, and saw much less of a time difference.

Is it possible the entire difference is that treefmt has to shell out to git to figure out which files to include?

(Sorry for the skepticism and wild guesses without doing any actual experiments myself.)

Mic92 · May 3, 2026, 8:29pm

github.com/numtide/treefmt

walk/git: split ls-files and stat paths concurrently

main ← git-walker-parallel

opened 08:24PM - 03 May 26 UTC

Mic92

+311 -109

walk/git: run ls-files --cached / --others separately and stat in parallel When… --cached and --others are passed together, git only flushes output after the untracked scan is done. On nixpkgs that is ~1.5s before the first formatter can be spawned. After that we still Lstat every path sequentially. So run the two ls-files queries concurrently (the index-only one streams right away) and do the Lstat calls in a worker pool. On nixpkgs with --no-cache and nixfmt-rs this brings the wall time from 3.26s to 1.79s on a 64-core machine; the first formatter spawns after 80ms instead of 1564ms. The same paths are emitted, just no longer in a deterministic order. As far as I can tell nothing in the scheduler depends on that.

Looks like I got confused because I looked at syscalls with perf stat, not code… So it was actually git that was the sequential/blocking part. Now the overhead of treefmt is much smaller…

Also found a deadlock, that can potential trigger if you press ctrl-C

jfly · May 3, 2026, 8:54pm

github.com/numtide/treefmt

SIGINT (ctrl-c) ignored and treefmt (appears to?) hang during long formatting runs

opened 07:33PM - 03 Feb 26 UTC

closed 06:07PM - 05 Feb 26 UTC

jfly

bug

**Describe the bug** Try running `treefmt` in nixpkgs, and then try to kill it …with ctrl-c. It will spit out a bunch of errors and then hang: ```console $ treefmt INFO config: searching for tree root using tree-root-file: .git-blame-ignore-revs INFO config: tree root: /home/jeremy/src/github.com/NixOS/nixpkgs INFO formatter | keep-sorted: 1024 file(s) processed in 451.296319ms ^CERRO formatter | nixf-diagnose: failed to apply with options '[--auto-fix --ignore=sema-unused-def-lambda-noarg-formal --ignore=sema-unused-def-lambda-witharg-arg --ignore=sema-unused-def-lambda-witharg-formal --ignore=sema-unused-def-let --ignore=or-identifier --ignore=sema-primop-removed-prefix --ignore=sema-primop-overridden --ignore=sema-constant-overridden --ignore=sema-primop-unknown]': signal: interrupt ERRO formatter | nixf-diagnose: failed to apply with options '[--auto-fix --ignore=sema-unused-def-lambda-noarg-formal --ignore=sema-unused-def-lambda-witharg-arg --ignore=sema-unused-def-lambda-witharg-formal --ignore=sema-unused-def-let --ignore=or-identifier --ignore=sema-primop-removed-prefix --ignore=sema-primop-overridden --ignore=sema-constant-overridden --ignore=sema-primop-unknown]': signal: interrupt ... [ this goes on for a while, but treefmt never exits ] ... ``` If I run `treefmt -vv`, I see endless spew that shows we're not making any progress: ```console $ treefmt -vv ... ^C ... DEBU read 0 files DEBU walk | cache: read 0 files from delegate DEBU read 0 files DEBU walk | cache: read 0 files from delegate DEBU read 0 files DEBU walk | cache: read 0 files from delegate DEBU read 0 files DEBU walk | cache: read 0 files from delegate DEBU read 0 files DEBU walk | cache: read 0 files from delegate DEBU read 0 files DEBU walk | cache: read 0 files from delegate DEBU read 0 files DEBU walk | cache: read 0 files from delegate DEBU read 0 files DEBU walk | cache: read 0 files from delegate DEBU read 0 files DEBU walk | cache: read 0 files from delegate ``` **To Reproduce** Steps to reproduce the behavior: 1. Run `treefmt` in a large project (such as `nixpkgs`)` 2. Try to kill it with a SIGINT. **Expected behavior** treefmt should exit nonzero, but it *should exist*. **System information** N/A **Additional context** N/A

Perhaps this is SIGINT (ctrl-c) ignored and treefmt (appears to?) hang during long formatting runs · Issue #662 · numtide/treefmt · GitHub? Or something else?

Edit: nevermind, I see this is also in your PR.