nixfmt alpha release

jagajaga · May 12, 2019, 5:23pm

Today we are happy to announce an alpha release of nixfmt, our formatter for Nix code. The work is still in progress, but we can’t help sharing what we’ve done by now, as we consider the project truly important for the community and us as the Nix users. It is released under the MPL 2.0, and contributions are welcome. The project is available at GitHub - serokell/nixfmt: A formatter for Nix code.

At Serokell, we write quite a bit of Nix code, and the lack of tooling surrounding the language has always bothered us. No good formatter existed, and even editor indentation plugins were often lacking. So we set out to change that by ourselves.

We started writing nixfmt, a tool for formatting Nix code in a consistent way. In the project, we’re using popular libraries like megaparsec and text, and we use a custom pretty printer originally based on the prettyprinter library.

Currently, nixfmt parses and formats all of nixpkgs in about 20 seconds on a dual-core CPU and individual files like all-packages.nix in less than a second. To give it a try, clone it and run nix-build. Then run result/nixfmt -w80 some.nix files.nix. For more details, see the README.md.

The tool is not entirely done yet, but the formatting it applies is very consistent and improves overall readability although unattractive in many places. We’re working hard to make it produce good results by the end of the month. In the first place, we aim to reach a nixfmt that generates somewhat pretty code for almost everything. At that point, it will be considered widely usable. Then we’d like to work along with the NixOS community to bring the project up to standard for an officially accepted code formatter.

If you want to see how it works, we deployed it at https://nixfmt.serokell.io. We kindly invite you to help with the final stretch. Feel free to contribute!

tomberek · May 12, 2019, 8:39pm

Let the bikeshedding begin! Consider:

let
  # * What you're seeing here is our nix formatter. It's quite opinionated:
  sample-01 = { lib }: {
    list = [ elem1 elem2 elem3 ] ++ lib.optionals
    stdenv.isDarwin [ elem4 elem5 ]; # and not quite finished
  }; # it will preserve your newlines

Is there a way to have these line continuations be indented further? Otherwise it’s a bit jarring to see a partial statement on a line. I can see that it is far simpler to just re-use the existing context indentation rather than create a new one (either a fixed number of spaces or based on first character after the “=”), but it does look weird.

Overall: amazing work! Thanks for the effort.

domenkozar · May 13, 2019, 3:39am

Cool! As there are more and more Nix formatters, I think it would be a good idea to compare them.

nixfmt comes with its own parser so compared to:

hnix: a lot of effort was put there to have good error messages, but that sacrifices the performance. Both formatters have haskell specific parser. Duplicates more or less the same parser.
nix-tree-sitter: language agnostic parser, written in C for performance, can gradually parse and allow failure (very good for editor support or missing features). Doesn’t work in browsers (yet?).

The good side of nixfmt approach is that:

it can be compiled to JS, although I don’t personally see much benefit in that (as Nix is just not a browser tool), it’s good for demos
supports comments (as it’s implemented as lexer) contrary to hnix parser that lacks comment support

Overall, I still think giving up browser support to use language-agnostic nix-tree-sitter parser is better long term.

GitHub - justinwoo/format-nix: A simple formatter for Nix using tree-sitter-nix. goes into that direction, but the choice of PureScript is really confusing to me: probably a handful of Nix developers can write code in PS. Targeting JS which won’t work in browser (due to tree-sitter not supporting that) - why? Because we can, sure, but not for something I’d wish community dedicates to

my 2c

Infinisil · May 13, 2019, 2:26pm

What I noticed about nixfmt in comparison to hnix is that nixfmt’s parser is simpler, because it doesn’t need to report errors for invalid syntax, it can assume the syntax is valid and do the formatting with that.

domenkozar · May 13, 2019, 2:47pm

Depends on the use case, I think editor integration is crucial so do you really want to parse a big file twice with two parsers?

Infinisil · May 13, 2019, 2:51pm

I just wanted to point out that difference between hnix and nixfmt regarding the parser. But yeah if you need to parse something with full error reporting anyways, then it would be better to use a single one that supports that.

yorickvP · May 13, 2019, 3:18pm

The worst realistic use-case so far is all-packages.nix, which takes 0.5s to format on my computer. Nevertheless, swapping out our parser with a tree-sitter-nix one is probably a good idea .

justinw · May 14, 2019, 7:02am

but the choice of PureScript is really confusing to me: probably a handful of Nix developers can write code in PS.

It hasn’t been that hard for people who didn’t already use PureScipt to contribute actual code changes to format-nix so far. There’s just not that much complicated in the codebase. Otherwise, I have asked people to simply contribute test data with their expectations of what should happen.

Targeting JS which won’t work in browser (due to tree-sitter not supporting that) - why?

Max Brunsfeld posted a demo of tree-sitter in the browser here: x.com

domenkozar · May 14, 2019, 7:21am

Yes, 4 people one of which is me: Contributors to justinwoo/format-nix · GitHub

Either way that doesn’t the answer why PS is a good choice but rather still “why not”

justinw · May 14, 2019, 7:46am

I don’t think people become convinced of this either way

gilligan · May 14, 2019, 12:05pm

https://github.com/justinwoo/format-nix goes into that direction, but the choice of PureScript is really confusing to me: probably a handful of Nix developers can write code in PS. Targeting JS which won’t work in browser (due to tree-sitter not supporting that) - why? Because we can, sure, but not for something I’d wish community dedicates to

I don’t think being able to run the formatter in the browser is in any way part of the objective so I don’t think that matters at all.
Looking at the code I would say PureScript seems like a good fit. Apart from that the Nix community is full of Haskell developers most of which I believe are most probably able to navigate and work on that code base.

While I won’t argue that the language in which the formatter is written in doesn’t matter at all, I would indeed argue that there are other factors that I find even more relevant. Like, does the tool use some established library for parsing for example. (tree sitter seems like a good fit). I guess ideally it would use the actual Nix parser itself, which would of course mean it has to be written in C++ (or maybe Rust) as part of Nix.

My personal preference for the formatter would probably be to not incorporate it into Nix (at least not for now) because that would just definitely slow everything down enormously.

Yes, 4 people one of which is me: Contributors to justinwoo/format-nix · GitHub

Which is one more than nixfmt which is at 3 now - shrug

TL;DR: I personally wouldn’t disqualify something simply because it is written in PureScript. I’d be more skeptical towards using a hand-written parser…

Edit: Actually, I think want to rephrase that: IMHO the nix formatter should either use the Nix parser (ie as part of the Nix code) or use/be based on some existing, maintained library/tooling such as tree-sitter. I think I might actually find that more important than the language used…

domenkozar · May 14, 2019, 12:49pm

Looking at the code I would say PureScript seems like a good fit. Apart from that the Nix community is full of Haskell developers most of which I believe are most probably able to navigate and work on that code base.

So why not just use Haskell then?

Alright, Robert is going to write a formatter using Haskell nix-tree-sitter next week.

That should preserve whitespace and give best editor support. Formatting is actually the easiest part

ryantm · May 14, 2019, 1:02pm

@domenkozar

Why do you want to make your own formatter when the nixfmt creators agree that it is a good idea to swap their parser, and the formatter is written in Haskell?

gilligan · May 14, 2019, 1:14pm

Alright, Robert is going to write a formatter using Haskell nix-tree-sitter next week.

Out of all possible outcomes in the universe that is probably the last one i wanted to achieve.

domenkozar · May 14, 2019, 1:15pm

I’m done arguing, seems best each writes their own formatter and see which of the 54 will be most complete

PS: I want to explain that I just think it’s better to deliver the proposal rather than endlessly discuss, especially since the prototype should be easy to do.

Lucus · May 14, 2019, 1:36pm

I have found formatting to be the hardest part by far. Parsing is just following a specification that is already designed to be handled by computers. Formatting means generating something that humans find acceptable, for which there is no specification.

Note that the parser is based on Megaparsec, a popular parser combinator library. Specifying the Nix grammar for tree sitter should be about the same amount of work.

Regarding hnix, because I needed to store parsed comments somewhere, I decided to store all syntactical tokens, including operators, etc, which introduced enough difference with the hnix parser that I preferred to rewrite it.

Actually, the nixfmt parser does report errors, this is all handled by the megaparsec library. There’s currently a bug that causes these errors to be printed incompletely though.

Lucus · May 14, 2019, 2:17pm

This is the biggest remaining issue. Indenting all expressions or all assignments would result in too much indentation in many cases. It’s hard to design a good algorithm for this, but I certainly intend solve it.

Infinisil · May 14, 2019, 2:21pm

Should I open an issue for all cases where standard nix reports a parse error but nixfmt doesn’t then?

Lucus · May 14, 2019, 2:43pm

Matching the same level of error reporting is not a priority right now. Once the formatting becomes acceptable and the interface is improved a little, we can discuss if this should be a goal for nixfmt. Feel free to create an issue to discuss this.

domenkozar · May 15, 2019, 2:59am

Technically you can use rewrite rules if formatting is also meant to refactor a bit (e.g. uri → string):

As far as the format goes, it’s actually quite simple:

either you decide on one format and design it’s philosophy (I prefer vertical spacing for avoiding ugly git conflicts)
or you go with configurability