Nix 2.4, and what’s next?

I’d like to give some hindsight from the point-of-view of a Nix developper as to what’s happening there, and also try and spot what can be improved and how.

Note that this reply is just my personal opinion as a Nix contributor. I’m not speaking on behalf of any “Nix dev team” or whatever (if only because − to may great sadness − no such thing exists).

I also certainly don’t intend this as a defense of the current state of things.
I do agree that while there’s some awesome work being done, there’s also issues (although I might not agree with all the ones that are exposed here), and I’m eager to work on trying to improve whatever can be.

Answering the letter point-by-point:

Backward incompatible changes

Basic stable features and interfaces have received backwards-incompatible changes.

That’s indeed true (with the caveat that well, it had been several years without a release and the diff between 2.3 and 2.4 is insanely big).

I think there’s several reasons for these:

  • Some of these bugs are actually hitting some under-specified part of the Nix semantics. For example I’m not sure that the behavior broken in https://github.com/NixOS/nix/issues/4785 has ever been explicitely intended. So the breakage couldn’t really have been caught in any other way than having people use it and notice that it was broken for them.
  • Part of these are just do to the testsuite being generally bad − it’s both quite slow and with a low coverage (though it has improved a lot I think during the few years I’ve been involved in the development of Nix)
  • Some are also just hard problems to track. In particular ensuring the compatibility between every version of the client and the daemon in every possible situation is a lot of work. Not to say that there’s not some room for improvements (and I think that to some extent it’s some work that’s really worth doing if only to specify what should happen in all these cases), but it’s not a trivial problem.
    Also, this part of the codebase suffers from some heavy technical dept because the protocol used beteween the different Nix instances is a custom hard-to-debug thing (trust me, I’ve suffered it). While it was probably a sensible choice when it was invented, it’s a technical dept that we have to bear now.

Additionally, response time to the breaking changes has been bad.
The issue that needs to be addressed is the apparent lack of work to fix backward incompatible changes.

That’s true. And I think this is pointing to a crucial problem in the development of Nix.
There are some people who try to tackle these issues, but nobody (except of course @edolstra) has any legitimity/duty in actually doing that.
Which means (at least for me) that I don’t really feel enticed to even look at issues that aren’t within my immediate reach (either because they are tied to some part of the codebase that I’m not utterly familiar with, or because they involve some design decision that I’m not 100% confident to make by myself).
And since @edolstra didn’t (yet, to the top of my knowledge) develop any super-power, most of these stay unanswered or at least unsolved because there’s nobody to take care of them.
The nix core team was an attempt at fixing this issue (more than 3y ago already), but it didn’t go anywhere unfortunately and got disbanded. Maybe it would be time to resurrect something similar.

nix command

I haven’t been directly involved in this (as far as I remember, most of the changes actually took place before I started touching Nix actually), so I wouldn’t comment too much on this.

I definitely think that there has been a big communication issue indeed. And that’s unfortunate.

Maybe something to take out of this is that Nix developers (me included ofc) should be more careful with what ends-up on master − and even more on releases.
That’s why the --experimental-features flag has been introduced. I have the feeling that it’s a good tradeof (though it could be refined in many ways) between releasing unfinished stuff and having to deal with long-running forks when developping a big feature.
But if ppl have better ideas, I’d be interested to hear them.

Many nix commands are now Flakes-centric, even when Flakes are not enabled

Indeed. But that’s also the whole point of the new CLI.
Whether this is a good choice or not is obviously debatable, but Flakes probably wouldn’t make any sense without being a first-class citizen on the primary interface that the CLI represents.

See also

Not strictly breaking changes, but lack of response to major behaviour change
Temporary build directories not cleaned up because they are not empty · Issue #5207 · NixOS/nix · GitHub

I definitely wouldn’t qualify that as a “major behaviour change”.
It’s definitely a somewhat annoying bug (I hit it every once in a while too), but I don’t think this issue really deserves its place here.

Poor testing without experimental features

Tests for Nix always run with flakes and nix-command experimental features enabled.
This is not how experimental features should be tested.

Yup’, this brings us back to my point about “the testsuite being generally bad” above.
The way everything works would require a giant test matrix, running the entire testsuite (as much as makes sense) along all the following axis:

  • Client version
  • Daemon version
  • Remote builder version
  • Xp features on the client
  • Xp features on the daemon
  • Xp features on the remote builder

(plus the same thing for the non-daemon case)

This is not fundamentally impossible, but way beyond what we have right now, which is:

  • Only one client version
  • A couple of daemon versions
  • Only one remote builder version
  • Everything tested with the same set of XP features, except locally (the testsuite for ca-derivations in particular replicates most of the standard testuite but with the ca-derivations feature enabled).

In addition to the technical difficulties in making all that work, there’s also an issue with the CI wall time (which is already way too long in my opinion), and potentially we’d also reach some scalability issues wrt. the free tier of Gh actions, etc…

Finally, some basic functions are broken

Well, that’s actually a point for the section above about the nix command.
The UX of nix search without flakes is indeed awful, but that’s not really a matter of (automated) testing.

“Breaking” nature of Flakes development

We believe the current approach at implementing Nix Flakes is made at the cost of non-Flakes Nix use.

I think the main motivation behind this argument is the fact that the new CLI is very flakes-oriented.
Meaning that non-flake users can’t use it.

Let me first explain this from a technical point-of-view:

This is true, but is also missing the point.
Flakes are not really a thing by themselves. They only make sense as part of a global cohesive interface.
The design of the CLI is indeed tied to flakes, but the design of flakes is also tied to the constraints of the CLI.
So developing them separately would be a mistake because you’d end-up with two different levels with two different sets of abstraction.
Obviously, there could have been a separate flake-specific cli, but that means doubling the maintenance work to handle both of them.
And it happens that the main developper of flakes is also the only real maintener of Nix, and there’s only so much one man can do.
(and you could blame him for developing flakes rather than working on other stuff, but well…).
So keeping flakes external wouldn’t have prevented “having to opt into Flakes to benefit from the basic improvements”. It would just have prevented the improvements from happening because of a lack of manpower (or organisation, but then let’s tackle the organisational issue rather than just fighting over the red herring that flakes represent).

Development of the feature in the main development branch was tolerated, under the assumption that it would not cause deleterious effects on the quality of Nix itself.

What (except again for the CLI changes) are these “deleterious effects on the quality of Nix” (genuine question, I don’t see them, but I trust they exist)?

What should be done right now?

We will be brutally honest. The upcoming NixOS upgrade with 2.4 will be bad

Well, at least it is. Much better than being stuck in oblivion forever.
Numbering it 3.0 would probably have been better, but what’s done is done.

We fear the trust of stable Nix, and stable NixOS users may be irreparably breached if Nix 2.4, a minor release, ends-up being incompatible with their existing setups.

That’s a pretty bad message indeed, but much better than having a project actively developed but without any release.
And except for ppl being hit by the CLI change, most of the highlighted issues only touch a handful of users.
A lot of stuff needs to be improved, but the world isn’t coming to an end either.

Where to go next from here?

We also believe that the broad issues here show a problem with the development practices, and it must be figured out

Yes.

Still, here’s an unordered list of propositions to start from

  • Testing needs to consider all experimental features, and needs to test for correct behaviour with and without the feature enabled.
  • Changing basic language features (builtins included) should be done through compartmentalized change sets for better community review and testing. The changes should also come with tests especially for added conditionals.

Yes, and yes

  • Plugins should be used to provide experimental changes where new language-based semantics isn’t enough.

Well, there’s only so much that plugins can do, and except in some very rare cases (a big change in on of the few areas that plugins cover), that would probably be a huge amount of work for little savings (Imho plugins barely add any value compared to just forking Nix).
But that’s why the experimental features machinery has been added, for the cases where maintaining a fork is either too complex or not worth it.

We believe it would be better to put the new and exciting features work on a brief pause, and sort out the last of the testing infrastructure, to get a better handle on what is changing.

Well, I half agree here (but if you’re a pessimisic, you’ll notice that I also half-disagree).
Except maybe for some protocol-related issues (which I’m probably the most guilty of as the ca-derivations work had to extend it a lot), most of the issues you mention here aren’t due to the “new and exciting features work”, but to either some deeply needed refactorings (which I think prevented overall way more bugs than what they created) or some external contributions from individuals which were fixing some concrete issues. And I certainly wouldn’t want to reject these.
(Note that this isn’t blaming occasional contributors at all. I rather think that usual contributors are much less hit by le legitimity issue I mention at the begining when the bug is due to their work, so these issues are fixed sooner).

So although I agree that working on the test infra is much needed, I don’t think just stoping the development of new features is gonna substantially help.
If nothing else it will bore the few regular contributors, and we won’t be any better off.

We sincerely believe that the upcoming change to calendar-based versioning could be harmful to the sustained quality of Nix without first tightening up the development workflow such that it will not cause constant breakage for stable features.

I strongly disagree here.
I think that the calendar-based versioning will help a lot:

  • Properly handling all the breaking changes accumulated during 2y of development with an insufficient test infrastructure is plain impossible (hence the arguably bad 2.4 release).
    Otoh, handling the breaking changes introduced in a 6w cycle is totally manageable. Obviously that’s not correct math at all, but assuming there has been 15 serious breaking changes during the last release (which if we exclude duplicates is quite an over-approximation), that means less than one every 6w cycle. Which, again, is obviously a wrong approximation (that’s definitely not a linear thing), but my point is that it would make releases much more manageable (though still painful)
  • Having frequent(-ish) slightly painful releases will be a very efficient reminder if the test infrastructure and process isn’t good-enough.
11 Likes