Nix 2.4, and what’s next?

:information_source: Preface

Let us start with this message for people less involved in Nix things:

We like Nix, we are overall positive about Nix, and this is why we are critical in this piece. Do not let any of this deter any of you from using Nix. Nix is great. We believe Nix will continue to be great, but diminished, even if none of these concerns are addressed.

This is written with the intent to be a wake-up call. There is no cause for alarm at this moment, especially if you’re not deeply embedded into Nix.


We, signatories of this open letter, think it’s time to take a look at what has happened with the 2.4 development cycle, the 2.4 release, and how it will affect users.

This is written from the point of view of end-users, and not from the point of view of Nix developers. End-users use the Nix tools, and write Nix expressions.

First, we want to acknowledge that the development cycle was in stuck in a bad place. We understand the desire to release something. A new Nix release was long overdue. We assume it was a hard situation to be in.

The 2.4 release has been divisive. It seems a lot of the community was already using nixUnstable, and we expect none of those users have had any issues with the non-upgrade to “nixStable” 2.4.

Users coming from 2.3 and upgrading to 2.4 were thrown into an ice-cold pool.

Overall, these issues need to be discussed and resolved.

  • Backward incompatible changes
  • Poor testing without experimental features
  • “Breaking” nature of Flakes development

Ending with

  • What should be done right now?
  • Where to go next from here?

Note that we are purposefully not talking about the design of Flakes. The design of Flakes is out of scope.


Backward incompatible changes

Basic stable features and interfaces have received backwards-incompatible changes. Additionally, response time to the breaking changes has been bad.

We understand that breakage can happen. The issue that needs to be addressed is the apparent lack of work to fix backward incompatible changes.

nix command

The messaging on whether to embrace or avoid the experimental features starting with the nix command was unclear. In practice, users started using the nix commands.

While some of us are understanding of the change making the nix command experimental, it was a de-facto interface that users relied on. It would have been better to have an intermediary release that marked it as an experimental feature without changing its use. This would have reduced the scope of user-perceived breakage in a subsequent Nix 2.5 release that would have been exactly like the 2.4 we got.

Like it or not, far too many users were impacted by breakage in its basic behaviour. Users relied on the behaviour. These became closer to de-facto stable through usage in official Nixpkgs and NixOS documentation. It may have been too late to do a major re-design of the UX.

We understand that this is an irreconcilable truth; the command has been purposefully listed as unstable, and the UX specifically listed as to be revised and finalised.

Many nix commands are now Flakes-centric, even when Flakes are not enabled.

As a central example in user reports, the nix search UX is broken by default without buying into Flakes.

See also

Not strictly breaking changes, but lack of response to major behaviour change.

Poor testing without experimental features

Tests for Nix always run with flakes and nix-command experimental features enabled.

This is not how experimental features should be tested.

First, commands were marked with the wrong feature, since it was not tested whether the command worked with the minimal subset of features:

Then, additional features were unclearly defined:

Then, it was found that some experimental features were not marked as such:

Finally, some basic functions are broken

“Breaking” nature of Flakes development

Reminder: This is not about the design of Flakes.

We believe the current approach at implementing Nix Flakes is made at the cost of non-Flakes Nix use.

Flakes has not been accepted as the future of Nix development through the RFC process.

Development of the feature in the main development branch was tolerated, under the assumption that it would not cause deleterious effects on the quality of Nix itself.

In reality, Flakes was implemented “from the top down”. Basically shoved into Nix, transforming it towards a vaguely specified vision.

The community is still divided into whether the Flakes vision is the right way forward or not. The community is, as far as we understand not against most of the overarching goals of Flakes.

Having to opt into Flakes to benefit from the basic improvements is a sign of lack of care about abstractions.

Also note previous grievances about the nix command-line being changed beyond recognition in the name of Flakes.

What should be done right now?

We will be brutally honest. The upcoming NixOS upgrade with 2.4 will be bad. We think that Nix should be reverted back to 2.3 in NixOS, or issues resolved in an impossible timeline.

Users who were not following the nixUnstable development are less likely to have migrated their infrastructure to whatever was needed for Flakes. They will be the worst hit individuals. We fear the trust of stable Nix, and stable NixOS users may be irreparably breached if Nix 2.4, a minor release, ends-up being incompatible with their existing setups.

From this moment, we expect to see announced, first, a stop-gap plan to help with the current situation, and later a proper plan to prevent such a situation to happen again.

Where to go next from here?

We also believe that the broad issues here show a problem with the development practices, and it must be figured out. We do not have a single easy solution. It is a harder problem that needs to be worked from multiple angles. And really, it should be worked on from within.

Still, here’s an unordered list of propositions to start from:

  • Testing needs to consider all experimental features, and needs to test for correct behaviour with and without the feature enabled.
  • Changing basic language features (builtins included) should be done through compartmentalized change sets for better community review and testing. The changes should also come with tests especially for added conditionals.
  • New features shouldn’t be implemented “from the top down”.
    • Basic generic building blocks should be added, if missing, to the language or to the interpreter.
    • Plugins should be used to provide experimental changes where new language-based semantics isn’t enough.
    • Implementation of abstractions should be prototyped in Nix on top of the new abstractions.

We want to acknowledge there were in fact many improvements to the test suite and developer infra, exactly the kind of work needed to prevent these problems. This is a good start. We believe it would be better to put the new and exciting features work on a brief pause, and sort out the last of the testing infrastructure, to get a better handle on what is changing.

We sincerely believe that the upcoming change to calendar-based versioning could be harmful to the sustained quality of Nix without first tightening up the development workflow such that it will not cause constant breakage for stable features. Paradoxically, We also believe that the same calendar-based versioning is key to prevent a scope creep similar that happened with Nix 2.4 from happening again.


Signatories

(Alphabetically ordered)

:arrow_right: Note

Community members are invited to edit the first reply to this thread to add their names to the list of signatories, if they agree with the message. Signing pseudonymously with your usual community pseudonym is accepted.

52 Likes

Additional signatories

  • Peter Hoeg ( peterhoeg )
  • Norbert Melzer ( NobbZ )
  • nicoo ( nicoo )
  • NinjaTrappeur ( ninjatrappeur )
  • Julien Moutinho ( julm )
  • Yureka Lilian (yureka)
  • Martin Weinelt (hexa)
  • sterni (sternenseemann)
  • Daniel Olsen (Dandellion)
  • Profpatsch (Profpatsch)
  • Janne Heß (das_j)
  • Tobias Pflug (gilligan)
  • Ryan Lahfa (RaitoBezarius)
  • Ben Siraphob (siraben)
  • Daniël de Kok (danieldk)
  • Thomas Depierre (dianaolympos)
  • Nicolas B. Pierron (nbp)
1 Like

(on behalf of RMs, this was coordinated and drafted beforehand) The user experience of non-flakes 2.4 is a great concern and there are enough regressions in behavior to make us reconsider. The reports of breakages are useful and helps Nix developers prioritize fixes. This will also give the new experimental features a chance to mature prior to the the next 22.05 release. Our current preference is to revert the default Nix version to 2.3 ONLY for the 21.11 release which will show up as nixos-21.11, nixos-21.11-small, and nixpkgs-21.11-darwin. Users of nixpkgs-unstable, nixos-unstable, and nixos-unstable-small branches will continue to use Nix2.4 as default. The community benefits from testing by users who deploy with various combinations of old/new client/daemon or complex setups. Please send in reports and make issues! The reports of breakages from users provide valuable feedback and provide the right kind of pressure for continued development.

This actual proposed fix will likely be something this simple and something in the release notes.

diff --git a/pkgs/tools/package-management/nix/default.nix b/pkgs/tools/package-management/nix/default.nix
index f365348607a..463ebe49f3d 100644
--- a/pkgs/tools/package-management/nix/default.nix
+++ b/pkgs/tools/package-management/nix/default.nix
@@ -220,7 +220,7 @@ in rec {
 
   nix = nixStable;
 
-  nixStable = nix_2_4;
+  nixStable = nix_2_3;
 
   nix_2_3 = callPackage common (rec {
     pname = "nix";
11 Likes

Also nix log is/was broken for the longest time for remote builds.

I am not sure how this passed through. I am doing nix-shell() { command nix-shell -p nix_2_3 --run nix search $*" ; } for the longest time right now.


I recently started to convert to flakes. It may surprise you but I have absolutely no idea about functional programing languages. My experience was rocky at best and I set evenings creating two basic flake.nix files which do nothing more than importing my configuration.nix and home.nix. For someone that is so deeply involved into nix and having 10s of repos as examples this was not great.

8 Likes

And it’s fine!

My only experience with functional programming was through functional programming concepts in imperative languages (map, reduce and similar things) before using Nix. So no, I’m not surprised. And I think no one can assume FP experience from any new users :).

5 Likes

My experience was rocky at best and I set evenings creating two basic flake.nix files which do nothing more than importing my configuration.nix and home.nix. For someone that is so deeply involved into nix and having 10s of repos as examples this was not great.

This is a problem with Nix in general (i.e. the language has a steep learning curve) and something we’re thinking about in the UX team. Ultimately the goal is to move away from the need to write code in a functional language, e.g. it should be possible to write a simple flake.toml without any magic boilerplate code.

10 Likes

As someone who had very little functional programming experience before nix I’m really not sure why I hear so many bad things about the language. I like it because it’s such a simple language to work with. Just me I guess :man_shrugging: I hope nix will continue to remain an option.

As always, thanks for everything @edolstra :sparkling_heart:

31 Likes

I’d like to give some hindsight from the point-of-view of a Nix developper as to what’s happening there, and also try and spot what can be improved and how.

Note that this reply is just my personal opinion as a Nix contributor. I’m not speaking on behalf of any “Nix dev team” or whatever (if only because − to may great sadness − no such thing exists).

I also certainly don’t intend this as a defense of the current state of things.
I do agree that while there’s some awesome work being done, there’s also issues (although I might not agree with all the ones that are exposed here), and I’m eager to work on trying to improve whatever can be.

Answering the letter point-by-point:

Backward incompatible changes

Basic stable features and interfaces have received backwards-incompatible changes.

That’s indeed true (with the caveat that well, it had been several years without a release and the diff between 2.3 and 2.4 is insanely big).

I think there’s several reasons for these:

  • Some of these bugs are actually hitting some under-specified part of the Nix semantics. For example I’m not sure that the behavior broken in https://github.com/NixOS/nix/issues/4785 has ever been explicitely intended. So the breakage couldn’t really have been caught in any other way than having people use it and notice that it was broken for them.
  • Part of these are just do to the testsuite being generally bad − it’s both quite slow and with a low coverage (though it has improved a lot I think during the few years I’ve been involved in the development of Nix)
  • Some are also just hard problems to track. In particular ensuring the compatibility between every version of the client and the daemon in every possible situation is a lot of work. Not to say that there’s not some room for improvements (and I think that to some extent it’s some work that’s really worth doing if only to specify what should happen in all these cases), but it’s not a trivial problem.
    Also, this part of the codebase suffers from some heavy technical dept because the protocol used beteween the different Nix instances is a custom hard-to-debug thing (trust me, I’ve suffered it). While it was probably a sensible choice when it was invented, it’s a technical dept that we have to bear now.

Additionally, response time to the breaking changes has been bad.
The issue that needs to be addressed is the apparent lack of work to fix backward incompatible changes.

That’s true. And I think this is pointing to a crucial problem in the development of Nix.
There are some people who try to tackle these issues, but nobody (except of course @edolstra) has any legitimity/duty in actually doing that.
Which means (at least for me) that I don’t really feel enticed to even look at issues that aren’t within my immediate reach (either because they are tied to some part of the codebase that I’m not utterly familiar with, or because they involve some design decision that I’m not 100% confident to make by myself).
And since @edolstra didn’t (yet, to the top of my knowledge) develop any super-power, most of these stay unanswered or at least unsolved because there’s nobody to take care of them.
The nix core team was an attempt at fixing this issue (more than 3y ago already), but it didn’t go anywhere unfortunately and got disbanded. Maybe it would be time to resurrect something similar.

nix command

I haven’t been directly involved in this (as far as I remember, most of the changes actually took place before I started touching Nix actually), so I wouldn’t comment too much on this.

I definitely think that there has been a big communication issue indeed. And that’s unfortunate.

Maybe something to take out of this is that Nix developers (me included ofc) should be more careful with what ends-up on master − and even more on releases.
That’s why the --experimental-features flag has been introduced. I have the feeling that it’s a good tradeof (though it could be refined in many ways) between releasing unfinished stuff and having to deal with long-running forks when developping a big feature.
But if ppl have better ideas, I’d be interested to hear them.

Many nix commands are now Flakes-centric, even when Flakes are not enabled

Indeed. But that’s also the whole point of the new CLI.
Whether this is a good choice or not is obviously debatable, but Flakes probably wouldn’t make any sense without being a first-class citizen on the primary interface that the CLI represents.

See also

Not strictly breaking changes, but lack of response to major behaviour change
Temporary build directories not cleaned up because they are not empty · Issue #5207 · NixOS/nix · GitHub

I definitely wouldn’t qualify that as a “major behaviour change”.
It’s definitely a somewhat annoying bug (I hit it every once in a while too), but I don’t think this issue really deserves its place here.

Poor testing without experimental features

Tests for Nix always run with flakes and nix-command experimental features enabled.
This is not how experimental features should be tested.

Yup’, this brings us back to my point about “the testsuite being generally bad” above.
The way everything works would require a giant test matrix, running the entire testsuite (as much as makes sense) along all the following axis:

  • Client version
  • Daemon version
  • Remote builder version
  • Xp features on the client
  • Xp features on the daemon
  • Xp features on the remote builder

(plus the same thing for the non-daemon case)

This is not fundamentally impossible, but way beyond what we have right now, which is:

  • Only one client version
  • A couple of daemon versions
  • Only one remote builder version
  • Everything tested with the same set of XP features, except locally (the testsuite for ca-derivations in particular replicates most of the standard testuite but with the ca-derivations feature enabled).

In addition to the technical difficulties in making all that work, there’s also an issue with the CI wall time (which is already way too long in my opinion), and potentially we’d also reach some scalability issues wrt. the free tier of Gh actions, etc…

Finally, some basic functions are broken

Well, that’s actually a point for the section above about the nix command.
The UX of nix search without flakes is indeed awful, but that’s not really a matter of (automated) testing.

“Breaking” nature of Flakes development

We believe the current approach at implementing Nix Flakes is made at the cost of non-Flakes Nix use.

I think the main motivation behind this argument is the fact that the new CLI is very flakes-oriented.
Meaning that non-flake users can’t use it.

Let me first explain this from a technical point-of-view:

This is true, but is also missing the point.
Flakes are not really a thing by themselves. They only make sense as part of a global cohesive interface.
The design of the CLI is indeed tied to flakes, but the design of flakes is also tied to the constraints of the CLI.
So developing them separately would be a mistake because you’d end-up with two different levels with two different sets of abstraction.
Obviously, there could have been a separate flake-specific cli, but that means doubling the maintenance work to handle both of them.
And it happens that the main developper of flakes is also the only real maintener of Nix, and there’s only so much one man can do.
(and you could blame him for developing flakes rather than working on other stuff, but well…).
So keeping flakes external wouldn’t have prevented “having to opt into Flakes to benefit from the basic improvements”. It would just have prevented the improvements from happening because of a lack of manpower (or organisation, but then let’s tackle the organisational issue rather than just fighting over the red herring that flakes represent).

Development of the feature in the main development branch was tolerated, under the assumption that it would not cause deleterious effects on the quality of Nix itself.

What (except again for the CLI changes) are these “deleterious effects on the quality of Nix” (genuine question, I don’t see them, but I trust they exist)?

What should be done right now?

We will be brutally honest. The upcoming NixOS upgrade with 2.4 will be bad

Well, at least it is. Much better than being stuck in oblivion forever.
Numbering it 3.0 would probably have been better, but what’s done is done.

We fear the trust of stable Nix, and stable NixOS users may be irreparably breached if Nix 2.4, a minor release, ends-up being incompatible with their existing setups.

That’s a pretty bad message indeed, but much better than having a project actively developed but without any release.
And except for ppl being hit by the CLI change, most of the highlighted issues only touch a handful of users.
A lot of stuff needs to be improved, but the world isn’t coming to an end either.

Where to go next from here?

We also believe that the broad issues here show a problem with the development practices, and it must be figured out

Yes.

Still, here’s an unordered list of propositions to start from

  • Testing needs to consider all experimental features, and needs to test for correct behaviour with and without the feature enabled.
  • Changing basic language features (builtins included) should be done through compartmentalized change sets for better community review and testing. The changes should also come with tests especially for added conditionals.

Yes, and yes

  • Plugins should be used to provide experimental changes where new language-based semantics isn’t enough.

Well, there’s only so much that plugins can do, and except in some very rare cases (a big change in on of the few areas that plugins cover), that would probably be a huge amount of work for little savings (Imho plugins barely add any value compared to just forking Nix).
But that’s why the experimental features machinery has been added, for the cases where maintaining a fork is either too complex or not worth it.

We believe it would be better to put the new and exciting features work on a brief pause, and sort out the last of the testing infrastructure, to get a better handle on what is changing.

Well, I half agree here (but if you’re a pessimisic, you’ll notice that I also half-disagree).
Except maybe for some protocol-related issues (which I’m probably the most guilty of as the ca-derivations work had to extend it a lot), most of the issues you mention here aren’t due to the “new and exciting features work”, but to either some deeply needed refactorings (which I think prevented overall way more bugs than what they created) or some external contributions from individuals which were fixing some concrete issues. And I certainly wouldn’t want to reject these.
(Note that this isn’t blaming occasional contributors at all. I rather think that usual contributors are much less hit by le legitimity issue I mention at the begining when the bug is due to their work, so these issues are fixed sooner).

So although I agree that working on the test infra is much needed, I don’t think just stoping the development of new features is gonna substantially help.
If nothing else it will bore the few regular contributors, and we won’t be any better off.

We sincerely believe that the upcoming change to calendar-based versioning could be harmful to the sustained quality of Nix without first tightening up the development workflow such that it will not cause constant breakage for stable features.

I strongly disagree here.
I think that the calendar-based versioning will help a lot:

  • Properly handling all the breaking changes accumulated during 2y of development with an insufficient test infrastructure is plain impossible (hence the arguably bad 2.4 release).
    Otoh, handling the breaking changes introduced in a 6w cycle is totally manageable. Obviously that’s not correct math at all, but assuming there has been 15 serious breaking changes during the last release (which if we exclude duplicates is quite an over-approximation), that means less than one every 6w cycle. Which, again, is obviously a wrong approximation (that’s definitely not a linear thing), but my point is that it would make releases much more manageable (though still painful)
  • Having frequent(-ish) slightly painful releases will be a very efficient reminder if the test infrastructure and process isn’t good-enough.
11 Likes

I don’t think the problem is inherent in making flakes a “first-class citizen” but rather:

  • non-flakes user are now de facto second-class citizens; there are a lot of ways one could avoid breaking the UX, but apparently people didn’t care to:
    • make the nix search CLI to default to searching nixpkgs when not given an “installable” (going with the manpage’s nomenclature here) ;
    • similarly provide sensible defaults on other subcommands, when not in a flakes context ;
    • not fetching/caching/evaluating flake-registry when not doing flakey things (amusingly, the only change in that registry over the past year has been to remove the “legacy nixpkgs entry”; wonder how useful that thing really is)
  • the flakes functionality was developed in a way that caused lots of issues to creep into the nix 2.4 release, including a massive performance regression (that I already reported):
    (on my system) calls to the nix CLI take 9s to do absolutely nothing (even giving out an error over an unknown CLI flag), or ~30s if flake-registry needs to be “refreshed”.

I believe you missed the next sentence in the OP:

Yes, having an experimental feature cause development of nix diverge (from stable releases) for 2 years is bad, and shipping that many changes in a single release is bound to cause issues.

As far as I understand, the proposal is to first fix the development workflow of nix, including QA and managing experimental features, before moving to a calendar-based release schedule.

That lines up pretty well with my experience of what it takes to make calendar versioning work:

  • continuous QA (incl. but not limited to CI) so the development branch is ~always in a release-worthy state;
  • managing larger (esp. user-facing) changes so they can be gated behind explicit opt-in, and structuring the test suite in such a way that all (relevant) functionality is tested with and without the new feature;
  • getting people to break down larger ideas into smaller components that can each be developed and shipped within the time-frame of a single release, rather than drag along large, long-diverging feature branches.
8 Likes

It was nice reading @thufschmitt’s reply, because @thufschmitt has done more work to shore up the development workflow of Nix in recent times than anyone else. I think.

I guess for me — and I am trying to not let my general anti-flakes sentiments seep in here :slight_smile: — It would have been ideal if 2.4 had contained all the shoring up and CA derivations work, but little flakes work.

Some of these bugs are actually hitting some under-specified part of the Nix semantics.

Exactly the libnixstore stuff is now well understood, but we have all these peripheral builtins.fetch*, extra caches, other things, that I think is just too much new functionality too quick.

And yes, while the controveries picked up with flakes, to me the builtin.fetch* stuff in Nix 2.0 is where I started to get unconfortable. I am just a moduality/ayering purest that doesn’t want “non-replaceable batteries included” tools. Less of this stuff in Nix 2.3 would have also prevented 2.4 from being as breaking, simply because there was less to break!

If we had spent that same effort that went into flakes on that on the further improvements to the test suite @thufschmitt outlined, and cleaning up technical debt more broadly, we would have a really strong and really uncontroversial release.

We could always have instead done Flakes afterwords for Nix 2.5, now with a much better foundation in 2.4, and --experimental features nix-command there to make clear that, yes, the nix command is indeed changing.

Obviously, there could have been a separate flake-specific cli, but that means doubling the maintenance work to handle both of them.

I’ve been thinking about this, and I am increasinglingly in favor of trying to revive nix-with-flakes as a separate executable. I think it need not be so prohibitively expensive thanks to the way Installable works – we can share much of the same CLI, and and just make only the “with flakes” exe parsing flake # arguments for the upstream commands that they both share.

12 Likes

The revert is becoming a bit more involved than expected and has potential to cause its own breakages, I’d like some more review of [21.11] nixStable: 2.4 -> 2.3.16 by tomberek · Pull Request #147511 · NixOS/nixpkgs · GitHub.

(While we’re at it, nixosTests.keymap.qwertz fails on aarch64-linux · Issue #147294 · NixOS/nixpkgs · GitHub could also use some help for aarch64.)

3 Likes

I agree that from a user experience point of view flakes are conceptually needed even when not activated as a command, one example is of course the new semantics of nix search, but is keeping the 2.4 release “in limbo” a good answer to that?
The people that want to use the 2.3 release can do it by explicitly installing it.
From my point of view there are so many new things in the 2.4 release that some breakage was inevitable and from what I can understand this is due to the exceptionally long time since the last major release.
The other thing I wish I knew better is the “economics and funding” of the Nix development and its relation with the NixOS foundation, if they are related at all. I’m at the margin of the community but sometimes it seems to me like there isn’t a common goal between the people who spend their time developing Nix (for which I’m very grateful) and what the community mumbles it’s better but maybe without having the resources in terms of (paid?) time and knowledge and skills to sit down and actually make it happen.
Maybe a better funding could help in steering the development towards a better common goal?

6 Likes

This captures really concisely something I noticed about Flakes when I first tried them, prior to the 2.4 release:

Because the Flakes interface requires you to write many more things explicitly, having a concise and portable flake.nix file requires you to experience Nix-as-code to a much greater degree than pre-flakes Nix configurations, where if your needs are simple, Nix can really feel like a mostly inert configuration language (even though, of course, it isn’t).

Imo the interface presented by flake-utils-plus goes 60-70% of the way toward restoring that balance where when your needs are simple, Nix can feel like a simple configuration language again.

(I say all of this as an overall fan of the Nix language (and lately, of Flakes, too), and as someone who picked up Nix in college when I had little to no FP experience.)

6 Likes

I’m considering myself a newbie. When I (recently) switched to NixOS my only requirement was that it should “just work” out of the box, but actually this wasn’t the case. From my POV Nix always was an experiment and I doubt this will change in a foreseeable future.

I’ve started from its UI, then I needed to learn its language and now I’m even reading edolstra’s PhD. I think it’s worth that time investment. After I’ve upgraded to a previous version of NixOS I’ve already required to adapt my config.nix to account for incompatible changes anyway. I’m still considering myself an end-user, but I don’t think that anyone could potentially avoid such breaking changes anyway. That’s how NixOS development is done. Just look at GitHub issues and PRs, many are untouched for years.

I think that too rosy expectations are just unrealistic. You just can’t beat QA of a well financed Linux distribution. It’s already amazing how much is done with so little financial support.

Just my two cents.

3 Likes

That’s actually a very good point.
It’s interesting to note that most of the “big” recent Nix changes (the “new and exciting features”) happened because some company funded the work (Flakes initial design and work by Target, CA derivations first by Groq and then Tweag and I think the IPFS foundation, Eelco’s current work on ACLs by Flox, etc.).

Bug getting funding for just the grunt maintenance work is a much harder thing to get because it’s not really something that most companies will be willing to spend their own money on. I think that most of the companies that actually employ the ppl working on Nix also give them some time to work on some more generic Nix maintenance (at least that’s what Tweag does), and that’s probably how most of the Nix maintenance happens, but these are generally small-ish consultancies that only have a limited amount of available money.

Maybe that’s something where the community could help indeed, like what’s been done with Nix 🖤 macOS - Open Collective

12 Likes

I kind of agree with @thufschmitt that maintenance is an issue. I don’t think that money is a solution to the problem at all right now. At least not until we solve various other issues.

The primary issue from my perspective is that there aren’t enough people empowered to help move the Nix project forward.

We rely on a very small set of people, where only a few are actually merging changes that aren’t typos or CI fixes.

Guiding contributors through the process of making a change is crucial. If the experience isn’t great, they will not stick around. One of the key features of NixOS that I often hear is how easy and welcoming it is to contribute (in comparison). When I started out, that we certainly true. GitHub is one ingredient that currently makes it very easy to contribute to both Nix and Nixpkgs (and I’m usually the one favouring Emails over PRs…) but it isn’t a silver bullet.

The biggest issue right now is human bandwidth. We are essentially always waiting for Eelco to approve anything. That isn’t a feasible long-term situation to be in. Making everything depend on him isn’t in the best interested of the project. (This is not because of distrust, but because we can’t clone him.)

The following points are what I think is crucial right now (in rough order of priority)

  • First and foremost the most critical part: We need a team of empowered maintainers that have the capacity & capability to look at changes, ideas, issues and are individually in a position to accept those. This means merging changes that meet the contribution criteria, considering a bug a major blocker and so on. This team has to be aligned about the future of nix and the contribution guidelines they enforce.
  • Contribution guidelines have to be documented. Right now, we only have documentation on how to build Nix in various ways. What are the standards that are aimed for? What is a valid change? Should I open an issue before implementing a feature? What should my commit messages look like? Is “Fix issue.” a valid commit message? What checks should I run before submitting a change. Are all platforms equally important? Do I have to write tests as part of each change? What do I do if my change hasn’t been reviewed for a while? Whom can I ping? Documenting requirements and goals allows involving more people, as you can always refer to the documents when in doubt. Changes can always be reverted when they aren’t actually aligned. Not formalizing what the common goals are might lead to chaos and stagnation instead.
  • Project vision should be communicated in regular intervals. What is going well with the project? Which issues are (high) on the radar of the maintainers? What are the features that we identified are most significant to be added next? Keep the community involved. Be vocal about where the journey is going. It might encourage someone to step up and look into specific topics.
  • Enforce peer-review for all changes that are made. This is the part where the aligned maintainer team will be reviewing each other and “outside” contributors. Everyone should adhere to the contribution standards and follow the same process. The response times here are as crucial as for “external” contributions as they are for changes within the team. There shouldn’t be two classes of contributions (ones that just push/merge and those that always have to wait for feedback or approval).
  • Define the maintainer team lifecycle. Who decides who is in? What are the requirements? When does someone drop out of the team?
  • Define a stability policy for nix. The policy should answer what parts can be relied on to work across all versions of Nix. For example, if the language and the built-ins are permitted to break between releases or if the CLI is permitted to change or if expressions are supposed to be reproducible, what the … Which kinds of changes are valid release blockers? This will also help users understand what they can rely on between releases and what might change. This will also guide maintainers in deciding whether a change is a valid fix or breaking compatibility guarantees.

I admit this will be a lot of work, but it is work we can’t postpone forever. Nix has a lot of potential but is really missing a governing body. Most of the time we focus on the technical issues, but we should also consider the human and community side of things. To some degree, Nix is victim to its own success…

32 Likes

Yeah I basically agree with @andir. There is actually a virtuous cycle where newly deputized maintainers will feel more comfortable merging bug fixes and “uncontroversial” changes and fancy features work, and more people will contribute once the lead time in getting such things merged is less bad.

The new Haskell Foundation planning on taking a more active role than the NixOS foundation is perhaps something we could persue next, but first we need to empower more outside contributors to Nix. GHC has a hellish CI to get PRs past, but even so has more outside contributors than Nix itself. We should at least reach that level of diversified contribution.

6 Likes

As I think has been outlined in some of the responses, it might be more accurate to say that flake.nix has been designed to accommodate a smarter Nix CLI, by defining a schema for the kind of metadata it takes to support a smarter CLI. (It’s probably fair to say that this has involved some entanglement, as flakes and the new CLI have been designed together.)

@thufschmitt emphasized this here:

I think that if we think of flakes essentially as metadata designed to support (1) pure evaluation and (2) a better CLI, it becomes much easier to motivate flakes.

Because I think the discussion in this thread is intended to be more focused on process issues that are not necessarily specific to flakes and its implementation, I’ve split the elaboration of this thought into a separate thread. (Mods, feel free to merge it if putting it in a separate thread takes up too much space.)

I don’t think what I have in mind is especially novel or non-obvious, but I do think it’s potentially relevant and I’d be grateful for any thoughts on it, especially from regular Nix contributors and the signatories of the OP.

1 Like

This is my exact sentiment. I started using Nix/NixOS fully around the time that flakes were proposed at NixCon 2019, so I started my configs fully with nix 2.3. I had a hard time understanding the general concepts (let alone, what was a bug an what was just a user-induced error). I just started to learn and use pinned nixpkgs in my system configs when I decided to move to flake configurations. So I never really finished even leaning the way to deal with the old interface.

But the flakes/pure approach to configurations was a much, much much more simple introduction to nix, especially given introductions like nixflk/DevOS. Before flakes, there were 15 (exaggerated) different ways to pin and reproduce configurations and no official answer to the problem, so all new users would have their own factions depending on their choice.

I think the conclusion is flakes sucks for master pre-flake Nixers, but is a blessing to new users introduced to nix.

However, I agree that we shouldn’t pull the rug from under all of the pre-flake nix users with a stable release of NixOS defaulting to nix 2.4. The RMs made a good decisions with this.

15 Likes

Now let me preface this, by saying that Nix has come a long way, especially in the last two years and especially in terms of knowledge transfer to, and communication with that part of the user-base that stays relatively silent, does not live and breathe the nixpkgs repo and does not spend every free second on discourse, discord and irc.
Just to add my voice, I think you guys are doing great and the “breaking changes” in 2.4 might have been blown a bit out of proportion as seen from the perspective of such a mere mortal. Nix 2.4 took many steps in the right right directions and flakes are very useful, not to mention the manual and the website. If I started out with nix today, I’d be in a much better place, than 5 years ago, when I actually did. A bit of breakage was to be expected and will be sorted out, some stuff just needs to hit the broader public, which in turn will create pressure or take up arms for it to get sorted out. That is fine! Things take time and problems are necessary to make progress.

However, that quote here resonated with me, though:

The primary issue with nix, from my point of view as a long-time silent user, is that it is really hard become one of these empowered users. While it is true that the bus-factor is very low and the whole project seems to stand and fall with maybe 20 people pouring their blood and youth into it (who seem to burn out and rotate quiet quickly, I might add), that just seems like a symptom to me.
You see that quiet often in the software industry. When a software company grows too large for it’s own shoes, most of the time, that is evidenced by growing workload, difficulty in hiring, overworking their employees and not making much progress despite everyone being very busy. That also means the turnover is rising as demand grows but profit shrinking. In short, things just don’t quite scale as they used to and everything gets inefficient. That is mostly due to lack of switching to processes that scale better, lack of early knowledge transfer and failing to reduce complexities right as they occur at the start, i.e. technical debt “we’ll do it later” type of things.
I’m not saying that’s the case with nix, or that I have a solution. I’m just saying that if I was someone to pour my free Friday into the nix project, I’d fail, because there’s no one who’d have the time to mentor me and no track or program in place that would make me effective quickly, let alone painlessly. There’s just so many inter-dependent features and construction sites to look out for, in short, it’s such an enormous monolithic project, that it’s hard to sink your teeth into just a small part and learn that. And the people in the know, who could make sense of it, are putting out fires and doing cutting edge (read niche) stuff, when maybe training apprentices, writing tutorials and implementing processes would be more conducive. Nix has exploded technically, but organization, planning, training and communication has not not kept up. Just like tech start-ups can not keep being start-ups forever, nix can not keep being a cool tech-demo of loosely organized individuals forever. But you people know that, and you’re making progress with nix 2.4, or the summer of nix, or the marketing team, docu and website. These efforts just need to keep pace with the technical side of things.

Edit:
Let me link Graham’s epitaph here.

17 Likes