Can we please stop breaking stuff willy-nilly?

As an absolutely tiny-fish contributor to nixpkgs who would not be impacted by the workload of this being implemented (primarily just a user):

It seems like a lot of this could be addressed with a compromise solution: We don’t need to carry e.g. services.fcgiwrap-2, services.fcgiwrap-3, services.fcgiwrap-4 and so on forever but… It sure would be nice if services.fcgiwrap-2 and services.fcgiwrap-3 would be separated and services.fcgiwrap-2 would be carried forward through at least one NixOS stable release cycle with deprecation warnings (e.g. deprecate in 24.05, remove in 24.11).

That way, it would be possible for stable users to adapt to breaking changes incrementally instead of having to do it all at once, and even users who run unstable would get some transition period.

I’ll note that I, too, am one of those people that runs unstable primarily because I’m after a rolling distro on my desktop systems (and in fact, “stable” releases are often more broken than “unstable” on the desktop in my experience).

2 Likes

Lots to catch up on, after the weekend …

@waffle8946

Can you elaborate on your usecase for using unstable, while still wanting the stability? Then perhaps we can come up with a reasonable solution here.

As I said above: I’m helping flush out issues early, before they hit next stable. The reasonable solution is to recognize, that our stable users will also be hit by the same breakage that we are, only delayed, and that there is little qualitative difference between the channels, except for maturity.

Just going “why don’t you just use stable then?” completely misses that point.

@raboof

Please do not see this as “taking the L”

Why not? I started a fight. I thought it was about stability of our user experience. Turns out it was also about the tone we find acceptable. I don’t have a problem with changing my approach and “losing” some of the sass, if it means that the community wins in civility. I only ask that we still take this issue seriously, even if I roll off the gas.

the particular case that triggered this thread seemed fairly reasonably considered, though?

Hard disagree. As detailed above, it was the most egregious case, I’ve seen in a long time:

  • It removed working options for security reasons, even though they could be configured safely
  • It introduced a submodule in its place, in a way that prevented both proper error message and any type of backwards compatibility, even as an afterthought [1]
  • It seems to have done this in a paradoxical way, where
    • we’re done after migrating internal users, because external users should expect to be broken on unstable
    • we’re so concerned about our external user’s security, even after having migrated all internal users, that we can’t let them have their toys any more, not even with a warning. are we realizing that this way, we’re disincentivizing users from upgrade, which is even worse for security?

Perhaps ‘declarative’ was the wrong word to use, but did you really not understand what I was writing here?

I can see your hands, waving, but I actually don’t know precisely what your point is here. Is it “because we can detect a certain class of breakage (that which produce eval errors) earlier, it’s more acceptable to break our users”?

postponing your update

That goes exactly to the heart of the issue: Why is the NixOS security team deciding for me, that I can’t get my latest CVE fixes, before having reviewed all uses of fcgiwrap in my systems? In the name of fixing a local privilege escalation?

FFS! Local root exploits are a dime a dozen. Those should never hold up more important security fixes!

The maintenance burden […] all those should keep working […] do they share code

I agree that those aren’t easy problems. I’ll stand by my point that if we just stopped breaking things, then the maintenance burden will be a lot less than what should be expected, judging based on other systems.

I take some issue with your implication that anybody who disagrees with you must just not be very experienced.

Oh plenty more experienced people will disagree with me on any number of things. Just not about stability, because people who don’t value that, tend to burn out and change carreer. [2]

Also, I am taking issue with your implication that I’m in any way looking down at or thinking less of somebody inexperienced. If anything, I want to help them avoid some of the mistakes I’ve seen and made myself.

@polygon

If that is your end goal, I’d immediately stop maintaining.

With respect, but issuing all-or-nothing threats like this makes me question if your current level of involvement is at a healthy place. If you need someone to talk to, my DMs are open.

I have scaled back my own involvement with NixOS at various points, where I noticed that I cared “too much” for it to be healthy, and now that I’m feeling better about it, I’m back. Hello. Most of my old stuff still works and that’s how I like it.

@srd424

would it also be possible / worth introducing a separate of ‘backcompat’ modules translating old options to new? This way they’d be out of the main code base, and could be maintained by those with more interest in long-term backward compatibility?

This is a sensible approach IMO, also because it would allow overburdened maintainers to have a place to move what they’d consider “legacy” to, while still keeping half an eye on avoiding unnecessary breakage.

@b-m-f

If security concerns can arise, we would need some mechanism to warn of this when the configuration is being evaluated, no? Any ideas how to solve this in code @bendlas?

We could handle that the same way as we handle e.g. outdated electron or node versions: By blacklisting, while allowing users to whitelist the known-insecure thing back.

There is also a NixOS option for warnings.


[1] btw, I think implementing services.<service>.<instance> instead of services.<service>.instances.<instance> has repeatedly caused issues. I think that could be something to put in an RFC, to always leave room for top-level options in services.

[2] yes, of course, there are nuances, especially in a project as big as NixOS, which is why I’m careful to qualify with stuff like “unless absolutely necessary” and “where is the line”. Also, I believe @raboof when they say they have extensive experience providing compatibility, not all of it good, but given how fast people seem to be with defending unnecessary breakage, I don’t think we’re erring on that side right now.

2 Likes

I didn’t say that and would prefer that you don’t put words in my mouth.
Of course I missed the point, which is why I asked for clarification on your usecase in the first place.

In any case, if we’re discussing being the “vanguard” then I would say that the goal should be to catch broken-by-design modules. The benefit to running stable in this case would be, you don’t get hit with such modules because they were fixed prior to the release. I would not advocate eliminating breakage that improves modules - as long as the breakage is adequately documented in the release notes and via adequate warnings/errors.

PS:

If you select some text in another’s comment and click “Quote” it will provide a blockquote that can link back to the relevant comment. Makes it easier to follow the conversation.

Guess a good ad-hominem always trumps actually answering the argument?

1 Like

Also, if “just stopp[ed] breaking things” is not all-or-nothing, I don’t know what is.

1 Like

That’s super good to know, thank you very much!

I did answer a common sentiment ITT in the context of answering your question, but I didn’t mean to make it sound like you said it …

But would you agree that there are levels of improvement, that are below a threshold, where it’s worth breaking consumers, and should therefore only be done in a compatible way?

How would you approach defining that threshold, to keep the slope from becoming too slippery i.e. without resorting to the 'ole “I’ll know it when I see it”?

How do you like my definition of “Only break if there is just no way around it, without compromising other functionality”?

I guess I did see your bringing up your involvement as maintainer, as an invitation to discuss your involvement as a maintainer. But I can also see now how my wording may have been too close for comfort, so my apologies for that.

I would ask you to also refrain from wordings like “You worry so much about …”, because that makes it easier for me as well, to not “go there”.

I think what I’m trying to say is: If you must hypothetically quit because of a hypothetical community commitment to stability, then you’d hypothetically have to do what you’d had to do, it’s all volunteer work after all. I’d hypothetically much rather keep you involved, though, or hypothetically even increase your involvement, so …

… I’d like to find out, what you’d need for that, when faced with an increased (or even absolute) requirement to keep your packages and modules stable, as a maintainer.

I didn’t answer the argument, you’re referring to, because I’m still thinking it’s asked and answered. But let me reiterate the points where you put question marks:

Depends on what the fix looks like. If it means replacing module or package names with incompatible successors, then yes. Make a new thing.

That the new thing has all the fixes and works better and users will love it, especially when you warn them about the possibility/necessity to upgrade, while they can rely on the old thing remaining there.

With nixos, this is possible. That’s the huge innovation. That’s the reason, we’re patching RPATHs. Worst case will always remain that it needs to fully instanciate an ancient version in a container, from git history, with the closure bloat, that involves, which could just be another warning. That could be the final resting place, where old modules and packages go to die. An overlay that knows about the final commit, where something still works. But that’s just one possibility to handle this.

Whatever we do, I think we should fully commit to a deprecation/warning cycle of at least one release, hopefully two for important functionality.

Yes, that means that introducing attrsOf submodule in place of a what used to be named options will not be acceptable.

No, a maintainer would maintain old stuff indefinitely, because it’s policy.
Also, as long as nobody breaks the old stuff, it’s basically free.

Yes, we would need support structures, for maintainers to say: “I don’t want to deal with another gcc update” and just kick something out of their maintenance responsibility.

But maybe instead of deleting the thing, there is a standard method to pin its inputs to various degrees (up to and including a historic nixpkgs revision) and anybody needing the old thing would be expected to keep up with the required emulation degree / closure sizes.

BTW, please also note how we can even only start having this conversation without having to bring up VMs, because we can 99.9% rely on the linux kernel not breaking us.

Can I ask you the same question, I asked @waffle8946: How do you like my definition of “Only break if there is just no way around it, without compromising other functionality”?

I mean, point taken. I’m offering an extreme and contrarian viewpoint here. But we have a zero-tolerance policy on various forms of communication within the community, so why not explore a zero-tolerance policy for breakage as well?

How do you feel about my concession to “break when absolutely necessary”?

Interesting. Was that with the help of NixOS? Can you share some of your experience / estimation of how much leverage you get in the advantage / cost relation, by basing such an environment on Nix, with its referential transparency and other related properties?

I guess I just find it difficult to agree with the premise that there’s unnecessary breakage that needs to be prevented. Sure, I’ve had my gripes with breakage, like the pipewire configPackages fiasco, but again that was only a major issue because people didn’t know how to migrate to the new option - i.e. it was a (within-module) documentation issue - hence my stipulation that “breakage is adequately documented”. I don’t see a need to avoid the breakage entirely; if we made a mistake with the interface originally or didn’t address some usecase, let’s just fix it.

If the community sees it more like you do, then that should be reflected in a policy developed via RFC, and my individual opinion wouldn’t matter in that case.

You can discuss my involvement without speculating about the state of my mental health and insinuating that my answers are what they are because said state is rather poor. I’d call this borderline abusive behavior. That being said, apology accepted.

You want maintainers to do more work so users have to do less work. You don’t accept any policy where old stuff is eventually forced out so everyone has to switch - at least I assume that because your argument “And what happens in 6 months when unstable becomes the new stable, and old stable gets abandoned?” can be made for any deprecation policy. These sound like the unenjoyable parts of a job. So the answer is probably money.

This seems a bit naive to me. If one is the type of user that wants to set up a thing until it works and then, ideally, never touch again, with a policy like that in place, these users will probably never switch. Because why, it works, and others do the unpaid, unsatisfying work of keeping it in this state (or maybe the ones left after this policy is introduced actually enjoy it, what do I know). Whereas if you are the type of user usually excited about new things, this whole “we break things occasionally because we only now realized that our design was kind of a dead-end” is a non-issue to start with.

Go on, we are getting to the important topics now, what do you have in mind here? Can I just assign maintenance of old module versions to @bendlas? Or more realistically a “keep deprecated stuff running” task force?

And according to various sources, the majority of regular Kernel developers are paid for their contributions. I wonder why. That being said, changes to the userspace-facing Kernel interfaces carry a significantly larger downstream update burden than spending half an hour every six months to change a few lines in a NixOS configuration file.

3 Likes

That was without NixOS. I don’t think leveraging Nix would be much help for avoiding the desire for binary compatibility there: this was in the Scala ecosystem, where it’s common to have ‘deep’ trees of transitive dependencies managed in independent repositories.

This is a contrast with Nix modules, where there’s typically just one or two layers of modules depending on each other across repositories, reducing the urgency of compatibility for module options.

I posted what I believe the best approach for such cases is:

4 Likes

I’d be down to be part of something like that.

Besides this answer, I’ve put the rest of what I typed for you into a gist, because I noticed that it took up the majority of my post again, with comparatively little new information: gist:c3ae238ef7c23c9a31aa04ebc36ba23e · GitHub

Feel free to comment or also quote me on it, if you feel that there’s anything worth responding to in this thread.

Ha! I think Scala actually illustrates my point perfectly: Look at the graveyard of Lift 1.x applications, that were dependency-locked to a degree where they just had to be rewritten. Seems like they have learned their lesson: The Scala 3 compatibility story - VirtusLab

Can I ask how you go from the (probably traumatic?) experience of maintaining compatibility layers in such an environment, to “if anything we’re being too conservative with breaking unstable”?

Clojure OTOH committed to not-breaking with 1.0, and as a result I can still run code from 15 years ago in the most recent version [3]. Not meaning to brag here, I’m fully aware how this is much easier to pull off with a dynamically typed language, then again, look at how python3 is going. I’m sure we’ll be ready to drop python2 any minute now …

see also angular vs react. I swear, once you see it, you can’t unsee it

[3] @polygon and yet, people still update their dependencies regularly, which is usually effortless, because library authors tend to respect the “dont-break-make-a-new-thing” rule.

/rant

Well … if I could have just a single new rule, it would be: “Avoid re-using a namespace with regular options for an attrsOf submodule.”

attrsOf submodule is one of the more brittle parts of the module system and it should almost always have a dedicated key.

Thank you very much for taking the effort to collect that information. I think it reflects current consensus very well.

And thanks to everyone, who’s helping to mitigate the damage from my hothead outburst.

2 Likes

Same. I only use unstable because of a rolling distro. I DO NOT want breakage unless absolutely necessary and unavoidable.
Unfortunately, people in this community don’t share the same mindset.

I’ve heard this kind of saying many many times in the community. Ultimately because people have different definition and expectation for unstable. Without any clear policy and commitment for backward compatibility clearly defined, I can only expect this to happen again and again. One group of people complaining “if anything we’re being too conservative with breaking unstable” and another complaining about “breaking stuff willy-nilly”.

Below is another recent perfect example of how I call it as “breaking stuff willy-nilly” while certain group of people think “we’re being too conservative”.

I cannot appreciate more how Clojure stand strong by its commitment of non-breaking.

@bendlas Do you think a RFC can be drafted in that regards? Unless policy is defined and consensus be clearly made among community, I can only imaging same thing will happen again and again.

@jjpe @snow40479 I am also totally with you. I actually asked about nixos-stable a while ago: Why does NixOS not have a rolling release system?. The discussion may be of interest to you!

1 Like

The whole design of NixOS needs to be redone honestly. This isn’t trivial because of how intertwined everything is.

1 Like

Not just the technical design, where it’s easy to see warts all over the place. The organization behind NixOS also needs a from-scratch rebuild, with more human-centric values, if the way some of my my recent post on this very discourse have been handled are anything to go by.
Not that that should be surprising: the organization (insofar there actually is a coordinated organization at all) makes the software. That is to say, to tackle the technical problems, the root of the problems must be tackled.

That’s not to say I think the entirety of the org is rotten though, I view it as more like a couple of poisonous apples in an otherwise fairly beautiful garden. But those apples are not just poisonous, they’re emitting noxious fumes and need to be removed lest the entire garden go the same direction. Especially people actually creating and merging PRs are most probably fine; they just need a better base to work from and with.
Of course I’m just one voice.

@dschrempf interesting read!

1 Like

Yes, that’s exactly why we are currently doing exactly that. If you want to talk about this, you can follow instructions in Zulip for governance discussions.

3 Likes

That should be possible, I think. Though probably not easy, because:

  • nixpkgs gets incoming breakage all the time, so the RFC would have be in a large part about how to manage incoming breakage in a backwards-compatible way
  • nixos is a massive project to begin with, so finding and addressing all needs and wants towards such a management system, will probably require a few passes
  • there is also breakage, that you want to incur and fix. e.g. downgrading a database version

Maybe the reasonable thing would be to start with the back-stop, a la “make historical packages and modules available from git history …” and then propose a procedure to maintain compatibility levels (vm, container, original, buildinput-rebased) in a separate thing?

Interesting, thanks for the pointer! The idea for instantiating git history comes up there as well.

I honestly get the sentiment, and in a way I agree: If somebody has a good idea for it, let’s do it, let’s reorganize everything. BUT: The existing organisation under nixpkgs/nixos stays in place, and as we port modules into the new structure, we’re leaving “forwarder” modules behind.

I will also say: Given the scope that nixos manages to pull off, I think it’s design is pretty clean.

2 Likes

It could be a 1-2 combo:

  1. Create a new project, with a from-scratch technical design that keeps the good stuff, but learns from the mistakes made in the past. This could make on-boarding new users easier, and ideally would also reduce the maintenance burden where possible, which would help those seeking to give back to the community.

  2. That same project could then also adopt a more equitable governance structure (think RFC 175 and initiatives like it), complete with checks and balances. That would be a perfect moment because it’s increasingly clear such a change is unwelcome by the powers that be in the nix community as it exists today, to the detriment of the community as a whole.

Meanwhile, nixpkgs/nixos are left as-is, meaning such an initiative wouldn’t break anyone’s existing setups.

Without getting too much into the details of the governance & community culture debates here, as I did so elsewhere:

A good option to keep breaking changes to services you care about to a minimum is to contribute a NixOS vm test for it. Those are run before nixos-unstable is updated (after nixpkgs-unstable, which only runs package tests).

If changes then break the tests, the need to be handled before it lands on your machines :slight_smile:

6 Likes