Can we please stop breaking stuff willy-nilly?

sirphobos · July 5, 2024, 4:46pm

Is it a off-topic question to inquire how usually such changes are handled in release?

From my user experience I’ve noticed that usually deprecated options cause warnings on rebuild, so it’s easy to locate and fix problematic part, but who adds those checks?

Is it the person, who introduced the breaking change in the first place? (like the author of the PR in question here)
Or is there some dedicated people who look over all packages with breaking changes before the release and add all the deprecation warnings/errors to them?

bendlas · July 5, 2024, 4:59pm

keeping your frustration in check somewhat - as a fellow-member you also have a responsibility in that regard.

Yeah, you know what? I’m happy to take the L on that one, if it means that we can discuss different stances towards backwards compatibility.

Regarding making it a pleasant experience for everyone: Yes, a thousand times yes! There have been many times now where we forced poorly-justified config reworks on everyone. Many times, where I have kept quiet and just updated the thing. I do value maintenance burden a lot higher than user burden. A lot. And I also reign in entitled users. Still. It hurts. Every time. And if we hypothetically multiply my pain/displeasure with every downstream user of a given thing that we break needlessly, it adds up quickly.

At what ratio of maintenance burden / (effort x userbase) do we start empathising with every one we break? I gave a lower limit of “one dev changing all the options for no reason”, which @polygon dismissed as “slippery slope” (if I may quote you in this).

To me there is a real question there: Where should the line be? My stance is: Exactly because it’s a slippery slope, we take the radical approach: Only break if there is just no way around it, without compromising other functionality. And yes, if if we’re calling something services.fcgiwrap-2, that’s still better than breaking the old thing.

Remember: We all got here, because Nix dared to take the radical approach and started patching RPATHs …

One of the great advantages of our declarative approach is that you will discover incompatibilities before actually rolling out changes to your machine

Counter point: If declarative was enough, we’d be doing ansible. One of the great advantages of our config language being a programming language is, that it’s much easier to translate and adapt data, including config languages. Of course it’s still not free, also in maintenance burden, but consider this: If we just stopped breaking things, where would the maintenance burden come from?

bendlas · July 5, 2024, 5:06pm

From my user experience I’ve noticed that usually deprecated options cause warnings on rebuild, so it’s easy to locate and fix problematic part, but who adds those checks?

The usual approach is for the author of the change to add warnings and descriptive errors as necessary. That also didn’t happen my linked case here, because they couldn’t use the convenient library facilities for such deprecation notices [1] and seem to not have been aware of other possibilities to achieve these.

Also, they pasted a new option structure over the old one, removing the previous options. I’d have been cooler with putting a deprecation for one release, though I’d still prefer just … not breaking.

[1] apparently due to the introduction of a submodule, and btw this is the second time in quick succession, where I’m see subpar user experience in conjunction with changes in a submodule, possibly there’s an opportunity to improve there …

TLATER · July 5, 2024, 6:01pm

I still don’t agree with the premise that there are too many breaking changes. The last release had such in ~70 modules or so: nixpkgs/nixos/doc/manual/release-notes/rl-2405.section.md at 6febe3f41500c64d6ffbf1977e9072fcdc791970 · NixOS/nixpkgs · GitHub

The slippery slope argument is definitely the fallacy it’s usually invoked as here, because current policy does not impose any limitation on breaking changes, and the above is the result. Nobody is (seriously) proposing to break more, so we can’t expect this to worsen significantly.

I don’t know exactly how many independent modules NixOS has, but that’s probably under the 1% range. In a given 6 month period, assuming you use completely random modules, that’s roughly how often you’ll run into a backwards incompatibility. Many of them are just deprecations caused by upstream, too, where the module author has no choice.

These are obviously rather crude statistics - more maintenance availability for a module will naturally mean more changes (and hence more breaking ones, too). Most modules are used significantly less than a core few, too, so the number of things you see per release is likely concentrated around a subset.

I’d also like to see how many of these changes could have actually been skipped, to get an estimate of how much restricting incompatibility would actually improve, but that’d require more knowledge of each of those modules.

Still, I think this illustrates my point: Is the current situation really that bad, especially given that we usually have warnings from nix and the fact that there is an existing stable branch? What would a “better” state be, assuming 0 is not possible?

I’d also propose a different solution, given the above: Could limiting breaking changes be done on a set of “core” modules that are likely to cause users to struggle with upgrades?

This comes up in other contexts too; I think having such a “core” set of modules and packages that people can rely on, even on unstable, could help industrial users. This is already done informally, to a degree, but lack of explicit policy means you don’t know if it covers your use cases, and no way to propose marking new things as “core”.

You could have a very specific compatibility promise then, e.g. promise to have deprecation warnings for at least one release, as you mention. This would also likely change culture around those modules, as they’re classed as more important, and more of a pain to change, which would have some of the “better” effect you want.

endocrimes · July 5, 2024, 6:13pm

I’m sorely disappointed by some of the conduct in this thread - please treat your fellow community members and developers with respect. I appreciate that things can be frustrating, but taking that frustration out by lambasting others is unacceptable and will not continue.

If you wish to change NixOS branch policies - open an RFC. If you wish to ensure more things are caught, try finding or building tooling that can catch them.

sirphobos · July 5, 2024, 6:43pm

Please, correct me if I’m wrong, but I understood the original question to address exactly that: were some existing policies violated in that PR or not.

I guess the gist of it is that PR started as small fix/refactor, which requires one level of attention, but turned out to introduce a breaking change, which maybe requires a different level of attention from reviewers?
and it does seem that it was merged in not quite finished state: missing warnings and such to ease the migration.

Though that problem now seem to be on the way to be resolved, judging from PR discussion on github

bendlas · July 5, 2024, 11:34pm

I’m sorely disappointed by some of the conduct in this thread

I get the disappointment, but don’t be too hard on everybody. Many people have yet to experience the compounding-interest effect, that comes with just not breaking stuff … like being able to run code from a decade ago unchanged. Many of the newer members haven’t even been developers for that long.

I still don’t agree with the premise that there are too many breaking changes.

With respect, that’s not the premise, but the point I’m trying to make in this thread.

And I do think that your number of 70 cases last release makes it better than any rhetoric could. That’s 70 times number of users cases of little tragedies, where we took some precious time out of someone’s day. Were all of these cases really necessary? Judging by the level of enthusiasm on display here, for breaking other people, I can hardly imagine proper due diligence being done on all of these …

Still, I think this illustrates my point: Is the current situation really that bad, especially given that we usually have warnings from nix and the fact that there is an existing stable branch?

Yeah, no, I agree - and had already stated - that when the current conventions are followed, it’s not that bad. Not optimal, but not that bad.

It is that bad when even the weak conventions we have, go out of the window: Stuff just gets removed without deprecation period, leaving only cryptic unrelated errors; complainers get tone policed and told to RTFM and test updates in a container (like I need advice on that; I traced the cryptic error back to the offending PR, didn’t I?)

What would a “better” state be, assuming 0 is not possible?

See, the "0 is not possible", that’s a premise. And I would very much like to call it into question.

It is a goal that I think is achievable and that I’d like us to take seriously, at least when it comes to not-strictly-necessary breaks.

Statements like “if anything we’re too conservative on unstable” make me physically sick, and it’s against that stance, I’m taking a stand. With respect, if you want a quagmire, you shall find it with fedora, ubuntu, arch, npm, docker, flathub, or indeed most platforms …

industrial users

I also wanted to address this in particular, because the last time I brought this up on matrix (in a much more polite way, let me assure you), I immediately got accused of wanting free maintenance on behalf of an employer, when I’ve only ever applied NixOS for my personal stuff and in open source contexts.

I don’t quite get how any of this would particularly help commercial users. My impression is, that individual users and small associations would profit from a stable base at least as much - if not more - than deep pocket corpos with change to throw at maintenance.

waffle8946 · July 5, 2024, 11:55pm

It was already mentioned upthread, that the stability guarantees only are given for stable. Can you elaborate on your usecase for using unstable, while still wanting the stability? Then perhaps we can come up with a reasonable solution here.

euxane · July 6, 2024, 12:02am

Hello,

I’m the author of the pull request cited in this thread.

I brought some clarifications, context, and summary about:

the purpose of the changes (and security implications),
the reasons for the API breakage / refactor (necessary for the fix),
the handling of migrations (not straightforward to do implicitly).

Please see: https://github.com/NixOS/nixpkgs/pull/318599#issuecomment-2211515713

I agree that breakage should be avoided on the stable channel, which this
pull request does not introduce. Mentions of backports are about trying to
find an non-disruptive partial mitigation for the stable branch.

Breaking changes are to be expected on unstable. This is not to say that they
should not be motivated, and that any required migration should be made as
painless as possible.

This set of changes was not without motivation. I can assure you that time was
indeed spent on minimising breakage and limiting the needed migration,
documented under the “breaking changes” section of the release notes.

I understand that the latter can easily be missed, and that tracing back the
source of an evaluation failure is indeed painful. I mentionned troubles adding
proper error messages in the pull request, but we eventually found a way
(thanks to @minijackson) to add those in this follow-up pull request:
https://github.com/NixOS/nixpkgs/pull/324923

raboof · July 6, 2024, 8:09am

Please do not see this as “taking the L”, that was not the intent - the point is just to improve the experience for all of us.

So first, I think this is partly a matter of expectations. As a user, especially when tracking unstable, I think it is entirely reasonable to expect to have to make updates to your configuration. Of course these should ideally be easy to make and not “poorly-justified” (the particular case that triggered this thread seemed fairly reasonably considered, though?). On unstable, it will happen that you see breakage on unstable that does not provide a good update experience or that is indeed “poorly-justified”. In these cases, we should do better, but especially as a member you should not be “hurt”/“pained” - this is an opportunity to collaborate with your fellow-members, come up with improvements such as nixos/fcgiwrap: add option migration instruction errors by pacien · Pull Request #324923 · NixOS/nixpkgs · GitHub , leading by example and sharing knowledge with the contributors that slipped up. Not scolding them (unless perhaps it was some egregious negligence - but that does not seem the case here).

I think that is undesirable. For example, sometimes, the old approach is error-prone (i.e. makes it easy to lead to broken or even insecure installations). Additionally, having ‘more than one way to do it’ is confusing: is there a meaningful difference? which do you choose? Why are documentation sources contradicting each other? Even if there is “a way around it”, in such cases IMO it is better to move forward and break compatibility.

Perhaps ‘declarative’ was the wrong word to use, but did you really not understand what I was writing here? Ansible is more likely to break mid-update leaving you with a broken system. In contrast, backwards incompatibilities in NixOS module options just means you get to choose between solving those right away or postponing your update.

The maintenance burden mainly comes from:

When having services.fcgiwrap and services.fcgiwrap-2 (and -3, and -4…), all those should keep working. They should be tested, documented, etc.
All of them essentially do the same thing, so there is overlap in their functionality. That brings you to a dilemma: do they share code or not? If they don’t, you’ll have code duplication, which is not great for maintainability. If they do, that means the shared code needs to be so flexible it can cater to each variation. Likely that means adding all kinds of options and flags to the shared code, making that shared code harder to understand and evolve.

I take some issue with your implication that anybody who disagrees with you must just not be very experienced. I’ll resist enumerating my credentials here, but I’ve spent years in an environment providing strong binary compatibility guarantees, so I’m well aware of the advantages - but also the costs.

I don’t see anyone being ‘enthusiastic’ about breakage, of course it should be well-reasoned.

I stand by that opinion. Of course, that does not mean I think we should be “breaking stuff willy-nilly”: breakage should be well-considered and provide a good upgrade experience. But we should still do it. And yes, on unstable it will occasionally break things - and then we fix them.

polygon · July 6, 2024, 9:19am

If that is your end goal, I’d immediately stop maintaining. No longer possible to fix old mistakes? What would be the incentive to put out an improved version anyways? I’d still have to maintain the old stuff indefinitely because there’s five users who still use the old variant and refuse to switch.

You worry so much about not putting any work (or as you call it “little tragedies”) on the users that you seem to forget the ongoing dread for maintainers that this would cause.

Edit: The “you” was directed towards @bendlas here, not raboof.

jjpe · July 6, 2024, 10:04am

Without commenting on the rest of the thread, personally I’m using nixos-unstable because it’s a rolling distro, so I get updates a lot faster. And even though I need to update my config every once in a while as available options change, that doesn’t mean I want breakage, especially when it can be avoided. What I want is the rolling distro aspect, and I accept that some breakage comes with that as development on various derivations and the projects behind them progresses.

But the idea that that should be license to go nuts with breaking things willy nilly is misguided at its very best.

Also, try to look at it whis way:
The more things break (on stable or unstable), the more the value of your own system flakes and/or configuration.nix deteriorates, because you can’t assume that you can just take that and have it work as-is e.g. on a more recent version of NixOS stable (by contrast, unstable is always a moving target). So that’s a negative externality to such a view.

raboof · July 6, 2024, 10:15am

Come on, nobody is saying that…

I think I’ve been the most blunt in outright saying you should expect breakage on unstable (and that that is OK), and even I have been stating the obvious each time that breaking compatibility should still always be a well-considered decision and that care should be taken to make the update process as smooth as possible.

jjpe · July 6, 2024, 11:22am

I didn’t mean to imply that breakage at this point in time is done willy nilly. I in fact do not hold that view at all. I included the term because that’s obviously the view some hold in this topic, and ultimately this topic is all I have to go on unless I’m willing to play archeologist on GitHub, which I neither am willing to nor have time for.

My post was meant as an answer to the question it quotes, no more, no less.

But as mentioned, breakage policy is a bit of a slippery slope:

We don’t break things.
Ok it turns out that breakage is
actually necessary in some cases if we want to fix the issue
Well, we have a policy that allows breakage, and in this case it is the cleaner solution even if not strictly necessary, so…
[I think you can see where this is going]

And importantly, each successive step is likely not to be taken by the same set of people, which is a nontrivial part of the mechanism of a slippery slope in social contexts I think.
If it was the same set of people, they’d presumably guard against step 3 and definitely everything below it. Not least because they have the institutional knowledge (and motivation) to keep that little snowball from rolling downhill, and that institutional knowledge is imperfectly transferred to would-be successors, for a variety of reasons.

srd424 · July 6, 2024, 1:52pm

I’m still pretty new to Nix, so I don’t know if this is viable, but: while still trying to avoid unnecessary breaking changes anyway, would it also be possible / worth introducing a separate of ‘backcompat’ modules translating old options to new? This way they’d be out of the main code base, and could be maintained by those with more interest in long-term backward compatibility?

I’m thinking that maybe larger projects or orgs with many existing configurations might be more motivated to help here. It would be some sort of a way that people who want/need stronger backcompat could “put their money [time] where their mouth is”? And there might be a natural sunset period - once those was no longer interest in maintaining the oldest sets of backcompat shims, that would be a reasonable signal that they were no longer required.

waffle8946 · July 6, 2024, 1:54pm

I’ve seen that done in the modules themselves before.
Might be a good idea to formalise that.

raboof · July 6, 2024, 2:22pm

You’re likely thinking of:

aliases for packages that were renamed or removed, formalized in nixpkgs/pkgs at master · NixOS/nixpkgs · GitHub
mkRenamedOptionModuleWith and mkRemovedOptionModule for options that were renamed/removed, formalized in nixpkgs/nixos at master · NixOS/nixpkgs · GitHub

I haven’t really dug into the particular PR that sparked this issue, but I understand they ran into complexities applying those in this particular case, decided to merge without them initially, and later followed up with https://github.com/NixOS/nixpkgs/pull/324923 when a way was found to apply them after all.

While of course ideally those would have been added immediately along with the initial PR, I think the way things went was overall fairly reasonable.

waffle8946 · July 6, 2024, 5:07pm

Nope, but I’ll look for some examples later.

Edit: here’s one: nixos/davfs2: fix rfc42 conversion, make settings and extraConfig mutually exclusive, and other cleanup by eclairevoyant · Pull Request #302689 · NixOS/nixpkgs · GitHub , where the original change was nixos/davfs2: Convert extraConfig to freeform type (RFC42) by onny · Pull Request #297014 · NixOS/nixpkgs · GitHub

b-m-f · July 7, 2024, 5:27am

I like the MR.
Changes are described in the Release Notes.

Looks good to me.
If security concerns can arise, we would need some mechanism to warn of this when the configuration is being evaluated, no?

Any ideas how to solve this in code @bendlas ?

clhodapp · July 8, 2024, 7:51pm

As an absolutely tiny-fish contributor to nixpkgs who would not be impacted by the workload of this being implemented (primarily just a user):

It seems like a lot of this could be addressed with a compromise solution: We don’t need to carry e.g. services.fcgiwrap-2, services.fcgiwrap-3, services.fcgiwrap-4 and so on forever but… It sure would be nice if services.fcgiwrap-2 and services.fcgiwrap-3 would be separated and services.fcgiwrap-2 would be carried forward through at least one NixOS stable release cycle with deprecation warnings (e.g. deprecate in 24.05, remove in 24.11).

That way, it would be possible for stable users to adapt to breaking changes incrementally instead of having to do it all at once, and even users who run unstable would get some transition period.

I’ll note that I, too, am one of those people that runs unstable primarily because I’m after a rolling distro on my desktop systems (and in fact, “stable” releases are often more broken than “unstable” on the desktop in my experience).