Re: recent breaking changes - lockscreen/PAM

It wouldn’t really help the people most likely to struggle with breakage, because people put nonsense values for stateVersion (like “unstable”) and change it when we tell them not to 4 different ways.

We should reduce our reliance on stateVersion and where possible avoid the breakage in the first place. I still observe the attitude to breaking changes being “it’s unstable so we can break anything we want without warning”.

3 Likes

My understanding is stateVersion is for what it says: state.

The PAM config is not state. Like other things within NixOS, these options can change in breaking ways.

I do not believe we need a warning for changes like this. The change was made an unstable, an announcement was made about it. If one does not want to deal with changes like these, one should refer to the stable branches, and utilize the release notes when upgrading between versions.

Proving my exact point about the attitude to breaking changes on unstable.
Yes, technically, breaking changes can happen at any point, but I have to think there’s better options than pointlessly breaking PRs like this.

If breakages must happen, sure.
But breaking things for funsies is just wasting people’s time.

1 Like

It was mentioned above that these changes happened as part of refactoring of a very messy moudle. Technical debt is real and while it might not be very tangible at times, refactoring to get rid of technical debt is not something done for funsies.

4 Likes

I originally took the comment at their word as I had assumed the change was done for good cause; without checking the context originally, I even suggested myself that stable should be used to avoid issues. However, reading through the actual PR, the only justification is that it’s “unneeded PAM stuff”, which is not a particularly strong case IMO.

The PR is satisfying one person’s compulsions to have a minimal system to the detriment of a notable number of other users. Yet said user could just disable said options on their own system instead of imposing the change on NixOS overall. IMO we need a better policy regarding changed defaults for core functionality like the init system, kernel, graphics, etc.

9 Likes

So, trying to gain some insights from this. What can be done to better handle this? It seems largely caused by downstream modules implicitly relying on PAM being used a certain way. I don’t see a nice way to resolve that from upstream. Communication could maybe be a little better but the PAM changes weren’t necessarily a surprise either; I’ve seen plenty of talk about it on discourse.

Not having breaking changes is not an option. Whether or not something justifies a breaking change is inherently up for interpretation. Somebody’s always going to deem a particular change as not worth breaking. So the only question that I think can be asked here is: How can breaking changes be better communicated to users (such as the discourse topic) and to downstream projects (such as version fields). At the end of the day, it’s still on module authors to watch out for changes and update their modules accordingly.

I also don’t agree that “unstable” means things should break on a whim. In a vacuum that may be fine. But the reality is that unstable is and has been treated like a rolling release: Plenty of third party flakes/modules assume unstable, not all security fixes make it into stable, unstable is needed for the latest updates to user facing apps which have very little effect on the system.

For what it’s worth I think version fields would make a mess out of NixOS systems, and I’m rather against it. Maybe an email list for proposed breaking changes is better? You could send out an email for your module two weeks (or some other agreed upon time) before the changes merge into unstable? If the affected modules are put in the subject header it would be trivial for module authors to setup a seive filter for the components they care about. Anyone making these kinds of implementation assumptions could then subscribe to that module to watch for changes. At least for very important modules like PAM or networking.

1 Like

I’m literally on discourse every day for several months and didn’t hear about it. I also don’t see any discourse links to or from the PR prior to merge. Where was this discussed prior to merge?

It got one approval and then a self merge.

4 Likes

We do effectively have this already: it’s the Breaking changes announcement for unstable thread which I originally posted to! Although my post there was a reaction to experiencing a breaking change, the intent is that changes are proactively posted there before being merged to give people some time to adjust their configurations.

To me, it seems like one problem was that neither the author nor reviewer of the change saw it as potentially-breaking. (This is backed up by the release notes change going into the “Other Notable Changes” section, not the “Backward Incompatibilities” section.) Unfortunately I don’t know that this can be fixed in the general case: every change breaks somebody’s workflow. Perhaps a link to that thread could be included in the pull request template checklist, though (a checkbox for “posted in the breaking changes thread”, in the same vein as “added a release notes entry”), to encourage people to evaluate it as an option?

A difficulty here is that people are in general not aware of all of the software on their machines. (And I won’t accept that this is a problem that needs to – or can – be fixed.) Equally, not every part of nixpkgs has an active maintainer, and even active maintainers can miss things (for example, the hypothetical maintainer of the i3lock module probably uses that module, so breakage as a result of not using that module is likely to slip by them). In a universe where messaging that “some PAM modules will be conditionally enabled now” had gone out ahead of time, would that have been sufficient to alert anyone of the impending breakage?

On the other hand, messaging like that, of “changes we don’t think will break anything but we’ll broadcast it just in case”, can make it easier to root-cause a problem if one does occur. I was lucky to stumble across the responsible PR while searching for “PAM” in recently-merged PRs; if that failed I would have had to bisect nixpkgs which is far more annoying.

Lastly I think it might be valuable to have a little more documented guidance about what to do in the event of unexpected breaking changes. There were at least a couple of people calling for the breaking part of the PR to be reverted, and the author was happy with that possibility, but (to my knowledge) nobody submitted a PR to do so. Was that the right call? Should we revert the change now, even though it’s been almost two weeks? (Perhaps there is some bystander effect at work here: were people waiting for others to submit a revert? or was the consensus truly that a revert wasn’t needed?)

4 Likes

The way I saw it, this was part of an ongoing refacter of PAM, as discussed in this thread. On review, that’s a bit of a stretch for me to make. Worth noting that the PR was reviewed and approved by someone very familiar with NixOS PAM. Saying it was self merged, at least to me, suggests nobody else looked at it.

So this is where I don’t think it’s quite enough. I don’t think it’s reasonable to expect all nix users to be active on the discourse. And I don’t think it’s reasonable to expect people to filter out what’s relevant from a firehose like that (or at least, theoretically a firehose if we documented breaking changes better).

Let me add a little context which hopefully illustrates what I’m saying: I maintain a nix flake for managing wiregaurd connections. It happened during the 23.05 release that the module interface for wirguard over networkd changed. When I encountered this on my desktop I quickly fixed the problem and carried on. However, as a flake maintainer, it would have been great if I could have been alerted to that change ahead of time.

Now imagine this scenario: As a maintainer of a NixOS flake, I know very well that breaking changes to /nixos/modules/system/boot/networkd.nix could break my downstream module. So I send an email to ~breaking/nixos/modules/system/boot/networkd+subscribe@lists.nixos.org (or some similar agreed upon schema). Lists.nixos.org is a firehose of breaking changes because contributors make sure to send out notifications when they break something, but I’m only concerned with networkd so I don’t get very many emails. Then someone decides to update the networkd module in a breaking way (as happened in 23.05). I get an email 2 weeks before about these changes, so as a downstream module provider I can prepare a fix and release it at the time the networkd changes hit nixpkgs unstable. None of my hypothetical users need to break their systems by having the misfortune of updating their flake inputs before I broke my own system and fixed it (or worse, I’m on vacation). Similarly, this could extend to PAM and the home-manager devs, or to regular NixOS users who know they’re doing something irregular and just want a heads up. Most users will just break their systems and fix it on the spot, which is fine, but I think giving downstream flake/module authors a way to stay informed would alleviate a lot of pain.

3 Likes

This sounds a lot like the OWNERS mechanism for automatically requesting review from people or teams when a PR touches a file/directory. (Not suggesting that this is necessarily directly appropriate for your usecase – although it is once again a corner of nixpkgs that doesn’t afaict have any documentation to inform its use – but it is potentially interesting prior art. Would you see value in something similar to that which only notified you about e.g. PRs labelled with a “breaking” label, for example?)

1 Like

I think something like that would be very helpful! Although ideally it would be more dynamic. People should be able to add and remove themselves without creating a PR to nixpkgs for it, since they’re not necessarily involved in nixpkgs at all. That’s why I thought a sourcehut style mailing list could be helpful.

That being said there’s a balance to be had between ease of use and ease of implementation, and if I had to make a nixpkgs PR to watch networkd network manager, and wg-quick, that wouldn’t stop me. Although it would probably stop me from watching parts of nixpkgs that might break my personal system, such as kanidm or step-ca services.

There’d probably need to be some sort of interest survey and rfc process I imagine, I don’t really follow nixpkgs development so I’m not sure how it works. I just know there’s certain modules I reach pretty deep into and it would be great to get pinged when they have breaking changes.

2 Likes

By self-merge I literally mean the PR author clicked the merge button, irrespective of reviews. I’m strongly of the opinion that that should be disallowed in nixpkgs. Though actually enforcing that probably requires some RFC :weary:

4 Likes

Hi. I approved this PAM PR. I’d like to share some thoughts on this discussion.

1. The author did everything right. The author posted a clean PR. They included a release notes entry about the change. They got an approval from the maintainer listed in the PAM module. They posted a breaking changes announcement. (Edit: See replies.) When an issue was posted, the author engaged and offered to revert. I don’t think it’s productive to accuse the author of being self-centered or sloppy. (I especially don’t see the “self merge” as a problem. The author only merged after approval.)

2. This change helps security.pam. The change was not, as some have suggested, a coordinated part of the broader PAM refactoring work. But it is in line with that work, because moving module-specific rules out of the core PAM module is one of the main objectives. The PR does not explain in depth why this change is helpful, but as the reviewer, I didn’t need that explanation.

3. The problem here is change management, not the end state. Requiring i3lock rules to be enabled separately when rolling a custom module is fine. The issue is with how we got into that state. This change management problem is nixpkgs-wide.

4. It’s not an easy problem. We could not have simply added a warning, because that warning would have to be shown to every NixOS user. If we had an automated way to detect i3lock users who weren’t using programs.i3lock, then we would simply enable the PAM rules with that logic.

We also could not have added a stateVersion check. Experienced maintainers would reject that approach as an abuse of stateVersion. Personally, I think we should have an option versioning system so that we can change defaults more easily. But for better or for worse, the current consensus is that such changes should hit unstable with a breaking changes announcement instead. If you disagree with that approach, I’m with you! But you should not aim your frustration at the author who complied with the established approach.

How else could (for example) i3lock usage be detected? And are those detection methods reasonable ways to enable PAM rules? I would find it surprising if simply adding a package to environment.systemPackages would somehow modify the PAM stack.

5. The breakage only affected unsupported setups. The NixOS-supported way to use i3lock is with the programs.i3lock module. Using i3lock and not using that module and not defining your own PAM rules is off the beaten path. I understand now that many users didn’t know that.

6. …or did it? The above was my impression from reading this thread and this issue. But services.xserver.windowManager.i3.enable doesn’t seem to enable programs.i3lock, and it does have i3lock in the default value of extraPackages. It would be great if someone who uses i3 could confirm that i3lock can be used (prior to the breaking change) with services.xserver.windowManager.i3.enable and no custom i3lock module. If there are gaps like this in nixpkgs, we absolutely should close them, and it would be reasonable to temporarily restore these default PAM rules until nixpkgs is internally consistent.

9 Likes

re: self-merges - If this was someone’s personal project, they can run it as they want; however, I don’t think self-merges are appropriate in a collaborative repo like nixpkgs. But I have no steam to push an RFC for something like that, I’ve just simply pointed out the problem. And for breaking changes, they should be properly justified rather than just doing them to do them :person_shrugging: But evidently I’ve not convinced anyone of that, so I’ll give up here.

3 Likes

Self-merges in Nixpkgs are generally considered acceptable when at least one other relevant person has reviewed and approved it. Why not just have the reviewer merge it? Well sometimes they don’t have the merge privilege. But perhaps more importantly, it’s sometimes basically just a communication strategy. e.g. The reviewer is informing the author that it looks good as is, but the author is free to make changes and request re-review before merging if they’d like. As long as some other relevant person has approved, I don’t see the benefit in preventing the author from hitting the merge button. In practice it is quite useful, and effectively equivalent to waiting for someone else to merge.

13 Likes

This is not true; I did the announcement, after discovering the problem. I agree that everything else went fine, and regardless our processes should be resilient against human error.

I don’t really see how you can make this argument, at least not without acknowledging that we have no coherent and discoverable notion of “supported” setups. The problem affects e.g. xscreensaver as well as i3lock, and the xscreensaver module was apparently added six years after the package:

and was not mentioned in any release notes as far as I can see (until the 25.05 release notes documenting this PAM change). How was @aij meant to discover that their setup was “unsupported”? It’s not practical to grep for every piece of software I use in man configuration.nix every six months just in case a module was added.

I can confirm this: I have services.xserver.windowManager.i3.enable set and was still impacted.

I do wonder if @cafkafk’s idea is feasible. I think she suggests an i3lock derivation like:

{
  # usual deps here, plus:
  __usingModule ? false,
}:

lib.warnIf !__usingModule "programs.i3lock.enable required" stdenv.mkDerivation { ... }

and then default programs.i3lock.package to i3lock.override { __usingModule = true; }. Would that work?

9 Likes
  1. If this has to be moved from the core of PAM (reasonable), one also needs to justify why not create a pam-legacy-entries.nix module and move stuff there (or even pam-automatic-entries.nix if there are still entries that make more sense there).

  2. To claim that just installing i3lock is unsupported, any direct use of this package should warn. programs.i3lock.enable can override the package arguments to dismiss the warning (as can any alternative setup that takes care of PAM).

  3. Objecting in a blanket manner to self-merges of changes reviewed by someone else is absolutely going too far. You are feeling weary about creating an RFC about it because it will sink as it should.

  4. We probably have an unbalanced set of expectations about breaking changes in unstable in general, and unstable carries an inherent risk of breaking changes, but this specifically is a rather pointless breaking change for a breakage that only happens at runtime.

3 Likes

Not that it makes a difference to the point being made, but just to set the record straight, the module was added six years after I switched to NixOS (and started using the already well established xscreensaver package). The package itself appears to have been added in 2007, which would be 16 years before the module.

3 Likes

Thanks, I missed that. Edited my post.

Yeah, that is fair, and I didn’t mean this in a legalistic “this is Officially Unsupported and therefore you have no recourse” sort of way. It’s more that the modules weren’t broken by the change, and there was a reasonable-on-the-face-of-it assumption that users would be using the module. It is usually the case in nixpkgs that if a module exists for a piece of software, then that is the only fully supported way to use it. Rolling your own solution comes with a higher maintenance burden.

In this case, some users didn’t migrate when a module became available, and didn’t know that they were now “rolling their own solution”. All I’m trying to say here is that it’s an understandable mistake all around with some interesting lessons.

Thanks. This is a bug that we should fix before 25.05 release. (We can do this by having services.xserver.windowManager.i3.enable in turn set programs.i3lock.enable.) We should look for similar bugs in other affected modules, too. It sounds like we should also improve the release notes entry to specifically warn about certain usage patterns.

Technically, yes - but I wouldn’t feel comfortable approving that without sign-off from more experienced reviewers. Nixpkgs is more than just NixOS, and that approach blurs the boundary between packages and modules.

I might be misreading this, but it seems there’s an implication here and in other posts that this was deliberately a breaking change where it was deemed an acceptable tradeoff to break lock screens. I don’t think that’s the case. There was an unfortunate misunderstanding by both users and the author about how these tools are configured, and the degree of breakage was unexpected.

3 Likes