A recent kernel update broke my system. I found a workaround by upgrading the kernel to latest (6.6.2). But, when I change the kernel version, sudo (and su for that matter) are broken. They will fail instantly without asking for password, logging that all three tries have failed for sudo.
Currently, I only see one option which would be to set my kernel version to the last working version which is 6.1.55. But I don’t know how to do it.
What could be the cause of this and how can I fix it ?
After a bit of looking around, I found that /etc/pam.d/sudo and /etc/pam.d/su were both empty after nixos-rebuild. It looks like something in the config file is wiping those files. I haven’t check for all the pam.d/* files yet. But this is definitly not normal ! For now, as a workaround, I writing the config file by hand in my configuration.nix file by copying the previous values. I’ll post an update on what I have done exactly tomorrow.
I’ll post a link to the issue I opened on github which has a fix for this bug. Even though, it is not meant to be permanent.
I’ll add that any new generation after the kernel upgrade will wipe those files, so this seems like an issue with NixOS. I highly doubt that my config file is in cause because I did not mess with sudo in any way. I’m wondering if everyone has those files in their system though, even if I’m pretty sure everyone does.
Having built thousands of generations over the years with workstations tracking unstable and servers on whatever is the current stable - and having built latest unstable just now, I have neither seen empty /etc/pam.d/* files nor heard anyone mention that. This is either a misconfiguration on your part or faulty hardware.
you can check your hardware with tools like memtest(ram) and smartmontools(hdd).
strip your configuration.nix file to the minimum, you can use /* */ commentary and/or # ,
then re-add one config option at a time.
If your system config is done as a flake, you can also try this:
cd $where_your_config_is
nix repl
# Then inside the repl, load the flake config
:lf .
nixosConfigurations.your-host-name.config.environment.etc."pam.d/su".text
# this should give you null. If you have an empty string instead, you know something is messing with it
nixosConfigurations.your-host-name.config.environment.etc."pam.d/su".source
# this should give you a derivation - not the same path as me
«derivation /nix/store/y8zw8708p64f579kz6pfi0bfszhnl419-su.pam.drv»
# now take a look at it outside the repl
cat /nix/store/y8zw8708p64f579kz6pfi0bfszhnl419-su.pam.drv
# a look for the filename following "out" and cat that:
cat /nix/store/12wav088kp22snm01wc4pbkbfy27mfz4-su.pam
If the last file has (correct) data, it’s something to do with the activation scripts.
I already tried this before. It doesn’t resolve the issue. Besides, it was working prior the kernel upgrade without the need to explicit sudo and the problem doesn’t only comes from sudo but also from su.