In December, I switched from Arch Linux to NixOS, definitely one of the best decisions ever.
And since then, I’ve been suffering from a problem that I didn’t have on Arch, which I hoped would be fixed with updates. However, 9 months and a new release (24.04) later, the problem persists.
If I try to login before the Wi-Fi connection is established, GDM either kicks me back to the login screen or it just presents me with a black or gray screen…
What’s maybe worth mentioning is that I’m using Wayland (it’s the default on Gnome) and the entire home directory is unchanged from my Arch Linux install (I have a separate /home partition).
I think I might have stumbled across the same problem. I tested some (ok, a lot of) gnome shell extensions, so I wasn’t surprised when I got kicked back at the login screen every now and then. I even got the unresponsive gray or black screen once or twice.
But a few a days ago I disabled all extensions and still experienced the problem. Now after finding this topic I started waiting for the wifi connection until I log in and since then it worked flawlessly every single time. I even started to enable my shell extensions again, one by one.
In some of the failed login attempts (but never when a login worked) I found this in the journal
Jul 16 22:56:33 nixos .gsd-rfkill-wra[2141]: Could not open rfkill device: Could not open RFKILL control device, please verify your installation
For testing I replaced Network-Manager with systemd-networkd to configure the wifi. The problems still occurs, but less often, probably because it takes less time to establish the wifi connection.
I might have found a different workaround though. Disabling the envolved systemd units seems to solve the issue
It might have side effects on other services, so I can’t recommend it, but I have experienced none so far.
Also disabling the first one, network-online.target is probably sufficient.
I ran systemctl list-dependencies NetworkManager-wait-online.service --all --reverse to see what services were dependent on NetworkManager-wait-online and I was presented with the following result:
It is known that multi-user.target should not depend on network-online.target, and there have been attempts to fix this (though, we had to revert because of some regressions we didn’t catch).
That said, it shouldn’t be causing this problem, so it’s quite surprising to me that disabling it fixes anything. Nothing related to the desktop should be ordered after network-online.target or multi-user.target. So either GDM or GNOME is doing something strange here.
I’m interested, what Wi-Fi card and driver do you use? Maybe it’s related in some way…
You can check that using lspci -nnk command (found in pciutils package).
Here’s mine:
I have an odd question… If you wait 5 seconds before attempting to login, and the wifi still hasn’t connected yet, does it still kick you out?
Here’s why I ask (it’s complicated, and spans several open issues on nixpkgs). The service getty@tty1.service has the systemd property Type=idle, which means that it will wait an extra 5 seconds before starting in case there are any pending systemd jobs. In this case, the pending job would be network-online.target. If you login within those five seconds, GDM might start GNOME on tty1. But if getty@tty1.service starts after that point, it will kick GNOME off the tty and GNOME will crash. But if you give getty@tty1.service the five seconds to be started, this shouldn’t happen because GNOME will start on a different tty.
This is only a hunch. I’m going to figure out how to test it on my setup.
EDIT: I have replicated the exact behavior I described by disconnecting my network switch from my router and logging in within those 5 seconds. I experienced the crash as expected. If I wait the 5 seconds, it does not crash.
I just tested it by removing network-online.enable fix and plugging out my Wi-Fi router from the wall.
If I don’t wait for 5 second, Gnome will kick me out. However, if I wait for 5 seconds, it will not kick me out.
Ok, we’re actually currently working on a fix for this for a completely different reason Turns out it’s the same bug, just discovered a totally different way. Hopefully, we’re going to switch to putting GDM on tty1 instead of tty7 so this whole mess never happens. There’s some complications with that, but hopefully we’ll have it all figured out.
Here’s an old one, and here’s the one I found. You can see these are all similar, but a little different. In mine, it seemed to be caused by systemd-initrd+plymouth, in the old one it was just autologin, and in this case it was network-online.target. In fact it’s all just because it’s hard to ensure that GNOME and getty@tty1.service don’t fight each other, so the solution to all of this will be to put GDM on tty1 and allow its Conflicts=getty@tty${initialVT}.service do the work.
Disabling getty on tty1 doesn’t solve the problem for me at least not completely. Every now and then gnome still crashes on login. It doesn’t happen often though and so it took a while to occur again.
Intel 3168NGW and iwlwifi
I tried to check, but it seems that network-manager doesn’t prevent reaching network-online.target if my wifi ap is not running.
Also gnome doesn’t always end up on the same vt. Sometimes gdm keeps displaying the login-screen on vt1 and gnome ends up on vt2 and sometimes gnome replaces gdm on vt1.
Hm, right now it is. Maybe I got the vt gdm is running on wrong. I only guessed from the agettys I was cycling through.
But I am sure gnome didn’t always end up on the same vt.
Right, gnome starting on an unpredictable VT is sort of where the problems are coming from. You can see what’s using which VT with loginctl list-sessions