Gnome Display Manager fails to login until Wi-Fi connection is established

Hello,

In December, I switched from Arch Linux to NixOS, definitely one of the best decisions ever. :smiley:
And since then, I’ve been suffering from a problem that I didn’t have on Arch, which I hoped would be fixed with updates. However, 9 months and a new release (24.04) later, the problem persists.

If I try to login before the Wi-Fi connection is established, GDM either kicks me back to the login screen or it just presents me with a black or gray screen…

From this description, you can see that the problem is somewhat similar to this topic: GNOME session sometimes fails to load after login unless wifi is disabled from login screen

However, I don’t use tlp and I tried disabling Bluetooth on boot and the issue still persists.

My entire Nix configuration is publicly available, and you can check it out here: GitHub - MatejaMaric/nix-setup: My Nix Flake configuration

Here’s the result of journalctl --boot when GDM kicked me back to login once, and I was able to login the second time: boot logs: GDM failes to login until WiFi connection is established, kicked back to login screen · GitHub

What’s maybe worth mentioning is that I’m using Wayland (it’s the default on Gnome) and the entire home directory is unchanged from my Arch Linux install (I have a separate /home partition).

Thanks a lot for any help! :smiley:

Here’s an another journalctl --boot log, this time I was presented with a gray screen and a working mouse pointer, so I swiched to TTY 3 and saved this log to a file: journalctl --boot: GDM fails to login until WiFi connection is established, presented with a gray screen and a working mouse pointer · GitHub

That, sir, is almost asking for trouble.

I think I might have stumbled across the same problem. I tested some (ok, a lot of) gnome shell extensions, so I wasn’t surprised when I got kicked back at the login screen every now and then. I even got the unresponsive gray or black screen once or twice.

But a few a days ago I disabled all extensions and still experienced the problem. Now after finding this topic I started waiting for the wifi connection until I log in and since then it worked flawlessly every single time. I even started to enable my shell extensions again, one by one.

In some of the failed login attempts (but never when a login worked) I found this in the journal

Jul 16 22:56:33 nixos .gsd-rfkill-wra[2141]: Could not open rfkill device: Could not open RFKILL control device, please verify your installation
1 Like

For testing I replaced Network-Manager with systemd-networkd to configure the wifi. The problems still occurs, but less often, probably because it takes less time to establish the wifi connection.

I might have found a different workaround though. Disabling the envolved systemd units seems to solve the issue

  systemd.targets = {
    network-online.enable = lib.mkForce false;
  };
  systemd.services = {
    systemd-networkd-wait-online.enable = lib.mkForce false;
    NetworkManager-wait-online.enable = lib.mkForce false;
  };

It might have side effects on other services, so I can’t recommend it, but I have experienced none so far.
Also disabling the first one, network-online.target is probably sufficient.

2 Likes

Oh wow, it seems that fixes it for me also…

I ran systemctl list-dependencies NetworkManager-wait-online.service --all --reverse to see what services were dependent on NetworkManager-wait-online and I was presented with the following result:

NetworkManager-wait-online.service
● └─network-online.target
●   └─multi-user.target
●     └─graphical.target

Then I search why multi-user.target depends on network-online.target and I found this immediately:

And it seems to still be the issue on nixos-24.05:

I’ll make a bug report in a next day or two, this is obviously not a normal behavior…

It is known that multi-user.target should not depend on network-online.target, and there have been attempts to fix this (though, we had to revert because of some regressions we didn’t catch).

That said, it shouldn’t be causing this problem, so it’s quite surprising to me that disabling it fixes anything. Nothing related to the desktop should be ordered after network-online.target or multi-user.target. So either GDM or GNOME is doing something strange here.

1 Like

It may not necessarily mean that it is gdm’s fault, but I couldn’t reproduce the problem with sddm (services.displayManager.sddm.wayland.enable).

With gdm and NetworkManager I mostly get thrown back to the login screen.
With gdm and systemd-networkd I mostly get a grey screen.

With the wifi connection established (if I wait a few seconds before login) or with network-online.target disabled, there is no problem.

1 Like

I’m interested, what Wi-Fi card and driver do you use? Maybe it’s related in some way…
You can check that using lspci -nnk command (found in pciutils package).
Here’s mine:

0000:02:00.0 Network controller [0280]: Realtek Semiconductor Co., Ltd. RTL8822CE 802.11ac PCIe Wireless Network Adapter [10ec:c822]
        DeviceName: WLAN
        Subsystem: Hewlett-Packard Company Device [103c:85f7]
        Kernel driver in use: rtw_8822ce
        Kernel modules: rtw88_8822ce

I have an odd question… If you wait 5 seconds before attempting to login, and the wifi still hasn’t connected yet, does it still kick you out?

Here’s why I ask (it’s complicated, and spans several open issues on nixpkgs). The service getty@tty1.service has the systemd property Type=idle, which means that it will wait an extra 5 seconds before starting in case there are any pending systemd jobs. In this case, the pending job would be network-online.target. If you login within those five seconds, GDM might start GNOME on tty1. But if getty@tty1.service starts after that point, it will kick GNOME off the tty and GNOME will crash. But if you give getty@tty1.service the five seconds to be started, this shouldn’t happen because GNOME will start on a different tty.

This is only a hunch. I’m going to figure out how to test it on my setup.

EDIT: I have replicated the exact behavior I described by disconnecting my network switch from my router and logging in within those 5 seconds. I experienced the crash as expected. If I wait the 5 seconds, it does not crash.

1 Like

I just tested it by removing network-online.enable fix and plugging out my Wi-Fi router from the wall.
If I don’t wait for 5 second, Gnome will kick me out. However, if I wait for 5 seconds, it will not kick me out.

It seems it’s a good hunch. :smiley:

Ok, we’re actually currently working on a fix for this for a completely different reason :stuck_out_tongue: Turns out it’s the same bug, just discovered a totally different way. Hopefully, we’re going to switch to putting GDM on tty1 instead of tty7 so this whole mess never happens. There’s some complications with that, but hopefully we’ll have it all figured out.

1 Like

Nice :smiley:

Is there a Github issue that we can use to follow the progress?

I’m also interested what’s the other issue cause by this… :sweat_smile:

Here’s an old one, and here’s the one I found. You can see these are all similar, but a little different. In mine, it seemed to be caused by systemd-initrd+plymouth, in the old one it was just autologin, and in this case it was network-online.target. In fact it’s all just because it’s hard to ensure that GNOME and getty@tty1.service don’t fight each other, so the solution to all of this will be to put GDM on tty1 and allow its Conflicts=getty@tty${initialVT}.service do the work.

1 Like

Yup, it’s the same issue. The following also fixes it:

systemd.services."getty@tty1".enable = false;
systemd.services."autovt@tty1".enable = false; 

Cited from:

Disabling getty on tty1 doesn’t solve the problem for me at least not completely. Every now and then gnome still crashes on login. It doesn’t happen often though and so it took a while to occur again.

Intel 3168NGW and iwlwifi

I tried to check, but it seems that network-manager doesn’t prevent reaching network-online.target if my wifi ap is not running.

Also gnome doesn’t always end up on the same vt. Sometimes gdm keeps displaying the login-screen on vt1 and gnome ends up on vt2 and sometimes gnome replaces gdm on vt1.

1 Like

This is odd. GDM should only start on vt7, not vt1. That should be controlled by the initial-vt build parameter.

Hm, right now it is. Maybe I got the vt gdm is running on wrong. I only guessed from the agettys I was cycling through.
But I am sure gnome didn’t always end up on the same vt.

Right, gnome starting on an unpredictable VT is sort of where the problems are coming from. You can see what’s using which VT with loginctl list-sessions