Systemd freezing up in any new generation

So, I tried installing home-manager because it was recommended to me, but I messed up the installation, then tried uninstalling it, system runs fine, days later retry installing home-manager, fail again, multiple sources says different things and I got confused and fumbled it, uninstalled it again, but this time when I did nixos-rebuild, systemmd just froze, completely froze, no error messages no nothing, I thought it was just going slow, and let it go, 45min later still frozen, so I rebooted to a previous generation and worked fine, I tought if I rebuilt and switched this old generation it would solve my problems, nope, this new rebuild also froze, ok then I tried deleting all generations that I used to try installing home-manager, console says they were deleted, It frees sapce up on my computer, so something mustve happened, I rebuilt switched, and they were still there and to top it off, the rebuilt generation also froze, so now I dont know what to do, do you guys have any ideas or anywhere you can point me too?

System info:
OS: NixOS 22.11.2999.a7cc81913bb (Raccoon) x86_64
Host: ASUSTeK COMPUTER INC. X571GT
Kernel: 5.15.97
Packages: 1099 (nix-system), 861 (nix-user), 112 (nix-default)
Shell: bash 5.1.16
Resolution: 1920x1080
DE: Plasma
WM: KWin
Terminal: .konsole-wrappe
CPU: Intel i5-9300H (8) @ 4.100GHz
GPU: NVIDIA GeForce GTX 1650 Mobile / Max-Q
GPU: Intel CoffeeLake-H GT2 [UHD Graphics 630]
Memory: 16GB

2 Likes

Hopefully you have your configuration in a repository of some kind, so rolling back and trying to build an earlier config should help you identify the offending change(s).

If you don’t (you really should), you can try comparing /etc of a broken and a non-generation to see if that shows you anything.

kdiff3 /nix/var/nix/profiles/system-123-link/etc /nix/var/nix/profiles/system-128-link/etc or similar.

The only thing I have in a repo is my configuration.nix file

So, I used kdiff sudo kdiff3 /nix/var/nix/profiles/system-39-link/etc /nix/var/nix/profiles/system-34-link/etc to analyze the most recent broken generation and the last working generation, I got this:
An error while opening kdiff:
Error: Some files could not be processed
Opening /nix/var/nix/profiles/system-39-link/etc/systemd/system/getty.target.wants/autovt@tty1.service failed. No such file or directory

Number of different files: 12
These were the files with differences:

1: File path /dbus-1/session.conf
2: File path /dbus-1/system.conf
3: File path /systemd/system/accounts-daemon.service.d/overrides.conf
4: File path /systemd/system/dbus.service.d/overrides.conf
5: File path /systemd/system/multi-user.target.wants/cpufreq.service
6: File path /systemd/system/polkit.service.d/overrides.conf
7: File path /systemd/system/systemd-fsck@.service.d/overrides.conf
8: File path /systemd/system/cpufreq.service
9: File path /systemd/user/dbus.service.d/overrides.conf
10: File path /issue
11: File path /os-release
12: And a missing file in both Gens

For all files but 5 and 8 all the differences look something like this:
Gen 34: X-Restart-Triggers=/nix/store/k66r6h5i8cinx5i09lpc5ga04dj67kb6-dbus-1
where k66r6h5i8cinx5i09lpc5ga04dj67kb6 changed to xjfir7861yhmszv05ivb2gakb29aal27 in Gen 39
They don’t change to the same string in all files, but all occurrences in the same file change to the same string

In files 5 and 8 the changes on top of that also include a number change from 97 to 99 in all occurrences of cpupower-5.15.97

Files Issue and os-release change nix BUILD-ID from 2999.a7cc81913bb to 3042.5eb98948b66

When you say about my config in a repo if it’s not just configuration.nix file, what else should I include in a repo?

You shouldn’t have to run kdiff3 with sudo.

The only thing I have in a repo is my configuration.nix file

You are not importing any other files? The installer as an example also creates a hardware specific file. Did that change? Or did you just incorporate the changes into configuration.nix?

At some point it no longer feasible to keep everything in one file. From $JOB configuration repo:

 $ find . -name '*.nix' | wc -l
520

You can also try running the installer again to see if you at least can bring that up succesfully. You can then add your changes bit by bit.

Could your kernel have changed between builds?

Yesterday I rebuilt an x86_64 system using a slightly newer revision of nixpkgs. The system appears to freeze during boot. (What actually happens is that the process responsible for switching from (or maybe to) the kernel framebuffer display system is hung.)

That same system config built using nixpkgs from last week worked fine. In fact, that same config using nixpkgs from today works fine. (I’m tracking nixpkgs branch nixos-22.11.)

Maybe if you just rebuild yet again your system will magically work. :wink:

Opening kdiff without sudo gives three errors: Some files could not be processed
Opening /nix/var/nix/profiles/system-39-link/etc/cups/subscriptions.conf failed. Permission denied
Opening /nix/var/nix/profiles/system-39-link/etc/cups/subscriptions.conf.O failed. Permission denied
Opening /nix/var/nix/profiles/system-39-link/etc/systemd/system/getty.target.wants/autovt@tty1.service failed. No such file or directory
But yeah, it still opens sorry

No, I’m not importing anything else, and the hardware nix file didn’t change at all between any gens I tested

So I’ve noticed that the cpupower-5.15.97 string that changed in the cpufreq files actually line up with the linux kernel version, and all generations with the most recent .99 version don’t work.

So thats why when I build no new gen works, they all build to the new linux kernel version, how can I force new gens to either stop building to the new kernel, or make the new kernel work.

I tried building today nothing changed, the new gen still builds to .99 kernel and still freezes upon startup

Add to your config boot.kernelPackages = pkgs.linuxPackages_5_15_97 where 5_15_97 represents the kernel version in your nix channel closest to the one you want. You can pull from a different channel if what you want isn’t available, but describing how to do that is more complicated. (I did this a bit last year when Intel kept breaking their wifi/bt driver; it was anoying to maintain.)

If you already have a line for boot.kernelPackages, first try removing that. Otherwise, I’d probably just try linuxPackages_6 to see if that works without further qualification.

For details, see NixOS manual Kernel Config.

I tried using both pkgs.linuxPackages_VERSION and pkgs.linuxKernel.packages.linux_VERSION
but both when specifying the full version 5_15_97 give me the error

`error: attribute ā€˜linux_5_15_97’ missing

   at /etc/nixos/configuration.nix:98:25:

       97|   # Chooses boot kernel
       98|   boot.kernelPackages = pkgs.linuxKernel.packages.linux_5_15_97;
         |                         ^
       99|

(use ā€˜ā€“show-trace’ to show detailed location information)`

using show trace in nixos-rebuild gives out this massive wall of text --show-trace results - JustPaste.it

I also tried doing the linux kernel version 6, and that seemed to not freeze on systemd, but for my disappointment as soon as the system turned on I had no desktop enviroment and it just booted me into nixos console, I could login in my user just fine, but it was just all console, so yeah.

I also tried building with version 5_15_99 which is the version where systemd freezes but got the same error as the first one.
also tried building with just 5_15 and no followup version specified, and it built, but it was a 99 and it froze like the others
also tried 5_15_102 Because I saw it was available in the kernel.org site but still got the same attribute missing error

I’ve been assuming that you are using a typical NixOS installation based on channel nixos-22.11. Probably we should verify that by examining the output of nix-channel --list.

Reading about nix-channel suggests you could use nix-channel --update to grab the latest revision of nixpkgs for your channel, and use nix-channel --rollback to undo prior updates.

Maybe nix-channel --update is needed to make kernel 5.15.102 available.

Unfortunately I have little experience with nix channels. I’ve always used nix flakes with pinned revisions of nixpkgs. I do my best to suggest things for you to try, but I am not able to actually try them myself.

Looking at nixpkgs source I see a bunch of kernel versions which should be available. However, if I recall correctly, it can take some time before a change to the source tree makes it through Hydra to become available for binary download.

It might be easiest to try some of the kernel minor versions listed in the source, such as 5_10, 6_0, or 6_1. Also, if 5_15_102 is not available, maybe 5_15_101 is. FYI, my main laptop is running kernel v6.1.16 – the latest compatible with ZFS when I upgraded my system yesterday. However, the laptop where I recently had boot problems now is running kernel v5.15.101.

Hopefully nix-channel --update will get you to version 5.15.102 and all your problems will be solved. :wink:

So, I’ve rolled back all my nix-channel updates, which were two, then did a nix-channel update 102 and 101 kernels still couldn’t be found, but ok, so then I tried building 6_0, 6_1, 6_2 and 5_10, all built fine but when turned on, the four were lacking a graphical environment, all dropped me into NixOS command line big black screen with login prompt on, I’ve searched to how to start KDE or a graphical environment, tried a 2005 solution that didn’t work, using startx and init 5, and went into KDE site where it said to use systemsettings to start KDE but it gave me qt.qpa.xcb: could not connect to display

qt.qpa.plugin: Could not load the Qt plataform plugin "xcb" in "" even though it was found

also tried reinstalling xcb with this exact name as it was listed in the available plugins list and just got
selector 'xcb' matches no derivations
welp, sadge

I suspect something is wrong with loading the necessary graphics drivers. Probably you could see details about that in the console by running journalctl -b, then press G to go to the end, then u to back up through the pages looking for error messages. If you are lucky, they will be red.

Knowing the details can offer clues but may not really be what you need. Can you share your full configuration.nix (or whatever minimalist version you keep trying)? There ought to be some lines specifying kernel modules for graphics, and some lines for KDE.

You also might try a different Desktop Environment in case KDE is related to the problem (or xserver versus wayland). I use XFCE with a fairly minimalist config along the lines of

  services.xserver.enable = true;
  services.xserver.displayManager.defaultSession = "xfce";
  services.xserver.desktopManager.xfce.enable = true;

My kernel modules also are minimal. I think the only one related to graphics is boot.initrd.availableKernelModules = [ "xhci_pci" ];. But if you have an NVIDIA card, graphics drivers get more complicated.

Hopefully building up from a minimal working system will be easier.

Or, maybe go back to reinstall from scratch if you don’t have any data to lose.

So I was exploring journalctl and found the errors:
I got multiple
Process 1055 (.xdg-desktop-po) of user 175 dumped core.

x86/cpu: SGX disabled by BIOS.

also got this gigantic wall of errors Wall of errors - JustPaste.it

Here is my configuration.nix configuration.nix - JustPaste.it

I do have a nvidia graphics card

If all goes wrong I have all my data on the cloud, so it’s not a problem to reinstall from scratch, but I want to learn more about the system, otherwise I probably would have done that already

xdg-desktop-po dumping core could be relevant – deciding would depend on what’s in the dump.

SGX disabled is reasonable, I think. (It’s just Intel security extensions.)

The ā€œWall of errorsā€ all look to be the same (or similar). Searching online for dswload2-326 leads to a bunch of links calling them out as irrelevant firmware bugs. This kernel.org thread is an example.

configuration.nix looks fine as far as I can tell.

I think that you have to run sudo nix-channel update to affect what is used by sudo nixos-rebuild. Without sudo you affect your user environment but not the OS, if I recall correctly. Maybe that will get you the newer kernel.

I reread this thread from the beginning looking for more clues. Here are some bits I want to clean up…

Originally you mentioned deleting generations to free space. My experience with this is limited, but I have found that to remove an OS generation from the boot screen, I must manually delete it from /nix/var/nix/profiles/ and then run sudo nixos-rebuild boot. Any symlink in /nix/var/nix/profiles/ will keep the item in the boot screen list and hold the binaries in the store. (Possibly there is a nix command to remove these system profile links, but I don’t know it.)

When you saw the message error: attribute ā€˜linux_5_15_97’ missing, it probably meant that kernel version 5.15.97 was not in the nixpkgs revision specified by your channel. You just have to try a different version to find one which is available (or check the nixpkgs source. Keep in mind that the channel list used is for the user running the nix command – root when running sudo nixos-rebuild.

If sudo nix-channel update does not get you kernel v5.15.102, you might try sudo nix-channel add unstable. If I recall correctly, more kernel versions are available in unstable and the channel really isn’t all that unstable. (Many of us use unstable for user packages while using a release channel for OS builds. I used unstable for OS builds for months until the Intel wifi driver fixes finally landed in a release channel.)

If you get this far and still it is not working, try changing your desktop manager. As I searched for info on bits of this thread (like sddm and qt.qpa.xcb) I was seeing mentions of KDE-related problems. I don’t think KDE is bad, but it’s an easy thing to swap to narrow the problem-space.

Hopefully something I am suggesting will reveal more useful info, if not actually solve the problem.

2 Likes

I was already running it as sudo but to make sure, I added unstable and it also gave me the could not be found error

But I tried swapping DE from KDE to GNOME and it worked, now my 6.1 build is running on a graphical interface just fine.
So now I can actually build my system again yay!
Maybe the problem was a mix of nvidia drivers and KDE clashing in a weird way, I will still investigate more.
But thanks for the help, I learned a lot through this post

Just as one last ask, could you point to any docs or posts about how to not fk up while switching from a DE to Window Manager, I didn’t want to use KDE and now GNOME, would help a lot, thanks again for the help :slight_smile:

I’ve learned a lot, too. Getting started with NixOS is hard, but the more I use it, the more I do not want to go back.

What I know of nix comes from living with it for the last two years, reading the docs, wiki, forum, various blogs and the source code. Many of the best resources are the links posted here in the forums. Learning nix takes a lot of persistence – at least a thousand hours. Only recently have I felt like I understand enough to do more than minor tweaks. (I built a nix tool to make a system boot image using ZFS so I don’t ever have to run the installer anymore.)

Honestly, I don’t really have much detailed knowledge of the different DEs. I don’t have any good references for you. I do as much as I can in a terminal. My DE use is mostly limited to Firefox, Thunderbird, VLC and some limited VSCode.

What I think I know about Linux desktop environments is that the different DEs should play nice together. You should be able to load any of them and use the UI of the login screen to choose which one you want at that time. Each DE has its own way of managing its settings – they should not conflict.

Years ago I liked KDE for being more modern and performant. But its foundations are different from the others. Researching your issues I found more recent problems linked to KDE than gnome, which is why I encouraged you to try another. Maybe those problems will be gone the next time you try KDE.

Most other common DEs are based on some version of GNOME. A few year’s ago Ubuntu abandoned their own DE and poured all their DE resources into fixing GNOME. Consequently, GNOME has addressed many of the issues that had held it back. I hear that modern GNOME is a decent, performant DE.

Another important consideration is the migration from the Xorg display system to Wayland. This migration has been long and painful. I hear Wayland works well now but issues still crop up. I’m not sure if your system was running Wayland or X, but KDE seems a bit more aggressive about adopting Wayland. Possibly this is related to the problems you encountered.

Lastly, NVidia has not been supportive of Linux much; although maybe that is changing. You have to choose to use non-free nvidia drivers or the generic Linux alternative. There are trade-offs either way. I do not know what you are running. (Possibly that is in hardware-configuration.nix.) If your hardware supports running the display on the Intel graphics, it’d be worthwhile for you to learn how to disable nvidia for diagnosing problems.

I’ve found a lot of value in frequently reading the journal. This leads to knowing what’s unusual and clues to guessing what might be going wrong. It’s also quite impressive as a bit of developer tooling – something to aspire to in my own projects.