NixOS Gnome installer crashes

Hi everyone.

This is the new account for Sherlock Xia.

Just an update:

Context:
https://discourse.nixos.org/t/help-i-want-to-reinstall-nixos-with-btrfs-subvolumes/58603

I was planning to reinstall nixos with btrfs subvolumes; however, I was in the iso installer and it crashed, produced the error messages in the photo.

I would like to know is it the same amdgpu problem that was mentioned by @lopter in the following topic that I made eariler. https://discourse.nixos.org/t/help-nixos-kde-plasma-6-kept-crashing-and-freezing-even-in-tty/58425/36

I have asked chatgpt and it suggest me to use a installer with a newer kernel which might fixed the issue already. I wonder if nixos got something like that?

Please don’t question my hardware, I have got another nvme ssd with windows 11 installed with ZERO system crashes. I also passed the memory test with AMD EXPO on.

Here is my fastfetch result in my previous build.

Thanks for reading.

BTW, I am getting tired of debugging this system trying to make it work for me.

I was using Linux Mint for the past 5 years with ZERO system crashes. I would like more actual using instead of frustratingly debugging.

Hello,

The latest traces from the Nix installer you’ve posted aren’t from the amdgpu driver. They are (serious) warnings from the RCU code rather than crashes. This is indicated under -------[ cut here ]-------, and also below that under Call Trace.

You could try to run linux_6_12 (using boot.kernelPackages) and see if it changes anything.

Thanks for your reply.

What might cause the issue?

I undervolted my cpu to 1.1V, it ran fine on windows 11, could that be the cause of the issue for NixOS?

Furthermore, all those errors that I have experienced with nixos in the past 15 days has been really inconsistent, thus making it difficult to pin point the issue or issues.

Oh god…

Please don’t make it a cluster of issues, I am just a noob, not a master debugger.

EDIT 1: I asked chatgpt and it still points the possible problem to the linux kernel.

Yes, this can be an additional source of problems. As soon you do overclocking, undervolting or anything of that sort, you can’t assume that application B will work because application A did.

We don’t make that, computers and software are at the top of the supply chain, they inherently are a cluster of issues.

We’ve already shown you how to try different kernels.

@lopter

I tried resetting the cpu voltage to default. It will still crash randomly.

Today the problem changed, the gnome desktop in the iso image will crash and return to the login screen.

I would like to, but even the thumbdrive iso crashes ramdomly so I can’t reinstall the system.

I would like to have a copy of the iso image with the latest linux kernel. Can ask the Nix devlopment team to build one?

A politician is a master at lying and is still overshadowed by the utter bullshit chatGPT can confidently spit out in every prompt. Take all of it with a grain of salt.

If you were under normal circumstances, it might be trustworthy but the fact you are running a hardware voltage mod puts you well outside the norm that chatGPT is trained for.
Assume anything you cant already personally verify is bad info.

First, you need to completely reset your BIOS back to factory and set your ram xmpp profile again to be sure your mods are gone but theres a chance at this point you’ve already permanently damaged your proc.
Undervolting can be just as bad for hardware as overvolting.

Reading that error code you got, i can tell a register that should have data in it didnt get loaded correctly for the instruction to fully run and it erros out with a crash. An undervolt could very well explain that problem, insufficient power when function is called so the memory didnt get charged correctly, or at all

Then you shouldnt be fucking with clock speed and voltages in the first place.

1 Like

After you reset your BIOS and confirm your XMPP profile for memory, make a new usb stick with a new image downloaded fresh.
Odds are, if you live booted that stick and had a full crash (like is indicated above) some of the blocks probably got smashed up and are no longer readable which are causing your intermittent crashing post volt reset. Again, full bios reset. Dont mess with voltages unless you KNOW what you are doing or you WILL destroy your PC eventually.
I do know what im doing and ive personally destroyed several procs, and once set a GTX Titan on fire in my tower.
Dont fuck with voltages without a fire extinguisher handy.

EDIT to add:

Theres alot to unpack here but the short answer.
No.
Longer answer. Still No.

Linux kernel

By default, the latest LTS linux kernel is installed (Linux Kernel Version History).

Source: Linux kernel - NixOS Wiki
The default kernel is the Linux LTS kernel, which is the extended support kernel thats standard in alot of distros.
Compiling with the newest kernel version will not solve your problem, in fact its likely to add alot more.

1 Like

Thank you for your reply.

I kept the original iso image on my Windows 11 system and I just resetted the bios settings.

I will update after I try out the newly made boot thumbdrive.

Let us know how it goes, as for btrfs, its better to configure btrfs in a shell as the gui can be a bit clunky on that point, especially if you have several drives or are setting up a raid but for a single drive it should be fine to use the gui.

Otherwise, for manual btrfs config read this carefully Btrfs - NixOS Wiki
Be very careful not to skip steps and get ahead or it wont boot right, or at all that matter. Took me 4 attempts my first time to get it really right, but i was doing a raid on 3 NVMe’s which complicated everything alot so you probably wont have as many issues.

@Mephist0phel3s

Thank you so much for pointing out that my iso image was corrupted after the crash.

After hours of work, I successfully reinstalled nixos with btrfs subvolume.
So I used Gparted to format and partition the drive, and then btrfs command to create and mount the subvolume and it all ran well.
The iso crashed after I finished setting up all the partitions so I had to remake the iso thumbdrive and reinstall.
The system was successfully reinstalled yesterday.

However, during test use today it still suffered similar issues like the previous install.

I posted the dmesg -w results in the following post as well as my configuration.nix content if anyone can point out any issues or possible improvements to be made.

https://discourse.nixos.org/t/help-system-still-unstable-after-reinstall/58834

I also just updated the bios in Windows 11.

My hardware:
Mobo: Sapphire pulse B650M Wifi
GPU: Sapphire Radeon 7900XT
CPU: AMD 9700X
RAM: Crucial 24GB * 2 6000MHz

I have a strong suspicion you did in fact damage your processor with the undervolt, you need to run a hard stress test to see if it breaks under conditions that are known to be stable.

You still have windows on a separate drive or another drive you can slap in with a windows install?
If so you can use the stress tester built for AMD cpus baked into the ryzen master application. Dont mess with clock or voltages, just do a stability/stress test on the CPU.
Furmark or cinebench** heaven are also good

Let it run for 4 hours, try to record the output if you can at all. If it can run for 4 hours under stress and still be aight, we good and can narrow the problem down in Nix directly.

If you want to try a stress test in Nix directly instead, that is also possible and you have a few options.
Check out: Stress testing - ArchWiki
But i would personally suggest just using xmrig, its realistic, stable and can be run from a tty console directly so we dont have to mess with kwin or compositors during the test.
Get it booted, go into TTY mode and run xmrig as sudo. You can get a temp package for this, we dont need it forever. Just nix-shell -p xmrig

sudo xmrig --stress

Let it run for 4 hours, you can get a dump of the logs from dmesg and journalctl after you come out of tty or after a crash.
If/when it does crash on nix, get a complete log from dmesg and journalctl from start to finish and put it in a paste bin and copy here.

If you run on windows, get a snapshot of the event viewer in admin mode before during and after the stability test. Windows can sometimes compensate for damaged hardware but will normally still produce an error with a stack trace somewhere in event viewer.

@Mephist0phel3s

Just ran cinebench 30minutes stability test with no issue and my test score is higher than even an AMD threadtripper



At this point I would say it is definitely something wrong with my nixos.

EDIT: Sorry for the Chinese in the event viewer, there is no error within the last hour when I ran cinebench

Alright so def a linux problem then, since the crashes, have you done a total wipe of the install usb drive?
Not just a copy over format, but a shred?
You mentioned still having issues with stability on the gnome nixos iso, maybe its the USB stick or the ISO so lets rule those out before we move to nix.

First, the USB stick. We need to shred the disk (not literally) to ensure any bad blocks get completely purged and written over. There are a few methods of shredding, we dont need a complex one though so we can just write over the entire disk with 0’s.

First, find how your disk is identified in the /dev directory. Use partition tools to figure it out, you will NOT get an undo button if you nuke your main drive.

This part you will need to do from a unix/linux shell, it wont work in windows.
Once you know what the identifier is, we can use a low level tool called DD.

sudo dd if=/dev/zero of=/dev/<your drive here> bs=512 count=1

This will completely nuke the drive, including any bytes heading the front or the back of it. This is the only real sure method of completely erasing a disk after a crash like you had.
Dont partition it after this, leave it unpartitioned.

We could just hash the current ISO with the expected sha256 of the repo but thats too complicated to explain while smoking so we’re just going to remove gnome from the equation entirely. Gnome has been having its own issues so its also kinda possible gnome itself might be the culprit.

Grab a copy of the KDE installer for Nix 24.11 >> https://channels.nixos.org/nixos-24.11/latest-nixos-plasma6-x86_64-linux.iso

With the usb /dev/path , we are going to use DD again to write the iso out to the drive directly instead of partitioning and flashing.
as root, or sudo

dd if=/path/to/nixos-plasma6.iso of=/dev/<your drive here> bs=1M conv=sync status="progress"

This will take a while to complete, you might be tempted to check on it, or tinker with it, or whatever.
Dont. Its not stuck, it will finish. Give it time. Trust the process.

This also sets boot flags, renames the disk and resizes it at the same time so no other work is required at this point.
Plug it into the problem PC, boot into the stick, and run the installer again.
This time run with the defaults, all defaults. Dont run any customization or enable the btrfs system. Just install, and then boot into the new system.

Run your stability test again, tinker with firefox whatever and see if the problem persists.

EDIT: Side thought, are you dual booting or is the nix drive a completely separate drive? And is it an NVMe, SATA SSD, or NGFF SATA SSD?

Thank you for your reply.

Just to let you know that the KDE version of the iso does not work at the moment. I tried it the first time I install NixOS and it won’t start. I also searched in the forum and other people also suggest the same thing.

I had to nvme slots and two m.2 drives.
So nixOS and Windows 11 are on different disks.

I believe that there is that something wrong with how the kernel handles the issue.

Because …

  1. The issue persists across gnome and kde, thus desktop environment cause is unlikely.
  2. dmesg -w will output errors from the kernel so I can search from there.
  3. Obviously, I believe the kernel by itself doesn’t come out with defects, but started causing problems when it was intergrated into the system. The problem persists even when I switched the kernel version to a newer one. The problem is still the same.
  4. Software corruption is unlikely, since the problem (erratic and inconsistent errors) still persists after reinstall. I completely erased the disk from Windows 11.
    Furthermore, it will download fresh files from the mirror everytime I ran nixos-rebuild switch --upgrade, even if the previous build had issues, the problematic file will be overwritten by the freshly downloaded ones from the mirror and subsequently deleted by the garbage collector.
    On the manual, it says that during installation, the command “nixos-install” can be run as many times as you like because if … for example … the download fails, it will output error messages and discontinue the installation. The system will not be installed if any of the tasks specified in the configuration file fails.
    The samething applies to nixos-rebuild switch , it will not just overwrite the system while the upgrade procedure was done half way. It will only apply when it finishes all the specified tasks. Thus software corruption is highly unlikely.

I believe that this problem only occurs on my specific machine, because otherwise the forum will be bombarded with help posts. So it left me with the only one possible culprit … something that is unique to my machine fresh after configuration…

configuration.nix

Again, context:
https://discourse.nixos.org/t/nixos-kde-wayland-session-is-unusable/57808

This is my first post on this forum, at the time I just installed nixOS fresh for the first time. I was not even able to boot into the DE initially. So friends on the forum helped me to configure my configuration.nix. It worked, I was able to boot into the DE; however, not long after the install I started experiencing crashes already. See here:
https://discourse.nixos.org/t/nixos-plasmashell-5208-issue-and-possible-wayland-issues/58259

At the time I thought it was because I was using a third party theme; however, the problem still persists after defaulting the desktop theme to breeze dark.
After that, I just kept trying to debug the issue but nothing really fixes the problem.
At this point, I am really tired while trying to make this system work without issues. But I don’t need to give up because I got Windows 11 running without issues for now. I can take my time debugging it until I eventually fixes the issue.
Thus, I would like to try editing configuration.nix next. I would like all the advice I can get to fix the issue.

The configuration.nix content is on my new post here:

https://discourse.nixos.org/t/help-system-still-unstable-after-reinstall/58834

Thank you for reading and thank you @Mephist0phel3s for pointing out the iso corruption issue. Otherwise I wouldn’t even know.

I see.
This context has shifted my thinking. After reviewing all of it, your on the money with a kernel issue. Just had to rule out the basics given the unique situation you’re in.

Hmm. Assuming you’ve been using the off the shelf kernel, lts mainly which is just a fork of the OG kernel.
It works.
But your issue in particular is with the DE more so than system stability.

Also, looking at one of your desktop configurations I notice you have xserver disabled but Wayland with plasma6 enabled, I imagine a similar thing with the gnome install.
Let’s assume for the sake of argument the kernel is aight, Wayland by itself is inherently unstable and unsupported by a lot. A lot of backports have been made for legacy x11 code but most of those backends need the x11 server itself running.
Wayland + xorg make up xWayland which blends elements of both.
Set your xserver to enable,

This is how mine is set actually. Use mine as a template. This should be a rather quick rebuild and reboot but if this don’t work I have a custom kernel snip I can also share with you.
On phone, won’t let me code quote for some reason so sorry for bad formatting.

services.xserver.enable = true;
services.displayManager.sddm.autoNumlock = true;
services.displayManager.sddm.enable = true;
services.displayManager.sddm.wayland.enable = true;
services.desktopManager.plasma6.enable = true;
services.displayManager.defaultSession = “plasma”;
services.displayManager.autoLogin.user = “jason”;
hardware.opengl = {
enable = true;
extraPackages = with pkgs; [libvdpau-va-gl];

};

services.xserver.videoDrivers = [“nvidia”];

You’ll want to swap to mesa or nouvea if you have an amd gpu but should work with an Nvidia card