Chroot 23.11 within 22.11 bare-metal because Nvidia

What makes me think that? Well, for one, it doesn’t work. It doesn’t even start an X session with 23.11. And on Gentoo, legacy 390 blocks kernels > 5.15.
I reckon my best shot is to work out (or be told) how to hold the kernel version down.

1 Like

What’s the error in the journal?

boot.kernelPackages = pkgs.linuxPackages_5_15;
1 Like

I tried 23.11 with downgraded kernel and legacy Nvidia drivers; my config. is here.
The machine boots and eventually goes to a blank laptop display with just a solid (not blinking) text cursor at the top left. From /var/log/lightdm here are x-0.log and lightdm.log. x-0.log really does end with the line (EE) and nothing else.
The keyboard doesn’t work and ctrl-alt-F1 etc. do nothing; I had to ssh in to get the above information.
Any ideas?

1 Like

As mentioned, take a look into the journal.

The lightdm log indicates that X segfaulted which would also explain the empty last log line. You should be able to see that happen in the journal and kernel log.

You could try to get a backtrace using coredumptctl after it crashes to further analyse but I do not believe this is something you could do anything about. You’re using a known troublesome hardware configuration (optimus) in combination with an EOL driver that was hacked up to work with somewhat modern kernels. Don’t expect it to work.

One more thing you could try is to use an even older kernel. Perhaps the oldest one we have and maybe one in the middle.

Yep, as you suggest: coredump.

I tried downgrading to 5_10 – same problem.

4_14 was the only other choice but when I ran nixos-rebuild switch:

error: linux 4.14 was removed because it will reach its end of life within 23.11

But never mind, because as it happens I have a nominally identical laptop running 22.11 and it works. I just checked on that: kernel 5.15, nvidia 390.151. This 23.11 is trying to install nvidia 390.157. That minor increment should not make a difference but I’d like to try 151 on this 23.11. How do I hold the minor version number down? I see from here that legacy_390 is just a package or short-hand reference of some sort but I’m not yet sufficiently familiar with Nix’s specification DSL to know how to use the declaration there as the basis of my own with a lower minor version number.

1 Like

Easiest is probably to revert those update commits touching the nvidia driver in a local Nixpkgs checkout of 23.11.

1 Like

Now I am highly perplexed. I’ve made /etc/nixos/* as similar as possible to those on my working system (differences are stuff like filesystem UUIDs, version of PHP and required changes to users.users.mounty.

It still boots to a blank screen with a steady text cursor, as described above. The files under /var/log/lightdm are the same, including the final (EE). But there is no coredump in journalctl -e. That’s the only difference. No coredump, everything else the same. I am perplexed.

The only other difference is that the non-working system pulls in the Nvidia drivers version 390.157 and the old working system has 390.151. It could be that, or maybe it’s the version of GCC that’s used to build … something, during the nixos-rebuild run.

1 Like

I’m going to try a 23.11 chroot within a 22.11 bare metal. It seems there is no other way.

1 Like

Have you given this a shot? That will really be significantly easier IMO

Like, just do a git clone https://github.com/nixos/nixpkgs, get in there with git blame on the nvidia package file to figure out which commit did that (or use the GitHub UI if that’s too difficult), git reset the commit away, and then build your config from that checkout instead of the channel (I think nixos-rebuild -I nixpkgs=/path/to/repo does the trick).

Periodically rebase against upstream to get updates. NixOS being easily manipulated via git is like half the reason to use the distro over alternatives.

No cursed 22.11 + 23.11 mix that will barely work anyway, and your host gets to have actual security updates.

The crux really is knowing that the nvidia driver version is the problem, and that an older one works. Long-term you can consider just overriding the nvidia package in your configuration so you can just use channels as normal.

Well I suppose the answer is that I know chroot, but I don’t (yet) understand playing with Nix repositories. But I’m certainly not happy with a chroot solution so I’ll take the time to understand your recommendation. Thank you.

Wait. I don’t think this is going to work. Did you see Atemu’s comment in another post?
TL;DR Nvidia drivers 390.151 + kernel 5.15 + 22.11 do work: Nvidia drivers 390.157 + kernel 5.15 + 23.11 do not work.
That is why I think I need a chroot system; because 23.11 uses a newer and likely Nvidia-breaking glibc.
So unless it is possible to hold 23.11 down to glibc 2.35 (or possibly >=2.x and <2.38) I don’t think this is going to work.
If you think it is, where should I start? Can I install 23.11 and hold the version of glibc down?

1 Like

I updated to 23.05 (glibc 2.37) as a starting-point. The display seems to work alright but when IntelliJ IDEA is the only desktop app. running, the .cinnamon-wrapp process is consuming around 80% of one core, according to top, which compares unfavourably with 0.2% on the 22.11 machine.
Is it possible to move to 23.11 but hold glibc down to 2.37?
I need more help to understand how to do this with repo. manipulation. Is there a more detailed guide anywhere?

1 Like

In theory, yes. You can revert this commit: glibc: 2.37-39 -> 2.38-0 · NixOS/nixpkgs@e861529 · GitHub

In practice, this will mean running an older glibc than what anything has been tested against. You may run into issues with other repackaged binaries. You will also need to compile everything downstream (changing glibc is serious business), which will be painful on a decade-old laptop (for this, using another machine to build and deploying remotely, or using something like peerix may help, but this will probably remain very painful even with a faster build host).

I think you’d be best served by any generic git tutorial (no idea if that one in particular is good, I learned this stuff by osmosis), but I’ll try to take some time writing up a more detailed guide.

1 Like

I think this has run long enough. Even NixOS cannot make software work that is not designed for the situation. I’ll continue to use the laptops with the built-in Intel display, but get something else for multi-screen usage. Thanks Tlater and Atemu for your help but let’s draw a line under this.
BTW, Nouveau isn’t coming to the rescue. There’s been no significant movement on them for almost three years.

That might still be true but note that the xf86 driver is not really all that relevant here as this sort of thing is handled in the kernel driver. Nouveau should work with the generic modesetting driver. Are you using that?

Here’s what I have. The two external displays (Nvidia) are working but the laptop display is blank with a text-mode cursor at top left.

{ config, pkgs, lib, ... }:
{
	services.xserver.videoDrivers = [ "nouveau" ];
	hardware.nvidia.modesetting.enable = true;
}

It seems so close but how do I get the laptop display working too?

  • I did reboot after nixos-rebuild* but it didn’t help.
  • hardware.intel.modesetting.enable = true was rejected by nixos-rebuild.
  • services.xserver.videoDrivers = [ "nouveau" "intel" ]; leaves all screens blank after reboot.
  • services.xserver.videoDrivers = [ "nouveau" "kvm-intel" ]; is the same as without "kvm-intel"; i.e., just the two external displays are working and the laptop display is blank with a text-mode cursor at top left.

I am obviously just floundering around and guessing.

Just set services.xserver.videoDrivers to [ "modesetting" ] or reset it back to default.

That just switches the display to the laptop and both external screens are now blank.
The configuration is here; is there anything obviously wrong with it?

Not that I know of but I’m not an authority on nouveau on old hardware. I’d ask more specialised circles if you want to pursue nouveau. IIRC nouveau developers have an IRC channel somewhere.

This is a dual-graphic laptop, right? Is it somehow possible to disable the Nvidia GPU entirely in the firmware settings and only use the Intel one? That one will give you a lot less trouble.

I fixed it. The fix: logging in.

The display manager was using the laptop display only, but when I logged-in, all three screens were used.

It’s nearly there. If I use settings / display to arrange the screens according to their physical layout, it induces a lot of flickering when windows are moved. I think I can get used to the layout not matching physical.

So thanks again Atemu and Tlater; the laptop is useable with 23.11.

[Later] I fixed the 2560 by 1600 resolution issue (I was using DVI not dual-link) and the setup works fine but when I try a 4K and a 1920 by 1080, it is totally weird. Cinnamon Settings / Display shows the two external displays correctly but they are not driven. A monitor-generated NO INPUT message appears on the 4K. Additionally, the 4K has buttons to set the input source (DP / HDMI1 / HDMI2) but these do nothing whilst the 23.11 is connected. If I try to set the display layout in Settings / Display, the laptop becomes pretty much unresponsive although the HDD active LED flashes occasionally. The power button does not initiate a shutdown. I can ssh in and shut down that way. The total workaround is to reduce the resolution of the 4K in Settings / Display down to 2560 by 1600 which looks rubbish but at least it works.
As I’ve no idea what causes the bug, I’ve no idea if it’s something that might be fixed in time. At this stage, I’m relieved to be able to move on to 23.11.