Nvidia 390 driver not working

I’m having a bad time getting the nvidia proprietary driver to work. Normally I boot it up using nouveau, and this works fine. But I wanted to use the proprietary driver for performance, because for some reason nouveau just uses the intel GPU.

I am having a laptop supporting Optimus, with both an intel GPU and an nvidia GPU.

$ lspci -v | grep VGA
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if
00 [VGA controller])
01:00.0 VGA compatible controller: NVIDIA Corporation GF119M [Quadro NVS 4200M] (rev a1) (prog-if 00 [VGA controller])

I started trying with the latest nvidia driver. When rebooting X didn’t want to start, however looking at dmesg I had a nice error message saying that my GPU is only supported by the 390 legacy drivers. Thus I switched to that.

Here is my current attempt:

        services.xserver.videoDrivers = [ "nvidia" ];

        hardware.nvidia = {
          modesetting.enable = true;
          nvidiaSettings = true;
          package = config.boot.kernelPackages.nvidiaPackages.legacy_390;
          prime = {
            intelBusId = "PCI:0:2:0";
            nvidiaBusId = "PCI:1:0:0";
            sync.enable = true;
          };
        };

        nixpkgs.config.nvidia.acceptLicense = true;

And it doesn’t work, I have no idea why. The system boots normally until reaching the display manager (lightdm), and there it just shows a black screen. At this point my system is completely frozen, I can’t switch to TTY, the cursor is not blinking on the screen, the only thing I can do is shutting down my computer.

Here’s the X log when this is happening: https://0x0.st/HEsQ.log

I’ve tried a lot of things to make it work, but I didn’t. Disabling modesetting, the i915 driver, using rcutree.rcu_idle_gp_delay=1 as kernel parameter, removing sync.enable = true from my config, setting modesetting.enable = false, using bumblebee instead, nothing works.
If I remove sync.enable, X just fails to start because the nvidia driver isn’t configured for Optimus in that case, which plainly fails. Fiddling with kernel parameters and modesetting didn’t change anything. Using bumblebee just didn’t work, it allowed me to boot as I usually do with nouveau but when using optirun it says it can’t find the GPU. So I think my current config is the closest thing to something working, if it weren’t for that freeze !

Does anyone have an idea to fix this freeze ?

Prime offload is only supported since ~435 (presumably the feature had driver support on Windows before that, which is why your GPU advertises it). Sync should be supported by your GPU though.

There’s no freeze; X does not detect any displays to render on, so it doesn’t render anything. I wonder if it’d work if you connected an external display, presumably your laptop display is connected to the intel GPU and the whole prime sync thing just isn’t working:

[ 69.812] (–) NVIDIA(0): No enabled display devices found; starting anyway because
[ 69.812] (–) NVIDIA(0): AllowEmptyInitialConfiguration is enabled

What’s in services.xserver.videoDrivers? Also, you almost definitely have tried this, but just to be sure since at least two people around here have failed at the basic step: did you double check the bus IDs are correct?

Also, to confirm my theory that this isn’t a freeze, have you attempted switching to a different tty? X says it’s on tty 7, so using e.g. Ctrl+Alt+F2 should bring up a terminal.

Edit: Ah, interesting, reading through the nvidia manual on the topic - the missing display is expected. Some xrandr magic needs to happen at runtime, this is done in the display manager setup for the NixOS module: nixpkgs/nixos/modules/hardware/video/nvidia.nix at 0b3d618173114c64ab666f557504d6982665d328 · NixOS/nixpkgs · GitHub

That means if you’re e.g. running greetd this probably won’t happen, and therefore not work at all. Same if you cobbled together your own display manager init. I’m also doubtful if this is even possible if you run exclusively wayland. What DE/DM do you run?

I didn’t try offload or reverse sync because I saw from the nixos wiki that those were not supported for my driver, so I only focused on sync.

It is a freeze, as I mentioned above “I can’t switch to TTY”, doing Ctrl+Alt+F2 or whatever other FX with X between 1 and 6 does nothing. I can only press the power button to shutdown my computer. Also for some reason switching my backlight on and off works, probably because this is handled by the BIOS.

Forgot to say that services.xserver.videoDrivers = [ "nvidia" ], I’ve edited my post too to include it. I saw by reading the nvidia module in nixpkgs that you must have “nvidia” in your videoDrivers for the module to do anything, and it was also mentioned on the wiki.

I’m using lightdm as display manager, allthough it’s a bit weird I must say, because I’ve never explicitly enabled it, it’s just that I set some parameters for displayManager and by default I get that.

Let me show you:

          services = {
                  xserver = {
                          enable = true;
                          layout = "us";
                          xkbVariant = "";
                          libinput.enable = true;
                          displayManager = {
                                  defaultSession = "none+dwm";
                                  sessionCommands = ''
                                          <some shell commands>
                                  '';
                          };
                          windowManager.dwm.enable = true;
                  };

Except for setting elsewhere services.xserver.videoDrivers I don’t touch services.xserver elsewhere.

Fair, I skimmed past a few things there.

It’s the default if you don’t specify anything else. I think that’s set by one of the profiles if I recall correctly?

Other than that, yeah, this sounds like a pain. Anything from the general journal (i.e. not just X logs)? Is your caps lock flashing by any chance?

My caps lock isn’t flashing, so it’s not a kernel panic.

I’ve tried experimenting a bit more by using startx as “DM”. When running my system that way, here’s what I get in dmesg: https://0x0.st/HEHd.log

When I run startx, I get the freeze as described above. Now when I shutdown it’s not a hard shutdown, it’s not that I press the button for 5sec and it instant shuts down. It’s a soft shutdown, I press the button for 1sec and I can even see for half a second the “console” of systemd which logs the services it stops before my computer shuts down.

I had trouble getting the journal from a previous boot, probably because I’ve tried fetching it from another generation instead of the same generation. So here is the complete journal, including when the freeze occurs: https://0x0.st/HEH5.log

Now notice line 1550

déc. 27 14:19:05 illumination systemd[1]: Created slice Slice /system/systemd-coredump.
déc. 27 14:19:05 illumination systemd[1]: Started Process Core Dump (PID 3642/UID 0).
déc. 27 14:19:05 illumination systemd-coredump[3643]: Resource limits disable core dumping for process 3638 (X).
déc. 27 14:19:05 illumination systemd-coredump[3643]: Process 3638 (X) of user 0 terminated abnormally without generating a coredump.
déc. 27 14:19:05 illumination systemd[1]: systemd-coredump@0-3642-0.service: Deactivated successfully.

Turns out that when I start X it gets in fact killed. And then I get a freeze. This is probably because X is in the middle of setting up thing, capturing my screen and keyboard, and suddenly terminates without cleaning up. Now my problem is, why is the nvidia driver making it crash like that ?

Also searching for nvidia, look at line 929 and 936, there are some related errors.

déc. 27 14:17:07 illumination (udev-worker)[813]: nvidia: Process '/nix/store/q1c2flcykgr4wwg5a6h450hxbk4ch589-bash-5.2-p15/bin/bash -c 'mknod -m 666 /dev/nvidiactl c 195 255'' failed with exit code 1.

and

déc. 27 14:17:08 illumination (udev-worker)[813]: nvidia: Process '/nix/store/q1c2flcykgr4wwg5a6h450hxbk4ch589-bash-5.2-p15/bin/bash -c 'for i in $(cat /proc/driver/nvidia/gpus/*/information | grep Minor | cut -d \  -f 4); do mknod -m 666 /dev/nvidia${i} c 195 ${i}; done'' failed with exit code 1.

I don’t know if these are somehow related ?

I am pretty sure those are red herrings, I’ve been seeing the same on my system for years (with no X coredumps to report).

Have you tried wayland for the sake of argument? At the very least you might get a readable error?

Oh very good idea. I didn’t know anything about Wayland, so after looking up a bit I managed to get sway running. Had to run it with sway --unsupported-gpu. Checking with glxinfo it’s using my Intel GPU, so looking up I have to set these variables:

export GBM_BACKEND=nvidia-drm
export __GL_GSYNC_ALLOWED=0
export __GL_VRR_ALLOWED=0
export __GLX_VENDOR_LIBRARY_NAME=nvidia

now running glxinfo I get something like “stack smashing detected ! aborting”. The actually interesting part is by looking at the journal, this one:

déc. 27 16:17:51 illumination systemd[1]: Started Process Core Dump (PID 8826/UID 0).
déc. 27 16:17:52 illumination systemd-coredump[8827]: Process 8824 (glxinfo) of user 1000 dumped core.

                                                       Module libnvidia-glcore.so.390.157 without build-id.
                                                       Module libnvidia-tls.so.390.157 without build-id.
                                                       Module libGLX_nvidia.so.0 without build-id.
                                                       Module libXdmcp.so.6 without build-id.
                                                       Module libXau.so.6 without build-id.
                                                       Module libxcb.so.1 without build-id.
                                                       Module libGLdispatch.so.0 without build-id.
                                                       Module libXext.so.6 without build-id.
                                                       Module libGLX.so.0 without build-id.
                                                       Module libX11.so.6 without build-id.
                                                       Module libGL.so.1 without build-id.
                                                       Module glxinfo without build-id.
                                                       Stack trace of thread 8824:
                                                       #0  0x00007f1c90681d7c __pthread_kill_implementation (libc.so.6 + 0x8cd7c)
                                                       #1  0x00007f1c906329c6 raise (libc.so.6 + 0x3d9c6)
                                                       #2  0x00007f1c9061b8fa abort (libc.so.6 + 0x268fa)
                                                       #3  0x00007f1c9061c767 __libc_message.cold (libc.so.6 + 0x27767)
                                                       #4  0x00007f1c907107f9 __fortify_fail (libc.so.6 + 0x11b7f9)
                                                       #5  0x00007f1c90711aa4 __stack_chk_fail (libc.so.6 + 0x11caa4)
                                                       #6  0x00007f1c9067bdf5 _dlerror_run (libc.so.6 + 0x86df5)
                                                       #7  0x00007f1c9067c271 dlopen@GLIBC_2.2.5 (libc.so.6 + 0x87271)
                                                       #8  0x00007f1c905c9c9a __glXLookupVendorByName (libGLX.so.0 + 0x8c9a)
                                                       #9  0x00007f1c905c44b8 __glXInit (libGLX.so.0 + 0x34b8)
                                                       #10 0x00007f1c909bcebe call_init (ld-linux-x86-64.so.2 + 0x4ebe)
                                                       #11 0x00007f1c909bcfac _dl_init (ld-linux-x86-64.so.2 + 0x4fac)
                                                       #12 0x00007f1c909d2f50 _dl_start_user (ld-linux-x86-64.so.2 + 0x1af50)
                                                       ELF object binary architecture: AMD x86-64
déc. 27 16:17:52 illumination systemd[1]: systemd-coredump@6-8826-0.service: Deactivated successfully.

So anything involving the nvidia driver and GLX crashes. Um, how am I supposed to fix this ?

EDIT: I also ran something that needs vulkan to work, that also crashes with the same error message, and with following core dump:

déc. 27 16:07:03 illumination systemd[1]: Started Process Core Dump (PID 6443/UID 0).
déc. 27 16:07:03 illumination systemd-coredump[6444]: Process 6438 (.serioussam-wra) of user 1000 dumped core.

                                                       Module libnvidia-glcore.so.390.157 without build-id.
                                                       Module libnvidia-tls.so.390.157 without build-id.
                                                       Module libGLX_nvidia.so without build-id.
                                                       Module libudev.so.1 without build-id.
                                                       Module libmp3lame.so.0 without build-id.
                                                       Module libmpg123.so.0 without build-id.
                                                       Module libogg.so.0 without build-id.
                                                       Module libopus.so.0 without build-id.
                                                       Module libvorbisenc.so.2 without build-id.
                                                       Module libvorbis.so.0 without build-id.
                                                       Module libFLAC.so.12 without build-id.
                                                       Module libsndfile.so.1 without build-id.
                                                       Module libpulsecommon-16.1.so without build-id.
                                                       Module libpulse.so.0 without build-id.
                                                       Module libpulse-simple.so.0 without build-id.
                                                       Module libcap.so.2 without build-id.
                                                       Module libsystemd.so.0 without build-id.
                                                       Module libdbus-1.so.3 without build-id.
                                                       Module libXdmcp.so.6 without build-id.
                                                       Module libXau.so.6 without build-id.
                                                       Module libXrender.so.1 without build-id.
                                                       Module libxcb.so.1 without build-id.
                                                       Module libXss.so.1 without build-id.
                                                       Module libXrandr.so.2 without build-id.
                                                       Module libXfixes.so.3 without build-id.
                                                       Module libXi.so.6 without build-id.
                                                       Module libXcursor.so.1 without build-id.
                                                       Module libXext.so.6 without build-id.
                                                       Module libX11.so.6 without build-id.
                                                       Module libgcc_s.so.1 without build-id.
                                                       Module libstdc++.so.6 without build-id.
                                                       Module libvulkan.so.1 without build-id.
                                                       Module libz.so.1 without build-id.
                                                       Module libSDL2-2.0.so.0 without build-id.
                                                       Module libEngine.so without build-id.
                                                       Module .serioussam-wrapped without build-id.
                                                       Stack trace of thread 6438:
                                                       #0  0x00007f2e890a4d7c __pthread_kill_implementation (libc.so.6 + 0x8cd7c)
                                                       #1  0x00007f2e890559c6 raise (libc.so.6 + 0x3d9c6)
                                                       #2  0x00007f2e8903e8fa abort (libc.so.6 + 0x268fa)
                                                       #3  0x00007f2e8903f767 __libc_message.cold (libc.so.6 + 0x27767)
                                                       #4  0x00007f2e891337f9 __fortify_fail (libc.so.6 + 0x11b7f9)
                                                       #5  0x00007f2e89134aa4 __stack_chk_fail (libc.so.6 + 0x11caa4)
                                                       #6  0x00007f2e8909edf5 _dlerror_run (libc.so.6 + 0x86df5)
                                                       #7  0x00007f2e8909f271 dlopen@GLIBC_2.2.5 (libc.so.6 + 0x87271)
                                                       #8  0x00007f2e89d71d04 loader_scanned_icd_add (libvulkan.so.1 + 0x2ad04)
                                                       #9  0x00007f2e89d76502 loader_icd_scan (libvulkan.so.1 + 0x2f502)
                                                       #10 0x00007f2e89d806e1 vkCreateInstance (libvulkan.so.1 + 0x396e1)
                                                       #11 0x00007f2e89b41832 _ZN11CGfxLibrary8InitAPIsEv (libEngine.so + 0x141832)
                                                       #12 0x00007f2e89ad4149 _Z13SE_InitEnginePKc8CTString (libEngine.so + 0xd4149)
                                                       #13 0x0000000000421ca0 _Z4InitPvi8CTString (.serioussam-wrapped + 0x21ca0)
                                                       #14 0x0000000000422907 _Z7SubMainPvS_Pci (.serioussam-wrapped + 0x22907)
                                                       #15 0x000000000042375c _Z14CommonMainlinePvS_Pci (.serioussam-wrapped + 0x2375c)
                                                       #16 0x000000000041b0f1 main (.serioussam-wrapped + 0x1b0f1)
                                                       #17 0x00007f2e8903ffce __libc_start_call_main (libc.so.6 + 0x27fce)
                                                       #18 0x00007f2e89040089 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x28089)
                                                       #19 0x000000000041db55 _start (.serioussam-wrapped + 0x1db55)

                                                       Stack trace of thread 6440:
                                                       #0  0x00007f2e8909fc96 __futex_abstimed_wait_common (libc.so.6 + 0x87c96)
                                                       #1  0x00007f2e890aab0b __new_sem_wait_slow64.constprop.0 (libc.so.6 + 0x92b0b)
                                                       #2  0x00007f2e89787c37 SDL_SemWaitTimeout_REAL (libSDL2-2.0.so.0 + 0x187c37)
                                                       #3  0x00007f2e896c1df7 SDL_TimerThread (libSDL2-2.0.so.0 + 0xc1df7)
                                                       #4  0x00007f2e896c1736 SDL_RunThread (libSDL2-2.0.so.0 + 0xc1736)
                                                       #5  0x00007f2e897876e9 RunThread (libSDL2-2.0.so.0 + 0x1876e9)
                                                       #6  0x00007f2e890a3084 start_thread (libc.so.6 + 0x8b084)
                                                       #7  0x00007f2e8912560c __clone3 (libc.so.6 + 0x10d60c)

                                                       Stack trace of thread 6439:
                                                       #0  0x00007f2e891182e6 ppoll (libc.so.6 + 0x1002e6)
                                                       #1  0x00007f2e88bd2089 pa_mainloop_poll (libpulse.so.0 + 0x28089)
                                                       #2  0x00007f2e88bd2666 pa_mainloop_iterate (libpulse.so.0 + 0x28666)
                                                       #3  0x00007f2e88bd2710 pa_mainloop_run (libpulse.so.0 + 0x28710)
                                                       #4  0x00007f2e897443cf HotplugThread (libSDL2-2.0.so.0 + 0x1443cf)
                                                       #5  0x00007f2e896c1736 SDL_RunThread (libSDL2-2.0.so.0 + 0xc1736)
                                                       #6  0x00007f2e897876e9 RunThread (libSDL2-2.0.so.0 + 0x1876e9)
                                                       #7  0x00007f2e890a3084 start_thread (libc.so.6 + 0x8b084)
                                                       #8  0x00007f2e8912560c __clone3 (libc.so.6 + 0x10d60c)
                                                       ELF object binary architecture: AMD x86-64
déc. 27 16:07:03 illumination systemd[1]: systemd-coredump@2-6443-0.service: Deactivated successfully.

Good question! Can’t even throw this at nvidia since they don’t support this driver anymore.

Seems there was a similar discussion over on the arch forums near the start of this year: (resolved) DKMS: nvidia 390.157 rebuild causing xorg to crash / AUR Issues, Discussion & PKGBUILD Requests / Arch Linux Forums

Looks very similar - maybe you can try to reproduce their workaround?

It didn’t work. I’m not sure what my proper libnvidia-tls.so is, so I took a guess by doing ls /nix/store/*nvidia* … In sway when I did LD_PRELOAD=/nix/store/89dnx7scyz2pbpa61s440k10sywkvwl0-nvidia-x11-390.157-6.1.68/lib/libnvidia-tls.so glxinfo it was still crashing. Strangely though, doing LD_PRELOAD=/nix/store/89dnx7scyz2pbpa61s440k10sywkvwl0-nvidia-x11-390.157-6.1.68/lib/tls/libnvidia-tls.so glxinfo did not (notice the lib/tls/… instead of just lib/…), but glxinfo failed with a strange message with “Bad Value”. I’ve tried the same thing with the vulkan program, and it dumped core:

déc. 27 17:33:41 illumination systemd[1]: Started Process Core Dump (PID 8751/UID 0).
déc. 27 17:33:42 illumination systemd-coredump[8752]: [🡕] Process 8727 (.serioussam-wra) of user 1000 dumped core.

                                                       Module libnvidia-glcore.so.390.157 without build-id.
                                                       Module libGLX_nvidia.so.0 without build-id.
                                                       Module libGLdispatch.so.0 without build-id.
                                                       Module libGLX.so.0 without build-id.
                                                       Module libGL.so.1 without build-id.
                                                       Module libGame.so without build-id.
                                                       Module libEntities.so without build-id.
                                                       Module libamp11lib.so without build-id.
                                                       Module libvorbisfile.so.3 without build-id.
                                                       Module libxml2.so.2 without build-id.
                                                       Module libncursesw.so.6 without build-id.
                                                       Module libffi.so.8 without build-id.
                                                       Module libudev.so.1 without build-id.
                                                       Module libmp3lame.so.0 without build-id.
                                                       Module libmpg123.so.0 without build-id.
                                                       Module libogg.so.0 without build-id.
                                                       Module libopus.so.0 without build-id.
                                                       Module libvorbisenc.so.2 without build-id.
                                                       Module libvorbis.so.0 without build-id.
                                                       Module libFLAC.so.12 without build-id.
                                                       Module libsndfile.so.1 without build-id.
                                                       Module libpulsecommon-16.1.so without build-id.
                                                       Module libpulse.so.0 without build-id.
                                                       Module libpulse-simple.so.0 without build-id.
                                                       Module libcap.so.2 without build-id.
                                                       Module libsystemd.so.0 without build-id.
                                                       Module libdbus-1.so.3 without build-id.
                                                       Module libXdmcp.so.6 without build-id.
                                                       Module libXau.so.6 without build-id.
                                                       Module libXrender.so.1 without build-id.
                                                       Module libxcb.so.1 without build-id.
                                                       Module libXss.so.1 without build-id.
                                                       Module libXrandr.so.2 without build-id.
                                                       Module libXfixes.so.3 without build-id.
                                                       Module libXi.so.6 without build-id.
                                                       Module libXcursor.so.1 without build-id.
                                                       Module libXext.so.6 without build-id.
                                                       Module libX11.so.6 without build-id.
                                                       Module libgcc_s.so.1 without build-id.
                                                       Module libstdc++.so.6 without build-id.
                                                       Module libvulkan.so.1 without build-id.
                                                       Module libz.so.1 without build-id.
                                                       Module libSDL2-2.0.so.0 without build-id.
                                                       Module libEngine.so without build-id.
                                                       Module libnvidia-tls.so without build-id.
                                                       Module .serioussam-wrapped without build-id.
                                                       Stack trace of thread 8727:
                                                       #0  0x00007fc49b0fdf32 glXGetCurrentContext (libGLX.so.0 + 0x3f32)
                                                       #1  0x00007fc4a094fdec X11_GL_LoadLibrary (libSDL2-2.0.so.0 + 0x14fdec)
                                                       #2  0x00007fc4a0913a06 SDL_GL_LoadLibrary_REAL (libSDL2-2.0.so.0 + 0x113a06)
                                                       #3  0x00007fc4a0d728eb _ZN11CGfxLibrary14InitDriver_OGLEi (libEngine.so + 0x1728eb)
                                                       #4  0x00007fc4a0d44b3b _ZN11CGfxLibrary16StartDisplayModeE10GfxAPITypeiii12DisplayD>
                                                       #5  0x00007fc4a0d44cb9 _ZN11CGfxLibrary14SetDisplayModeE10GfxAPITypeiii12DisplayDep>
                                                       #6  0x00000000004215f5 _Z19TryToSetDisplayMode10GfxAPITypeiiiii12DisplayDepthi (.se>
                                                       #7  0x0000000000421a5c _Z12StartNewMode10GfxAPITypeiiiii12DisplayDepthi (.serioussa>
                                                       #8  0x00000000004224c4 _Z4InitPvi8CTString (.serioussam-wrapped + 0x224c4)
                                                       #9  0x0000000000422907 _Z7SubMainPvS_Pci (.serioussam-wrapped + 0x22907)
                                                       #10 0x000000000042375c _Z14CommonMainlinePvS_Pci (.serioussam-wrapped + 0x2375c)
                                                       #11 0x000000000041b0f1 main (.serioussam-wrapped + 0x1b0f1)
                                                       #12 0x00007fc4a023ffce __libc_start_call_main (libc.so.6 + 0x27fce)
                                                       #13 0x00007fc4a0240089 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x28089)
                                                       #14 0x000000000041db55 _start (.serioussam-wrapped + 0x1db55)

                                                       Stack trace of thread 8748:
                                                       #0  0x00007fc4a03182e6 ppoll (libc.so.6 + 0x1002e6)
                                                       #1  0x00007fc4a06c8089 pa_mainloop_poll (libpulse.so.0 + 0x28089)
                                                       #2  0x00007fc4a06c8666 pa_mainloop_iterate (libpulse.so.0 + 0x28666)
                                                       #3  0x00007fc4a0944172 PULSEAUDIO_PlayDevice (libSDL2-2.0.so.0 + 0x144172)
                                                       #4  0x00007fc4a083428d SDL_RunAudio (libSDL2-2.0.so.0 + 0x3428d)
                                                       #5  0x00007fc4a08c1736 SDL_RunThread (libSDL2-2.0.so.0 + 0xc1736)
                                                       #6  0x00007fc4a09876e9 RunThread (libSDL2-2.0.so.0 + 0x1876e9)
                                                       #7  0x00007fc4a02a3084 start_thread (libc.so.6 + 0x8b084)
                                                       #8  0x00007fc4a032560c __clone3 (libc.so.6 + 0x10d60c)

                                                       Stack trace of thread 8738:
                                                       #0  0x00007fc4a03182e6 ppoll (libc.so.6 + 0x1002e6)
                                                       #1  0x00007fc4a06c8089 pa_mainloop_poll (libpulse.so.0 + 0x28089)
                                                       #2  0x00007fc4a06c8666 pa_mainloop_iterate (libpulse.so.0 + 0x28666)
                                                       #3  0x00007fc4a06c8710 pa_mainloop_run (libpulse.so.0 + 0x28710)
                                                       #4  0x00007fc4a09443cf HotplugThread (libSDL2-2.0.so.0 + 0x1443cf)
                                                       #5  0x00007fc4a08c1736 SDL_RunThread (libSDL2-2.0.so.0 + 0xc1736)
                                                       #6  0x00007fc4a09876e9 RunThread (libSDL2-2.0.so.0 + 0x1876e9)
                                                       #7  0x00007fc4a02a3084 start_thread (libc.so.6 + 0x8b084)
                                                       #8  0x00007fc4a032560c __clone3 (libc.so.6 + 0x10d60c)

                                                       Stack trace of thread 8741:
                                                       #0  0x00007fc4a029fc96 __futex_abstimed_wait_common (libc.so.6 + 0x87c96)
                                                       #1  0x00007fc4a02aab0b __new_sem_wait_slow64.constprop.0 (libc.so.6 + 0x92b0b)
                                                       #2  0x00007fc4a0987c37 SDL_SemWaitTimeout_REAL (libSDL2-2.0.so.0 + 0x187c37)
                                                       #3  0x00007fc4a08c1df7 SDL_TimerThread (libSDL2-2.0.so.0 + 0xc1df7)
                                                       #4  0x00007fc4a08c1736 SDL_RunThread (libSDL2-2.0.so.0 + 0xc1736)
                                                       #5  0x00007fc4a09876e9 RunThread (libSDL2-2.0.so.0 + 0x1876e9)
                                                       #6  0x00007fc4a02a3084 start_thread (libc.so.6 + 0x8b084)
                                                       #7  0x00007fc4a032560c __clone3 (libc.so.6 + 0x10d60c)
                                                       ELF object binary architecture: AMD x86-64
déc. 27 17:33:42 illumination systemd[1]: systemd-coredump@12-8751-0.service: Deactivated successfully.

Trying to startx with this LD_PRELOAD just froze as usual.

EDIT: also the problem the arch folks faced were related to switching from glibc 2.36 to 2.37. But in that stacktrace I’m seeing __libc_start_main@@GLIBC_2.34 so I would suspect I’m using 2.34 ?

I wouldn’t totally discount it, NixOS 23.11’s default glibc is supposed to be 2.38. This would mean that you’re either mixing different nixpkgs versions (and for 2.34 one of them would be quite old), or the glibc version denoted there is whatever nvidia compiled their driver against.

Have you tried the minimum reproduction shared here: (resolved) DKMS: nvidia 390.157 rebuild causing xorg to crash / AUR Issues, Discussion & PKGBUILD Requests / Arch Linux Forums

Might of course be a red herring, but at this point you’re trying to debug a segfault caused by a proprietary blob.

Well I got it working, thanks a lot !

I’ve tried the minimal reproduction. At first I was unsure how I could do it, but then I had the idea of setting LD_LIBRARY_PATH to where the nvidia libraries are located in the nix store, and indeed I got a stack smashing error with that. Then I tried the fix with LD_PRELOAD and … it fixed it. Thus I thought this can’t be, and in disbelief I wanted to try again to start X that way. And you will laugh … It didn’t work when I tested because of a stupid typo: I wrote D_PRELOAD instead of LD_PRELOAD, lol ! And once I corrected it, it finally worked ! Well I actually got a black screen, but without a freeze this time, and that was expected since it was a plain xinit, without doing the required setup. I then copied the setup commands from the generated lightdm-display-setup script in the nix store, and it worked ! glxinfo showed me that it’s using the nvidia GPU. So I suppose it didn’t work on wayland with a weird error because it was wayland and not X, while nvidia only supports X.

Now I just need a proper way to get this work with NixOS. Not sure how to proceed … We have 2 solutions from that arch topic: either have LD_PRELOAD set to the libnvidia-tls.so that’s in lib/tls, or delete the libnvidia-tls.so that is in lib and not in lib/tls (EDIT: the reverse, it must be in lib/tls and not in lib). But how to do it the NixOS way ? Will probably have to look at how nvidia drivers are packaged …

Otherwise do you happen to know of a temporary hack I could put in my configuration.nix ? Apart from having to use startx as I did that is, lol, instead by using lightdm

PS: yes I’m using NixOS 23.11 and it indeed uses glibc 2.38, but somehow next to it I see GLIBC_2.34 (the line:

81040089 __libc_start_main@@GLIBC_2.34 (/nix/store/qn3ggz5sf3hkjs2c797xf7nan3amdxmp-glibc-2.38-27/lib/libc.so.6 + 0x28089)

) so yes maybe the version the nvidia library was compiled with

Humm, this is tricky. Maybe it’s as simple as:

hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.legacy_390.overrideAttrs (old: {
  postInstall = ''
    mv $out/lib/tls/libnvidia-tls.so $out/lib/libnvidia-tls.so
  '';
});

Looking at the arch thread, that may in fact become the permanent solution.

I certainly don’t recommend fooling around with LD_PRELOAD semi-permanently on NixOS.

Been there, done that ;p Good thing you spotted it this time :slight_smile:

I’ve corrected my previous message, libnvidia-tls.so must be in lib/tls and not in lib according to the arch thread.

Overriding postInstall didn’t work, so I had a look at the package and it turns out they use a custom builder script which override the installPhase without calling the pre and post hooks … So I just took the postFixup instead. This:

	  package = config.boot.kernelPackages.nvidiaPackages.legacy_390.overrideAttrs (old: {
	    postFixup = ''
		  rm $out/lib/libnvidia-tls.so*
		'';
	  });

But it doesn’t work ! It lets me boot X without problems, however glxinfo doesn’t work as it did with the LD_PRELOAD trick. Sure enough from the X log:

[    62.538] (EE) Failed to load /nix/store/nh7q72zijdzzf0v29cbd15sg3jw88qb3-nvidia-x11-390.157-6.1.68-bin/lib/xorg/modules/extensions/libglx.so: libnvidia-tls.so.390.157: cannot open shared object file: No such file or directory
[    62.538] (EE) Failed to load module "glx" (loader failed, 0)
$ ldd /nix/store/nh7q72zijdzzf0v29cbd15sg3jw88qb3-nvidia-x11-390.157-6.1.68-bin/lib/xorg/modules/extensions/libglx.so
        linux-vdso.so.1 (0x00007ffe6a9bb000)
        libnvidia-tls.so.390.157 => not found
        libnvidia-glcore.so.390.157 => /nix/store/76n9p37a92vv7z4ygl4j7hzhi06xlwbx-nvidia-x11-390.157-6.1.68/lib/libnvidia-glcore.so.390.157 (0x00007eff8d000000)
        libc.so.6 => /nix/store/qn3ggz5sf3hkjs2c797xf7nan3amdxmp-glibc-2.38-27/lib/libc.so.6 (0x00007eff8ee18000)
        libdl.so.2 => /nix/store/qn3ggz5sf3hkjs2c797xf7nan3amdxmp-glibc-2.38-27/lib/libdl.so.2 (0x00007eff900d2000)
        libm.so.6 => /nix/store/qn3ggz5sf3hkjs2c797xf7nan3amdxmp-glibc-2.38-27/lib/libm.so.6 (0x00007eff8cf20000)
        /nix/store/qn3ggz5sf3hkjs2c797xf7nan3amdxmp-glibc-2.38-27/lib64/ld-linux-x86-64.so.2 (0x00007eff900db000)

So I will have to mess around with the nvidia-x11 package. But the override technique is a good one, I always forget that I can do this. Will check if I can override its lib, maybe with patchelf

Turns out you were right, after re-reading the arch thread the patch is to move the lib from $build/tls/libnvidia-tls.so to $out/lib/libnvidia-tls.so, which is what you do. This is what I ended up with:

          package = config.boot.kernelPackages.nvidiaPackages.legacy_390.overrideAttrs (old: {
            postFixup = ''
                  mv $out/lib/tls/* $out/lib
                  rmdir $out/lib/tls
                '';
          });

Now I just have to go and patch nvidia-x11 on NixOS. Thanks again for your help !

2 Likes