Using nvidia open-source drivers

As usual means leaving your current configuration as is, only adding the single line for open.

OK, I wasn’t certain because part of my current configuration seems to specify the proprietary drivers. I’ll just add the hardware.nvidia.open = true; declaration to that file.

You’d better check if your hardware is supported by the open source driver first (a full list is at GitHub - NVIDIA/open-gpu-kernel-modules: NVIDIA Linux open GPU kernel module source), that would likely be a no as you are now using the legacy_390 driver.

2 Likes

Yeah, the new drivers should not be usable for your GPU either. ooi, when you used nouveau, what was your configuration? It supposedly doesn’t use the same variables as nvidia to manage offload rendering, and I imagine the prime options are either noops or problematic.

I suspect by the symptoms that you ended up just using the nvidia GPU. What does glxinfo give you when using the nouveau driver?

Nouveau still doesn’t have powermanagement for anything though, you probably won’t have a fun time with it either. I think your best bet is just using the proprietary driver in sync mode, which I think it’s capable of (?).

As I’ve mentioned twice already, my current configuration is here.

Yep, I saw that, but that configuration isn’t using nouveau, and afaict there is no branch with nouveau either, so I can’t see what your configuration was when you were using it.

Looking at the nouveau documentation for optimus, and the NixOS implementation, NixOS currently just doesn’t support nouveau for offload rendering. It in fact blacklists the nouveau driver, which is probably why one of your displays isn’t working.

Sorry for my bluntness and thank you for your patience in the face of it. It is borne of frustration that Nvidia used to work perfectly, with 21.11, but for other reasons I could not just stay on that release.

The 22.11 Nouveau configuration is here. The fault is that it drives only the two external displays; the laptop display is blank, although with a text-mode blinking input caret at the top-left. Oh, and the 4k screen appears to be scaled to FHD, not in its native resolution.

The official closed-source drivers do kind-of work, but they suck CPU (the laptop fan runs constantly) and one external screen blinks every ten seconds or so. I can ameliorate the CPU load by renice on the X process but top doesn’t show anything else sucking CPU so it must be kernel + drivers.

If you’re completely stuck, maybe you can slowly bisect to see what upstream commit broke things? instead of going to 22.11 immediately, starting by going to 22.05, and then going only by commits (binary search style).

I don’t really have the time to get into that. I’ll just put up with it and try Nouveau every six months or so. Thanks again for your help.

I did find the time to install 22.05. It fixes the problem. 22.11 was almost workably slow but 22.05 works just fine.
TLater mentions bisecting. How does that work with NixOS, where so far as I am aware, the only method of updating is nixos-rebuild ?

You’ve effectively already started; bisecting is just the process of building various commits (in a binary search-like pattern) until you find the one that causes it, using some specific git tooling to help do this.

You can use nix-bisect for that. There’s an announcement for it here with some additional info: Nix-bisect -- Bisect Nix Builds

It seems to me it’s going to be difficult to use git-bisect because I can’t really write a test that git can find. I can only to bisect onto the problem commit myself, as I have to log in to the desktop and run it for a couple of minutes to work out whether or not the problem is present.
I have here a W520 running 22.05 and doing so well. I understand that rather than using nixos-rebuild switch --upgrade I can somehow use git commands to bisect between 22.05 and 22.11. Can you give me some pointers on that, since I was not even aware that nix packages were stored in a versioned repository.

I don’t think you necessarily need to use git here, you can just “upgrade” to intermediate unstable revisions between 22.05 and 22.11 to find the revision that broke the drivers for you.

Basically what nixos-rebuild switch --upgrade does is to upgrade the root channel before running nixos-rebuild switch normally. You can just set the channel to specific revisions, rebuild, and see if the drivers work or not.

Could you show your output of sudo nix-channel --list and nix-channel --list? Just so I can give you more correct instructions.

Also be aware that 23.05 was released in the meantime, so if you try upgrading once more, maybe you’re lucky and it’s fixed again?

Additionally, in case you don’t want to put in the effort into bisecting and just want a working system with the latest software otherwise, you can install older working drivers on a new release of NixOS as well.

1 Like

I did try 23.05 and the problem is not fixed, so here goes:

[mounty@pingala:~]$ sudo nix-channel --list
nixos https://nixos.org/channels/nixos-22.05

[mounty@pingala:~]$ nix-channel --list

[mounty@pingala:~]$

Thank you.

Eventually I am going to adopt this suggestion, but I’m hoping that if I can find the point at which the bug was introduced, that it might be possible to report it and get it fixed. That’s the right way to proceed.

Ok awesome, only the one channel so no ambiguity.

Finding and pinning NixOS releases

On hydra, you can see all the revisions for 22.05 and those for 22.11.
You can only see them in a paginated view because these take a while to generate, but that won’t be much of a problem. You installed 22.05 in May, so the revision you most likely installed is nixos-22.05.4692.50fc86b75d2, released on 2023-04-28

Let’s check whether that suspicion is correct first:

# Show contents of the .version and .version-suffix files
$ cat /nix/var/nix/profiles/per-user/root/channels/nixos/.version*
22.05
.4692.50fc86b75d2%

(% shows up in zsh to indicate that .version-suffix has no trailing newline, it is not important here)

There was a never revision released on 2023-06-01, so for demonstration purposes, let’s upgrade your system to that explicitly. In the job list, you can see a field “Package/release name”:

In this case, it’s nixos-22.05.4694.380be19fbd2. You can append this to https://releases.nixos.org/nixos/22.05/ and set your nixos channel to the resulting URL:

$ sudo nix-channel --add 'https://releases.nixos.org/nixos/22.05/nixos-22.05.4694.380be19fbd2' nixos
$ sudo nix-channel --update
$ cat /nix/var/nix/profiles/per-user/root/channels/nixos/.version*
22.05
.4694.380be19fbd2%

(Note that --add is a bit of a misnomer as it overwrites the previous definition of nixos)

Now you can nixos-rebuild like you did before and see if the issue already exists or not. This is your basic tool for rebuilding your config based on a fixed channel release.

Bisecting

Now, for the strategy of finding the commit that causes your issue; on the 22.11 channel job page, there’s 197 releases. They are listed in descending order by their “Finished at” date, so the first one is on the last page, and you can see it was published in 2022-11-22. Just to be sure you’re not wasting time, start with that one first.

If it succeeds, we know that the very first release of 22.11 was ok, and you can succeed in the usual bisect algorithm:

  1. Choose one release roughly halfway between the most recent known good one and the oldest known bad one
  2. Get the release URL as shown above (maybe paste it into your browser once to make sure it doesn’t 404)
  3. Set your nixos channel to the release URL, nix-channel --update, nixos-rebuild.
  4. Check if the issue is there
  5. If it is, all newer commits can be considered bad. If it isn’t all older commits can be considered good.
  6. Select a new release according to step 1 and continue until you find the first bad release.

If you have the same issue on the very first release of 22.11, you’ll need to do the bisecting on old releases of nixos-unstable. It is called trunk-combined on hydra, but it works pretty much the same as described above, you just have to know that the base URL is https://releases.nixos.org/nixos/unstable/ and that pre22.11 means that the release was after the branch-off of 22.05 and before the release of 22.11, so those are the releases you’ll want to look at. I think old releases might be purged from unstable if they’re old enough, or if they were never downloaded. The oldest one that seems to still be up is nixos-22.11pre383837.033bd4fa9a8 from 2022-06-08.

I hope this explanation was clear enough, let me know if you have any more questions or issues! Also make sure to click “Reply” on my answer or mention me directly, otherwise I might not get a notification about your reply :slight_smile:

2 Likes

Thanks for that. I changed to nixos-22.05.4694.380be19fbd2 and the problem is present, so surely doesn’t that mean that it happened in 22.05, not 22.11?

Just to be clear:

  • nixos-22.05.4694.380be19fbd2 – problem present.
  • nixos-22.05.4692.50fc86b75d2 – problem absent.

Jup, exactly. I would have assumed some changes got backported to 22.05 that caused this, but looking at the git log, this seems impossible:

$ git log --decorate --oneline --graph | head -7
*   380be19fbd2 (HEAD, origin/release-22.05, origin/nixpkgs-22.05-darwin, origin/nixos-22.05-small, origin/nixos-22.05-aarch64, origin/nixos-22.05) Merge pull request #235159 from prusnak/bitcoin-22.05
|\  
| * c3a341f4d26 bitcoin: 23.0 -> 23.2
|/  
* 50fc86b75d2 haskellPackages.hs-mesos: throw properly
* abd6db5708e texlive.texinfo: fix hash (#227358)
*   08741686397 Merge pull request #206382 from NixOS/backport-206193-to-release-22.05

You see 50fc86b75d2, which is where the problem is absent, and 380be19fbd2 where the problem is present.

The only change that happened in between was a minor version bump of bitcoin; the full diff just contains the version number and the hash, and searching for “bitcoin” in your config yields 0 results, so you’re not even installing it.

How the hell is that breaking nouveau?

I guess we gotta look deeper. The next thing we should check is the full closure of your system.
I’m not on NixOS, but this should work like a regular profile. First off, let’s see the generation numbers:

nix profile history --profile /nix/var/nix/profiles/system

This should allow you to find the generation numbers for the generations based on 50fc86b75d2 and 380be19fbd2. You’ll have to figure out which is which, but I would assume they’re the last two. Now, let’s say those were 661 and 662, respectively. You’d then execute this command:

nix store diff-closures /nix/var/nix/profiles/system-661-link /nix/var/nix/profiles/system-662-link

This should already show if there’s any difference between the two profiles. If there are some, you can also run

diff --recursive /nix/var/nix/profiles/system-661-link /nix/var/nix/profiles/system-662-link

which will give you more detail. Remember to replace 661 and 662 with your actual generation numbers.

Please post the output of all commands here (but truncate the first one, it would be way too long to post here).

I really hope we find something in those diffs.

1 Like

I was mistaken. The problem is absent in nixos-22.05.4694.380be19fbd2. Without boring you with the details, I am performing this investigation in less than ideal conditions.
Also, I’m using the Nvidia proprietary closed-source drivers, not Nouveau. Here is my config.
So at the end of the 22.05 line my system is running OK. What should I do next? Move to the start of 22.11? If so, how?

Ah phew, ok.

Ahhh right I looked at the link to your nouveau config, but you already explained that didn’t work. Sorry about that.

Yes. So as I wrote above you can go to the Hydra build page for 22.11 and select the first beta release, so https://releases.nixos.org/nixos/22.11/nixos-22.11beta19.c9538a9b707 is what to set your channel to first. Starting from that, follow the instructions I already posted.

Note that for some of these releases, the commit hash is missing from the release name, and you’ll get a 404 if try to access it that way. You need to use the full file name of the archive minus the file extension:

1 Like