How to troubleshoot/diff channel upgrades?

rolodato · May 21, 2023, 11:37pm

I currently run a NixOS server with Jellyfin and many other applications that have been working great for a very long time. Recently I decided to run a channel upgrade and nixos-rebuild, and this broke some of my Jellyfin clients in a very strange way that I’ve described here: https://github.com/jellyfin/jellyfin-webos/issues/156.

I have been able to work around this problem by rolling back the channel upgrade with nix-channel --rollback + nixos-rebuild, but I would like to find the root cause. How can I get more information on what changed exactly when I ran a rebuild and/or the channel upgrade? Maybe a diff between channel generations or NixOS generations? Or a more detailed log on what is happening during nixos-rebuild?

If I’m reading things correctly, the last good Nix channel was at revision d70f5cd5c3bef45f7f52698f39e7cc7a89daa7f0 and the bad one was at revision 628d4bb6e9f4f0c30cfd9b23d3c1cdcec9d3cb5c.

To clarify, my NixOS configuration itself has not changed, it’s only the channel upgrade plus a rebuild that broke my use case.

Thanks!

wamserma · May 22, 2023, 6:44am

You can use nvd and nix-diff to see what changed on your system:

Atemu · May 22, 2023, 7:27am

You get yourself a checkout of Nixpkgs and start a git bisect (ideally with --first-parent). Then nixos-rebuild build -I nixpkgs=/path/to/nixpkgs each step and tell git whether it’s good or bad.

You could deploy (test) each revision on the production machine but I’d personally rather build-vm instead and test against the VM with a problematic device.

Do that a few times until git tells you which commit is the first bad one and ping the author of the commit in your issue.

If it’s a staging-next merge, you’ll have to bisect that staging cycle in a second step which will get quite a bit more complicated and time-intensive. As a start you could then manually see whether any change affecting jellyfin directly was in that cycle.