Failed "nixos-rebuild switch --upgrade"

Hello,

So I had a system issue in the middle of running “nixos-rebuild switch --upgrade”, basically something panicked, and got rebooted before completion.

I can boot into the previous configuration, but not the most recent. No worries, select that one, right?

What is no longer working though is any “nixos-rebuild switch” command, which results in the following:

sudo nixos-rebuild switch
building Nix...
building the system configuration...
Traceback (most recent call last):
  File "/nix/store/9nnalghn6ffv5j42pnkk6qjkp48z38nl-jr4rls1hlf9caafl7sy9csl2iwzd8rxm-systemd-boot", line 394, in <module>
    main()
  File "/nix/store/9nnalghn6ffv5j42pnkk6qjkp48z38nl-jr4rls1hlf9caafl7sy9csl2iwzd8rxm-systemd-boot", line 377, in main
    install_bootloader(args)
  File "/nix/store/9nnalghn6ffv5j42pnkk6qjkp48z38nl-jr4rls1hlf9caafl7sy9csl2iwzd8rxm-systemd-boot", line 324, in install_bootloader
    remove_old_entries(gens)
  File "/nix/store/9nnalghn6ffv5j42pnkk6qjkp48z38nl-jr4rls1hlf9caafl7sy9csl2iwzd8rxm-systemd-boot", line 222, in remove_old_entries
    bootspec = get_bootspec(gen.profile, gen.generation)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/9nnalghn6ffv5j42pnkk6qjkp48z38nl-jr4rls1hlf9caafl7sy9csl2iwzd8rxm-systemd-boot", line 113, in get_bootspec
    bootspec_json = json.load(boot_json_f)
                    ^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/json/__init__.py", line 293, in load
    return loads(fp.read(),
           ^^^^^^^^^^^^^^^^
  File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
warning: error(s) occurred while switching to the new configuration

I have tried both the --install-bootloader and --rollback options, and each fails the same way. Something clearly got a bit mangled, but I’m not sure how to restore it to be able to do a normal switch.

Thoughts?

This is just an idea, but, in the category of “your system is already borked anyway”, what if you executed nixos-rebuild switch --upgrade?
Worst case it doesn’t work and you’re where you are now, best case the --upgrade brings your system back in a happy state.

Under /nix/var/nix/profiles, there should be a number of symlinks called system-*-link. Each of those symlinks should point to a directory containing a boot.json file. At least one of those files is probably corrupt, empty, or missing; probably the last one.

Find the most recent one without a bad boot.json file. Run

sudo nix-env -p /nix/var/nix/profiles/system -G NNN

where NNN is the number of the last good system-*-link entry.

Then run

sudo nix-env -p /nix/var/nix/profiles/system --delete-generations XXX YYY ZZZ

where XXX... are all the numbers with bad or missing boot.json files.

Then run nixos-rebuild switch and see if that helps.

1 Like

Yeah, I tried that, thanks. Same result.

Oh, will check this out, thanks! I have a longer fix process running currently, but will look at these in parallel while it’s going.

There are definitely empty boot.json files in the last few. (File exists, but zero bytes)

OK, this is weird. I do all that, seems fine, only valid-looking boot.json files left.
I run sudo nixos-rebuild switch, and there’s a new entry… with an empty boot.json file. (And of course, the error message)

Huh! Do you have an older version of your configuration.nix in version control that you can roll back to?

Well, a backup. The changes were minimal, and the result, the same.

And if you rollback your channel with

sudo nix-env -p /nix/var/nix/profiles/per-user/root/channels --rollback

?

(And re-deleting any bad generations.)

I’ve done that, then kicked off nixos-rebuild switch --repair.

It did find some things to clean up, but has been sitting at this for a very long time:

repairing outputs of ‘/nix/store/0xdgamyxvj6c4akzmixqy8hzcrlj7sv8-nixos-system-peefiddy-24.05.2028.e4509b3a560c.drv’

(where peefiddy is the hostname)

1 Like

OK, that’s done it. Thanks for the help!
I don’t know what all the --repair had to do, but it was doing it for a very long time.

1 Like