What you need to do in this case is to manually remove one or two old kernel+initrds that were part of the GC’d generations.
Then you can run nixos-rebuild boot again. It’ll first copy over the current generation’s kernel+initrd and then clear out the rest of the GC’d generations’ files from /boot/ automatically.
Edit: The fault is in the system activation script that gets executed by nixos-rebuild, not nixos-rebuild itself.
I struggled with this a lot, and was frightened by the suggestions to delete kernels that I hoped I wasn’t using I’d like to test my new understanding and hopefully comfort other people
nixos-rebuild switch and maybe some other commands will make sure that /boot/EFI/nix and /nix/store contain everything needed for all of the system profiles in /nix/var/nix/profiles/. After they’ve done that they remove any unused files from /boot/EFI/nix
This explains why it’s not enough to remove old generations/profiles
It also explains why manually deleting kernels doesn’t risk creating broken entries in grub – nixos-rebuild will put everything back (as long as you don’t crash or restart while in this state)
If you use flakes, it’s even better than this. You can remove old generations, re run nixos-rebuild to rebuild the current installed version of your config (assuming that you still have it – git-commit is your friend), thus requiring no new space on /boot/. Maybe this is possible without flakes, but I never figured out how. This frees up space on /boot and now you can upgrade as normal
What I meant is that before I managed my system with a flake, when I ran out of space on /boot I was pretty stuck if nixos-rebulid wanted to install a new kernel. I don’t know how to tell it to rebuild the current setup and clean up any unused kernels instead of installing a new kernel first. Thus, I had to remove older generations, and then remove matching kernels to free up space manually
With a flake system, though, I can remove old generations of my profile, re-run nixos-rebuild switch --flake ... and have it remove any unused kernels. This feels much less sketchy to me
The core of what flakes do is a standardised method to manage external dependencies. How you do that has no influence on whether the activation script removes unused kernels after you remove old generations; it’s entirely independent.
Oh, great. In that case, why is that standard advice to manually remove files from /boot instead of just removing old generations and then regenerating the current profile, cleaning up /boot at the same time?
Because (at least at the time I wrote said advice), the activation script would first copy new kernels+initrds to /boot/ and only then delete unused ones.
You need to do both: Remove the profile, delete enough unused files from /boot/ so that any potential new files can be copied and then run the activation script.
If you only deleted old kernels, the activation script would copy them right back; using the space again.
But that’s my point. I think that you can do this when you have already run out of space, by removing old generations (but not kernels) and then rebuilding the current profile, so that no new kernel is required
The problem in the scenario I described is that you typically only notice that you’ve that you’ve run out of space when you’ve already built and attempted to activate a new generation with new kernel/initrd. The current profile would not be in /boot/ yet. (Adding the generation and switching the runtime to it works even if /boot/ is full.)
Activating a previous profile also wouldn’t help in this case as the activation script copies kernels and initrds of all available generations.
Hmm. Next time I run out of space I’m going to take careful note about how I resolve this. I still think that I could remove the new generation and some older generations, rebuild the previous successfully installed generation and resolve the problem. But I’ll let you know