Any thoughts about how to recover the nix store sqlite db?
I get the following error when running a gc (the package can vary):
error: executing SQLite statement 'delete from ValidPaths where path = '/nix/store/iy9hn7sknd60nk77rf4vrznklhax8m5i-CPAN-Meta-Check-0.014.tar.gz.drv';': database disk image is malformed (in '/nix/var/nix/db/db.sqlite')
The OP’s original question says that just reinitialising the db will try to re-download everything, and the rest of the thread mostly seems to just assume that’s unacceptable or otherwise ignores it.
If I’m ok with re-dowloading everything, is there some other problem that means it won’t work? Will all the downloads fail because the path already exists on the store?
Do I need to boot from removable media, and rebuild both the db and the store together?
Stop the nix-daemon and its socket. We don’t want it accessing the db.
Then you can take a look at the DB in sqlite and try to fix its issues manually. Perhaps it’s just a few malformed entries that you can simply fix manually with a few SQL queries.
Otherwise, you can always just re-install NixOS via nixos-install (obviously keep all your data and only wipe /nix). If you’re on btrfs, this could even be done on a live system.
I suspect that since the error is already from trying to delete rows, this isn’t likely to get far. Unless maybe it can free an entire corrupted page in a single transaction that doesn’t try to read the existing contents, which seems unlikely.
But worth some experiments.
I’m on zfs, and I have renamed the store dataset before. Because it’s mountpoint=legacy, this doesn’t change the mount on the running system, and I just need to rebuild with the new dataset path in the filesystems.* config entry for next boot.
So, yeah, I could build a whole new store on a new dataset from the running system, the same way I would from removable media.
I’m trying to learn whether that’s necessary, or just a fallback option.
The --keep-going was only because there wasn’t a separate /mnt/boot and it got upset trying to install the bootloader, which I didn’t need.
Interestingly, the new store is rather smaller than the old, which suggests the db issue was preventing gc from working fully. I had already cleaned up as many gc roots (mostly dev profiles) and run nix-collect-garbage -d in hopes that it would delete enough to drop the problematic db records.
This is a super-easy workaround, and helps confirm for me that the nix store is truly ephemeral (and doesn’t need to be backed up, which is where that naming comes from).
It’s assisted, or enabled, by my particular circumstances and setup. It’s not really a general solution to the problem though.