Nix store sqlite db corruption

Any thoughts about how to recover the nix store sqlite db?

I get the following error when running a gc (the package can vary):

error: executing SQLite statement 'delete from ValidPaths where path = '/nix/store/iy9hn7sknd60nk77rf4vrznklhax8m5i-CPAN-Meta-Check-0.014.tar.gz.drv';': database disk image is malformed (in '/nix/var/nix/db/db.sqlite')

I see Rebuild sqlite db from scratch? · Issue #3091 · NixOS/nix · GitHub which didn’t really reach a conclusion about the best path forward.

The OP’s original question says that just reinitialising the db will try to re-download everything, and the rest of the thread mostly seems to just assume that’s unacceptable or otherwise ignores it.

If I’m ok with re-dowloading everything, is there some other problem that means it won’t work? Will all the downloads fail because the path already exists on the store?

Do I need to boot from removable media, and rebuild both the db and the store together?

Is there some other, better, option?

Before you do anything, make a backup of the db.

Grab an interactive sqlite tool.

Stop the nix-daemon and its socket. We don’t want it accessing the db.

Then you can take a look at the DB in sqlite and try to fix its issues manually. Perhaps it’s just a few malformed entries that you can simply fix manually with a few SQL queries.

Otherwise, you can always just re-install NixOS via nixos-install (obviously keep all your data and only wipe /nix). If you’re on btrfs, this could even be done on a live system.

I suspect that since the error is already from trying to delete rows, this isn’t likely to get far. Unless maybe it can free an entire corrupted page in a single transaction that doesn’t try to read the existing contents, which seems unlikely.

But worth some experiments.

I’m on zfs, and I have renamed the store dataset before. Because it’s mountpoint=legacy, this doesn’t change the mount on the running system, and I just need to rebuild with the new dataset path in the filesystems.* config entry for next boot.

So, yeah, I could build a whole new store on a new dataset from the running system, the same way I would from removable media.

I’m trying to learn whether that’s necessary, or just a fallback option.

Ah, in that case it’d be like 3 commands and probably not even a single download.

I’d honestly just do that and not bother with a corrupted DB. It’s likely this was caused by a past Nix bug and won’t occur again.

ohhh… because nixos-install will copy from the existing store. Cool, yeah, good point.

It was a little more than 3, but the rest are trivial things like making mountpoints.

zfs rename rpool/fmrl/nix rpool/fmrl/nix-old
zfs create -o mountpoint=legacy -o atime=off -o dedup=on rpool/fmrl/nix
zfs create -o mountpoint=legacy rpool/fmrl/dummyroot
mount -t zfs rpool/fmrl/dummyroot /mnt
mkdir /mnt/nix
mount -t zfs rpool/fmrl/nix /mnt/nix
mkdir -p /mnt/etc/nixos
rsync -ai /etc/nixos/. /mnt/etc/nixos/.
nixos-install --keep-going

reboot

Took all of about 4 minutes.

The --keep-going was only because there wasn’t a separate /mnt/boot and it got upset trying to install the bootloader, which I didn’t need.

Interestingly, the new store is rather smaller than the old, which suggests the db issue was preventing gc from working fully. I had already cleaned up as many gc roots (mostly dev profiles) and run nix-collect-garbage -d in hopes that it would delete enough to drop the problematic db records.

NAME                   AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
rpool/fmrl/nix          286G  14.1G        0B   14.1G             0B         0B
rpool/fmrl/nix-old      286G  78.1G        0B   78.1G             0B         0B

This is a super-easy workaround, and helps confirm for me that the nix store is truly ephemeral (and doesn’t need to be backed up, which is where that naming comes from).

It’s assisted, or enabled, by my particular circumstances and setup. It’s not really a general solution to the problem though.

What are the steps exactly for this on btrfs?

If you don’t have enough experience to come up with them yourself, I would not recommend doing it from a live system.

I’m not saying that to gatekeep but because you’re unlikely to succeed on the first and only attempt if you don’t know the mechanics behind it and other requirements; or worse yet: lose data.
It’s the sort of thing where I make another fresh backup and keep a recovery env handy beforehand eventhough I do have the understanding to pull it off.

Just do the simple “reinstall” route from a recovery env unless you explicitly want to use this opportunity for learning (and have a backup).

Is there any documentation on doing the simple ‘reinstall’ route? :stuck_out_tongue:

It seemed there is mostly only an old-ish wiki page Btrfs - NixOS Wiki

Well, you do a regular manual installation but instead of formatting the drive, you use the filesystems already present and merely wipe the nix store (or rename it if you’re cautious).