BTRFS bad tree block error after hibernating on an impermanent system and restarting

Anomalocaris · October 14, 2024, 8:42pm

I recently noticed that after I hibernate my PC and then shut it off again, I get btrfs bad tree block errors and I need to boot into a live environment to run btrfs rescue super-recover. Booting from hibernation works fine, but when I restart my PC afterwards, I start getting the errors after.

Also I recently noticed I started getting messages saying a file descriptor was leaked on LVM invocation. I do not know if this is related or not, but I figured I might as well mention it.

Important details:

I last updated a few days ago, hibernation worked just fine before that.
I use an impermanent system with a tmpfs.
My hard drive uses a LUKS-encrypted LVM partition with a btrfs volume and a swap volume. The errors occur after decrypting.
I use Disko to manage my disk layout (disko config: dotfiles/disko-configurations at main · Anomalocaridid/dotfiles · GitHub)
I use hypridle as my idle daemon

My entire config: GitHub - Anomalocaridid/dotfiles: My personal dotfiles for NixOS

Atemu · October 15, 2024, 4:49am

Whenever btrfs corrupts, it’s usually because of bad hardware and sometimes btrfs bugs.

Btrfs is a lot more sensitive to hardware issues (or rather: has the capability to detect them) and will refuse to work when the hardware fscks up rather than continuing to work with possibly bad states as if nothing happened which is what most other filesystems typically do.

It’s not impossible this is a bug though, especially with hibernation thrown into the mix which is kind of a hairy thing anyways IMHO.

I highly doubt this has anything to do with disko, impermanence or hypridle. Disko merely automatically applies mkfs and such which shouldn’t be any different from running those manually which is what everyone else does. Impermanence is all about userspace, so it couldn’t possibly break btrfs unless btrfs has a bug. Hypridle is also an exclusively userspace component and has absolutely no relation to btrfs.

I think you’d be better of discussing this with btrfs folks as this doesn’t have anything to do with NixOS. Just send an email to the mailing list.

Anomalocaris · October 15, 2024, 5:15pm

I don’t think it’s a hardware issue. After seeing your reply, I rolled back to the latest generation I had before I updated (I updated on October 5) and hibernation worked properly without corrupting the superblock.

I agree, I mainly wanted to cover my bases as much as possible and give as much potentially pertinent info as I could. Also I figured showing my Disko config would be helpful in case my system’s particular filesystem layout happened to be related.

I think there’s a very good chance you’re right, but I still feel like there’s some uncertainty. It may not be a NixOS problem, but for all I know it could be, say, an LVM problem, not a btrfs problem. Especially since the file descriptor leak warning appeared after the same update, so I want to at least rule it out.

Apparently the file descriptor leak warning is a fairly recent issue on nixpkgs: https://github.com/NixOS/nixpkgs/issues/342082. Although I did not see any mention of issues related to hibernating with swap.

Atemu · October 16, 2024, 10:26am

It’s usually pretty hard to ascertain this.

It could be a super subtle issue that only affects btrfs of a newer kernel for instance. What’s the kernel version diff between those gens?

After you’ve done a fresh backup of everything and verified it you could try booting to the new gen and seeing whether that corrupts the superblock reproducibly or whether it was just a one-off.

They’ll be able to figure that out.

Anomalocaris · October 19, 2024, 6:15pm

According to uname -r: 6.10.3-xanmod1 (old) vs 6.10.11-xanmod1 (new). Also happens on the currently default kernel, 6.6.53, although that time it corrupted my filesystem to the point I could not figure out how to fix it and I had to restore my system from a backup.

I already tested it. It seems to consistently happen every time I hibernate on the new gen.

Fair point. I’ll shoot them an email.

ltrump · November 23, 2024, 4:19pm

Have you found what the cause the problem? I’ve met almost exactly the same problem after a normal hibernation resume and then reboot. I don’t use disko/LUKS, but I also use an impermanent system (by clear root volume every boot). btrfs-find-root can correctly find the tree root, but all other operations failed, including mount/btrfs check --repair/btrfs restore.

Atemu · November 23, 2024, 4:22pm

Please don’t ever run that unless specifically instructed to by a btrfs developer.