File system check running for a day after `lvreduce`+`lvextend`

l0b0 · March 21, 2022, 8:02pm

I’ve been running out of space on the root logical volume, so I thought I’d change some over from /home to /. This is the rough sequence of events:

lvreduce -L 100G /dev/mapper/vg-home
lvextend -l+100%FREE /dev/mapper/vg-root
resize2fs /dev/mapper/vg-home insisted I run e2fsck first.
Rebooted into NixOS 21.11 installer.
Tried running e2fsck, but it just never finished and did not respond to Ctrl-c or SIGUSR1.
Rebooted into “normal” boot to see if that file system check would be more responsive.
Waited 17 hours before posting this.

Is my machine hosed? Is this a known issue?

The system is NixOS 21.11, and it is running on a single ~250 GB SSD. The LVs are formatted as EXT4. I’ve not had any storage issues so far, so it’s a bit of a mystery why fsck is so slow. To be clear, the machine is still responsive - there’s a [ *** ]-style waiting indicator going back and forth, and pressing Enter does move the text down. The machine is room temperature, so it doesn’t seem to be doing anything power-intensive.

(Cross-post)

l0b0 · March 23, 2022, 4:14am

After 49 hours I forced a reboot by holding down Ctrl-Alt-Delete, but it just starts again after booting. What do I do?

l0b0 · March 23, 2022, 4:41am

Recording the process so far for reference, in case it works:

Press F1 during GRUB menu, then e to edit the command line.
Add fsck.mode=skip to skip file system check. This results in the system booting into rescue mode, since it can’t mount /.
Run ls -lt /etc/lvm/archive to find the previous configuration (the second-to-last file).
Run vgcfgrestore --file [path found in previous step] [name of volume group] to restore the volume group configuration.
Reboot.

At this point the boot log says “Failed to start File System Check on /dev/disk/by-uuid/[…].”, “Dependency failed for /home.” and “Dependency failed for Local File Systems.”

systemctl status systemd-fsck@[…] says “Inodes that were part of a corrupted orphan linked list found.” To recover:

Run fsck /dev/mapper/[…]
Answer “y” to all queries

After this the system starts up.

Lessons learned:

fsck does not print any useful status information by default.
Do not combine lvreduce and lvextend. They might work separately; future experiment coming Soon™.
At this point, I think my success rate with fsck is about 30% over 10+ years.
The fact that this is recoverable at all is fantastic.

l0b0 · March 23, 2022, 5:00am

To move some storage from one logical volume to another:

lvreduce --size 100G /dev/mapper/[LV name].
Reboot
Wait for fsck (never finishes, screw this)
vgcfgrestore to restore sanity

Third try’s the charm:

Boot into NixOS 21.11 USB key
lvreduce --resizefs --size 100G /dev/vg/home
lvextend --resizefs --extents +100%FREE /dev/vg/root

This took all of a few minutes to run. The --resizefs is absolutely key here - trying to resize2fs manually after lvreduce does not work - it’ll ask you to run fsck - and running fsck never finishes.

aanderse · March 23, 2022, 11:01am

something something zfs…