File system check running for a day after `lvreduce`+`lvextend`

I’ve been running out of space on the root logical volume, so I thought I’d change some over from /home to /. This is the rough sequence of events:

  1. lvreduce -L 100G /dev/mapper/vg-home
  2. lvextend -l+100%FREE /dev/mapper/vg-root
  3. resize2fs /dev/mapper/vg-home insisted I run e2fsck first.
  4. Rebooted into NixOS 21.11 installer.
  5. Tried running e2fsck, but it just never finished and did not respond to Ctrl-c or SIGUSR1.
  6. Rebooted into “normal” boot to see if that file system check would be more responsive.
  7. Waited 17 hours before posting this.

Is my machine hosed? Is this a known issue?

The system is NixOS 21.11, and it is running on a single ~250 GB SSD. The LVs are formatted as EXT4. I’ve not had any storage issues so far, so it’s a bit of a mystery why fsck is so slow. To be clear, the machine is still responsive - there’s a [ *** ]-style waiting indicator going back and forth, and pressing Enter does move the text down. The machine is room temperature, so it doesn’t seem to be doing anything power-intensive.

(Cross-post)

After 49 hours I forced a reboot by holding down Ctrl-Alt-Delete, but it just starts again after booting. What do I do?

Recording the process so far for reference, in case it works:

  1. Press F1 during GRUB menu, then e to edit the command line.
  2. Add fsck.mode=skip to skip file system check. This results in the system booting into rescue mode, since it can’t mount /.
  3. Run ls -lt /etc/lvm/archive to find the previous configuration (the second-to-last file).
  4. Run vgcfgrestore --file [path found in previous step] [name of volume group] to restore the volume group configuration.
  5. Reboot.

At this point the boot log says “Failed to start File System Check on /dev/disk/by-uuid/[…].”, “Dependency failed for /home.” and “Dependency failed for Local File Systems.”

systemctl status systemd-fsck@[…] says “Inodes that were part of a corrupted orphan linked list found.” To recover:

  1. Run fsck /dev/mapper/[…]
  2. Answer “y” to all queries

After this the system starts up.

Lessons learned:

  • fsck does not print any useful status information by default.
  • Do not combine lvreduce and lvextend. They might work separately; future experiment coming Soon™.
  • At this point, I think my success rate with fsck is about 30% over 10+ years.
  • The fact that this is recoverable at all is fantastic.

To move some storage from one logical volume to another:

  1. lvreduce --size 100G /dev/mapper/[LV name].
  2. Reboot
  3. Wait for fsck (never finishes, screw this)
  4. vgcfgrestore to restore sanity

Third try’s the charm:

  1. Boot into NixOS 21.11 USB key
  2. lvreduce --resizefs --size 100G /dev/vg/home
  3. lvextend --resizefs --extents +100%FREE /dev/vg/root

This took all of a few minutes to run. The --resizefs is absolutely key here - trying to resize2fs manually after lvreduce does not work - it’ll ask you to run fsck - and running fsck never finishes.

something something zfs:laughing: