Garbage collecting a machine with zero bytes free

Hi, does anyone know if (and how) it’s possible to run garbage collection on a machine with a completely full drive, literally zero bytes free? It’s a VM that I have been using as a staging area to test deployments, and it ran out of space due to accumulating system generations.

Running `nix store gc -v` first failed with this:

❯ nix store gc -v
...
deleting '/nix/store/saqh1rgvycbwx0r2vsyx6c7v77y75fii-etc'
deleting '/nix/store/c22yfb292ahc9kvaf5qfim4s2wf04yla-etc-metadata.erofs'
error (ignored): aborting transaction: SQL logic error, cannot rollback - no transaction is active (in '/nix/var/nix/db/db.sqlite')
45 store paths deleted, 971.3 KiB freed
error: committing transaction: database or disk is full, database or disk is full (in '/nix/var/nix/db/db.sqlite')

Then, running it for a second time, it failed with the following:

❯ nix store gc -v
finding garbage collector roots...
deleting garbage...
0 store paths deleted, 0.0 KiB freed
error: executing SQLite statement 'delete from ValidPaths where path = '/nix/store/bmdyy5k44ccahxrirpl6fyhzc87x172n-etc';': database or disk is full, database or disk is full (in '/nix/var/nix/db/db.sqlite')

Subsequent failures simply core dump:

❯ nix store gc -v
Bus error                  (core dumped) nix store gc -v

Really everything core dumps:

❯ nix repl
Bus error                  (core dumped) nix repl

nix-collect-garbage also fails:

❯ nix-collect-garbage -d
removing old generations of profile /nix/var/nix/profiles/system
error (ignored): writing to file: No space left on device
Bus error                  (core dumped) nix-collect-garbage -d

For reference:

❯ df
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/sda3       33027072 32391856         0 100% /
tmpfs             504836     9632    495204   2% /run
/dev/sda3       33027072 32391856         0 100% /nix
/dev/sda3       33027072 32391856         0 100% /var
/dev/loop0            48       48         0 100% /run/nixos-etc-metadata
overlay         33027072 32391856         0 100% /etc
devtmpfs          100968        0    100968   0% /dev
tmpfs            1009672        8   1009664   1% /dev/shm
tmpfs               1024        0      1024   0% /run/credentials/systemd-journald.service
tmpfs               1024        0      1024   0% /run/credentials/systemd-resolved.service
/dev/sda3       33027072 32391856         0 100% /srv
/dev/sda3       33027072 32391856         0 100% /var/swap
/dev/sda2         523248    55100    468148  11% /efi
tmpfs            1009672     1512   1008160   1% /run/wrappers
tmpfs               1024        0      1024   0% /run/credentials/systemd-networkd.service
tmpfs             201932       20    201912   1% /run/user/175
tmpfs             201932       20    201912   1% /run/user/0

Since it’s a test VM, I can wipe it and reinstall of course, but I really would like to know what one would do if this situation happened on a bare-metal machine. I’ve had this issue a few times now, and would really like to know: What is the intended way to run garbage collection on such machines, to escape this trap, if running garbage collection requires disk space in the first place?

I don’t know of a way to GC with literally zero bytes free, other than deleting some files outside of the Nix store on that partition.

Often, though, when I’m very low on space, smaller GCs will succeed when larger ones would fail. So if I have only a few kibs free, I run some ‘warmup’ GCs starting with nix-store --gc --max-freed 1M and steadily increasing the threshold.

I managed to get myself unstuck by vacuuming journald logs:

❯ journalctl --vacuum-time 1h

But then I tried deploying again and now the disk is full again, except this time there’s no logs to remove…

Maybe it’s worth bumping the storage on the VM, or determining why the VM is filling up so much?

1 Like

nix’s gc is hilariously bad - this is just one very common example. I don’t know how this hasn’t been solved in 20 years, but my hacky workaround would be as follows:

  1. List out generations
  2. Delete old generation(s)
  3. Determine the set of dead (gc-able) paths
  4. Remount store read-write (do not do this on a regular basis for hopefully obvious reasons)
  5. rm -rf said paths
nixos-rebuild list-generations
sudo rm /nix/var/nix/profiles/system-XYZ-link # XYZ is the generation # that you want to delete
nix-store --gc --print-dead > paths.txt # there's apparently no way to do this with nix3, lmao
sudo mount -o remount,rw /nix/store
cat paths.txt | xargs sudo rm -rf

I would then reboot as soon as possible to avoid weird store corruption regarding perms and whatnot (some programs like to write to their install dir, so…).

You’ll probably also want to do a store repair to ensure coherence of the nix db:

sudo nix-store --verify --repair --check-contents
5 Likes

Thanks this worked! It’s kind of a paradox that garbage collection requires disk space… funnily enough, logging into KDE worked, despite the completely full disk, which I expected would fail.

Also to answer the earlier reply, I did resize the VM, and resized the GPT as well, but then unfortunately forgot to resize the inner Btrfs partition, so after a reboot and another failed deploy attempt it went into zero bytes free again. But now I resized it properly and set it up to auto-collect garbage after each deploy and only keep the latest generation, so it shouldn’t happen again.

And I just found the GitHub issue for this, it has been open since 2015 (over 10 years!) Make garbage collector work if there is no free space · Issue #564 · NixOS/nix · GitHub

1 Like

Because it wants to do some moves to ensure deletions are atomic, and writing to the nix db unfortunately also takes some space… still it would be ideal to have a fallback in low-disk-space scenarios.

1 Like

This should be solved with the /nix/var/nix/db/reserved file. The fact that it’s not helping suggests a bug, I will try and reproduce.

This machine uses btrfs as the root filesystem, and according to that GitHub issue unfortunately deleting the reserved file doesn’t actually delete anything on a CoW filesystem :confused:

Independently of nixos.org stuff, with btrfs you could also get into a situation where deletions would fail on out-of-space errors. At least in the past. (even without any snapshots etc.)

1 Like