How do you organize your `/persist`?

prescientmoon · May 17, 2023, 5:10pm

Hi! I am preparing to switch to a setup based on zfs & the impermanence module where persistence is opt in (erase your darlings). I am trying to decide how to organize everything.

The setup I came up with last night consists of having three subdirectories for /persist: /persist/state, /persist/home, and /persist/data. The idea is that /data would contain things like media or other things I really care about, /state would contain persistent state of apps on my current system (stuff like settings made in gui apps, bt devices, etc), and /cache would contain things I should be able to delete at any point. Additionally, /persist/*/home/[username] will contain one directory for each app (so Discord stuff will be saved at /persist/state/home/adrielus/Discord/.config/Discord/*). This seemed fine when I came up with it (I even wrote a nixos module on top of impermanence to handle everything for me super nicely!), but after some sleep I’m starting to see some possible flaws:

On one hand, it feels like I am reinventing the wheel. Linux already has stuff like .local/share, .cache, etc.
Many apps (especially electron ones) like to throw everything in a single place. Discord and signal for example, put all their data in the .config directory of all places. This means that my /persist/state will also contain a bunch of cache (unless I want to investigate what every single file saved by discord does, but that sounds like a lot of work)

I am curious how y’all organize this kind of stuff!

If you’ve gotten this far, thank you very much for reading & thanks in advance!

nairou · May 18, 2023, 3:06pm

For my setup, the primary difference is the fact that I have /home located on it’s own disk. This means I could wipe root or change distros and never worry about anything in my home getting touched (very handy when I switched to NixOS).

In addition, I’m using btrfs, so I have /nix, /persist, and /var/log in their own subvolumes, excluding them from a root wipe.

That leaves /persist with the bare minimum of items I want to keep track of:

/persist/etc for files not in environment.etc, like adjtime and machine-id.
/persist/nixos for system configuration nix files and flake lock
/persist/var/lib things like VM configurations

This isn’t a complete list, as I’ve only recently started moving system files and haven’t yet risked a root wipe.

For my home directory, I do something similar, but for a different reason. As you said, programs like to dump files all over the place. So I created a ~/.persist directory, and moved all of my own directories there (documents, pictures, downloads, etc.), with symlinks in ~ to point to them. Anything that doesn’t point in there, I know was created by something else and I can decide whether to move or ignore it.

rnhmjoj · May 18, 2023, 5:36pm

I made my own module for this, I think before impermanence was a thing.
I manage the state like this:


  boot.rootOnTmpfs = true;

  # This creates bind mounts like:
  #  /var/db/dhcpcd → /nix/state/dhcp-leases
  state.directories =
    { "dhcp-leases" = "/var/db/dhcpcd";
      "lvm-backups" = "/etc/lvm/backup";
      "bluetooth"   = "/var/lib/bluetooth";
    };

  # Similarly, this creates symlinks:
  #  /var/lib/alsa/asound.state → /nix/state/audio
  state.files =
    { "audio"    = "/var/lib/alsa/asound.state";
      "printers" = "/var/lib/cups/printers.conf";
    };

I also wrote a utility to diff the root, in case I installed something new that
I want to make persistent:

sudo diff-root
--- /run/initial-root      2023-05-16 09:46:00.808825909 +0200
+++ /tmp/current-root.AM8  2023-05-18 19:17:33.361502071 +0200
@@ -275,6 +275,7 @@
 /var/lib/systemd/random-seed
 /var/lib/systemd/timers
 /var/lib/tlp
+/var/lib/tlp/rfkill_saved
 /var/lib/tor
 /var/lib/tor/cached-certs
 /var/lib/tor/cached-microdesc-consensus

ericgundrum · May 18, 2023, 11:25pm

I’ve been running with zfs-based impermanence for about two years now. What I found matters most in organizing my datasets is their snapshot (and similar) properties.

Root is wiped on every boot. /nix, /var/log, /var/cache persist but have no recurring snapshots. /home and /etc/nixos are in a dataset that gets frequent automatic snapshots.

The biggest change I made was to make /var/cache persist and create /var/cache/eric symlinked from /home/eric/.cache. Then I direct heavy, generated content there – compiler output and other such intermediate files. Keeping the garbage out of zfs snapshots became important when my storage started getting full.

I’ve not yet gone so far as to redirect individual applications cache directories. Although, I would if any got particularly large. I also set XDG_CACHE_HOME, XDG_DATA_HOME and friends, but too many programs ignore them.

And just to be difficult, I set the root of my home dir to read-only. This prevents programs from dropping their turds in there. When this occasionally breaks something, I unlock HOME and see what the broken program is dropping in there, and then lock it again.

prescientmoon · May 19, 2023, 6:13pm

@ericgundrum So if I understand correctly, your app state goes next to your normal files?

Was also wondering if there’s a way to exclude big directories like node_modules from snapshots. One solution would be to have a script which generates a dir somewhere else and symlinks to it. I’d then run that script before setting up any new project.

ericgundrum · May 19, 2023, 11:32pm

I’m not sure how best to answer this. Apps vary in how they manage their config, state, cache and more. The XDG standard specifies default subdirectories of $HOME for these and environment variables to redirect them. Unfortunately many apps do not follow the XDG standard.

Apps that follow XDG will use my /var/cache/eric for cache and data. Initially I did this for rust packages. Later I moved my podman containers there. I also symlink ~/.cache to var/cache/eric and use it manually for various ephemeral tasks, such as caching videos for later viewing.

I do not use a lot of apps. Still many do not follow XDG. Usually I just let them do what they want for their files if they don’t use a lot of space. Firefox might by the worst offender, yet it is less than 10% the size of the junk in my Downloads folder.

With so much diversity, I do not think any solution will be comprehensive. I’ve chosen to focus on the apps causing the worst problems, such as podman.

I can imagine how node_modules would be a problem – similar to what I was seeing with rust. It’d be nice if npm or yarn let you specify some other location for node_modules, but I suspect they don’t and you’d have to roll your own. (Also look at what they do for their cache folders.)

A reasonable alternative is to put your coding projects in a separate dataset where you can manage snapshots differently. If you are disciplined about committing and pushing code changes every day, you may not need snapshots at all. Or even if you use frequent snapshots, you can have them expire quickly so that outdated build artifacts do not hang around for more than a week or two. (You’d also have less worries about deleting these snapshots if you need space quickly.)

Likely I’d use this separate dataset approach rather than trying to script symlinks with project creation. There may be other dataset properties worth tuning for the use case such as compression, encryption and write-thru.

I’ll also mention that I do not use my $HOME for anything which I can easily place elsewhere. Instead I use encrypted volumes mounted as needed and outside of $HOME. This is a bit of security by obscurity – any tool I run also has authority to read any file my account can read. That sweet new npm package I want to try just also might be scanning $HOME for juicy credentials to my online services. I like to think such malware is less likely to scan from root, but probably I’m fooling myself.