How to optimize btrfs on SSDs (checksum, zstandard compression, discard)

As far as I know, BTRFS filesystems can be optimized in 2 steps, creation and mounting.

NOTE: read corrections below.

Creation

When creating a filesystem, one can define the hashing function to be used. BTRFS does not yet support blake3, but it supports blake2. According to Wikipedia blake3 is a variant of blake2 that is has fewer rounds and is thus many times faster.

official documentation on BTRFS checksum algorithms

Following that documentation, xxhash, successor to the default CRC32C might be the preferred one.

# create LUKS2 container
cryptsetup luksFormat /dev/DEVICE
cryptsetup open /dev/DEVICE root

# create BTRFS filesystem
mkfs.btrfs -L NIXROOT --csum xxhash /dev/mapper/root

There seems to be no way to change the checksum algorithm of an existing filesystem. The default one is fine, but for new deployments this might be relevant.

Mounting

In hardware-configuration.nix I have added the following arguments

fileSystems."/" =
    { device = "/dev/disk/by-uuid/xxxxxxxxx-xxxxx-xxxxx-xxxxx-xxxxxxxxx";
      fsType = "btrfs";
      options = [ "compress=zstd:??" "discard" ];
    };

To my knowledge, the chosen compression is applied for newly created files, so this will change the filesystem over time. The compression is transparent, meaning that files appear uncompressed, but take up less space on disk than their uncompressed sum, and are uncompressed when opened with an application. Only compressible files are compressed, so no unneeded work is done.

At current storage situations, using a high value may give you more freedom without a need for upgrading.

The “discard” option, which, according to the Arch wiki has the potential to leak a small amount of data like the filesystem used, but improves performance on SSDs.

It might cause performance and durability improvements or issues (on older hardware).


You might also be interested in authenticated BTRFS which might be useful when sharing a filesystem with others that should only read it, and all writes are authenticated.

mkfs.btrfs --csum hmac-sha256 --auth-key 0123456 /dev/disk

mount -t btrfs -o auth_key=btrfs:foo /dev/disk /mnt/point

Are you pasting LLM messages? It looks good on the surface, but you seem to lack the understanding you’d need to be so confident - or well, know any of the things you’re saying to begin with.

I’m not sure why you spend two paragraphs talking about blake hashes only to then conclude (reasonably) with xxhash probably being a good choice over the default on modern hardware. But that is also exactly what the btrfs docs say, why not just link those? This feels like hallmark LLM text.

What experience do you have to back up your experience with blake2 over sha256 on btrfs? This seems like an extremely niche thing to do, since it’s neither the default nor even recommended upstream, did you actually benchmark this? What kind of workload did you benchmark with, what kind of system is your recommendation for? What are you basing your claim that the hash btrfs uses isn’t security relevant on?


The rest seems full of subtle inaccuracies that I wouldn’t expect from someone who did the kind of testing you seem to imply.

Don’t set the ACL option. It’s the default, and forcing it will cause issues if defaults ever decide that it shouldn’t be on.

Compression level 15 is almost certainly going to cause non-negligible overhead for nearly no disk space gain. Recommending anything but the default without representative benchmarks to back it up seems weird.

Enabling compression at all tends to have fairly little gain on your typical desktop IME, most large files are stored in compressed file formats anyway. This might be useful if you use RAW image or video formats a lot, but then you probably still don’t want level 15 because of the massive overhead on general performance.

This is kind of a really un-nuanced interpretation. It doesn’t just “improve performance”, it syncs freed blocks to the device immediately. This lets the controller potentially balance writes better, which yes, can be a net positive in write performance and wear leveling.

Since it entirely relies on the controller optimizing things after the command is issued, this is very hardware-dependent, though. Especially with older controllers it’s known to just increase wear significantly.

It is indeed often advised for SSDs, but just setting the discard option isn’t enough if you use LUKS - LUKS needs to also be set to allow discards.

The dm-crypt developers seem to be pretty lukewarm about it, though, since they deem its impact relatively small. I’ll agree that the downside is negligible for most users, but if you need plausible deniability you shouldn’t use TRIM (and you should keep your LUKS header off-disk, and a number of other things which are difficult in practice, so personally I don’t think that’s a use case to optimize for).

The recommendation from the arch wiki is based on 15 year old data, in either case, even from before NVMEs were common. Getting more up-to-date benchmarking - especially across different devices - seems like it would be prudent to give any recommendation.


Most importantly, in the context of NixOS, the primary recommendations should be things like noatime, or figuring out a good partitioning scheme to deal with the unusual properties of /nix/store. A single btrfs partition feels wrong since you can’t turn of xattrs or set noacl for a partition that explicitly doesn’t use those. There is some interesting stuff to investigate here, but you seem to have a big blind spot.

Sorry to be so negative, but since this is in Guides I don’t want random passersby to put more trust in it than they should. Please don’t paste LLM content here, especially without saying you are doing so - people can ask their LLM of choice themselves if they want that.

9 Likes

I also felt this had the whiff of LLM slop and agree with all your points, but I find the topic itself interesting.

I thought about using xxhash for a new btrfs partition in the past, but everything I’ve read said that the hash algorithm is the last thing holding the filesystem back. I’d love some actual benchmarks and proper testing though.

I’m using zstd compression on all my systems, but am perpetually unsure which compression level would actually be the best fit. I’ve went with 1 on SSDs most of the time, but felt no difference to the default 3 and then there are the negative levels introduced in more recent kernels that compete with lzo.

Any recommendations I’d give are already covered by the official wiki entry.

2 Likes

I feel like zstd:1 is at least good from my tests, maybe compress-force for certain dirs like cargo target, which is what I tested on. For some nix store paths it would probably be good, I’d try compress-force zstd:1 personally as I compile a LOT of software, including using crate2nix, and can always disable it for certain directories if I need better performance. Definitely NOT zstd:15 though.

Full results from one cargo target directory:

mode level elapsed_sec disk_gb uncompressed_gb referenced_gb savings_pct
compress 1 103.92 6.64 10.40 11.05 36.18
compress 3 136.17 6.55 10.44 11.13 37.24
compress 6 138.72 6.43 10.38 11.03 38.06
compress 9 161.85 6.41 10.36 10.99 38.16
compress 15 596.58 6.30 10.35 10.97 39.13
compress-force 1 64.36 4.24 10.41 11.08 59.28
compress-force 3 105.56 4.17 10.44 11.13 60.06
compress-force 6 118.06 3.93 10.39 11.05 62.22
compress-force 9 140.08 3.90 10.43 11.12 62.60
compress-force 15 795.02 3.61 10.33 10.94 65.07

YMMV though, I would personally use compress-force but I recommend compress usually for other people unless they don’t mind the extra overhead or have lots of files the heuristics skip. Wish you could customize the heuristics.

2 Likes

Thanks for that. Having a separate compression setting for the nix store would require a different proper partition for that though, as compression settings (like most settings) can’t be changed on a per-subvolume level

3 Likes

That’s unfortunate. Need to run some more /nix/store-like mixed workloads to see if compress-force is as effective there as in Rust build artifacts.

Nushell script I used to test, if you want to run on your own workloads (full disclosure, was generated by gpt-5.5 via ChatGPT web and gpt-5.3-codex via pi):

#!/usr/bin/env nu

def main [
  --mode: string = "compress-force",
  --level: int = 3,
  --target: string = "./target",
  --img: string = "/tmp/btrfs-compression-test.img",
  --mnt: string = "/tmp/btrfs-compression-test-mnt",
  --size: string = "20G",
] {
  # Create sparse image
  truncate -s $size $img

  # Format as Btrfs
  mkfs.btrfs -f $img out+err> /dev/null

  # Mount with requested zstd compression
  sudo mkdir -p $mnt
  sudo mount -o $"loop,($mode)=zstd:($level)" $img $mnt

  # Prepare target dir in test fs
  sudo mkdir -p $"($mnt)/target"
  sudo chown $"($env.USER):($env.USER)" $"($mnt)/target"

  # Capture pre-copy Btrfs stats
  let btrfs_usage_pre = (sudo btrfs filesystem usage -b $mnt | lines)
  let btrfs_df_pre = (sudo btrfs filesystem df -b $mnt | lines)
  let btrfs_du_pre = (sudo btrfs filesystem du -s $"($mnt)/target" | lines)

  # Copy target into it
  let copy_time = (timeit {
    ^cp -a --reflink=never $"($target)/." $"($mnt)/target/"
  })

  # Flush writes before post-copy stats
  sync

  # Capture post-copy Btrfs stats
  let btrfs_usage_post = (sudo btrfs filesystem usage -b $mnt | lines)
  let btrfs_df_post = (sudo btrfs filesystem df -b $mnt | lines)
  let btrfs_du_post = (sudo btrfs filesystem du -s $"($mnt)/target" | lines)

  # Measure compression (bytes output)
  let summary = (sudo /usr/sbin/compsize -b -x $"($mnt)/target" | lines)
  let total = ($summary | where (($it | str upcase) | str starts-with "TOTAL") | last)
  let parsed = ($total | parse -r '^(?i)TOTAL\s+\S+\s+(?<disk>\d+)\s+(?<uncompressed>\d+)\s+(?<referenced>\d+).*' | first)
  let disk = ($parsed.disk | into int)
  let uncompressed = ($parsed.uncompressed | into int)
  let referenced = ($parsed.referenced | into int)

  # Compression savings based on Btrfs data usage delta
  let source_bytes = ((^du -sb $target | lines | first | split row "\t" | first) | into int)
  let data_used_pre = ((($btrfs_df_pre | where ($it | str starts-with "Data,") | first) | parse -r '^Data,\s+\S+:\s+total=\d+,\s+used=(?<used>\d+)$' | first).used | into int)
  let data_used_post = ((($btrfs_df_post | where ($it | str starts-with "Data,") | first) | parse -r '^Data,\s+\S+:\s+total=\d+,\s+used=(?<used>\d+)$' | first).used | into int)
  let btrfs_data_used_delta = ($data_used_post - $data_used_pre)
  let savings_pct = (100 * (1 - (($btrfs_data_used_delta | into float) / ($source_bytes | into float))))

  let result = {
    mode: $mode,
    level: $level,
    disk_bytes: $disk,
    uncompressed_bytes: $uncompressed,
    referenced_bytes: $referenced,
    savings_pct: $savings_pct,
    copy_time: $copy_time,
    source_bytes: $source_bytes,
    btrfs_data_used_delta: $btrfs_data_used_delta,
    compsize_savings_pct: (100 * (1 - (($disk | into float) / ($uncompressed | into float)))),
    btrfs_usage_pre: $btrfs_usage_pre,
    btrfs_usage_post: $btrfs_usage_post,
    btrfs_df_pre: $btrfs_df_pre,
    btrfs_df_post: $btrfs_df_post,
    btrfs_du_pre: $btrfs_du_pre,
    btrfs_du_post: $btrfs_du_post,
  }

  sudo umount $mnt
  rm -f $img
  sudo rmdir $mnt

  $result
}

Hm, it’d be better to use btrfs’ stats. It’s worth noting that you’re running this with nested file systems, whatever manages blocks underneath btrfs might impact your results, too. I also have some concerns with these numbers:

You mean to tell me we’re spending 40 seconds checking the entropy of blocks, and hence almost doubling compilation time?

I’m not saying the results are wrong, but this is the kind of unexpected outlier that should make you quintuple check your method and propose a mechanism by which this happens.

More runs to figure out variance would also help eliminate doubts, but this extreme of an outlier really needs investigation even if you are certain about the measurement.

Either way, yes, representative benchmarks for full system performance under compressed btrfs becomes way more complex still, which is why I’m so doubtful of any firm recommendations, especially if they aren’t just “use btrfs defaults”. For real recommendations you’d need long-term stats for a large number of systems.

2 Likes

Good points! Mostly I was only thinking about the savings_pct value. Timings were also generated for the whole script, not just the copy. The underlying filesystem was just tmpfs, so also maybe not best for checking actual performance. Tentatively just recommending compress=zstd:1 if you have reasonably fast HW, as it does not seem to hurt.

Will try to improve methodology later today.

1 Like

Table from copying extracted nix-store.squashfs from the current nixos-unstable graphical ISO. elapsed_sec is now only time from cp, updated previous post to match new script.

mode level elapsed_sec disk_gb uncompressed_gb savings_pct_btrfs savings_pct_compsize backing_fs
compress 1 154.937 5.05 9.31 45.92% 45.74% tmpfs
compress-force 1 145.332 4.54 9.31 51.59% 51.29% tmpfs
compress 1 120.660 5.09 9.31 45.53% 45.35% ext4
compress-force 1 134.034 4.55 9.31 51.48% 51.18% ext4

The faster compress-force probably came from running on tmpfs.

Still only checking a single workload here, not a real mixed /nix/store and /home.

Not sure if I want to spend more time on this, but feel free to try out my script and modify it and run your own tests.

The conclusion I’m getting is that setting compress=zstd:1 is fine if you have decent hardware, and compress-force=zstd:1 is also good if you really want to squeeze out as much storage as you can from your disk and can tolerate the extra overhead. I would not go for higher compression levels, 3 saves slightly more space but is slower.

Also no compression is fine too.

The variance on that is huge. The difference between the two looks to be within variance (and also within variance of all the other compression levels you tested previously, except for level 15), so there looks to be no verifiable speed difference given your benchmarks.

As such, I don’t think the data leads to your conclusion, in fact, it implies the opposite, you should use level 9 and force compression for that extra reduction in disk space use because higher compression levels have no measurable impact on write performance. Not that I think that conclusion is correct (you never benchmarked no compression, anyway, or didn’t share data for it), I’m trying to demonstrate that the data lacks enough precision to come to any conclusion.


No need to continue testing on my behalf of course, but to give extra context for people reading by: Besides having limited accuracy, this is a microbenchmark between two very narrow options and doesn’t really tell us anything about actual system performance.

This benchmark does say with some certainty that there are files the heuristic deems incompressible despite being able to gain measurable extra space (though the docs tell us that, too). It also does give nice numbers for how compressible you can expect the nix store to be, which is a nice experiment!

The scenario tested here doesn’t however cover what the nix store is used for 99% of the time - reading from it. For desktop system partitions read performance tends to matter a lot more than write performance, because binaries (and shared libraries) are also files that have to be read from disk. Over time your page cache will fill up, but it’s quite unlikely that everything you’re running will stay in there all the time even if you have a lot of memory.

So the most noticeable part of disk speeds to the average desktop user is actually how fast your programs load and run. Since that heavily depends on how you use your system at any given moment, as well as details like the size of your system memory, and how many browser tabs you keep open, it’s also the hardest to say anything about with microbenchmarks like this :wink:

Paradoxically, btrfs compression can actually improve read speeds - smaller files get read into memory faster. For very modern NVMEs the CPU is becoming the bottleneck even for basic reads, though, so my conjecture would be that compression probably has a worse performance impact on modern systems than older ones.

If nothing else these statistics are fun to look at, but it’s incredibly hard to predict anything about system performance with them.

1 Like

Interesting post! I’ll still start with compress=zstd:1 when I get my next machine (current machine still on ext4 on Pop!_OS), since I don’t seem much harm in it, but what btrfs compression is worth it is clearly something that is quite hard to test.

1 Like

Hello back, changed the post a bit. No LLMs involved lol

I didnt know of the 2 fastest btrfs hashsums before, so I only thought of blake2 in comparison to sha265 which are the most used for files.

Removed the unneeded part and indeed the high compression might not be perfect. I read somewhere that it doesnt matter, but in my experiences highest level can cause some unexpected slow outcomes indeed.

I didnt know acl was on by default, I needed it in the past and it wasnt there.

I have a separate subvolume for /nix/store, are there any tweaks that can be done on that level? Maybe an LVM could be used to dynamically resize, otherwise having a separate partition sounds like it has too many drawbacks

1 Like

Sadly no. Per-subvolume settings or one of these things that where somewhere on the horizon for many years by now, but haven’t materialized yet.

Btrfs uses (afaik) the settings of the first subvolume that gets mounted.

That’s one of the nice things bcachefs was supposed to offer, before any chance of taking it seriously ended when its developer got an LLM girlfriend he thinks is conscious.

3 Likes

Enabling compression at all tends to have fairly little gain on your typical desktop IM[O], most large files are stored in compressed file formats anyway.

It seems to me like btrfs compression is seriously underrated, actually. Here’s the output of compsize /nix/store on my desktop, as an example:

Processed 13091949 files, 2977318 regular extents (7932052 refs), 9075901 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       55%      227G         410G         896G       
none       100%      165G         165G         353G       
zstd        25%       62G         245G         542G       

Or maybe there’s something very wrong here :)

edit: for context, this is with compress=zstd:8 (not forced)

1 Like

I think the fact that your nix store is 400GB is the issue; the nix store does seem very compressible, but if it contains more than ~60GB in total something is very wrong, and 30GB on your typical 1TB+ system is not really much.

Other parts of the filesystem compress much less well, IME, for me the space savings are ~10% (with compress=zstd).

But yeah, if you’re letting the nix store eat your entire disk capacity, compression will be very useful!

1 Like

I’d object here. It really depends on how extensive you use nix on the system.

But I agree on the conclusion, that if you have a lot in the store, deduplicating it and compressing it, will have a much better effect than if its barely more than one or two generations of the system itself.

2 Likes