Aweful nix store performance on zfs


I recently switched hard drives on my laptop thinking my nvme drive was failing due to some pretty serious locking and freezing that randomly started about a week ago. Normally I would think it somes system level bit somewhere, but since I am using NixOS, I know for a fact my configuration hasn’t changed.

Anyway, I gave up trying to fix it and decided to move my system to another drive. I decided to give zfs a go, and it seemed to work out pretty well at first. But some of the issues I was having are reappearing. In particular, I ran nix-collect-garbage about 15 minutes ago and it’s still working. This is actually even slower than it was getting on my previous drive. Some of that is to be expected as I downgraded to a sata ssd for now, but I didn’t think it would be this bad.

At least my system is no longer deadlocking and crashing (I think that drive really was having issues) but it seems pretty far from optimized at this point. In particular, I’ve noticed my browser freezing on heavy io tasks like garbage collection and it was never doing this before.

My laptop is featuring a full featured desktop grade i9-9900k, so I really wouldn’t expect anything like this from my CPU. I was hoping perhaps there are some zfs settings I could tune to improve things, wondering if anyone else has experienced this with zfs.

I have a replacement nvme drive coming and I’m gonna try to continue to use zfs once it arrives but I’m hoping I can at least fix the annoying lock ups by then.

1 Like

do you get any meaningful logs? What kernel are you running, try running a few old ones with your system…

1 Like

Some things come to mind:
Have you hand-tuned the ARC?
Do you have swap on a ZFS dataset?
Do you have swap at all?
Is your kernel version supported by ZFS?
How much free space is in the pool?
What is the pool fragmentation?
Are you using any fancy features? (compression, encryption, deduplication)

  • I haven’t messed with the ARC
  • I don’t have any swap whatsoever (I have 56 GM of ram in this system)
  • kernel is default kernel for latest nixos-unstable 5.4.104
  • plenty of freespace in the pool. current: 73GB with a 2GB reservation
  • Not sure how to check fragmentation, but I doubt it is as I just created the pool 2-3 days ago
  • I am using compression and encryption. I thought about disabling encryption of my nix store, but unfortunately some of my secrets are provisioned in the store atm. I figured zstd was fast enough not to cause an issue but I’ll try disabling it for the store dataset and see if it helps.
  • I did tune the ashift to 9 on pool creation as I have a 512 block size and read (somewhere) that this was the right value.

When your system seems to be stuck on IO, could you try this command? It helped me in a similar issue, though I haven’t investigated further yet:

# echo 3 > /proc/sys/vm/drop_caches

If your system reacts well immediately after that, it’s probably an issue with the zfs arc that the system is trying to shrink but for some reason cannot shrink enough, but could also be in another part of the kernel.

512 byte block size would surprise me a bit, though I’m not really up to speed regarding hardware. A while ago pretty much all flash memory used 4k block sizes in the backend, but that may have changed without me noticing.

Yea for some reason almost all consumer hard drives (SSDs usually included) just lie about their sector size. ashift=9 is likely to be a major perf problem with any amount of sustained or random load.

1 Like
# echo 3 > /proc/sys/vm/drop_caches

This helped immediately, so it looks like I’ll have to tune the ARC, but first I’ll have to get the ashift set to 12. Is it true that ashift cannot be adjusted after pool creation? Thankfully this isn’t too big of a deal since I haven’t really done much on the drive the last few days.

I recreated my zpool with ashift of 12 and it seems to have resolved everything without tuning anything else. Out of curiousity though, do you think perhaps a larger record size than the default may benefit the nix store dataset?

I doubt there’d be any appreciable improvement from increasing recordsize. A large recordsize is really only useful if you know you’re going to be demanding a ton of large, sequential IO. The default 128K works quite well for most things.