High CPU usage from ZFS after 23.11 update

After updating to NixOS 23.11, I’ve noticed performance regressions on two out of two machines now. There is a lot of processes named z_wr_iss, z_wr_iss_h, z_wr_int and also with rd instead of wr. These hog most of the cores and drive the load to 5-10 while the system is completely idle. This has happened on my Laptop and my Desktop. On the former one, opening a shell can now take 10 seconds. I did the update to 23.11 using nixos-rebuild boot instead of nixos-rebuild switch and this issue started popping up immediately after reboot.

Both systems run encrypted ZFS on a single SSD and these processes seem to be ZFS related. Has anyone noticed something similar? This seems to be quite worrying.

Can you provide your kernel too? And can you confirm the ZFS version? We have 2.1.14 and 2.2.x series now in the tree.

Sure, Kernel is 6.6.4 and ZFS version is zfs-2.2.2-1 with zfs-kmod-2.2.2-1.

If your ZFS pool does not make use of ZFS 2.2.x specific features (block cloning notably), you could try 2.1.14 and see if this is a 2.2.x regression only or if you can reproduce with ZFS 2.1.x, either case, this seems to be an upstream problem.

1 Like

Cannot confirm this behaviour. I am also on NVME encrypted with zfs nativly. Same zfs version, I didn’t upgrade my zpool though.

It’s worth investigating what’s causing the IO that drives this CPU usage, in case there’s anything obvious (like a scrub) going on.

One thing I noticed on desktops was that GNOME 45 seemed to want to reindex everything for the new search updates. This showed up with CPU usage in both zfs and the tracker-miner process, so was fairly obvious - you might need to look for something similar.

This was indeed an unlucky coincidence. The Laptop was running a ZFS scrub right after the update and the desktop had the KDE file indexer run immediately after. And what they had in common was a high CPU usage by the mentioned ZFS processes. After these things were finished it seems as if performance is back to normal pretty much.

Still gotta figure out why ZFS now takes 10 seconds to start after updating which prompted looking into this in the first place. But I think ZFS can be excluded as the culprit. Thanks for the suggestions.

2 Likes