Recently I’m experimenting with several zfs features, I learned that zfs can do block level deduplication.
Has anyone tried it on /nix/store?
Since we already have nix optimise-store
, can I still benefit from the block level deduplication?
The deduplication implementation of zfs lowered the write performance, but /nix/store is read-only in most of time, I think it doesn’t matter, am I right about that?
No answer but acouple of points
-
Disable auto store optimisation for now · NixOS/nix@6c4ac29 · GitHub because auto-optimize can cause performance degradation (not zfs related). My guess would be that in an enterprise fs like zfs, this wont happen.
-
nix store is written maybe 99% of times from cache and my internet vs disk speeds are 1.x orders of magnitude in favor of disk. Unless deduplication slows down more than 10x, it’s going to be helpful for me.
Have you read this post-
ZFS: To Dedupe or not to Dedupe...?
Also this- How To Size Main Memory for ZFS Deduplication
Hoping that infrequent writes would make zfs remove the deduo table from RAM, I see dedup has an almost free saving.
dedup is almost never worth it, but nix store is significantly different than most other use cases.
I would, however, recommend using compression. I get compressratio’s around 1.6 to 2. Which is definitely nice for users who use spinning disks, as it essentially doubles your I/O bandwidth
Despite my /nix/store
being optimized I still get ZFS compression ratios of over 1.5, so compression seems to be worth it. Don’t know about dedup.
Using auto-optimize and lz4 compression now, works great for me
Can you post the output of
sudo zfs get all <MYPOOLNAME> | grep compressratio
and sudo zpool get all <MYPOOLNAME> | grep dedupratio
I don’t know how to check zfs ram usage but would you happen to know how it has changed after enabling dedup? (I got the sense from skimming the linked article that a rule of thumb would be 1/200 of used size of the dataset)
I just migrated my old installation to new one on zfs.
Old- ext4, new zfs pool with dedup=on
and compression=on
old:
du -sh /nix/store
: 46GB
new (after only copying /nix/store, not actual installation):
➤ sudo zfs get all tank/nix | grep compress
tank/nix compressratio 1.84x -
tank/nix compression on local
tank/nix refcompressratio 1.84x -
➤ sudo zpool get all tank | grep dedup
tank dedupditto 0 default
tank dedupratio 1.70x -
➤ zpool list tank
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank 113G 17.1G 95.9G - - 10% 15% 1.69x ONLINE -
Goes without saying that dedup is pretty useful (only if you don’t use store optimization, if you do use store optimization then dedup wins you nothing). That said, my computer was unusable for 3 hours for just 46 GB. It’s an old machine but I think I regularly get 100 MB/s copying on my HDD. Since /nix/store is ready-heavy, I guess I won’t mind the occasional slowdown …
Note also that copying stores is probably not the right approach since it means that your sqlite db is not built which means everything is at risk for garbage collection. I did this only for testing … (see: Rebuild sqlite db from scratch? · Issue #3091 · NixOS/nix · GitHub)
according to linux - How can I determine the current size of the ARC in ZFS, and how does the ARC relate to free or cache memory? - Super User you can ping /proc/spl/kstat/zfs/arcstats
for arc metrics.
And it seems to align with my system (256 GB of ram):
[12:00:03] jon@nixos ~
$ awk '/^size/ { print $1 " " $3 / 1048576 }' < /proc/spl/kstat/zfs/arcstats
size 41537.6
$ sudo zfs list
NAME USED AVAIL REFER MOUNTPOINT
nixstore 747G 1.03T 24K /nixstore
nixstore/store 746G 1.03T 746G legacy
tank 328G 6.70T 104K /tank
tank/movies 7.09G 6.70T 7.09G /tank/movies
tank/nixstore 112K 6.70T 112K /tank/nixstore
tank/swap 272G 6.96T 5.55G -
tank/torrents 48.4G 6.70T 48.4G /tank/torrents
I use lz4 compression with nix-store optimize. But I may just switch to zfs dedup
if I do it again.
dedup property can be set at any time …
yes, but i already set my nix-store to optimisize, and I think it may be largely duplicated work.
However, I may try it next time i nix-collect-garbage
The big issue with dedup is that it uses quite a lot of memory, and it’s scattered randomly across the disk, making it slow to read (and AFAIK, the entire table has to be read just to import the pool). Like it can take hours just to import a large array made of HDDs.
For the nix store… Eh, for most people I guess it’s small enough to not be such an issue. I wouldn’t count on it being all that much better than auto-optimize though.
ZFS dedup actually works while writing data. It calculates the checksum of a new block and then checks its table if a block with the same checksum is already on the disk. If yes, instead of writing the new block, ZFS just points to the old block in the metadata. This means that ZFS needs the full table of all blocks and their checksums in RAM while writing and a fast CPU. If it does not fit into the ZFS ARC, then ZFS will happily re-read the missing part of the table from the disk for each write. ZFS dedup is heavily biased towards servers with loads of RAM and enabling it without calculating the required RAM and adjusting the ZFS ARC size for it may cause massive performance hits instantly or later on, when the block list becomes too large.
Due to the way ZFS ARC size is set, you may not even see a memory increase, as ZFS happily uses ~50% RAM if it would otherwise be free for its own cache. But that cache may shrink due to the size of the block table it needs for dedup, while ARC size remains the same.
ZFS dedup is completely transparent during read, as it’s just a block pointer. If that block is used multiple times, it simply does not matter. For that reason it also should not affect pool import times.
Good thing is that you can just disable it at any time.
Careful with the set dedup=off
, that or set atime=off
commands, one of these very likely destroyed my zfs partition (I was on root and issued these after creation and installation)
Also the link I gave in my first post touch on the RAM usage and it depends on your data size and it’s not that bad …
EDITED: Sorry my original post was unclear.
EDIT2: I meant running these commands afterwards. I otherwise have great experience with a dedup=off
pool which started as such from the beginning.
I’ve been using atime=off
and haven’t add any issues. Essentially it just prevents another write when accessing files. And since the nix-store doesn’t care about time it seems like a good fit.
This might be just for my use case of doing a lot of reviews, but dedup is really helping.
$ zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
nixstore 1.81T 462G 1.36T - - 29% 24% 1.79x ONLINE -
$ zfs get all nixstore/store | grep compressra
nixstore/store compressratio 1.85x -
might be able to extend the usefulness of 2TB well past it’s original 2TB
This is without store auto-optimize right? Can someone with auto-optimize on post their store stats? Specifically I’m looking for two things:
-
du -sh
on nix store (will take at least 15 minutes) - optimise reported savings (typically at the end of garbage collectino)
this is with auto optimize I believe, at least I had it on and I’m not aware of a way to disable it.
One thing to note is that my server now floats like a baseline of 100-180 GB of zfs arc + dedup (out of 256GB). However, I haven’t really suffered memory pressure so it hasn’t affected performance too much.
Can’t believe I’m telling you … but what does
nixos-option nix.autoOptimiseStore
return?
I ask because 1.79 with auto-optimise on is very suspicious … There’s nothing zfs does over /nix/store optimize (zfs does block level dedup but saying that 79 out of 179 blocks are equal yet files are not doesn’t seem right to me).
Can you keep an eye out during your next garbage collection and see what savings it reports?
It’s false, I guess I did nix-store --optimise
once, but it was just a one-time action, not a persistent action.
After botching up my previous zfs install, I reinstalled nixos on zfs and this time (by accident) I had both zfs’s dedup
and nix.autoOptimiseStore
on. Amazingly zfs still managed to give me a 1.35x
deduplication. Substantial but not sure if I would like to pay the RAM cost.
Compression ratio went up to 1.95 (maybe just a difference of data …?)