ZFS yes or not?

The idea is that you use select the kernel you want to use yourself i.e. the latest lts kernel.
I have here some module that restores the old behavior:

1 Like

It’s missing a closing parens but otherwise, feel free to send a PR to word it better.

It now points at the default kernel (pkgs.linuxPackages).

Depends on your needs. If you need something newer than the latest LTS kernel (the default), you need to pin a kernel explicitly.

If that kernel works for you, then yes. Otherwise, you need to pin another kernel but it’s likely that this will leave you unable to update in the future because that kernel is likely to be removed before ZFS gains support for the next one.

It says to pin a different kernel, not an LTS kernel specifically. If the example is confusing, please send a PR.

With ZFS, if all copies of a piece of metadata are corrupted, …I don’t actually know what would happen because I don’t think I’ve ever seen any sort of recovery path mentioned in the ZFS docs.
I assume you’re just screwed?

I don’t know the details, but on disk ZFS is just a bunch of objects laid out in some arbitraty way on disk, pointing to each other using block pointers, (disk number, offset in 512 cbunks from end of second label). So if a object gets corrupted more towrds the leafs, then a directory may go poof. If a very root object gets corrupted then it won’t mount. Depends on where the corruption occurs, which is a good idea to to keep 2 copies of metadata even on a single disk pool, just to guard against the root of the object tree going poof.

As a sidenote, if the root goes poof, zfs will fallback to a older uberblock, so youll loose a couple of writes but it should still mount without any problems, since everything id CoW, uberblock n and n - 1 have different root nodes completely.

So in fact, it’ll almost always mount, yay for ZFS

1 Like

This is quite different from how btrfs does it I believe which has a forest of large btrees (hence the name) that contain all of the metadata.

As I understand it, this is required for btrfs’ flexibility. If you’ve ever needed to adapt ZFS to something other than the initial condition, you’ll know that data written into ZFS is as good as written into stone; you cannot change it at a later point. No defragmentation, no resize, no (actual) device removal, no out-of-band dedup or recompression etc. I guess that makes it a trade-off.

ZFS does this by default (for most metadata) but btrfs also does it (for all metadata) and clearly it can suffer from metadata corruption in all copies at once (bad RAM would do that), so there’s no difference in this regard.

I’m not sure if btrfs can do this in any meaningful way. That might be quite significant.

I know that you can make it fall back to a different copy of the superblock should one of them get corrupted (only via explicit opt-in mount flag which is sane IMHO), but not whether it’d be technically possible to make it fall back to a previous metadata root node revision or how long they remain valid.

2 Likes

Depending on where your mind places the closing braces (after two words, or after five) this sentence can become extremely confusing almost meaning the opposite.

It’s supposed to come after two words.

zfs’ killer feature for my purposes is its native encryption, since that happens per-block, unlike btrfs where you have to rely on fully encrypted partitions with luks.

This lets you do zero-knowledge, properly deduplicated backups by just sending blocks to another machine. This is way cleaner than using restic, borg or such, gets rid of several layers of indirection.

In practice I use btrfs, since that’s a small use case and it’s hard to find cheap backup hosts that’ll take zfs snapshots, but I sure wish I could have this.

2 Likes

I have a couple that cost around US$10 a month for 2 TB. Is that cheap? They’re deals I got as special offers, but special offers come around quite frequently, and also at least one of them is spare and I can transfer it to you if you like.

What is the latest with the ECC recommendation for zfs?

ECC is recommended in any case, no matter what filesystem.

5 Likes

Just to elaborate a bit on that, it’s a common myth that ZFS “needs” ECC. The reality is that ZFS doesn’t need it any more than any other FS. The myth comes from the fact that ZFS is often used in enterprise servers, where ECC is a must regardless of which file system the server is using.

8 Likes

Also that ZFS is better at detecting errors (because checksums fail) but of course by then it’s often too late; it’s hard to know whether the error was in memory when the data was written, in storage, or when being read back. ECC makes such errors less likely overall, and the remainder are more likely to be about the storage layer - which is what people think they’re using ZFS for.

Sometimes instead they’re using it to learn about bad memory, and that starts out looking like ZFS being unreliable because it’s reporting the consequences. That has fed the myth. ZFS can solve data corruption in storage (with redundancy, and metadata is redundant by default even on a single disk) but it can’t solve bad memory. That has fed the myth.

So the typical advice is that if you want the integrity protection of ZFS, you also want the integrity protection of ECC. Two good things that go well together. It’s totally valid advice.

However, wanting ECC is often basically irrelevant, it can be hard to get. Basically no laptops, and almost no desktops, support ECC, and we still need to use our data on these machines. ZFS is easy to get, and I will happily use it to help detect when memory might be going bad especially on machines where I can’t have ECC.

3 Likes

While we’re talking about ZFS I have to say that this issue is pretty annoying if you want to use ZFS on removable media: zpool commands block when a disk goes missing / pool suspends · Issue #3461 · openzfs/zfs · GitHub

if the physical media goes missing, you can find yourself unable to run any zfs command, even on other filesystems, even if the physical media comes back, up to and including your reboot process will get stuck unable to unmount the filesystem and you need to hard-power-off the machine.

I use btrfs as my root volume, but use zfs as a “hedge” in case I get bitten by btrfs bugs, which has happened to me from time to time. I wrote about my backup strategy here: GitHub - bmillwood/backups

1 Like