Does anyone have experience with NixOS using zfs with high zfs_txg_timeout (of the zfs filesystem)?
(Yes, I do have a UPS. No, I would rather not set it to a low value.)
There is a problem that at the end of nixos-rebuild switch/boot adding the new generation will be delayed by zfs_txg_timeout … e.g. if zfs_txg_timeout is 60 seconds, then even a very simple rebuild will take 60 seconds longer. It is apparently waiting for changes to get commited to disk, but perhaps isn’t using sync writes to accomplish it (?) so it waits around until zfs itself decides to commit the current transaction group.
The same problem happens when shutting down. Unmounting the zfs datasets takes around zfs_txg_timeout seconds, which is zfs_txg_timeout seconds more than it needs. Any ideas on what to do with this?
Calling zpool sync (from another terminal) when it gets “stuck” like this allows it to continue (which is proof it waits around until zfs commits current writes). Perhaps zpool sync can be set up to be called every time the configuration is being saved and every time the datasets are unmounting during shutdown? But I don’t know if that is just a crude fix that can be done better some other way.
I have also been able to determine that the part of nixos-rebuild that is delayed by this is nix-env -p … --set … , and that the sync command, when ran as root, has similar behavior of being slowed down this way (even if there are no writes to do).
uep
March 5, 2025, 10:45pm
2
I suspect that updating the nix store sqlite database may be the underlying cause, in the case of configuration switch. There are a particular sequence of operations in sqlite for locking and opening the database, that result in a sync and then a need to wait for the next txg to close, up to 5s by default, more in your case:
opened 02:51PM - 15 Dec 22 UTC
Type: Defect
<!--
Thank you for reporting an issue.
*IMPORTANT* - Please check our issue … tracker before opening a new issue.
Additional valuable information can be found in the OpenZFS documentation
and mailing list archives.
Please fill in as much of the template as possible.
-->
### System information
Type | Version/Name
--- | ---
Distribution Name | Fedora
Distribution Version | 36
Kernel Version | 5.15.82
Architecture |x86_64
OpenZFS Version |2.1.7
<!--
Command to find OpenZFS version:
zfs version
Commands to find kernel version:
uname -r # Linux
freebsd-version -r # FreeBSD
-->
### Describe the problem you're observing
Usually `rpm` commands execute quickly, but pretty often (10% of the cases) there is 4 second extra delay:
```
16:22:55.694445 openat(AT_FDCWD, "/usr/lib/sysimage/rpm/rpmdb.sqlite-shm", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = 6 <0.000086>
16:22:55.694563 newfstatat(6, "", {st_dev=makedev(0, 0x1e), st_ino=27231358, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=131072, st_blocks=18, st_size=32768, st_atime=1671114175 /* 2022-12-15T16:22:55.105109260+0200 */, st_atime_nsec=105109260, st_mtime=1671114175 /* 2022-12-15T16:22:55.106109257+0200 */, st_mtime_nsec=106109257, st_ctime=1671114175 /* 2022-12-15T16:22:55.106109257+0200 */, st_ctime_nsec=106109257}, AT_EMPTY_PATH) = 0 <0.000011>
16:22:55.694692 geteuid() = 0 <0.000010>
16:22:55.694759 fchown(6, 0, 0) = 0 <0.000047>
16:22:55.694855 fcntl(6, F_GETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=128, l_len=1, l_pid=0}) = 0 <0.000010>
16:22:55.694909 fcntl(6, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=128, l_len=1}) = 0 <0.000009>
16:22:55.694953 ftruncate(6, 3) = 0 <4.372489>
```
I have ZFS as root fs running on top of **LUKS2**. NVMe Corsair MP510 (Max Random Write QD32 IOMeter: Up to _440K_ IOPS) with practically zero other IO operations when I run `rpm`.
### Describe how to reproduce the problem
Just try rpm again.
### Include any warning/errors/backtraces from the system logs
<!--
*IMPORTANT* - Please mark logs and text output from terminal commands
or else Github will not display them correctly.
An example is provided below.
Example:
```
this is an example how log text should be marked (wrap it with ```)
```
-->
zpool show no errors and:
```
NAME PROPERTY VALUE SOURCE
tank type filesystem -
tank creation Wed Nov 10 15:40 2021 -
tank used 1.39T -
tank available 296G -
tank referenced 96K -
tank compressratio 1.29x -
tank mounted no -
tank quota none default
tank reservation none default
tank recordsize 128K default
tank mountpoint /tank default
tank sharenfs off default
tank checksum on default
tank compression zstd-3 local
tank atime on default
tank devices on default
tank exec on default
tank setuid on default
tank readonly off default
tank zoned off default
tank snapdir hidden default
tank aclmode discard default
tank aclinherit restricted default
tank createtxg 1 -
tank canmount off local
tank xattr sa local
tank copies 1 default
tank version 5 -
tank utf8only on -
tank normalization formD -
tank casesensitivity sensitive -
tank vscan off default
tank nbmand off default
tank sharesmb off default
tank refquota none default
tank refreservation none default
tank guid 8575689710526589949 -
tank primarycache all default
tank secondarycache all default
tank usedbysnapshots 0B -
tank usedbydataset 96K -
tank usedbychildren 1.39T -
tank usedbyrefreservation 0B -
tank logbias latency default
tank objsetid 54 -
tank dedup off default
tank mlslabel none default
tank sync standard default
tank dnodesize auto local
tank refcompressratio 1.00x -
tank written 0 -
tank logicalused 1.77T -
tank logicalreferenced 42K -
tank volmode default default
tank filesystem_limit none default
tank snapshot_limit none default
tank filesystem_count none default
tank snapshot_count none default
tank snapdev hidden default
tank acltype posix local
tank context none default
tank fscontext none default
tank defcontext none default
tank rootcontext none default
tank relatime on local
tank redundant_metadata all default
tank overlay on default
tank encryption off default
tank keylocation none default
tank keyformat none default
tank pbkdf2iters 0 default
tank special_small_blocks 0 default
```
and it shows up as similar delays for several applications, e.g.
opened 12:59PM - 17 Feb 24 UTC
I was getting extremely slow speeds with the default sqlite backend on zfs, I sw… itched to postgres and got a 24x upload rate speedup. I was also hitting https://github.com/zhaofengli/attic/issues/24 occationally.
Could there be an option to turn off fsync on the sqlite backend? That should speed it up massively.
opened 04:22PM - 20 Apr 22 UTC
closed 07:14PM - 20 Feb 25 UTC
0.kind: bug
6.topic: kernel
### Describe the bug
ZFS on a desktop system with default kernel which is compi… led with PREEMTIVE_VOLUNTARY causes a system with terrible lagg, short hangs and very bad realtime behaviour. This is easily so see with jackd and mixxx for example.
If the kernel is compiled with these changes, the system behaves much better:
```
boot.kernelPatches = [ {
name = "enable RT_FULL";
patch = null;
extraConfig = ''
PREEMPT y
PREEMPT_BUILD y
PREEMPT_VOLUNTARY n
PREEMPT_COUNT y
PREEMPTION y
'';
} ];
```
### Steps To Reproduce
Steps to reproduce the behavior:
1. Do any ZFS file io
2. Run mixxx + jackd for example
3. Observe the stuttering and underuns
### Expected behavior
More behaviour similar to other filesystems
### Additional context
Upstream ticket: https://github.com/openzfs/zfs/issues/13128
### Notify maintainers
@wizeman @hmenke @jcumming @jonringer @fpletz @globin
### Metadata
```console
[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
- system: `"x86_64-linux"`
- host os: `Linux 5.16.20, NixOS, 21.11 (Porcupine)`
- multi-user?: `yes`
- sandbox: `yes`
- version: `nix-env (Nix) 2.4`
- channels(poelzi): `"home-manager-21.11, nixos-21.05.4726.530a53dcbc9"`
- channels(root): `"nixos-21.11.335665.0f316e4d72d"`
- nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
```
opened 08:06PM - 07 May 23 UTC
closed 08:04AM - 08 May 23 UTC
Edit:
This has become the canonical issue for Atuin/ZFS issues
If you're u… sing ZFS with Atuin, you have likely noticed an error such as the following:
```
Error: pool timed out while waiting for an open connection
Location:
/home/runner/work/atuin/atuin/crates/atuin-client/src/record/sqlite_store.rs:48:20
```
This is due to an issue with ZFS and SQLite. See: https://github.com/openzfs/zfs/issues/14290
There are two workarounds
1. Use the Atuin daemon
This has not yet been released as stable, however is mostly without issue. The daemon takes all SQLite writes off of the hot path, therefore avoiding the issue.
Follow the steps here: https://github.com/atuinsh/atuin/issues/952#issuecomment-2121671620
2. Create an ext4 zvol for Atuin
Follow the following steps: https://github.com/atuinsh/atuin/issues/952#issuecomment-1902164562
---
I've just begun using atuin, and I absolutely love it so far. However, there's been a recurring issue for me, which I've found hard to diagnose:
My prompt regularly blocks for between 500ms to 5s whenever I run a command. I've narrowed this down to the `_atuin_preexec` function, by manually importing the shell hook generated from `atuin init zsh` and annotating it with logging and `time` calls. Here's a sample time call from a time where it hang:
```
Running pre-exec for cd ~
0.00user 0.00system 0:04.93elapsed 0%CPU (0avgtext+0avgdata 8192maxresident)k
52036inputs+1064outputs (15major+512minor)pagefaults 0swaps
Pre-exec done for cd ~
```
Here's how I modified the hook to get the result:
```bash
_atuin_preexec() {
log "Running pre-exec for $1\n" >> /tmp/atuin.log
local id
id=$(/usr/bin/time -a -o /tmp/atuin.log atuin history start -- "$1")
export ATUIN_HISTORY_ID="$id"
echo "\nPre-exec done for $1" >> /tmp/atuin.log
}
```
I've tried to replicate the behavior in cli use outside of the hook using `hyperfine`, and was successful:
```
» hyperfine -r 1000 "atuin search --limit 5"
Benchmark 1: atuin search --limit 5
Time (mean ± σ): 18.3 ms ± 114.8 ms [User: 4.9 ms, System: 8.2 ms]
Range (min … max): 12.5 ms … 2587.9 ms 1000 runs
```
This does not happen on every benchmark, even with 1000 runs. My initial thought was that this has to be contention on the database file, but I saw that you're already using WAL, so concurrent writes/reads should not be a problem. I can also trigger the delay by repeatedly opening the search widget, which should not even be doing writes to the database, which confuses me even more.
Do you have any idea on how I could gather further data on this?
For shutdown, you could add a zpool sync to the relevant systemd unit?
1 Like
Thanks for showing me the relevant issues. I was having a very hard time trying to find anything mentioning the problem. Its a shame it is an unsolved problem.
Speaking of the systemd unit, what should I look for? All I have found is “zfs-mount.service” and a bunch of “***.mount” services(?) but I haven’t found anything called “umount” or “unmount”, which is what I’d expect to find.
1 Like
uep
March 6, 2025, 11:17pm
4
There is a zfs-sync
service, it doesn’t actually call zpool sync
, it sets a custom property on the root dataset of the pool, which I assume is supposed to trigger a txg close as part of export. You could try adding an explicit zpool sync there.