Docker fails to start

I have a strange problem with starting docker. Two systems which are configured the same (as far as docker is concerned; the boxes have lots of other configuration differences) and one runs (box1) while the other does not (box2). I am hoping someone can point me in the right direction as I have been unable to figure out the problem.

Both boxes are at the same version of NixOS (19.09.2345.6d445f8398d):

[me@box1:~]$ cat /etc/issue
<elided>
<<< Welcome to NixOS 19.09.2345.6d445f8398d (\m) - \l >>>
<elided>
[me@box2:~]$ cat /etc/issue
<elided>
<<< Welcome to NixOS 19.09.2345.6d445f8398d (\m) - \l >>>
<elided>

But docker works on box1 but not on box2:

[me@box1:~]$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[me@box2:~] $ docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

The docker configurations are exactly the same on both boxes (and in fact are from the same virtualization.nix cloned from a git repo):

[me@box1:~]$ grep -rni docker /etc/nixos/virtualization.nix
virtualization.docker.enable = true;
users.users.me.extraGroups = [ "docker" ];
[me@box2:~]$ grep -rni docker /etc/nixos/virtualization.nix
virtualization.docker.enable = true;
users.users.me.extraGroups = [ "docker" ];

On box1, docker is running fine:

[root@box1:~]# systemctl status docker.service
● docker.service - Docker Application Container Engine
   Loaded: loaded (/nix/store/7qjn8j5cy2imizis9lys222ak7smql34-docker-19.03.5/etc/systemd/system/docker.service; enabled; vendor preset: enabled)
  Drop-In: /nix/store/liy275aqw412ggyd3xzfxj7cn235s1hk-system-units/docker.service.d
           └─overrides.conf
   Active: active (running) since Fri 2020-03-27 04:44:23 EDT; 5 days ago
     Docs: https://docs.docker.com
 Main PID: 21791 (dockerd)
       IP: 1.3G in, 1.4M out
    Tasks: 46 (limit: 4915)
   Memory: 227.7M
      CPU: 41min 57.026s
   CGroup: /system.slice/docker.service
           ├─21791 /nix/store/7qjn8j5cy2imizis9lys222ak7smql34-docker-19.03.5/libexec/docker/dockerd --group=docker --host=fd:// --log-driver=journald --live-restore
           └─21911 containerd --config /var/run/docker/containerd/containerd.toml --log-level info

Apr 01 16:31:34 box1 dockerd[21791]: time="2020-04-01T16:31:34.139669954-04:00" level=info msg="Layer sha256:5969d16c5b6921e0be2f921c0542404f748fd883fb3aef80146cd5a19e02c56a cleaned up"
Apr 01 16:31:42 box1 dockerd[21791]: time="2020-04-01T16:31:42.677811415-04:00" level=info msg="Layer sha256:cb5548570d4596dfd019ae24bd3821d44bd5b1cec116384fae127852fd6569dd cleaned up"
Apr 01 16:31:43 box1 dockerd[21791]: time="2020-04-01T16:31:43.944538688-04:00" level=info msg="Layer sha256:cb5548570d4596dfd019ae24bd3821d44bd5b1cec116384fae127852fd6569dd cleaned up"

While on box2 it does not:

[root@box2:~]# systemctl status docker.service
● docker.service - Docker Application Container Engine
   Loaded: loaded (/nix/store/7qjn8j5cy2imizis9lys222ak7smql34-docker-19.03.5/etc/systemd/system/docker.service; enabled; vendor preset: enabled)
  Drop-In: /nix/store/3r2gm2ik5m393p2jq8zs42xz2kj2aa51-system-units/docker.service.d
           └─overrides.conf
   Active: failed (Result: exit-code) since Wed 2020-04-01 20:39:26 EDT; 1h 11min ago
     Docs: https://docs.docker.com
  Process: 13236 ExecStart=/nix/store/7qjn8j5cy2imizis9lys222ak7smql34-docker-19.03.5/bin/dockerd --group=docker --host=fd:// --log-driver=journald --live-restore (code=exited, status=1/FAILURE)
 Main PID: 13236 (code=exited, status=1/FAILURE)
       IP: 0B in, 0B out
      CPU: 96ms

Apr 01 20:39:26 box2 systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Apr 01 20:39:26 box2 systemd[1]: Stopped Docker Application Container Engine.
Apr 01 20:39:26 box2 systemd[1]: docker.service: Start request repeated too quickly.
Apr 01 20:39:26 box2 systemd[1]: docker.service: Failed with result 'exit-code'.
Apr 01 20:39:26 box2 systemd[1]: Failed to start Docker Application Container Engine.

Comparing start up messages in the system journal (showing only the lines which differ, excluding differences in machine names, kernel versions, dates/times, etc), box1 succeeds:

[root@box1:~]# journalctl -u docker.service
<elided>
Mar 27 04:44:23 box1 dockerd[21791]: time="2020-03-27T04:44:23.472232856-04:00" level=warning msg="Usage of loopback devices is strongly discouraged for production use. Please use `--storage-opt dm.thinpooldev` or use `man dockerd` to refer to dm.thinpooldev section." storage-driver=devicemapper
Mar 27 04:44:23 box1 dockerd[21791]: time="2020-03-27T04:44:23.495267796-04:00" level=warning msg="Base device already exists and has filesystem xfs on it. User specified filesystem  will be ignored." storage-driver=devicemapper
Mar 27 04:44:23 box1 dockerd[21791]: time="2020-03-27T04:44:23.531122236-04:00" level=info msg="[graphdriver] using prior storage driver: devicemapper"
Mar 27 04:44:23 box1 dockerd[21791]: time="2020-03-27T04:44:23.531164648-04:00" level=warning msg="[graphdriver] WARNING: the devicemapper storage-driver is deprecated, and will be removed in a future release"
Mar 27 04:44:23 box1 dockerd[21791]: time="2020-03-27T04:44:23.548394425-04:00" level=error msg="layer does not exist, not restoring image sha256:8e090d9629330f49c7b88def487ff984d708463d2022ac1994bb68e3a4b1ae19, sha256:c756f7785bb37606fb32e16ca04758c2a3ba0d53928aef7a1c0431f60ba794f4, linux"
Mar 27 04:44:23 box1 dockerd[21791]: time="2020-03-27T04:44:23.561426329-04:00" level=warning msg="Your kernel does not support cgroup rt period"
Mar 27 04:44:23 box1 dockerd[21791]: time="2020-03-27T04:44:23.561440099-04:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Mar 27 04:44:23 box1 dockerd[21791]: time="2020-03-27T04:44:23.561628650-04:00" level=info msg="Loading containers: start."
Mar 27 04:44:23 box1 dockerd[21791]: time="2020-03-27T04:44:23.688613784-04:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Mar 27 04:44:23 box1 dockerd[21791]: time="2020-03-27T04:44:23.722178192-04:00" level=info msg="Loading containers: done."
Mar 27 04:44:23 box1 dockerd[21791]: time="2020-03-27T04:44:23.738616490-04:00" level=info msg="Docker daemon" commit=633a0ea838f10e000b7c6d6eed1623e6e988b5bc graphdriver(s)=devicemapper version=19.03.5
Mar 27 04:44:23 box1 dockerd[21791]: time="2020-03-27T04:44:23.738683060-04:00" level=info msg="Daemon has completed initialization"
Mar 27 04:44:23 box1 dockerd[21791]: time="2020-03-27T04:44:23.753080422-04:00" level=info msg="API listen on /run/docker.sock"
Mar 27 04:44:23 box1 systemd[1]: Started Docker Application Container Engine.

While box2 which does not:

[root@box1:~]# journalctl -u docker.service
<elided>
Apr 01 20:39:25 box2 dockerd[13236]: time="2020-04-01T20:39:25.154392684-04:00" level=error msg="[graphdriver] prior storage driver zfs failed: prerequisites for driver not satisfied (wrong filesystem?)"
Apr 01 20:39:25 box2 dockerd[13236]: time="2020-04-01T20:39:25.154712494-04:00" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
Apr 01 20:39:25 box2 dockerd[13236]: time="2020-04-01T20:39:25.154721394-04:00" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
Apr 01 20:39:26 box2 dockerd[13236]: failed to start daemon: error initializing graphdriver: prerequisites for driver not satisfied (wrong filesystem?)
Apr 01 20:39:26 box2 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Apr 01 20:39:26 box2 systemd[1]: docker.service: Failed with result 'exit-code'.
Apr 01 20:39:26 box2 systemd[1]: Failed to start Docker Application Container Engine.
Apr 01 20:39:26 box2 systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Apr 01 20:39:26 box2 systemd[1]: Stopped Docker Application Container Engine.

As noted above, the kernel versions are different, perhaps because I forgot to reboot after nixos-rebuild on box1.

[me@box1:~]$ uname -a
Linux box1 4.19.109 #1-NixOS SMP Wed Mar 11 13:15:13 UTC 2020 x86_64 GNU/Linux
[me@box2:~]$ uname -a
Linux box2 4.19.113 #1-NixOS SMP Wed Mar 25 07:06:15 UTC 2020 x86_64 GNU/Linux

Starting dockerd by hand rather than letting systemd do it gives the same “API listen” message as the successful run on box1:

[root@box2:~]# dockerd  # works
<elided>
INFO[2020-04-01T20:57:23.239202099-04:00] API listen on /var/run/docker.sock

And now docker works:

[me@box2:~]$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

Comparing the system journal output for dockerd run by hand vs run by systemd: starting by-hand has errors (related to storage-driver=zfs, sandbox not found and failure to unmount) but in the end it succeeds while starting through systemd fails.

$ diff -ubB <(cut -c44- /tmp/box2.by-hand.log) <(cut -c83- /tmp/box2.systemd.txt)
<elided>
-level=info msg="[graphdriver] using prior storage driver: zfs"
-level=warning msg="Your kernel does not support cgroup rt period"
-level=warning msg="Your kernel does not support cgroup rt runtime"
-level=info msg="Loading containers: start."
-level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
-level=error msg="Error unmounting /var/lib/docker/zfs/graph/78ef607577558ab46ba682c1b18bca91e15b72c61abb7cd4910946ec4288836d: invalid argument" storage-driver=zfs
-level=error msg="Error unmounting /var/lib/docker/zfs/graph/d438772fd056272c9921c7772d39bb89a4b5faaebf633f024744d93da6e9ba82: invalid argument" storage-driver=zfs
-level=warning msg="error locating sandbox id 9b6add6e012710a6421b7fa6bb244b9f507f0f5ab4382ed7e89db05d1fa14878: sandbox 9b6add6e012710a6421b7fa6bb244b9f507f0f5ab4382ed7e89db05d1fa14878 not found"
-level=warning msg="cc5f8114468997d3e70a7e0cce90c18b75d26a46811951e05d8bdf5760235249 cleanup: failed to unmount IPC: umount /var/lib/docker/containers/cc5f8114468997d3e70a7e0cce90c18b75d26a46811951e05d8bdf5760235249/mounts/shm, flags: 0x2: no such file or directory"
-level=warning msg="error locating sandbox id 3f7db32a07a4bffe2a720c3f0a3f639e16eafa57c307b9519ae045b64c9fdff1: sandbox 3f7db32a07a4bffe2a720c3f0a3f639e16eafa57c307b9519ae045b64c9fdff1 not found"
-level=warning msg="d6f798e9a09b19187a6e8a59bbb74bf1e32fa4e6f87ca8e783cb175ba6ed7f7f cleanup: failed to unmount IPC: umount /var/lib/docker/containers/d6f798e9a09b19187a6e8a59bbb74bf1e32fa4e6f87ca8e783cb175ba6ed7f7f/mounts/shm, flags: 0x2: no such file or directory"
-level=warning msg="Failed to unmount 78ef607577558ab46ba682c1b18bca91e15b72c61abb7cd4910946ec4288836d mount /var/lib/docker/zfs/graph/78ef607577558ab46ba682c1b18bca91e15b72c61abb7cd4910946ec4288836d: no such file or directory" storage-driver=zfs
-level=warning msg="Failed to unmount d438772fd056272c9921c7772d39bb89a4b5faaebf633f024744d93da6e9ba82 mount /var/lib/docker/zfs/graph/d438772fd056272c9921c7772d39bb89a4b5faaebf633f024744d93da6e9ba82: no such file or directory" storage-driver=zfs
-level=error msg="d6f798e9a09b19187a6e8a59bbb74bf1e32fa4e6f87ca8e783cb175ba6ed7f7f cleanup: failed to delete container from containerd: no such container"
-level=error msg="Failed to start container d6f798e9a09b19187a6e8a59bbb74bf1e32fa4e6f87ca8e783cb175ba6ed7f7f: error creating zfs mount: mount rpool/root/nixos/d438772fd056272c9921c7772d39bb89a4b5faaebf633f024744d93da6e9ba82:/var/lib/docker/zfs/graph/d438772fd056272c9921c7772d39bb89a4b5faaebf633f024744d93da6e9ba82: no such file or directory"
-level=error msg="cc5f8114468997d3e70a7e0cce90c18b75d26a46811951e05d8bdf5760235249 cleanup: failed to delete container from containerd: no such container"
-level=error msg="Failed to start container cc5f8114468997d3e70a7e0cce90c18b75d26a46811951e05d8bdf5760235249: error creating zfs mount: mount rpool/root/nixos/78ef607577558ab46ba682c1b18bca91e15b72c61abb7cd4910946ec4288836d:/var/lib/docker/zfs/graph/78ef607577558ab46ba682c1b18bca91e15b72c61abb7cd4910946ec4288836d: no such file or directory"
-level=info msg="Loading containers: done."
-level=info msg="Docker daemon" commit=633a0ea838f10e000b7c6d6eed1623e6e988b5bc graphdriver(s)=zfs version=19.03.5
-level=info msg="Daemon has completed initialization"
-level=info msg="API listen on /var/run/docker.sock"
+level=error msg="[graphdriver] prior storage driver zfs failed: prerequisites for driver not satisfied (wrong filesystem?)"
+level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
+level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby

At this point, I am at my wits end. Any help in identifying the problem will be greatly appreciated.

Thanks to @azazel75, I was able to find a solution. See: