How to adjust filesystem max sizes during building with nix-build and dockerTools?

I am attempting to create a Docker image for development to run my project’s code that uses python 3.9, pytroch with CUDA, opencv (headless), and some custom/3rd party repositories separate from the project. However, when I attempt to nix-build containers/tmp.nix, the build process does not finish due to running out of space within the temporary filesystem for building.

Technical details:

  • Hardware: System76 laptop
  • system: “x86_64-linux”
  • host os: NixOS using stable 22.05
  • multiuser: yes
  • virtualisation.docker.enable = true; and user in docker group.

I am trying to make this function without flakes at first, then I will modify it to use flakes. I am new to the Nix ecosystem, so I am trying to take it one step at a time. So no experimental features used atm.

Filesystem things before building (and after failing to build).

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        1.6G     0  1.6G   0% /dev
tmpfs            16G   11M   16G   1% /dev/shm
tmpfs           7.9G  7.6M  7.8G   1% /run
tmpfs            16G  456K   16G   1% /run/wrappers
/dev/nvme0n1p1  450G   28G  400G   7% /
/dev/nvme0n1p3  487M  137M  350M  29% /boot
tmpfs           3.2G  8.0K  3.2G   1% /run/user/0
tmpfs           3.2G   24K  3.2G   1% /run/user/1000

$ df -hi
Filesystem     Inodes IUsed IFree IUse% Mounted on
devtmpfs         3.9M   561  3.9M    1% /dev
tmpfs            4.0M    29  4.0M    1% /dev/shm
tmpfs            4.0M  1.9K  3.9M    1% /run
tmpfs            4.0M    41  4.0M    1% /run/wrappers
/dev/nvme0n1p1    29M  856K   28M    3% /
/dev/nvme0n1p3      0     0     0     - /boot
tmpfs            799K    32  799K    1% /run/user/0
tmpfs            799K    36  799K    1% /run/user/1000

During building, watch -d df -h indicated only a change in 1GB, and there is 399GB available. No change in df -hi.

Filesystem      Size  Used Avail Use% Mounted on
...
/dev/nvme0n1p1  450G   29G  399G   7% /
...

I thought it was a ulimit issue, but seems not based on these, afaik.

$ ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 127756
max locked memory           (kbytes, -l) 64
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 127756
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

The nix file I am using to build the docker image containers/tmp.nix as informed by my reading of the Nix docs and other sources.

# Build a Docker image using Nix and dockerTools.
{ pkgs ? import <nixpkgs> { }
, pkgsLinux ? import <nixpkgs> { system = "x86_64-linux"; }
}:

let
  python_pkgs = with pkgs.python39Packages; [
    numpy
    scipy
    scikit-learn
    pandas
    h5py
    tqdm
    pytorch
    #torchWithCuda
    torchvision
    #torchsummary
    torchmetrics
    pytorch-lightning
    pyro-ppl
    cython
    hdbscan
    opencv4
    pyaml
    opencv4
    #(opencv4.override { enableGtk2 = true; }) # we use headless
  ];
  non_python_pkgs = with pkgs; [
    git
    gitRepo
    autoconf
    gnumake
    m4
    gperf
    curl
    cudatoolkit
    stdenv.cc
    binutils
    zlib
    libGL
    libGLU
    util-linux
    unzip
  ];
in
pkgs.dockerTools.buildImage {
  name = "arn";
  tag = "0.2.0rc1";

  contents = builtins.concatLists [
    python_pkgs
    non_python_pkgs
  ];

  runAsRoot = ''
    #!${pkgs.runtimeShell}
    pip install -e . -r requirements/arn.txt

    # Install Prijatelj's public fork of `vast` for the Extreme Value
    # Machine and FINCH with recurse-submodules, and get pyflann as dep.
    git clone https://github.com/primetang/pyflann.git

    # Have to 2to3 the pyflann code...
    pip install 2to3==1.0
    2to3 pyflann/
    pip install -e pyflann/

    git clone --recurse-submodules https://github.com/prijatelj/vast
    pip install -e vast/
  '';

  config = {
    #Cmd = [ "/bin/..." ];
    WorkingDir = "/arn";
    #Volumes = { "/arn" = { }; };
  };
}

The output of my most recent nix-build of above.
This is after the first run, where the output from building dependencies already occurred.
Before, I tried to use TMPDIR=/tmp in case it made any difference, but it did not.

$ nix-build containers/tmp.nix
these 3 derivations will be built:
  /nix/store/hpkbvr1ax3rxgd76zkshha50mxhw6dlv-docker-layer-arn.drv
  /nix/store/7lc4k3sbn6y8r7idi49ql1naw6zd217k-runtime-deps.drv
  /nix/store/q16hgv9w4115xlab8x1jr9cy28is4y56-docker-image-arn.tar.gz.drv
building '/nix/store/hpkbvr1ax3rxgd76zkshha50mxhw6dlv-docker-layer-arn.drv'...
Formatting './image/disk-image.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1073741824 lazy_refcounts=off refcount_bits=16
cSeaBIOS (version rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org)


iPXE (http://ipxe.org) 00:03.0 CA00 PCI2.10 PnP PMM+1FF90EC0+1FEF0EC0 CA00



Booting from ROM...
Probing EDD (edd=off to disable)... ocloading kernel modules...
[    0.109837] Invalid ELF header magic: != ELF
[    0.117452] Invalid ELF header magic: != ELF
[    0.118982] Invalid ELF header magic: != ELF
[    0.120221] Invalid ELF header magic: != ELF
[    0.152394] Invalid ELF header magic: != ELF
[    0.153433] Invalid ELF header magic: != ELF
[    0.155440] Invalid ELF header magic: != ELF
[    0.156742] Invalid ELF header magic: != ELF
[    0.157875] Invalid ELF header magic: != ELF
[    0.158911] Invalid ELF header magic: != ELF
[    0.164057] Invalid ELF header magic: != ELF
[    0.165123] Invalid ELF header magic: != ELF
[    0.165920] Invalid ELF header magic: != ELF
[    0.167031] Invalid ELF header magic: != ELF
[    0.167892] Invalid ELF header magic: != ELF
[    0.193688] Invalid ELF header magic: != ELF
[    0.198097] Invalid ELF header magic: != ELF
[    0.201940] Invalid ELF header magic: != ELF
[    0.204912] Invalid ELF header magic: != ELF
[    0.206697] Invalid ELF header magic: != ELF
mounting Nix store...
mounting host's temporary directory...
starting stage 2 (/nix/store/nyr73kpr8a5pr5ahqcbhaiy7792ppafs-vm-run-stage2)
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: 6a21546a-4541-45aa-a177-55539e6050e8
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done

Executing pre-mount steps...
Adding contents...
Adding /nix/store/hmqkj2sim8kf6qnxkkr5i8hp42vyjgi8-python3.9-numpy-1.21.5...
Adding /nix/store/09irzral6arab2q7qqf43rz1mpz590kd-python3.9-scipy-1.8.0...
Adding /nix/store/phqnipihri21qs1xzd4li98annni0l22-python3.9-scikit-learn-1.0.2...
Adding /nix/store/1kp31589piagm4hi8j4mccr9lny9xiq8-python3.9-pandas-1.4.2...
Adding /nix/store/x9gghi0l3gcwxyw347hr4y7lprr1zp1d-python3.9-h5py-3.6.0...
Adding /nix/store/9smpd0cspxrg0zfnzgp0kaddrzd9cpf8-python3.9-tqdm-4.64.0...
Adding /nix/store/67n93fa7v0avn1vin063qgj6fp0nsp3s-python3.9-pytorch-1.11.0...
Adding /nix/store/8cgfzx7jw82n2lzkakh5wnz8f2ndb328-python3.9-torchvision-0.11.3...
Adding /nix/store/q2fswjv5klinkq8g0vlfyish4szb2999-python3.9-torchmetrics-0.8.2...
Adding /nix/store/mnz414fzr4caj8m8y8fgnknaq7px7cnr-python3.9-pytorch-lightning-1.6.3...
Adding /nix/store/7z7jvykqkd3hghsq5ilp8l0bc82y8kg7-python3.9-pyro-ppl-1.8.1...
Adding /nix/store/p7hpc52rm3l98cdpg839lsjvaayha40l-python3.9-Cython-0.29.28...
Adding /nix/store/g6k1f0y7b44ww0wrijwaiqg24ry4glhh-python3.9-hdbscan-0.8.27...
Adding /nix/store/lkz65sh3sz37x15f895lhdr63drmr1rq-opencv-4.5.4...
Adding /nix/store/r6an97ybiz4iiykvfvpb0gkpy8sn8nf3-python3.9-pyaml-21.10.1...
Adding /nix/store/lkz65sh3sz37x15f895lhdr63drmr1rq-opencv-4.5.4...
Adding /nix/store/6ycia1xk500pxssx5nk1hppxh6c0rl99-git-2.36.2...
Adding /nix/store/jng6bfmid1n83bx4lmbcy4ys9v0njkh1-git-repo-2.25...
Adding /nix/store/yfnhm2b6plv48i8sgl64sd148b48hcly-autoconf-2.71...
Adding /nix/store/w8d6aji611mfzxfgm8a6zqnp0k8xd577-gnumake-4.3...
Adding /nix/store/ngaqxjbm5dfl953lnr6nlpy2qykmabpb-gnum4-1.4.19...
Adding /nix/store/pby3z2xbswkaki2bx4ayq2va3kan1xm1-gperf-3.1...
Adding /nix/store/ng2i73pq8r045kxkkw46475sw9a5vwf9-curl-7.83.1-bin...
Adding /nix/store/m9gfjhzx0l186bmln5y2169g4i0cqz99-cudatoolkit-11.6.1...
rsync: [receiver] write failed on "/tmp/disk/layer/host-linux-x64/libQt5WebEngineCore.so.5": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(379) [receiver=3.2.5]
rsync: [sender] write error: Broken pipe (32)
[  125.192121] reboot: Power down
error: builder for '/nix/store/hpkbvr1ax3rxgd76zkshha50mxhw6dlv-docker-layer-arn.drv' failed with exit code 11;
       last 10 log lines:
       > Adding /nix/store/yfnhm2b6plv48i8sgl64sd148b48hcly-autoconf-2.71...
       > Adding /nix/store/w8d6aji611mfzxfgm8a6zqnp0k8xd577-gnumake-4.3...
       > Adding /nix/store/ngaqxjbm5dfl953lnr6nlpy2qykmabpb-gnum4-1.4.19...
       > Adding /nix/store/pby3z2xbswkaki2bx4ayq2va3kan1xm1-gperf-3.1...
       > Adding /nix/store/ng2i73pq8r045kxkkw46475sw9a5vwf9-curl-7.83.1-bin...
       > Adding /nix/store/m9gfjhzx0l186bmln5y2169g4i0cqz99-cudatoolkit-11.6.1...
       > rsync: [receiver] write failed on "/tmp/disk/layer/host-linux-x64/libQt5WebEngineCore.so.5": No space left on device (28)
       > rsync error: error in file IO (code 11) at receiver.c(379) [receiver=3.2.5]
       > rsync: [sender] write error: Broken pipe (32)
       > [  125.192121] reboot: Power down
       For full logs, run 'nix log /nix/store/hpkbvr1ax3rxgd76zkshha50mxhw6dlv-docker-layer-arn.drv'.
error: 1 dependencies of derivation '/nix/store/q16hgv9w4115xlab8x1jr9cy28is4y56-docker-image-arn.tar.gz.drv' failed to build

I think this section is indicative of the issue:

Formatting './image/disk-image.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1073741824 lazy_refcounts=off refcount_bits=16

If I am reading size correctly, that is only ~1GB of space, which correlates with seeing a 1GB of change in watch df -h before the build fails.

Or this section:

Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: 6a21546a-4541-45aa-a177-55539e6050e8
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376

If I understand correctly, the tmp filesystem created in this section has only ~10.5 GB of space. I know that the resulting image I am trying to build, when built on Docker before from the base Nvidia Docker image, is just over 13GB, not accounting for any more space needed during building.

During my search, I was unable to find any docs or tips detailing what commands I could run to increase this created tmp filesystem size for nix-build. Nothing apparent to me in the man pages.

One guess is that I wrote the nix file wrong such that dependencies aren’t correctly linked and it thinks it needs less space than it actually does.

This same error has occurred when I was attempting to build from the Nvidia docker image using nix and dockerTools (script available upon request).
I have also had this issue occur when I attempted to build this image on another system (Arch) using nix.
The latter case i think running into the second section as a space limit rather than the first.
However, I am able to build the Docker image via docker using an equivalent Dockerfile.

So it seems the issue is in the tmpfs created size or a similar tool nix uses wrt the disk-image.qcow2.

Anyone know anything about this?

dockerTools functions accept diskSize argument that will do exactly what you want. For some reason it is not documented in the manual.
I would however recommend trying to use available tools for building Python packages in Nix and adding them to the image instead of running all build and install steps in runAsRoot. It would allow for better reusability of these outputs and avoid running VM during build.

1 Like

Thanks for pointing out that argument! Using it allowed me to increase both the limiting values I noticed above (notably with only that argument). I’ll consider looking into how to submit PRs for the docs/manual once I am out of deadline mode.

I was just trying to get this project’s Docker built with any setup from Nix. I realize it still suffers from lack of reproducibility due to the loose/free versions in my requirements and relying on pip to resolve it. I am still learning all the tools that Nix has to offer and when to use them where. :sweat_smile:

I have a hunch that the ideal for me is to use flakes, mach-nix to nixify the python requirements, and then have the dockerTools import those things for building the container. Although I have no idea how to do that right now.

Now that I can get past this bug, I can debug further and learn. Thanks again!