I am attempting to create a Docker image for development to run my project’s code that uses python 3.9, pytroch with CUDA, opencv (headless), and some custom/3rd party repositories separate from the project. However, when I attempt to nix-build containers/tmp.nix
, the build process does not finish due to running out of space within the temporary filesystem for building.
Technical details:
- Hardware: System76 laptop
- system: “x86_64-linux”
- host os: NixOS using stable 22.05
- multiuser: yes
- virtualisation.docker.enable = true; and user in docker group.
I am trying to make this function without flakes at first, then I will modify it to use flakes. I am new to the Nix ecosystem, so I am trying to take it one step at a time. So no experimental features used atm.
Filesystem things before building (and after failing to build).
$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 1.6G 0 1.6G 0% /dev
tmpfs 16G 11M 16G 1% /dev/shm
tmpfs 7.9G 7.6M 7.8G 1% /run
tmpfs 16G 456K 16G 1% /run/wrappers
/dev/nvme0n1p1 450G 28G 400G 7% /
/dev/nvme0n1p3 487M 137M 350M 29% /boot
tmpfs 3.2G 8.0K 3.2G 1% /run/user/0
tmpfs 3.2G 24K 3.2G 1% /run/user/1000
$ df -hi
Filesystem Inodes IUsed IFree IUse% Mounted on
devtmpfs 3.9M 561 3.9M 1% /dev
tmpfs 4.0M 29 4.0M 1% /dev/shm
tmpfs 4.0M 1.9K 3.9M 1% /run
tmpfs 4.0M 41 4.0M 1% /run/wrappers
/dev/nvme0n1p1 29M 856K 28M 3% /
/dev/nvme0n1p3 0 0 0 - /boot
tmpfs 799K 32 799K 1% /run/user/0
tmpfs 799K 36 799K 1% /run/user/1000
During building, watch -d df -h
indicated only a change in 1GB, and there is 399GB available. No change in df -hi
.
Filesystem Size Used Avail Use% Mounted on
...
/dev/nvme0n1p1 450G 29G 399G 7% /
...
I thought it was a ulimit issue, but seems not based on these, afaik.
$ ulimit -a
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 127756
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 127756
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
The nix file I am using to build the docker image containers/tmp.nix
as informed by my reading of the Nix docs and other sources.
# Build a Docker image using Nix and dockerTools.
{ pkgs ? import <nixpkgs> { }
, pkgsLinux ? import <nixpkgs> { system = "x86_64-linux"; }
}:
let
python_pkgs = with pkgs.python39Packages; [
numpy
scipy
scikit-learn
pandas
h5py
tqdm
pytorch
#torchWithCuda
torchvision
#torchsummary
torchmetrics
pytorch-lightning
pyro-ppl
cython
hdbscan
opencv4
pyaml
opencv4
#(opencv4.override { enableGtk2 = true; }) # we use headless
];
non_python_pkgs = with pkgs; [
git
gitRepo
autoconf
gnumake
m4
gperf
curl
cudatoolkit
stdenv.cc
binutils
zlib
libGL
libGLU
util-linux
unzip
];
in
pkgs.dockerTools.buildImage {
name = "arn";
tag = "0.2.0rc1";
contents = builtins.concatLists [
python_pkgs
non_python_pkgs
];
runAsRoot = ''
#!${pkgs.runtimeShell}
pip install -e . -r requirements/arn.txt
# Install Prijatelj's public fork of `vast` for the Extreme Value
# Machine and FINCH with recurse-submodules, and get pyflann as dep.
git clone https://github.com/primetang/pyflann.git
# Have to 2to3 the pyflann code...
pip install 2to3==1.0
2to3 pyflann/
pip install -e pyflann/
git clone --recurse-submodules https://github.com/prijatelj/vast
pip install -e vast/
'';
config = {
#Cmd = [ "/bin/..." ];
WorkingDir = "/arn";
#Volumes = { "/arn" = { }; };
};
}
The output of my most recent nix-build of above.
This is after the first run, where the output from building dependencies already occurred.
Before, I tried to use TMPDIR=/tmp
in case it made any difference, but it did not.
$ nix-build containers/tmp.nix
these 3 derivations will be built:
/nix/store/hpkbvr1ax3rxgd76zkshha50mxhw6dlv-docker-layer-arn.drv
/nix/store/7lc4k3sbn6y8r7idi49ql1naw6zd217k-runtime-deps.drv
/nix/store/q16hgv9w4115xlab8x1jr9cy28is4y56-docker-image-arn.tar.gz.drv
building '/nix/store/hpkbvr1ax3rxgd76zkshha50mxhw6dlv-docker-layer-arn.drv'...
Formatting './image/disk-image.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1073741824 lazy_refcounts=off refcount_bits=16
cSeaBIOS (version rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org)
iPXE (http://ipxe.org) 00:03.0 CA00 PCI2.10 PnP PMM+1FF90EC0+1FEF0EC0 CA00
Booting from ROM...
Probing EDD (edd=off to disable)... ocloading kernel modules...
[ 0.109837] Invalid ELF header magic: != ELF
[ 0.117452] Invalid ELF header magic: != ELF
[ 0.118982] Invalid ELF header magic: != ELF
[ 0.120221] Invalid ELF header magic: != ELF
[ 0.152394] Invalid ELF header magic: != ELF
[ 0.153433] Invalid ELF header magic: != ELF
[ 0.155440] Invalid ELF header magic: != ELF
[ 0.156742] Invalid ELF header magic: != ELF
[ 0.157875] Invalid ELF header magic: != ELF
[ 0.158911] Invalid ELF header magic: != ELF
[ 0.164057] Invalid ELF header magic: != ELF
[ 0.165123] Invalid ELF header magic: != ELF
[ 0.165920] Invalid ELF header magic: != ELF
[ 0.167031] Invalid ELF header magic: != ELF
[ 0.167892] Invalid ELF header magic: != ELF
[ 0.193688] Invalid ELF header magic: != ELF
[ 0.198097] Invalid ELF header magic: != ELF
[ 0.201940] Invalid ELF header magic: != ELF
[ 0.204912] Invalid ELF header magic: != ELF
[ 0.206697] Invalid ELF header magic: != ELF
mounting Nix store...
mounting host's temporary directory...
starting stage 2 (/nix/store/nyr73kpr8a5pr5ahqcbhaiy7792ppafs-vm-run-stage2)
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: 6a21546a-4541-45aa-a177-55539e6050e8
Superblock backups stored on blocks:
32768, 98304, 163840, 229376
Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done
Executing pre-mount steps...
Adding contents...
Adding /nix/store/hmqkj2sim8kf6qnxkkr5i8hp42vyjgi8-python3.9-numpy-1.21.5...
Adding /nix/store/09irzral6arab2q7qqf43rz1mpz590kd-python3.9-scipy-1.8.0...
Adding /nix/store/phqnipihri21qs1xzd4li98annni0l22-python3.9-scikit-learn-1.0.2...
Adding /nix/store/1kp31589piagm4hi8j4mccr9lny9xiq8-python3.9-pandas-1.4.2...
Adding /nix/store/x9gghi0l3gcwxyw347hr4y7lprr1zp1d-python3.9-h5py-3.6.0...
Adding /nix/store/9smpd0cspxrg0zfnzgp0kaddrzd9cpf8-python3.9-tqdm-4.64.0...
Adding /nix/store/67n93fa7v0avn1vin063qgj6fp0nsp3s-python3.9-pytorch-1.11.0...
Adding /nix/store/8cgfzx7jw82n2lzkakh5wnz8f2ndb328-python3.9-torchvision-0.11.3...
Adding /nix/store/q2fswjv5klinkq8g0vlfyish4szb2999-python3.9-torchmetrics-0.8.2...
Adding /nix/store/mnz414fzr4caj8m8y8fgnknaq7px7cnr-python3.9-pytorch-lightning-1.6.3...
Adding /nix/store/7z7jvykqkd3hghsq5ilp8l0bc82y8kg7-python3.9-pyro-ppl-1.8.1...
Adding /nix/store/p7hpc52rm3l98cdpg839lsjvaayha40l-python3.9-Cython-0.29.28...
Adding /nix/store/g6k1f0y7b44ww0wrijwaiqg24ry4glhh-python3.9-hdbscan-0.8.27...
Adding /nix/store/lkz65sh3sz37x15f895lhdr63drmr1rq-opencv-4.5.4...
Adding /nix/store/r6an97ybiz4iiykvfvpb0gkpy8sn8nf3-python3.9-pyaml-21.10.1...
Adding /nix/store/lkz65sh3sz37x15f895lhdr63drmr1rq-opencv-4.5.4...
Adding /nix/store/6ycia1xk500pxssx5nk1hppxh6c0rl99-git-2.36.2...
Adding /nix/store/jng6bfmid1n83bx4lmbcy4ys9v0njkh1-git-repo-2.25...
Adding /nix/store/yfnhm2b6plv48i8sgl64sd148b48hcly-autoconf-2.71...
Adding /nix/store/w8d6aji611mfzxfgm8a6zqnp0k8xd577-gnumake-4.3...
Adding /nix/store/ngaqxjbm5dfl953lnr6nlpy2qykmabpb-gnum4-1.4.19...
Adding /nix/store/pby3z2xbswkaki2bx4ayq2va3kan1xm1-gperf-3.1...
Adding /nix/store/ng2i73pq8r045kxkkw46475sw9a5vwf9-curl-7.83.1-bin...
Adding /nix/store/m9gfjhzx0l186bmln5y2169g4i0cqz99-cudatoolkit-11.6.1...
rsync: [receiver] write failed on "/tmp/disk/layer/host-linux-x64/libQt5WebEngineCore.so.5": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(379) [receiver=3.2.5]
rsync: [sender] write error: Broken pipe (32)
[ 125.192121] reboot: Power down
error: builder for '/nix/store/hpkbvr1ax3rxgd76zkshha50mxhw6dlv-docker-layer-arn.drv' failed with exit code 11;
last 10 log lines:
> Adding /nix/store/yfnhm2b6plv48i8sgl64sd148b48hcly-autoconf-2.71...
> Adding /nix/store/w8d6aji611mfzxfgm8a6zqnp0k8xd577-gnumake-4.3...
> Adding /nix/store/ngaqxjbm5dfl953lnr6nlpy2qykmabpb-gnum4-1.4.19...
> Adding /nix/store/pby3z2xbswkaki2bx4ayq2va3kan1xm1-gperf-3.1...
> Adding /nix/store/ng2i73pq8r045kxkkw46475sw9a5vwf9-curl-7.83.1-bin...
> Adding /nix/store/m9gfjhzx0l186bmln5y2169g4i0cqz99-cudatoolkit-11.6.1...
> rsync: [receiver] write failed on "/tmp/disk/layer/host-linux-x64/libQt5WebEngineCore.so.5": No space left on device (28)
> rsync error: error in file IO (code 11) at receiver.c(379) [receiver=3.2.5]
> rsync: [sender] write error: Broken pipe (32)
> [ 125.192121] reboot: Power down
For full logs, run 'nix log /nix/store/hpkbvr1ax3rxgd76zkshha50mxhw6dlv-docker-layer-arn.drv'.
error: 1 dependencies of derivation '/nix/store/q16hgv9w4115xlab8x1jr9cy28is4y56-docker-image-arn.tar.gz.drv' failed to build
I think this section is indicative of the issue:
Formatting './image/disk-image.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1073741824 lazy_refcounts=off refcount_bits=16
If I am reading size
correctly, that is only ~1GB of space, which correlates with seeing a 1GB of change in watch df -h
before the build fails.
Or this section:
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: 6a21546a-4541-45aa-a177-55539e6050e8
Superblock backups stored on blocks:
32768, 98304, 163840, 229376
If I understand correctly, the tmp filesystem created in this section has only ~10.5 GB of space. I know that the resulting image I am trying to build, when built on Docker before from the base Nvidia Docker image, is just over 13GB, not accounting for any more space needed during building.
During my search, I was unable to find any docs or tips detailing what commands I could run to increase this created tmp filesystem size for nix-build. Nothing apparent to me in the man pages.
One guess is that I wrote the nix file wrong such that dependencies aren’t correctly linked and it thinks it needs less space than it actually does.
This same error has occurred when I was attempting to build from the Nvidia docker image using nix and dockerTools (script available upon request).
I have also had this issue occur when I attempted to build this image on another system (Arch) using nix.
The latter case i think running into the second section as a space limit rather than the first.
However, I am able to build the Docker image via docker using an equivalent Dockerfile.
So it seems the issue is in the tmpfs created size or a similar tool nix uses wrt the disk-image.qcow2.
Anyone know anything about this?