Reduce docker image size

Hi,

I’m building a docker docker image from a nix derivation and I noticed the size of the image was quite large.
This seems to be because all inputDrvs are present in the docker image?

A couple of searches led me to trying something like this:

fio = runCommand "fio" {} "mkdir -p $out/bin; cp ${pkgs.fio}/bin/fio $out/bin/fio";
# previously it was fio = pkgs.fio;
dockerTools.buildImage {
    contents = [ busybox fio ];
    config = { Entrypoint = [ ]; };
  };

This seems to work for this test image, where I now have this on the containers /nix/store:

65ys3k6gn2s27apky0a0la7wryg3az9q-zlib-1.2.11
70qnn6rikfjk2yva0yzy004iggmmqy67-libaio-0.3.112
dzyimsdk9yq7x6g24r79ipg3vbalyyy1-libidn2-2.3.1
i1dc1ac2hxjfl59rvsj49vvgvl1nl16s-libunistring-0.9.10
n5vgbvxda4y3b3bfli5dvj6p8s4c93dn-busybox-1.32.1
sbbifs2ykc05inws26203h0xwcadnf0l-glibc-2.32-46

vs

0d71ygfwbmy1xjlbj1v027dfmy9cqavy-libffi-3.3
0dbbrvlw2rahvzi69bmpqy1z9mvzg62s-gdbm-1.19
5k0s057y3swq5cqp58m8p4drq06nfd6w-sqlite-3.35.2
5ymjz97754jc6alp50cq1i3iv0jbg8b2-bzip2-1.0.6.0.2
65ys3k6gn2s27apky0a0la7wryg3az9q-zlib-1.2.11
66fbv9mmx1j4hrn9y06kcp73c3yb196r-python3-3.8.9
6kgfmzx90c1a6afqnbkz6qprkzss476k-mime-types-9
70qnn6rikfjk2yva0yzy004iggmmqy67-libaio-0.3.112
9m4hy7cy70w6v2rqjmhvd7ympqkj6yxk-ncurses-6.2
a4yw1svqqk4d8lhwinn9xp847zz9gfma-bash-4.4-p23
bla504khk46p58sv20758f38hxfk0iw7-fio-3.26
dzyimsdk9yq7x6g24r79ipg3vbalyyy1-libidn2-2.3.1
hbm0951q7xrl4qd0ccradp6bhjayfi4b-openssl-1.1.1k
hjwjf3bj86gswmxva9k40nqx6jrb5qvl-readline-6.3p08
i1dc1ac2hxjfl59rvsj49vvgvl1nl16s-libunistring-0.9.10
n5vgbvxda4y3b3bfli5dvj6p8s4c93dn-busybox-1.32.1
nlqz3916vfh4fqwbnky1l5bf02n876y5-expat-2.2.10
rdslqn6gj1a27laa1xcn0hm147v5an7z-xz-5.2.5
sbbifs2ykc05inws26203h0xwcadnf0l-glibc-2.32-46

However, when I try to do this with another derivation it just doesn’t seem to remove any “extra” from /nix/store, which at a glance doesn’t seem necessary:

spdk_fio_nvme = runCommand "fio_spdk" {} "mkdir -p $out/bin; cp ${libspdk-fio}/fio/spdk_nvme $out/bin";

dockerTools.buildImage {
    contents = [ busybox spdk_fio_nvme ];
    config = { Entrypoint = [ ]; };
  };

In this case I expected to see only “${libspdk-fio}/fio/spdk_nvme” in the “/bin” and in /nix/store any runtime dynamic libraries “${libspdk-fio}/fio/spdk_nvme” depends on which is:

        linux-vdso.so.1 (0x00007ffd5989d000)
        libnuma.so.1 => /nix/store/kgzm36w74yz066rpqra41r6lrafwabjr-numactl-2.0.14/lib/libnuma.so.1 (0x00007f70eae06000)
        libdl.so.2 => /nix/store/sbbifs2ykc05inws26203h0xwcadnf0l-glibc-2.32-46/lib/libdl.so.2 (0x00007f70eae01000)
        liburing.so.2 => /nix/store/bx4zwjlr67ydz9gg0i4z4v4iljnnb0yh-liburing-2.0/lib/liburing.so.2 (0x00007f70eadfa000)
        librt.so.1 => /nix/store/sbbifs2ykc05inws26203h0xwcadnf0l-glibc-2.32-46/lib/librt.so.1 (0x00007f70eadef000)
        libuuid.so.1 => /nix/store/ichji36r9qndk2yrk3wimx7baipj5jhy-util-linux-2.36.2/lib/libuuid.so.1 (0x00007f70eade4000)
        libssl.so.1.1 => /nix/store/hbm0951q7xrl4qd0ccradp6bhjayfi4b-openssl-1.1.1k/lib/libssl.so.1.1 (0x00007f70ead4d000)
        libcrypto.so.1.1 => /nix/store/hbm0951q7xrl4qd0ccradp6bhjayfi4b-openssl-1.1.1k/lib/libcrypto.so.1.1 (0x00007f70eaa61000)
        libm.so.6 => /nix/store/sbbifs2ykc05inws26203h0xwcadnf0l-glibc-2.32-46/lib/libm.so.6 (0x00007f70ea91e000)
        libaio.so.1 => /nix/store/70qnn6rikfjk2yva0yzy004iggmmqy67-libaio-0.3.112/lib/libaio.so.1 (0x00007f70ea919000)
        libpthread.so.0 => /nix/store/sbbifs2ykc05inws26203h0xwcadnf0l-glibc-2.32-46/lib/libpthread.so.0 (0x00007f70ea8f8000)
        libc.so.6 => /nix/store/sbbifs2ykc05inws26203h0xwcadnf0l-glibc-2.32-46/lib/libc.so.6 (0x00007f70ea735000)
        /nix/store/scd5n7xsn0hh0lvhhnycr9gx0h8xfzsl-glibc-2.34-210/lib64/ld-linux-x86-64.so.2 (0x00007f70edd7e000)

and on /nix/store I see:

0d71ygfwbmy1xjlbj1v027dfmy9cqavy-libffi-3.3
0dbbrvlw2rahvzi69bmpqy1z9mvzg62s-gdbm-1.19
0irhzkirzh39mridn7s4ipckvmpywzlc-linux-pam-1.5.1
3vllxvfpphanlww2lydmn2hangx3smza-libcap-ng-0.8.2
54klr10i53jdfgn7322mzgza6wsai0q8-gcc-10.3.0-lib
5k0s057y3swq5cqp58m8p4drq06nfd6w-sqlite-3.35.2
5ymjz97754jc6alp50cq1i3iv0jbg8b2-bzip2-1.0.6.0.2
65ys3k6gn2s27apky0a0la7wryg3az9q-zlib-1.2.11
66fbv9mmx1j4hrn9y06kcp73c3yb196r-python3-3.8.9
6kgfmzx90c1a6afqnbkz6qprkzss476k-mime-types-9
70qnn6rikfjk2yva0yzy004iggmmqy67-libaio-0.3.112
9m4hy7cy70w6v2rqjmhvd7ympqkj6yxk-ncurses-6.2
a4yw1svqqk4d8lhwinn9xp847zz9gfma-bash-4.4-p23
am5qwbpriqhp1i9qhp2idid7ympxqb9a-glibc-2.32-46-dev
bla504khk46p58sv20758f38hxfk0iw7-fio-3.26
bx4zwjlr67ydz9gg0i4z4v4iljnnb0yh-liburing-2.0
d32ym7m2p7lfb6gsghq1dhi61f694k0f-glibc-2.32-46-bin
dzyimsdk9yq7x6g24r79ipg3vbalyyy1-libidn2-2.3.1
fqi6xfddlgafbq1q2lw6z8ysx6vs9yjc-linux-headers-5.12
g5dx4y95pnc48p7z8xz2m5l4xi0ig8x1-liburing-2.0-bin
h3f8rn6wwanph9m3rc1gl0lldbr57w3l-gcc-10.3.0
hbm0951q7xrl4qd0ccradp6bhjayfi4b-openssl-1.1.1k
hjwjf3bj86gswmxva9k40nqx6jrb5qvl-readline-6.3p08
i1dc1ac2hxjfl59rvsj49vvgvl1nl16s-libunistring-0.9.10
ichji36r9qndk2yrk3wimx7baipj5jhy-util-linux-2.36.2
ja30x91i1k68xr90cgv2l5j24s8ar8pr-db-4.8.30
kgzm36w74yz066rpqra41r6lrafwabjr-numactl-2.0.14
n5vgbvxda4y3b3bfli5dvj6p8s4c93dn-busybox-1.32.1
nlqz3916vfh4fqwbnky1l5bf02n876y5-expat-2.2.10
p204f1715kiwqbxx292r9ifsmzccmn7k-util-linux-2.36.2-dev
p89kcdr3284fzwilw738043dy1ppaznd-util-linux-2.36.2-bin
rdslqn6gj1a27laa1xcn0hm147v5an7z-xz-5.2.5
sbbifs2ykc05inws26203h0xwcadnf0l-glibc-2.32-46
shsx5lvmj0042s59agk2miwhp8zrxh9l-liburing-2.0-dev
xfm6zgdqk180w7g7z6dyqvlvid8sdn46-shadow-4.8.1

libspdk-fio is basically this: https://github.com/openebs/mayastor/blob/a26a4e9597be3ad5e12eaddbad63b287c84f1828/nix/pkgs/libspdk/default.nix

Any explanation of why this happens would be great, thank you!

3 Likes

I have been puzzled very often by the significantly larger sizes of dockerTools-generated images (compared to the non-nix standard Dockerfile approach). My standard suspicion is some/many of the packages use buildInputs/nativeBuildInputs/propagatedBuildInputs/etc. incorrectly which results in build-time-only dependencies being included in the image. If there is a reliable way to find those mistakes I would be very interested to know it.

My own quest in a similar direction so far resulted in finding (and fixing) this issue: Copies of shared libraries instead of symlinks in the store and docker. 1G reduction in the image size!

2 Likes

Regarding fio: The fio package builds multiple binaries:

$ ls result/bin
fio             fio-dedupe          fio-histo-log-pctiles.py  fiologparser.py
fio2gnuplot     fio_generate_plots  fio_jsonplus_clat2csv     fio-verify-state
fio-btrace2fio  fio-genzipf         fiologparser_hist.py      genfio

And fio2gnuplot for example depends on python. This is why using runCommand to pull out only the fio binary reduces the closure size so much.

Regarding libspdk-fio: This adds fio as a whole to your docker container, not your minimal bin/fio only derivation.

One suggestion: fio has a withGnuplot parameter which you can set to false (via fio.override { withGnuplot = false; }. Also use a multiple-output derivation for libspdk-fio to install the header files, static libraries, etc. to their own store path, instead of $out. Then they won’t end up in the docker container.

nix-store has a --tree flag which helps when trying to find out why a particular store path ends up in your closure.

1 Like

Why does libspdk-fio add fio? This happens even if I remove fio from the buildDependencies. Seems it gets added simply because fio is referenced, eg: ${fio}/includes ?

EDIT: I got the derivation with multiple outputs working but even if I only reference my fio output, I still get all the other dependencies :frowning:

I was able to reduce things by an order of magnitude, but I don’t like it.
I used the multiple outputs to make sure the libspdk only pulls the fio includes, so far so good.
I used runCommand to pull only the fio binary I need, so far so good.

But even with this we’re still pulling in seemingly all references from libspdk.out.
And so I removed then by using them nukeReferences to reduce all references we might not need by making use of ldd - this seems wrong though

  spdk_fio_engine = runCommand "spdk_fio_engine" { } ''
    mkdir -p $out/lib
    cp ${pkgs.libspdk-fio.fio}/spdk_nvme $out/lib
    except=$(${pkgs.glibc.bin}/bin/ldd $out/lib/spdk_nvme | awk '{ print $3 }' | grep .so | xargs -I% echo "-e %" | tr '\n' ' ')
    # remove all references except the ones in spdk_nvme
    ${pkgs.nukeReferences}/bin/nuke-refs $except $out/lib/spdk_nvme
  '';

  fio = runCommand "fio_bin_only" { } ''
    mkdir -p $out/bin
    cp ${pkgs.fio}/bin/fio $out/bin/fio
  '';
  fio_wrapper = pkgs.writeShellScriptBin "fio" ''
    LD_PRELOAD=${spdk_fio_engine}/lib/spdk_nvme ${fio}/bin/fio "$@"
  '';

This is an interesting approach. I too noticed that in many cases packages have a great deal more libraries than actually needed by concrete reverse dependencies. I wonder if it is possible to apply some logic at the level of an image or, better, buildEnv-like environment that would remove all unused libraries from the closure. The risk is obviously that some libraries are only loaded explicitly at runtime on demand.