Docker buildImage workflow improvement

tomberek · December 13, 2019, 12:18am

This is to announce a few things I’ve been working on and to see where there is interest or overlap.

a modification to buldImageWithNixDb that incorporates all the layers (builds up a list from a fromImage passthru attribute) rather than just the deepest.
modifications to buildImageWithNixDb that allows nix-build and nix-env without using runAsRoot.
use hack from aszlig (How to get the build-time dependencies of a package? · Issue #1245 · NixOS/nix · GitHub) to include runtime deps of all build dependencies for packages needed to nix-build something in CI. Also include relevant nixpkgs to avoid a fetch.
post-build-hook to push results to a ./cache. This is to allow gitlab’s cache mechanism to work nicely and be available for subsequent builds. Can we avoid this? The .cache is currently in the form of a binary cache. Can it directly be a build-capable store, bypassing the “nix copy” step?
modification to buildLayeredImage with an attribute allowing a custom number of paths used per layer. Eg, “I want the 10 most popular paths in a single layer, and the subsequent paths in their own layer”.
a hack (looking for better ideas) for bulldLayeredImage to hint the popularity metric that a particular package should be deeper in the stack. This is probably better expressed as a map or different form of hinting compared to previous item.
not yet implemented: expose the intermediate build steps of images so that layers can be re-used/cached. Also can prevent tar/untar cycles. Perhaps use laziness to allow fast building of images directories and uploading to registries without a costly tar.gz. Also trying to avoid a single-threaded trip through nix-daemon of all the data.
todo: Update image manifest to schema 2v2.

tomberek · December 17, 2019, 12:20pm

Initial implementation of exposing raw image directories and caching sha256sum at
https://github.com/NixOS/nixpkgs/pull/75810

It’s a WIP and should not change semantics, just speed things up. There may be a change with GC behavior, not sure how that works yet or if there is a better way to expose the concept.

Upon investigation, I saw that every layer went through sha256sum upon generation of every subsequent layer. This created 1 + 2 + 3 + … N hashing steps needed for the Nth layer. Especially if there was a large layer in there, it would be an expensive build. With the MR, changing the last layer should be only cost the time it takes to process the new layer.

zimbatm · December 17, 2019, 6:56pm

Nice work!

The dockerTools folder has accumulated quite a bit of code over time. Maybe it’s worth borrowing and restarting from scratch. It might be easier for you if you want to make larger changes, without having to worry breaking backward-compatibility.

For example I think the raw outputs really should be the default since the tarball output duplicates all of the store content on every build.