Setting up a docker image with virtualenv inside

Hi all,

first a bit of context: I work on a project (Renku - renkulab.io) that provides “batteries included” interactive computational environments for data science primarily based on the jupyter stack and its various extensions. Our collection of “base” docker images is constantly growing because as use-cases grow, we are asked to pack more and more specialized applications in there. We can achieve some level of composability with docker, but nix seems like it could solve a lot of our problems and make maintainability less of a nightmare.

The problem:

We can’t have end-users deal with nix - ideally we would use nix to build a base image on-demand for them that they can then use with a Dockerfile to add any customizations on top. For this to be as seamless as possible, I’d like to set up the image coming out of nix to have all of the user configuration already in place (e.g. creating a non-privileged user for jupyter/R sessions) and drop straight into a virtualenv where things like pip install just work. Importantly, this virtualenv needs to have the ipykernel package installed so the jupyter session uses the correct environment.

The issue I’m running into is that I can’t find a way around setting up the virtualenv inside the entrypoint - I can do it in runAsRoot but then I can’t pip install anything (no network) or I can do it in a derivation but then the shared libraries are not found in the container. If I do it in the entrypoint everything is fine, but I don’t want to have to pip install the ipykernel every time a container starts because that delays the session start even more.

This is what I’m currently trying:

{ name ? "venv", tag ? "latest" }:
let
  system = "x86_64-linux";
  pkgs = import <nixpkgs> { inherit system; };

  # debian is the root
  debian = pkgs.dockerTools.pullImage {
      imageName = "debian";
      finalImageTag = "stretch";
      imageDigest = "sha256:205cce0b204ae98be1723d2df0a188bd254ee79f949b51b35bde5dfa91320b72";
      sha256 = "171a54jdmchii39qbk86xbjbrzjng9cysaw4a2ams952kfs7vz0q";
  };
in with pkgs;
  let
    # set up the entrypoint for the final image
    entrypoint = pkgs.writeTextFile {
      name = "entrypoint.sh";
      executable = true;
      text = ''
        #!/bin/bash
        /venv/bin/python -m ipykernel install --user
        $@
        '';
    };

    # create a virtualenv for the user
    setup-virtualenv = pkgs.stdenv.mkDerivation {
      name = "configure-virtualenv";
      propagatedBuildInputs = [ pkgs.python39Packages.virtualenv ];
      buildPhase = ''
        ${pkgs.python39Packages.virtualenv}/bin/virtualenv $out/venv
        . $out/venv/bin/activate
        $out/venv/bin/python -m pip install ipykernel
      '';
      installPhase = "echo";
      src = ./.;
    };

    # base image with user configuration
    base-image = pkgs.dockerTools.buildImage {
      contents = [ setup-virtualenv ];
      fromImage = debian;
      name = "base-image";
      tag = "latest";
      runAsRoot = ''
        #!${pkgs.runtimeShell}
        ${pkgs.dockerTools.shadowSetup}
        groupadd -r test-user -g 1000
        useradd -s /bin/bash -m -r -u 1000 -g test-user test-user
        echo "PATH=/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" >> /home/test-user/.bashrc
        chown -R test-user:test-user /venv
      '';
    };
  in
    # main image with all of the dependencies and metadata
    dockerTools.buildLayeredImage {
      inherit name tag;
      fromImage = base-image;
      contents = [
        bashInteractive
        coreutils
        pkgs.python39Packages.virtualenv
      ];
      config = {
        User = "test-user";
        Env = [
            "LOCALE_ARCHIVE=${glibcLocales}/lib/locale/locale-archive"
            "LANG=en_US.UTF-8"
            "LANGUAGE=en_US:en"
            "LC_ALL=C.UTF-8"
            "LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu"
            "color_prompt=yes"
            "PATH=/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            "SHELL=/bin/bash"
        ];
        WorkingDir = "/home/test-user";
        Entrypoint = [ entrypoint ];
      };
    }

In this setup, running /venv/bin/python -m ipykernel doesn’t find libsodium and I suspect there will be other issues with installing packages. Maybe what I’m trying to do fundamentally isn’t compatible with nix and I’m better off doing it in a follow-up step with docker? That would be fine but not as clean as I was hoping.

2 Likes

Do you need to have a python virtualenv inside (so users can still imperatively install packages) or would it be enough to have a fixed immutable set of python packages installed? For the latter you could directly use the python.withPackages infrastructure of nixpkgs.

1 Like

and drop straight into a virtualenv where things like pip install just work

Ok. I guess this answers my question.

thanks @knedlsepp - yes the users need to have the ability to install packages easily themselves and quickly experiment/iterate. And I’d rather not expose them to nix at all at this point (docker is usually already way too complicated), it will be abstracted away as a base layer.

1 Like

If you want to let the users installs the packages with non-python dependencies, I’d guess you have to install all those external libraries in the default user’s nix profile, then add ~/.nix-profile/lib to LD_LIBRARY_PATH. I am not sure how well it would work for non-wheel packages since they may include non-python code that needs to be compiled.

Hi @alexv - there is no nix in the image actually, but it’s based on Debian. So if they want to, they can install packages like they’re used to with apt on top of the base image that I provide for them. That’s the idea anyway, not sure how well that will work.

I meant you can add all you need to the contents attribute. So you want the users to be able to run both pip and apt? Nix is an excellent tool for dealing with dependencies (especially dependencies spanning multiple languages which are common in data science) but if you want to use nix only to create the debian image, then let the users install everything they need without it, I don’t see why you need nix at all. Mixing up nix and non-nix executables and shared libraries is very tricky (speaking from experience).

1 Like

Sure, but you mentioned adding ~/.nix-profile/lib to LD_LIBRARY_PATH which assumes that nix is actually installed in the final image.

Our users need a variety of interactive environment “starting points” - some images contain Jupyter, others RStudio or Matlab, still others VNC etc. Often there are combinations of these that are requested and sometimes there are specialized libraries that take some care to add to the final docker images. The idea with nix would be to build up a small set of composable environments that can then be added together to build a base image for the required use-case. Then, the users can add their own customizations on top, but most of the time it would be simply installing R or python packages. If some deeper changes for system libraries would be needed, we could intervene and add them via the nix setup for the particular project. Alternatively, it would in principle be possible for the users to bring their own docker images to which we would just bolt on a few things they need to work within our infrastructure, which would be done with nix.

But yes, your point about mixing nix with some other OS and that causing problems is well-taken and it’s the purpose of my current exercise to figure out how much of a deal-breaker that would be. Thanks for your input, very much appreciated!

1 Like

How would the tools in the “starter” environments be visible to the users if ~/.nix-profile/bin (or some other nix profile populated by nix-env) is not in their PATH? I have a similar use case (nix for data science) but the users are not allowed to install anything.

Whatever packages are in contents get added to /bin in the image and are available on the PATH.

btw do you have an example you can share of your setup? Curious to see how you’ve approached it.

I can’t share the code, sorry. One part of it though (derivations for open source packages missing in nixpkgs) I am planning to commit to nixpkgs over time. The general advice which worked well is to keep as much stuff in nix as possible and try not to rely on the base distribution or tools compiled for the base distribution. patchelf-ing of binary-only tools to make them nixpkgs compatible turned out to be rather unreliable.

1 Like

yes, I agree – keeping as many things as possible in nix is the idea. Most users only add a small delta on top of our base images usually so I think it should work in 90% of the cases.

@rokroskar We’ve managed to do exactly what you want as explained in this comment:

  • a nix python environment is created
  • the user is then dropped into it and can use pip to install extra stuff

cheers! :popcorn:

Thanks @kamadorueda that looks very promising indeed. It seems to me that it would be advantageous to mix the model of mach-nix (install any python package by doing dependency resolution in nix) and on-nix (use a repo of nix derivations for pypi packages). Is that possible?

Fetching arbitrary projects from pypi is the easy part. The hard part is customizing/patching so it works on Nix, for example, see the big “Custom Nix configuration” done for this package:

https://python.on-nix.com/projects/kaleido-0.2.1-python310/

This is the added value of using python.on-nix.com,
we already did the customization/patching so the offered projects can just be copied and they will work

But yes! I’m open to contributions exploring that integration with mach-nix. The general purpose of On Nix · GitHub is improving the hard/missing pieces in the ecosystem by releasing projects into the public domain

Agreed, getting the required customizations right is the hard part. What’s nice with mach-nix is that it can take a very long list of dependencies and almost automatically make things work - I really don’t want to write out all of those dependencies by hand and manage them manually every time dependabot updates them in our python project. I’d like our end-users and our developers to be able to use released and unreleased versions of the libraries we write (that’s a bit beyond the scope of the original question of this post though).

Thanks for the link to the example though - I will definitely give that a spin. Though I’m not sure it will solve my problem of the docker image setup… will report back :slight_smile: