Feedback on PR for adding nix in reproducible data science tool Binder

Pull Request: Adding support for nix buildpack in repo2docker by costrouc · Pull Request #407 · jupyterhub/repo2docker · GitHub

TLDR;

This pull request allows that by simply including a default.nix file in any public git repository you can enable anyone to launch a jupyter notebook with all those dependencies for free on a 1 core cpu in google cloud using https://mybinder.org/.

Within the python data science ecosystem there is a popular tool called binder notebooks which is:

  • a tool to reproducibly convert a git repository to a docker environment
  • jupyter notebook server
  • only requires a browser to interact (enabling easy demos of software and quick exploration)

An example is worth a thousand words. Here you will explore the data used to verify gravitational waves with python.

Binder

Right now conda, python pip, R, and julia package managers are being used. As a nix user I see several issues with this approach especially if we are advocating reproducible data science.

  1. How do the package managers pin the packages? (In most instances they don’t)
  2. How to handle dependencies that are not the specific language package manager (conda does a someone good job at this)
  3. What if the repository needs to build some small tools before being an interactive notebook?
  4. Does it have package X?

Nix answers the following by using nix-shell for the environment. We can even pin everything to an exact git commit.

let
  # Pinning nixpkgs to specific release
  # To get sha256 use "nix-prefetch-git <url> --rev <commit>"
  commitRev="5574b6a152b1b3ae5f93ba37c4ffd1981f62bf5a";
  nixpkgs = builtins.fetchTarball {
    url = "https://github.com/NixOS/nixpkgs/archive/${commitRev}.tar.gz";
    sha256 = "1pqdddp4aiz726c7qs1dwyfzixi14shp0mbzi1jhapl9hrajfsjg";
  };
  pkgs = import nixpkgs { config = { allowUnfree = true; }; };
in
pkgs.mkShell {
  buildInputs = with pkgs; [
    python36Packages.numpy python36Packages.scipy
    python36Packages.jupyterlab
  ];

  shellHook = ''
    export NIX_PATH="nixpkgs=${nixpkgs}:."
  '';
}

Questions that I have for the nix community of the pull request

Please go to the pull request Adding support for nix buildpack in repo2docker by costrouc · Pull Request #407 · jupyterhub/repo2docker · GitHub to provide feedback. I would like to allow any any repository with a default.nix to be explorable in a web browser.

3 Likes

I think it’s a good idea to use Nix here, however, one part that worries me a bit is the lack of mirroring for all dependencies. For example, I published the Nix expression for building my PhD thesis, but certain (texlive) dependencies are no longer available at the specified mirrors.

@FRidh in response to that I was considering adding cachix support. This way it would only require that the maintainer build it once. I have never used cachix before so I am not sure how this would fare