Git fetcher with each submodule as a separate deprivation

Currently trying to use either builtins.fetchGit with submodules = true or fetchFromGitHub with fetchSubmodules = true on a poor connection is tough as the connection breaking during the fetching of a submodule (or a submodules submodule) stops the whole process and you have to begin again from the start. Ideally, we’d have a system where something like the following happens:

  1. builtins.fetchGit would do a non-recursive fetch
  2. git submodules and other commands would be called to generate the urls, paths and revisions of each submodule to a file.
  3. The .git directory is removed (for the purpose of determinism, see `fetchgit` with `leaveDotGit = true` is still not completely deterministic · Issue #8567 · NixOS/nixpkgs · GitHub). The directory is written into the store.
  4. builtins.readFile reads the submodules file, performs the above recursively for each submodule.
  5. Once each submodule has been fetched, a final source directory is constructed using symlinks to the original source, but with the submodules directories replaced with symlinks to each submodule store path.

Generally it’d be better for this to be a built-in rather than going in nixpkgs, as otherwise you’d need to specify a hash for every single submodule. If you wanted to modify the built-ins as little as possible then you could simply add an extra attr containing the submodule paths, urls and hashes to the output of builtins.fetchGit or something and implement the rest in nixpkgs.

1 Like

I’ve attempted to implement a MVP here: Comparing NixOS:master...expenses:hacky-git-submodules · NixOS/nix · GitHub. I had to store a JSON string as opposed to an actual subattr in the output attr but it works well enough. Using it you can do something like this:

let
  pkgs = import /home/ashley/projects/nixpkgs { };
  fetchRecursive =
    attrs:
    let
      repo = builtins.fetchGit attrs;
      submodules = builtins.fromJSON repo.gitSubmodules;
      mappedSubmodules = builtins.mapAttrs (k: v: fetchRecursive (attrs // v)) submodules;
    in
    {
      inherit mappedSubmodules repo submodules;
    };
in
pkgs.writeText "wow" (
  builtins.toJSON (fetchRecursive {
    url = "https://github.com/expenses/lighthugger";
    rev = "9d6c46b3e339e9673b69b7d9933d6a439051232c";
  })
)

I’ve posted about this on the discourse at Add some git repo-related info to the output attrs of `builtins.fetchGit` (submodules urls and revs specifically) · Issue #11636 · NixOS/nix · GitHub.

Something like this accomplishes the goal of symlinking all the source directories into a final directory. It’s very messy though:

fetchRecursive =
    attrs:
    let
      repo = builtins.fetchGit attrs;
      submodules = builtins.fromJSON repo.gitSubmodules;
      mappedSubmodules = builtins.mapAttrs (k: v: fetchRecursive (attrs // v)) submodules;
    in
    if (mappedSubmodules == { }) then
      repo
    else
      pkgs.runCommand "patched-source" { } (
        pkgs.lib.strings.concatStringsSep "\n" (
          (builtins.map
            (f: ''
              mkdir -p $out/$(dirname ${pkgs.lib.strings.removePrefix (repo.outPath + "/") f})
              ln -s ${f} $out/${pkgs.lib.strings.removePrefix (repo.outPath + "/") f}
            '')
            (

              pkgs.lib.filesystem.listFilesRecursive repo
            )
          )
          ++ (builtins.attrValues (
            builtins.mapAttrs (k: v: ''
              mkdir -p $out/$(dirname ${pkgs.lib.strings.removePrefix "/" k})
              ln -s ${v} $out/${pkgs.lib.strings.removePrefix "/" k}
            '') mappedSubmodules
          ))
        )
      );
1 Like