How to fetch LFS enabled repo with fetchFromGitHub?

I’d like to fetch some LFS enabled repo from GitHub using fetchFromGitHub, but I’ve not found any examples in nixpkgs. Is it doable?

2 Likes

Theoretically, the only thing you need is a fix output derivation that calls git with the right parameters. It doesn’t seem like there is one in nixpkgs right now, though.

I haven’t tried this but you can probably hack it with something like

src = (fetchFromGitHub {
  owner = "foo";
  repo = "bar";
  rev = "revision here";
  sha256 = "hash here";
  fetchSubmodules = true; # needed to use fetchgit internally
  leaveDotGit = true; # needed to preserve the .git dir
  postFetch = ''
    git lfs init
    git lfs fetch
    # anything else needed to check out lfs files
    # possibly delete .git now
  '';
).overrideAttrs (oldAttrs: {
  nativeBuildInputs = oldAttrs.nativeBuildInputs or [] ++ [ git-lfs ];
});

It’s probably a little cleaner to skip fetchFromGitHub and call fetchgit directly.

You could also try something that wraps builtins.fetchGit, though that doesn’t take a postFetch script so you’d have to use it as the src for a wrapper derivation and hope that it preserves the .git dir (which I honestly have no idea if it does).

And finally, you could also consider submitting a PR that adds native git-lfs support to fetchFromGitHub or fetchgit directly.

3 Likes

The benefit of the github fetcher is that it will get a tarball. Since the LFS needs to directly use git features, it may as well just use fetchgit

Well, the github fetcher also builds the URL and supports private github instances, but yeah, going with fetchgit would be a cleaner approach.

1 Like

But fetchFromGitHub doesn’t support fetching via git from private GH repository:
assert private -> !fetchSubmodules;

Oh huh, I wonder why that is.

The benefit of the github fetcher is that it will get a tarball.

This is actually a misconception, depending on what you mean [0]; I had the same misunderstanding myself for a while, because what we actually do is fairly counter-intuitive.

GitHub hosts ${ver}.tar.gz files for all commit releases, and while we do download this via fetchFromGitHub -> fetchzip [1], the thing that we actually end up with is the recursive hash of the directory that comes from unpacking it. We do this so that GitHub doesn’t change their compression method or some timestamp metadata and produce an irreproducible hash [2], but what we really should do is just round-trip it through an unpack → deterministic repack.

Aside from being ~10x larger on disk than the compressed tarball, the unpacked recursive directory hash also cannot be mirrored through hashed mirrors like tarballs.nixos.org or – hopefully someday – the Software Heritage [3].

[0] Downloading the tar.gz and unpacking it is still much faster than a git clone with full history, of course, so we benefit today on that dimension.
[1] https://github.com/NixOS/nixpkgs/blob/79969356682e7ea642a0ee934080cc769a689790/pkgs/build-support/fetchgithub/default.nix#L14
[2] https://github.com/NixOS/nixpkgs/blob/79969356682e7ea642a0ee934080cc769a689790/pkgs/build-support/fetchzip/default.nix#L1-L20
[3] Use Software Heritage as a fallback download location · Issue #53653 · NixOS/nixpkgs · GitHub

2 Likes

I have been unsuccessful in getting git-lfs to work with fetchgit. The (lack of) progress can be followed here.

The helper I’ve written currently looks like this:

{ # ...
  fetchgitLFS = args:
    let
      args' = args // {
        fetchSubmodules = true;
        leaveDotGit = true;
        deepClone = true;
        postFetch = ''
          cd $out
          git remote add origin ${args.url}
          git lfs install --local
          git lfs fetch
          git lfs checkout ${args.rev}
        '';
      };

    in
      (pkgs.fetchgit args').overrideAttrs (oldAttrs: {
        nativeBuildInputs = oldAttrs.nativeBuildInputs or [] ++ [ pkgs.git-lfs ];
      });
}

but while the commands appear to do their thing, the assets are not actually present in the end.

Any advice on this would be helpful.

do you have an example repo?

Try this:

let
  # ...

  velorenSrc = fetchgitLFS {
    url = "https://gitlab.com/veloren/veloren";
    branchName = "yusdacra/override-git-lfs";
    rev = "e7eb51cecf3f251071ddecc026118e45c03adfb6";
    sha256 = "sha256-151UCMv/stPp7/gyjB3RHUOwJglyTa4/zU2RZdtbsRM=";
  };

in
  import "${velorenSrc}/nix" { system = "x86_64-linux"; disableGitLfsCheck = true; }

I think that should (not) work as expected.

change that to

git lfs fetch origin ${args.branchName}

don’t forget to invalidate the hash before re-creating

EDIT:
actually, this still doesn’t substitute the lfs pointer files. I’m not sure, git lfs fetch should suffice.

This may have to do with fetchhgit cleaning up some of the git files before running postFetch

I essentially had to rebuild the entire wheel, but was able to do it. Forgot that git-lfs will determine it’s behavior if related entries in $HOME/.config/git/config are present:

let
  pkgs = import <nixpkgs> { };
  fetchGitLRS = pkgs.callPackage ({ stdenvNoCC, git, git-lfs, cacert, writeText, sha256, url, branchName ? null }:
  stdenvNoCC.mkDerivation rec {
    name = "veloren-src";

    nativeBuildInputs = [ git git-lfs ];
    doBuild = false;
    gitconfig = writeText "gitconfig" ''
      [filter "lfs"]
        clean = "git-lfs clean -- %f"
        process = "git-lfs filter-process"
        required = true
        smudge = "git-lfs smudge -- %f"
    '';

    preferLocal = true;

    builder = writeText "git-lfs.sh" ''
      source $stdenv/setup
      set -x

      export HOME=$TMPDIR
      mkdir -p $HOME/.config/git/
      cp ${gitconfig} $HOME/.config/git/config

      mkdir -p $out
      git clone "https://gitlab.com/veloren/veloren" \
        ${if (branchName != null) then "--branch \"${branchName}\"" else ""} \
        $out
      cd $out
    '';

    outputHashAlgo = "sha256";
    outputHashMode = "recursive";
    outputHash = sha256;

    GIT_SSL_CAINFO = "${cacert}/etc/ssl/certs/ca-bundle.crt";

    impureEnvVars = stdenvNoCC.lib.fetchers.proxyImpureEnvVars ++ [
      "GIT_PROXY_COMMAND" "SOCKS_SERVER"
      ];


  });
in
  fetchGitLRS {
    url = "https://gitlab.com/veloren/veloren";
    branchName = "yusdacra/override-git-lfs";
    sha256 = "13hr856pkwvvh10823i9yym8wh7vbx7w1b1xnw6f7khx80srga0n";
  }

Thank you for the help. I managed to create a working fetchgitLFS with this information.

PR to add lfs support to fetchtgit fetchgit: add lfs support by jonringer · Pull Request #105998 · NixOS/nixpkgs · GitHub

1 Like

I merged the PR as a month passed, and the related code seems to not affect determinism.

That’s great, thank you!