How to fetch LFS enabled repo with fetchFromGitHub?

I’d like to fetch some LFS enabled repo from GitHub using fetchFromGitHub, but I’ve not found any examples in nixpkgs. Is it doable?

1 Like

Theoretically, the only thing you need is a fix output derivation that calls git with the right parameters. It doesn’t seem like there is one in nixpkgs right now, though.

I haven’t tried this but you can probably hack it with something like

src = (fetchFromGitHub {
  owner = "foo";
  repo = "bar";
  rev = "revision here";
  sha256 = "hash here";
  fetchSubmodules = true; # needed to use fetchgit internally
  leaveDotGit = true; # needed to preserve the .git dir
  postFetch = ''
    git lfs init
    git lfs fetch
    # anything else needed to check out lfs files
    # possibly delete .git now
  '';
).overrideAttrs (oldAttrs: {
  nativeBuildInputs = oldAttrs.nativeBuildInputs or [] ++ [ git-lfs ];
});

It’s probably a little cleaner to skip fetchFromGitHub and call fetchgit directly.

You could also try something that wraps builtins.fetchGit, though that doesn’t take a postFetch script so you’d have to use it as the src for a wrapper derivation and hope that it preserves the .git dir (which I honestly have no idea if it does).

And finally, you could also consider submitting a PR that adds native git-lfs support to fetchFromGitHub or fetchgit directly.

2 Likes

The benefit of the github fetcher is that it will get a tarball. Since the LFS needs to directly use git features, it may as well just use fetchgit

Well, the github fetcher also builds the URL and supports private github instances, but yeah, going with fetchgit would be a cleaner approach.

1 Like

But fetchFromGitHub doesn’t support fetching via git from private GH repository:
assert private -> !fetchSubmodules;

Oh huh, I wonder why that is.

The benefit of the github fetcher is that it will get a tarball.

This is actually a misconception, depending on what you mean [0]; I had the same misunderstanding myself for a while, because what we actually do is fairly counter-intuitive.

GitHub hosts ${ver}.tar.gz files for all commit releases, and while we do download this via fetchFromGitHub -> fetchzip [1], the thing that we actually end up with is the recursive hash of the directory that comes from unpacking it. We do this so that GitHub doesn’t change their compression method or some timestamp metadata and produce an irreproducible hash [2], but what we really should do is just round-trip it through an unpack -> deterministic repack.

Aside from being ~10x larger on disk than the compressed tarball, the unpacked recursive directory hash also cannot be mirrored through hashed mirrors like tarballs.nixos.org or – hopefully someday – the Software Heritage [3].

[0] Downloading the tar.gz and unpacking it is still much faster than a git clone with full history, of course, so we benefit today on that dimension.
[1] https://github.com/NixOS/nixpkgs/blob/79969356682e7ea642a0ee934080cc769a689790/pkgs/build-support/fetchgithub/default.nix#L14
[2] https://github.com/NixOS/nixpkgs/blob/79969356682e7ea642a0ee934080cc769a689790/pkgs/build-support/fetchzip/default.nix#L1-L20
[3] https://github.com/NixOS/nixpkgs/issues/53653

2 Likes