How to use RFC0133 git hashing for FODs correctly?

A friend of mine recently made me aware of https://github.com/NixOS/rfcs/pull/133 which introduced git’s tree hashing scheme as a valid hashing mode in Nix that results in a sha1 hash.

We discussed this in the context of Robotnix where we need to lock thousands of git repositories as FODs.
Pre-fetching 1000s of FODs (some of which are GiBs in size) is extremely time-consuming which massively slows down the update process. My friend’s idea was that we could cheaply fetch the tree hash of the remote heads without needing to nix-prefetch-git.

We played around with the git-hashing experimental feature together yesterday and were able to create a FOD that hashed the tree a la git and it was trivial to make fetchgit use that mode too, so even fetching actual stuff from the internet works just fine.

What we weren’t able to do is use a known git tree hash as the FOD hash. I suspect this is due to FODs not actually being content-addressed but being the hash of some special derivation representation that also includes e.g. the derivation name.

Is there a way to make this idea work?

The RFC mentions fetching sources from the software heritage foundation as a mirror but those must be FODs too though, right? How would that work?

cc @Ericson2314

4 Likes

I tried to run the current nixVersions.git at master

My current impression is: it hashes correctly, the same as I get from %T in git log, but it hashes $NIX_BUILD_TOP/tmp instead of $out

So my guess is that the support is not yet fiinished.

Please explain me why the following commands are not the good ones for testing (/tmp/a set up with randomly generated noise in a git repository):

/nix/store/azpy5a4w7whrcywg6rfks2a0mwyzwr0a-nix-2.25.0pre20241101_2e5759e3/bin/nix-build --no-out-link -E 'with import <nixpkgs> {}; runCommandCC "test2" {outputHash="00000000000000000000000000000000";outputHashAlgo="sha1";outputHashMode="git";} "mkdir tmp; mkdir -p $out; cp  ${"" + /tmp/a}/* tmp/"' --extra-experimental-features git-hashing --impure
/nix/store/azpy5a4w7whrcywg6rfks2a0mwyzwr0a-nix-2.25.0pre20241101_2e5759e3/bin/nix-build --no-out-link -E 'with import <nixpkgs> {}; runCommandCC "test2" {outputHash="00000000000000000000000000000000";outputHashAlgo="sha1";outputHashMode="git";} "mkdir tmp; mkdir -p $out; cp  ${"" + /tmp/a}/* $out/"' --extra-experimental-features git-hashing --impure
2 Likes

Oh!

I noticed that it was being weird about $NIX_BUILD_TOP/tmp not existing but wanted to report that separately.

Indeed that seems to be hashing that rather than $out. When I change $out, the hash doesn’t change and when I change $NIX_BUILD_TOP/tmp, it does change. I’ll get to testing whether that corresponds with the git tree hash.

Funnily, this allows you to produce a weird state where you have a non-deterministic FOD that does not fail before the --check determinism check because the hash “matches”.

Indeed if you simply cp -a $out tmp in the end, it hashes to the correct git tree value! It works!

Clearly this feature isn’t ready for prime time yet but this also sounds like a rather simple bug to fix.

(Arguably, Robotnix isn’t ready for prime time either yet, sooo perhaps it’d be fine to use this feature with the workaround?)

I think that «weird states» don’t count when the feature is clearly not finished yet.

I guess your workaround could be called a «preview» or something, given that upgrading to a correctly working git hasher should be transparent (and doing a completely-extra bunch of persistent writes is how everything is done with Nix anyway)

1 Like

Wow, that’s a really weird bug!

Yeah, I haven’t tried push it for prime time yet (I went and finished a first version because of potential good interactions with the fetchers). But I am glad people are trying it out!

2 Likes

So that’s where the bug leved (I tried to find out where actual hashing happens, looked at all the cross-references, decided that everything is happenning somewhere else, and just blackboxed the diagnosis).

Thanks for the quick fix!

1 Like

Here’s the bug fix (since @7c6f434c didn’t link it): Fix #12295 by Ericson2314 · Pull Request #12335 · NixOS/nix · GitHub

You’re welcome!

3 Likes

OK, I am going to be a little more optimistic. Please use this, and if we find no bugs, we should be should be ready to stabilize it.

This are changes/extension I thought about, like a trick analogous to the “case-insensative file unpack nars hack” for unpacking submodules without getting a different tree hash, but they could always be a separate feature rather than a breaking change.

3 Likes