filterSource (and friends) do not compose with fetchGit (and friends)

Dear nixos community,

I want to clean the sources I get after fetching from various places (e.g. fetchGit, fetchTarball, etc).
However, when I do

lib.cleanSource (fetchGit ./.)

or even the more basic

builtins.filterSource (_:_: true) (fetchGit ./.)

I get an error of the shape:

error: string '/nix/store/hmbz2libawj65yrgva56kl55j5sv5isa-source' cannot refer to other paths, at /nix/store/qwwgph8zj7f7y7nypv31j3gqn5xm7npg-nixpkgs-21.03pre257989.f1f9a55fb4b/nixpkgs/lib/sources.nix:94:51`

(and without the at ... when using direclty builtins.filterSource)

I looked around and

  • I found a stackoverflow question which explains the error and how to get a single file, but it is not my use case, I do not understand the explanation (why exactly would it be bad to copy & amend a directory from the store?) and I cannot figure how to get it working in my case…
  • I cannot find anything referring to this in the doc
  • the source.nix file seems to indicate that indeed I cannot clean the sources (according to lib.hasPrefix storeDir), but is not documented (and neither is builtins.hasContext) in the doc and no workaround is suggested.

In the end I am lost: I do not understand the error, nor how to perform the cleaning, could someone enlighten me?

Best,
Cyril

Hey Cyril!

My best guess as to why the restriction exists is that if filterSource accepted an arbitrary store path, it would first have to make sure it really exists by building it. Nix mostly tries to avoid these “back edges” from building to evaluation, and they are not allowed on Hydra for example. I don’t know if it would be simple/feasible to make an exception for store paths like produced by fetchGit that are guaranteed to exist at evaluation time. One alternative Eelco once mentioned is to directly add a filter argument to fetchGit/fetchTree, which should also save one store copy.

Not sure what’s a good workaround without changing Nix though. Could you talk a bit more about your use case? I would assume that lib.cleanSource wouldn’t do much on a fetchGit result since that usually shouldn’t contain the intermediate files filtered out by the former anyway.

Hi Kha, thanks for this explanation. I do not know enough about the internal of nix to understand I guess… Derivations, sources and builds are all three in the store, are you saying that derivations are distinguished (an known by nix to be available at evaluation time), while the other two cannot? Moreover, how is the source necessary at evaluation time, isn’t it enough to compute it at build time, or does the hash need to match the contents of the directory rather than derivation producing them?

My use case was to filter (using a custom lib.cleanSourceWith) more than just git meta data, but also other forms of meta data that are not used for the build (e.g. .nix .md .txt etc that are not involved in the build process of the projects I want to build) so that a commit on those files, e.g. just to update a typo in a README.md or CHANGELOG.md, would not change the derivation and potentially save dozens of minutes of compilation (or more for a nix based CI infrastructure). I guess I’ll just extend the git attributes for some files (such as .github, etc) and give up on files that I really want to be part of the general purpose tarballs…

If the remotes hash changes because any file changes, then even the filtered hash will change as it is derived from its inputs, not from the resulting content after filtering. Therefore even if the changed files are all filtered out, the input hash for your package changed and therefore a rebuild will happen.

Hi NobbZ, thanks for this explanation, I understand that treating sources like derivation builds would lead nowhere near my aim indeed. The only way for the filtering to work as I intend would be for the resulting sources to be stored by the hash of their contents…

I’m pretty sure, you could create something by not using filter but instead making a derivation that copies wanted sources in a fixed output derivation. That again will come with an increased effort of maintaining the hash of the FOD.

I tried making a derivation to copy and filter sources, but could not use it as a src for another derivation. What it the FOD? Can I make it so the hash of the output does not depend on the derivation but on the contents of the output?

FOD is just short for fixed output derivation.

It’s in the manual, though I don’t remember which if them. Search each of them for “hashalgo”, it should find you the correct spot.

And using a derivation as source should be totally possible, fetchFromGitHub and friends is not really different from a derivation… It even wraps exactly the concept of a fixed output derivation.

I see, I use filterSource variants to the same effect in Lean 4: https://github.com/leanprover/lean4/blob/f251f9b6fa038825734edf9a34623936fd06284b/nix/bootstrap.nix#L85. Note that I’m using a local path ../. here (which works even when fetchGit’d) to circumvent the filterSource restrictions. But this doesn’t work if you want to keep the Nix files in a separate repo from the sources you want to filter.