Fodwatch: Automation for detecting reproducibility issues in fixed-output derivations

The goal of this project is detecting broken fixed-output derivations (for example: upstream tarballs or entire domains bitrotting away, git tags getting force pushed, fetchurl instead of fetchpatch, etc.). It does this by forcibly rebuilding all FODs in nixpkgs with allowSubstitutes=false and disabling tarballs.n.o.

Broken FODs are currently undetected because sources are also cached at cache.n.o.

Currently, 845 out of 91025 (or 0.93%) of FODs are broken for x86_64-linux (but keep in mind that I initially ran this in May, which was a bad month for GitHub uptime). I intend to run this somewhat regularly.

https://tapesoftware.net/fodwatch/

22 Likes

Nice website! If it isn’t too hard it would be cool to search by maintainer too.

3 Likes

That’s possible now.

2 Likes

Tangential curiosity: how hard do you think it would be from ~here to trawl through past revisions to collect and deduplicate all of the FODs before checking?

(If the cache leaves S3, I’ve wondered if there’s meaningful egress savings in seeding the new cache with all of the sources that are still reachable and unchanged.)

If I understand correctly, you want a set of all outPaths of all FODs, ever (and then possibly build those). The limiting factor there is how fast you can evaluate nixpkgs, as each evaluation takes 5-10 minutes ish. And you need to potentially do it for every system that gets cached (there are darwin and linux specific packages for example. Even now this project is missing some derivations because of that).

1 Like

Yes, roughly (and you’re right that there will be a tail of system-specific FODs, especially in the packages that just download existing binaries).

I could imagine reducing the eval load by writing a script that can iterate over revisions and ~accumulate revisions that don’t overlap/conflict…

That said, as I think about the logistics, and how many machine-months it could take to eval even 1/20th of the number of commits, and about how much time the git operations alone might take, and the fact that the cache may also have sources from hydra builds in other repos–I’m hoping the problem is more tractable from the cache side :slight_smile: