How do people bisect Nixpkgs?

I’m tracing down a bug in nixpkgs, and trying to bisect where the bug came from.

Naturally, given the size of the nixpkgs repository, I used to have a shallow clone. Since I have to bisect (and access commit history), I switched to a treeless clone. However, git bisect commands are incredibly, incredibly slow (as in running for tens of minutes with no progress). I switched to a blobless clone to see if it would improve things, but it was the same as well.

How do people even bisect nixpkgs? Any tips on how I should configure the repository?

3 Likes

I don’t know about the preferable type of clone locally, as I have a full clone (amounts to 2.7G currently). When bisecting you really want to use git bisect --first-parent as that prevents you ending up in a staging cycle while bisecting and having to rebuild the world.

12 Likes

Oh its only 2.7G? That’s considerably smaller than I imagined. Perhaps I’ll try a full clone.

What do you mean by ending up in a staging cycle?

By bisecting you end up in “random” commits, and you might end up in some commit that was merged into master via the staging branch, which is intended for changes which have lots of dependents (e.g. changes to stdenv would rebuild basically everything). Read more about staging in the CONTRIBUTING.md of nixpkgs.
When bisecting you really do not want to end up on such a commit since you will have to rebuild a lot of things, so --first-parent essentially does not go into pull requests, but instead tells you which merge commit broke whatever you’re bisecting.

6 Likes

I see, that makes a lot of sense. Thanks for your great tips!

Alas with my full clone, bisecting seems a little faster, but still fairly slow… (minutes for each bisect)

By the way, when I’m bisecting git seems to frequently complain:
“error: The following untracked working tree files would be overwritten by checkout:”
with some seemingly random files in nixpkgs even though git status shows a clean working tree, and persists even after git reset --hard and git clean -f, and only lets me keep bisecting when I directly rm the files. Would you happen to understand what’s going on?

and sometimes, this commit is precisely a merge from staging… In this case, what’s the best way to find the precise commit? (Currently, either I bisect and rebuild a lot of things, or I inspect the history manually and play the guessing game).

How many commits are between your known good/bad commits? Yon can find that via git rev-list --first-parent --count <bad rev>..<good ref>. That being too big might be the problem. On my machine having ~100k commits makes bisect take ~11s for the first step, testing for the whole history (so ~775k commits without --first-parent) the first bisection step has been running for more than 20 minutes now without result. It seems nixpkgs is too big for bisect to run in reasonable time in extreme cases.

Sorry, I haven’t encountered that, and wouldn’t know how to handle it other than deleting the files.

1 Like

I don’t think there’s any better way. Maybe you could look through hydra and find some commits at which your package did/did not build successfully and make a few bisection steps manually, but I don’t really know my way around hydra so I don’t have experiences with this.

1 Like

Currently I’m at around 20k commits between good/bad (but takes many minutes for bisect as opposed to your seconds for much more commits). That the time taken is related to the number of commits is a useful hint, I’ll keep that in mind.

Thanks for your help!

1 Like

A simple git checkout <commit> can take a few seconds in itself on my machine for nixpkgs. I always expect the execution time for a bisect to last a lot more than mere minutes.

are you talking about a whole git bisect run or a single git bisect good/git bisect bad here? The latter should be fairly quick?

My last message was about full git bisect run indeed. If I remember correctly, marking good/bad commits involves a checkout of a new commit. On my machine this can take several seconds (I’d say less than 20 though). Indeed, if this is what runs in minutes, there is a problem.

Yeah, each git bisect good/bad takes many minutes (tens?) for me…

Since recently we’re trying to put at least stdenv into cache.nixos.org for (almost?) each commit that appears on the staging branch (say, --first-parent on staging). Via Hydra - nixpkgs:staging

That partially helps with each full rebuild that you do. People with powerful machines in practice can/do bisect also over staging (or anywhere), including myself sometimes.

3 Likes

That is not normal. Does a regular checkout of a distant commit in the past takes the same amount of time? If yes, I’d try to debug / diagnose that. If no, no idea what’s going on.

1 Like

That doesn’t sound normal, unless you have a processor older than the average user here, or your disk is almost failing… though as you realized already, you should be using blobless or full clones only.

git gc --aggressive will bring the clone size down as well, btw (though it takes many minutes). My current full clone is 1.8G.

Just so I can match the scenario, what’s your git bisect log so far?

Okay, git bisect is finally getting to a reasonable time (several seconds) now that the window is narrowing (around 2k commits). Indeed git checkout to distant commits take a long time (about a minute).

My git bisect log looks like this.

git bisect start '--first-parent'
# status: waiting for both good and bad commits
# good: [380be19fbd2d9079f677978361792cb25e8a3635] Merge pull request #235159 from prusnak/bitcoin-22.05
git bisect good 380be19fbd2d9079f677978361792cb25e8a3635
# status: waiting for bad commit, 1 good commit known
# bad: [7f6d0b3986a142b788de597a5d93d1cba7ad265e] alistral: 0.5.2 -> 0.5.5 (#393540)
git bisect bad 7f6d0b3986a142b788de597a5d93d1cba7ad265e
# good: [5dc2630125007bc3d08381aebbf09ea99ff4e747] scope-lite: init at 0.2.0
git bisect good 5dc2630125007bc3d08381aebbf09ea99ff4e747
# good: [1890caf5b32c5281878bd1578e177ff034126147] Merge pull request #289250 from r-ryantm/auto-update/python311Packages.pinecone-client
git bisect good 1890caf5b32c5281878bd1578e177ff034126147
# bad: [1b4dd6f8984c45196297cca6b4a1fbf3bd87cf59] httm: 0.42.0 -> 0.42.4 (#336788)
git bisect bad 1b4dd6f8984c45196297cca6b4a1fbf3bd87cf59
# bad: [6285d6aa78940272d40bbcec3d6f7d897ab4c639] Merge pull request #314648 from r-ryantm/auto-update/mpvScripts.mpvacious
git bisect bad 6285d6aa78940272d40bbcec3d6f7d897ab4c639
# good: [1b44615623603e10a10dfc18b7671392ddcabeb4] Merge pull request #301802 from andresilva/ferdium-6.7.2
git bisect good 1b44615623603e10a10dfc18b7671392ddcabeb4
# bad: [79f4b5b1050caed092629308c698b9e3ee8b32dc] python3Packages.wikitextparser: 0.55.5 -> 0.55.13 (#308267)
git bisect bad 79f4b5b1050caed092629308c698b9e3ee8b32dc
# good: [66520524046923ba6d60dc9bd3870f48fc7cbdb6] Merge pull request #305224 from r-ryantm/auto-update/python312Packages.scikit-hep-testdata
git bisect good 66520524046923ba6d60dc9bd3870f48fc7cbdb6

i added my notes on bisecting to the wiki before

2 Likes