How to debug nix cache misses?

Occasionally, my environment starts building llvm or some other low level package. How can I debug or troubleshoot which part of my flake caused the package dependency and why wasn’t in the binary cache? Any other recommendations for minimizing cache miss experience? I’m running nix on a M1 Mac.

2 Likes

re llvm - swift 5.8 stopped working recently for darwin due to changes in nixpkgs and so was not in cache. (Although I did get a build at some time) It was then marked as broken.

So changes might not be in your flake but due to changes in the flakes you import.

Thanks for the hint @mark. Any workarounds you can recommend in the meantime? Is there an issue to follow?

For me I only had swift through another flake - so stopped overriding its nixpkgs.

In general revert to an older flake.lock especially for nixpkgs-unstable etc. (or use nixpkgs-24.05)

For news look at matrix channel nix-darwin
And Build failure: swift-5.8 · Issue #320900 · NixOS/nixpkgs · GitHub and Update request: swift 5.8 → 5.10 · Issue #297655 · NixOS/nixpkgs · GitHub

See Frequently Asked Questions — documentation

If you don’t have any “custom” packages and stick to the official channels, a cache miss of a non‐trivial build almost always means that the package is broken and the build will fail. (Of course those are large caveats!)

1 Like

That really does seem like a bit of a paper cut, doesn’t it? If only failure caching were more reliable.

Is there any way to identify builds that should not be happening locally? Guess it’s tricky because some files are config-specific… But I do also often find myself wondering why a project name zips by between builds, would be nice to optimize away accidental modifications.

There’s broken, I guess. But unstable doesn’t benefit as much from the Zero Hydra Failures process that gets that set.

Ah, I guess you’re talking about overrides and overlays causing unexpected rebuilds? I suppose my only real suggestion there is “try not to use them”.

1 Like

Yes, I read that while searching around internet. Unfortunately that answer starts with a derivation hash to check and I don’t have it. I just see that some package, e.g. llvm-16, is being built instead of fetched and that it stays in the output for a very long time. I guess when a normal user needs to build llvm locally there is something broken somewhere. My question was how does the user know if he messed up something himself or if it’s just the nix infra having a bad day.

Hm, I guess overlays at higher-level packages should not cause llvm to build locally. But what I have in my config is a mix of inputs from nixos-23.11, nixpkgs-23.11-darwin and nixpkgs-unstable.

It would really be helpful to have a tool/script that would print a bill of materials given my config and flake.lock, indicating which derivations are not available in the cache.

2 Likes

My suggestion is to first use nix output monitor to see a graph of the build dependencies while your flake is building.
You can then go to https://hydra.nixos.org/jobset/nixpkgs/trunk, select the latest evaluation with no pending packages (the grey box in the table), and then search for your offending package. In this case, swiftPackages.swift.aarch64-darwin has failed to build a few times, but you can see the last successful build was this one, and there in the input changes, you can see exactly which commit of nixpkgs (8de5bd2a) this version was built from.

Now you’ll probably want to lock the input of your flake to that revision. For that you need the full hash, which you can get by clicking the link in the input changes and copying the top hash from there (8de5bd2ac7c9a1c77a38e8951daa889b6052697f).

1 Like

Thanks @iFreilicht, nom is a good hint, although I’m still stuck.

I ran nom build .#darwinConfigurations.$(hostname).system after nix flake update on my MacOS system and what I see is this:

coreutils> building '/nix/store/d4541rydxkxl0fz77hdy3klsr2zfh7qa-coreutils-9.5.drv'
coreutils> ...
┏━ Dependency Graph:
┃ ⏵ coreutils-9.5 (configurePhase) ⏱ 1m33s
┣━━━ Builds
┗━ ∑ ⏵ 1 │ ✔ 0 │ ⏸ 0 │ ⏱ 1m36s

Then I go to the last non-pending Hydra job and see that coreutils for my platform was successfully built. So should nix pick the last successful (non-pending) job from hydra when I do nix flake update and have unstable nixpkgs input?

The relevant parts of that output would be before your selected output.
The out pout lists what derivations are to be built.

e.g. my recent one.

these 14 derivations will be built:
  /nix/store/20qk3yhxqb9f6nljslaksjjhp4ksz6zf-bashrc.drv
  /nix/store/48q473av35yjb4xnr1z3nxf4qq4fdd15-home-manager-applications.drv
  /nix/store/5x71nhqdz0ppz4fimq4b107lagvd8p7q-hm_.configzsh.zshrc.drv
  /nix/store/5q2naizr1wldxs169z95lmay3j8l7xml-ledger-3.3.2-fish-completions.drv
  /nix/store/6xqh9cjvgcxgd1fylgj4sqvx3lsawss1-mark-fish-completions.drv
  /nix/store/p7xql9ydn3amqyf7fmgmxvb618slfnsx-home-manager-fonts.drv
  /nix/store/9z6bx4l68x5f7s8a458xm672g9nbgfvm-hm_LibraryFonts.homemanagerfontsversion.drv
  /nix/store/mdd50phhjl6ps0f0if62s9bh1mf6szm8-man-paths.drv
  /nix/store/47lif3gszalff0zf5sl5mjy0x8rgx0y2-man-cache.drv
  /nix/store/sjhvz6qzh2a4f766slws8cn7n0l14rsp-hm_.manpath.drv
  /nix/store/23pbrwv43v7rqyb61364ibpvd18jc2m1-home-manager-files.drv
  /nix/store/jsmyc5sbibahqr99d8vp5d8dfgqbgl5s-home-manager-path.drv
  /nix/store/5671agklgzzmdj44sca1gnk2qxdmpbd2-activation-script.drv
  /nix/store/calx9fnckp9ipyrsbs644h3jxazw3pb9-home-manager-generation.drv
this path will be fetched (0.73 MiB download, 3.17 MiB unpacked):
  /nix/store/rkvmx0jnkkn3px15pxszl2mdv5gsxymn-ledger-3.3.2

I see that /nix/store/d4541rydxkxl0fz77hdy3klsr2zfh7qa-coreutils-9.5.drv is being built in my output.

Following Cachix FAQ, I’ve checked that curl https://cache.nixos.org/d4541rydxkxl0fz77hdy3klsr2zfh7qa.narinfo is 404. Now how do I trace that particular coreutils derivation to a failed Hydra job and the nixpkgs commit hash?