KiCAD suddenly getting rebuilt from source

As of the last 2 builds on nixos-unstable (e.g. 942eb9a3) my nixos-rebuild switch --upgrade has tried to build kicad-unstable from source. I killed it the first time around, for lack of patience and on the hope that it might be some odd transient cache-miss issue. This time, well, it’s still going but I have more patience :slight_smile:

I’m not sure what changed, I’ve been getting cached builds of this all along until now. I don’t see anything suspicious in recent commits. Only another version bump, except that this one is somewhat different, because it’s the first rc for 6.0

https://github.com/NixOS/nixpkgs/commit/decac5a0d20b05d7b95b9e0e2ad324e4fe7df7ba

image

Is there something weird in the version-parsing logic that would cause this to not get cached somehow? 2021 → 6 being seen as a downgrade?

1 Like

https://search.nixos.org/packages?channel=unstable&from=0&size=50&sort=relevance&type=packages&query=kicad

ok… i found it!

the PR seems to be ok?

https://nixpk.gs/pr-tracker.html?pr=142261

looks like it’s just been built.

https://hydra.nixos.org/build/159887280

you can find this good stuff out with hydra-check

I’m still trying to find out a good definition of a ‘channel’ vs a commit onto master/unstable , so there might be ‘somewhere else’ to check for ‘builds on a channel’.

hydra-check seems to report quite old builds… but that might be a configuration problem at my end.

Hm. Well, it’s still trying to build from source, so there’s still something else going on.

roughly, as I understand it:

  • commits and PR’s go onto master (directly, or more commonly via staging-next and staging). There’s a description and diagram for this in the manual.
  • a Hydra job kicks off, and picks a revision (I assume the current HEAD)
  • builds and tests run, sometimes they fail, sometimes they pass
  • assuming the tests are good, that rev gets merged to nixos-unstable and becomes the new HEAD of that branch
  • some time passes, usually a few hours. (I’m not entirely clear what the delay is here, it might just be for mirrors to have a chance to sync?)
  • the channel metadata that nix-channel will fetch gets updated to the new rev

By this understanding, the builds should all have well and truly finished by the time the channel is updated.

thanks for the explanation, a little clearer.

maybe it’s something to do with imminent release of 21.11 , but that is a complete guess without any evidence.

Whatever it was, it’s now been resolved with the build against rev 8a308775 that just appeared.

1 Like

hydra and channels work in mysterious ways. maybe it’s to do with loading the finished artefacts into S3 buckets, or the mirror that fastly does? or maybe it’s nothing at all.

1 Like

this just poped into my head, i wonder if it’s do with negative caching on your nix client.

there is a local client side cache , that caches negative fetches for paths that don’t exist on cache.nixos.org, presumably to keep requests and load level down on the remote cache servers/infrastructure.

just an idea…

My understanding was that kicad-unstable isn’t build on Hydra, but it depends on kicad-unstable-small which is: kicad-unstable-small: init to build kicad-unstable's base on hydra by evils · Pull Request #82634 · NixOS/nixpkgs · GitHub

the plot thickens, i wonder why?

the full kicad package(s) include about 6gb of 3D models
i don’t want to cache those on hydra as they don’t really involve any build effort
and i want to offer a package without the 3D models (as they’re 6gb and before i cleaned up the package, the default package was without them)

1 Like

And now nixos-unstable points to commit 6daa4a5c045d, for which a cached build of kicad-unstable-small again does not exist on cache.nixos.org. However, Hydra - nixos:trunk-combined:tested shows two more recent builds for that job (commits: a78dd785b29 — failed, c71f061c68b — succeeded, but the channel update is still waiting for something), and a cached kicad-unstable-small exists for both of them.

Looking at Hydra - nixos:trunk-combined:nixpkgs.kicad-unstable-small.x86_64-linux, I see that build 159887280 (which corresponds to nixpkgs commit 8a308775) succeeded, then build 159999294 (nixpkgs commit 6daa4a5c045d) failed (but then that commit happened to be published in the nixos-unstable channel), then two subsequent builds (for a78dd785b29 and c71f061c68b) succeeded. There are some more failed builds there, all for the most recent kicad-unstable-small-9fb05440b3; build logs show segfaults and aborts due to memory corruption detected by malloc().

The previous snapshot of kicad-unstable-small apparently did not have such random build errors. So it looks like that 6.0.0-rc1 version is not really as stable as someone may hope.

Yeah. Part of this for me was learning that there’s less coupling between the channel updates and the more general cache. In digging around, it seems the channel update only builds a subset of things that are invoked in the tests, and other builds are scheduled independently. It makes perfect sense, but before really thinking about it, I has kind of assumed that the channel was used to build the entire tree - even if the tests didn’t cover everything or consider all package failures as critical.

So my original question was somewhat based on a false premise, this was just the first case I hit of a non-essential package that failed to build in a published channel head, where I happened to use both that package and that version.

The 6daa4a5c version is a repeatable failure related to paths and needing an update to nixos packaging. But the others, in particular the previous one from the original post, is a test problem that’s not repeatable - my local builds and tests succeeded, and it seems to be failing in some runs and succeeding in the next.

I suspect this is a resource contention problem in the build and test system - maybe extra workload or memory shortage because there are more release channels being built right now, or something.

In general, kicad has been working up to this release all year and has seen very few problems as lots of small bugs get squashed.

Apparently there is still a dependency in the sense that rebuilds for all packages must finish, but any of those rebuilds may fail — and unless some of those packages which failed to build were required by some tests, the channel update will still be published.

Does not look like that for me — Hydra - Build 159999294 of job nixos:trunk-combined:nixpkgs.kicad-unstable-small.x86_64-linux points to this build log for kicad-base, and the error there is:

8/8 Test #7: qa_pcbnew ........................***Exception: SegFault 17.30 sec
   … some fallout due to intentionally broken $HOME …
malloc(): unaligned tcache chunk detected

Other logs have SIGABRT and the same malloc(): unaligned tcache chunk detected error; this really looks like some memory corruption.

And I also had rebuilt the same 76pwy4ap1c7c0a581bpgyzzs6q28g9rm-kicad-base-9fb05440b3.drv derivation locally (when attempting to install kicad-unstable-small from the nixos-unstable channel), and did not see the build error, so there is definitely some reproducibility issue there.

1 Like

And on a 5th run of nix build -v --rebuild /nix/store/76pwy4ap1c7c0a581bpgyzzs6q28g9rm-kicad-base-9fb05440b3.drv I finally got:

checking outputs of '/nix/store/76pwy4ap1c7c0a581bpgyzzs6q28g9rm-kicad-base-9fb05440b3.drv'...
error: builder for '/nix/store/76pwy4ap1c7c0a581bpgyzzs6q28g9rm-kicad-base-9fb05440b3.drv' failed with exit code 2;
       last 10 log lines:
       >
       >
       > 88% tests passed, 1 tests failed out of 8
       >
       > Total Test time (real) =  13.21 sec
       >
       > The following tests FAILED:
       > 	  7 - qa_pcbnew (Subprocess aborted)
       > Errors while running CTest
       > make: *** [Makefile:105: test] Error 8
       For full logs, run 'nix log /nix/store/76pwy4ap1c7c0a581bpgyzzs6q28g9rm-kicad-base-9fb05440b3.drv'.

Full log at https://pastebin.com/SwsDCuCp (apparently it contains terminal escape sequences, and the default view simply drops ESC characters). Looks like another case of SIGABRT (similar to other build errors on Hydra, but different from the “Segmentation Fault” error for the exact same derivation there).

i seem to recall having the last test (qa_pcbnew) to hang for me once
i haven’t had an outright build failure though

i’ve got the impression that test started to take longer at some point
and am now bisecting the KiCad source to investigate that

this is heading towards opening an issue on KiCad
feel free to do that yourself, if i come up with any findings i’ll add them to that
otherwise i’ll probably do so when i’ve found out if the test actually slowed down recently

1 Like

ok, got around to filing that issue

2 Likes