As of the last 2 builds on nixos-unstable (e.g. 942eb9a3) my nixos-rebuild switch --upgrade has tried to build kicad-unstable from source. I killed it the first time around, for lack of patience and on the hope that it might be some odd transient cache-miss issue. This time, well, it’s still going but I have more patience
I’m not sure what changed, I’ve been getting cached builds of this all along until now. I don’t see anything suspicious in recent commits. Only another version bump, except that this one is somewhat different, because it’s the first rc for 6.0
I’m still trying to find out a good definition of a ‘channel’ vs a commit onto master/unstable , so there might be ‘somewhere else’ to check for ‘builds on a channel’.
hydra-check seems to report quite old builds… but that might be a configuration problem at my end.
hydra and channels work in mysterious ways. maybe it’s to do with loading the finished artefacts into S3 buckets, or the mirror that fastly does? or maybe it’s nothing at all.
this just poped into my head, i wonder if it’s do with negative caching on your nix client.
there is a local client side cache , that caches negative fetches for paths that don’t exist on cache.nixos.org, presumably to keep requests and load level down on the remote cache servers/infrastructure.
the full kicad package(s) include about 6gb of 3D models
i don’t want to cache those on hydra as they don’t really involve any build effort
and i want to offer a package without the 3D models (as they’re 6gb and before i cleaned up the package, the default package was without them)
And now nixos-unstable points to commit 6daa4a5c045d, for which a cached build of kicad-unstable-small again does not exist on cache.nixos.org. However, Hydra - nixos:trunk-combined:tested shows two more recent builds for that job (commits: a78dd785b29 — failed, c71f061c68b — succeeded, but the channel update is still waiting for something), and a cached kicad-unstable-small exists for both of them.
Looking at Hydra - nixos:trunk-combined:nixpkgs.kicad-unstable-small.x86_64-linux, I see that build 159887280 (which corresponds to nixpkgs commit 8a308775) succeeded, then build 159999294 (nixpkgs commit 6daa4a5c045d) failed (but then that commit happened to be published in the nixos-unstable channel), then two subsequent builds (for a78dd785b29 and c71f061c68b) succeeded. There are some more failed builds there, all for the most recent kicad-unstable-small-9fb05440b3; build logs show segfaults and aborts due to memory corruption detected by malloc().
The previous snapshot of kicad-unstable-small apparently did not have such random build errors. So it looks like that 6.0.0-rc1 version is not really as stable as someone may hope.
Yeah. Part of this for me was learning that there’s less coupling between the channel updates and the more general cache. In digging around, it seems the channel update only builds a subset of things that are invoked in the tests, and other builds are scheduled independently. It makes perfect sense, but before really thinking about it, I has kind of assumed that the channel was used to build the entire tree - even if the tests didn’t cover everything or consider all package failures as critical.
So my original question was somewhat based on a false premise, this was just the first case I hit of a non-essential package that failed to build in a published channel head, where I happened to use both that package and that version.
The 6daa4a5c version is a repeatable failure related to paths and needing an update to nixos packaging. But the others, in particular the previous one from the original post, is a test problem that’s not repeatable - my local builds and tests succeeded, and it seems to be failing in some runs and succeeding in the next.
I suspect this is a resource contention problem in the build and test system - maybe extra workload or memory shortage because there are more release channels being built right now, or something.
In general, kicad has been working up to this release all year and has seen very few problems as lots of small bugs get squashed.
Apparently there is still a dependency in the sense that rebuilds for all packages must finish, but any of those rebuilds may fail — and unless some of those packages which failed to build were required by some tests, the channel update will still be published.
8/8 Test #7: qa_pcbnew ........................***Exception: SegFault 17.30 sec
… some fallout due to intentionally broken $HOME …
malloc(): unaligned tcache chunk detected
Other logs have SIGABRT and the same malloc(): unaligned tcache chunk detected error; this really looks like some memory corruption.
And I also had rebuilt the same 76pwy4ap1c7c0a581bpgyzzs6q28g9rm-kicad-base-9fb05440b3.drv derivation locally (when attempting to install kicad-unstable-small from the nixos-unstable channel), and did not see the build error, so there is definitely some reproducibility issue there.
And on a 5th run of nix build -v --rebuild /nix/store/76pwy4ap1c7c0a581bpgyzzs6q28g9rm-kicad-base-9fb05440b3.drv I finally got:
checking outputs of '/nix/store/76pwy4ap1c7c0a581bpgyzzs6q28g9rm-kicad-base-9fb05440b3.drv'...
error: builder for '/nix/store/76pwy4ap1c7c0a581bpgyzzs6q28g9rm-kicad-base-9fb05440b3.drv' failed with exit code 2;
last 10 log lines:
>
>
> 88% tests passed, 1 tests failed out of 8
>
> Total Test time (real) = 13.21 sec
>
> The following tests FAILED:
> 7 - qa_pcbnew (Subprocess aborted)
> Errors while running CTest
> make: *** [Makefile:105: test] Error 8
For full logs, run 'nix log /nix/store/76pwy4ap1c7c0a581bpgyzzs6q28g9rm-kicad-base-9fb05440b3.drv'.
Full log at https://pastebin.com/SwsDCuCp (apparently it contains terminal escape sequences, and the default view simply drops ESC characters). Looks like another case of SIGABRT (similar to other build errors on Hydra, but different from the “Segmentation Fault” error for the exact same derivation there).
i seem to recall having the last test (qa_pcbnew) to hang for me once
i haven’t had an outright build failure though
i’ve got the impression that test started to take longer at some point
and am now bisecting the KiCad source to investigate that
this is heading towards opening an issue on KiCad
feel free to do that yourself, if i come up with any findings i’ll add them to that
otherwise i’ll probably do so when i’ve found out if the test actually slowed down recently