Kubernetes broken on master, a recurring event

Hi guys, I’m a bit frustrated with the current state of the Kubernetes
on master. I’ve tried it today and the test don’t complete
successfully, so while I thank you all so much all how you for your
effort in bringing kubernetes to NixOS and while I understand the
necessity to update the packages and the modules to improve its
support, I’m wondering why this state of things is such recurring.

So I wanted to ask you:

  1. Am I the only one to have non working tests? I you try them right
    now, do they work for you?

  2. Why hydra builds don’t fail on them? (I must admit I have some
    problems understanding what’s happening on it)

  3. Can we coordinate the effort using branches so that master branch
    is kept functional? I see that someone (like me) submits PRs while
    someone other commits directly. Is it insane to have them try the
    tests before pushing?

  4. If we need it ( and I think we do), can we agree on some common
    goal, either using this forum and creating a kubernetes
    subsection in it or using some other kind of tool?

Thanks for reading and again thank you all for your effort.

1 Like

Correct me if I’m wrong, but I think this is because we are only running the nixos-unstable-small tests on hydra?

I don’t see a tests jobset for the unstable channel. only for the unstable-small channel.


And the unstable-small channel only seems to run a very small subset of the nixos tests Hydra - Jobset nixos:unstable-small

I think (but am not sure) this is because running all the tests for all commits on master is just too expensive.
This is why we fork of two times a year (03 and 09) and run all tests to create a release.

I would suggest staying on the stable channels for this kind of stuff, honestly. That’s what they are there for. But preferably I’d see less broken things on master too.

I don’t think they run for every commit, not even for every cumulative push to the master.
But when a PR is opened, the bot runs the tests on it… or not?

There’s a new version of Kubernetes every 4 months and while it isn’t always needed to use the latest release, sometime it is. I think that the purpose of having tests is that of avoiding commits that break things, mostly. If tests are too expensive, lets agree on a policy where the committer runs the tests (only the kubernetes tests) before pushing.

Feature branches and PRs help greatly in maintaining the master clean of breakages and to give visibility to one’s work

nixos-unstable is https://hydra.nixos.org/job/nixos/trunk-combined/tested, see https://howoldis.herokuapp.com/ and also this papercut:

1 Like

What Hydra does is that it builds these tests periodically on master:


and when all succeed, it advances nixos-unstable channel to the commit the jobset was run on.

Similar thing is done for other channel with different set of tests:

nixos-xx.yy channels run on release-xx.yy branch instead of master.

@GrahamcOfBorg currently does not run any NixOS tests unless asked in a comment. See also https://github.com/NixOS/ofborg/issues/368

Kubernetes are not part of any of the jobsets so channels advance even when that test is broken. We do not add non-critical software to the jobsets in order not to block progress, especially when the software does not have a proper maintenance in nixpkgs (as evident from the commonly broken tests).

People are expected to run the tests before pushing a commit or opening a pull request but sometimes unrelated change breaks it and the breakage is not discovered for a while, especially for less used software.

TLDR: various parts of nixpkgs have different levels of support and we do not want to block everything for less supported software most people do not care about.

is this an agreed upon policy? It seems vaporware… can’t we do better, with little effort (I’m talking to all those that push changes to kubernetes stuff)? The upstream project (kubernetes) alone is big enough and it takes some effort to get to know it to some level, if summed to that, we have to worry to have a frequently broken configuration codebase, it becomes very frustrating, but it’s sad, in way…

thanks @jtojnar to looking this up and making it clear

See NixOS - Nixpkgs 21.05 manual. It is also mentioned in the PR template.

Looking at the last few kubernetes changes, the tests were ran:

Maybe it is big enough upstream but not many nixpkgs users seem to care about it so it receives attention according to that. @johanot seems to be the only one who cares about it and they seem to be doing a good job.

One possible improvement would be adding passthru.tests (example) so that @r-ryantm bot which often updates the package can run the tests automatically and not update when the tests fail.

Another possible improvement could be fixing the e2e test to increase the certainty update did not break anything:


Yeah, channels need to find compromise between frequent advances with some software occasionally broken, and everything working with long stabilization periods and very infrequent releases. I feel like we have found the optimum for the popular software as maintainers usually fix the bugs very fast but for less popular software, which has fewer maintainers, the mean time to repair will unfortunately be longer.

As I have written in my first post, have you tried running them yourself? It’s possible that it’s an issue with my system…

I did not try to run it since Kubernetes is huge and I have some networking issue at the moment.

Hi, I’m new here. I’m wondering what the bottleneck is to running more tests, more often. How long would it take to build and test every package in nixpkgs? If that is “too long,” how long would it take to build all the packages that cross some popularity threshold? Certainly a set of the most popular packages could be built and tested upon every commit, or at least daily.

After some basic research it seems that these builds are generally run on a collection of around 75 machines at TU Delft. I’m unfamiliar with the commit frequency of nixpkgs but nothing about this strikes me as untenable, especially with cached build artifacts and/or an incremental build/test strategy.

Perhaps some parts of what I’m imagining are underdeveloped, if so, what are they and perhaps I may help?