Kicked off by this thread on “Quality control for packages” I would like to present my long-term vision/wishes for automation of package upgrades and QA.
I would especially like to hear what’s already planned from our great people working on build automation, but also from those who spend a lot of time reviewing/testing/merging PRs, and in general whether others share this vision.
Starting with a quote from the other thread:
r-ryantm are already very good steps into the right direction. I think we should all-in into even more automation.
This is how I think we can master the pull request flood in the long term:
nix-reviewbuild for all
r-ryantmbot upgrades, so that all dependent packages will be automatically rebuilt. Also for mass rebuidls. This should be against
master, even if the PR is against
stagingis often broken, and the
masterbuild will tell us most of what we need.
- Full automatic NixOS tests for them as well, and post results on the PR. Right now we run NixOS test on user-demand only.
Automatic merging of
r-ryantmbot point releases, e.g. 1.2.3 -> 1.2.4, after the above checks have passed.
- Add NixOS tests for a lot more apps (both CLI and desktop).
- Establish a evidence system for NixOS tests that says: “If this test passes / is accepted by a human, then the following packages X, Y, Z are probably working fine”. For example, we could say that if a test screenshotting Firefox showing various image file types and fonts looks OK (a human would make that statement on the PR), then
libjpeg, and all the font libraries Firefox depends on, are probably OK. Thus, the workflow would be that if a bot proposes an updateto
libpng, futher automation would say “hey humans, I have built the dependent Firefox and Evince against this, here’s the test outputs, if you say the screenshots of one of those look OK, I will auto-merge this PR”.
I believe this will allow us to remove most of the labour involved for upgrades, giving us more time to focus on the difficult stuff.
For security and performance, we should:
Add an explicit label
nonmalicious-checkedthat committers can apply to tell automatic infrastructure that the code involved is safe to be run. This is similar to the current
@grahamcofborg build, but would be more explicit, not conflating this assertion with the actual build command as we do now, and not only be for ofborg. Infrastructure should only rely on the commit that was labelled this way; force-pushes or new commits have to be re-labeled. This label would not put a judgment on whether the change is a good one to merge, just assert that it does not obviously contain code to abuse infrastructure.
Forbid PRs to be merged without this label. (This just makes current workflow explicit.)
Move to building all pre-merge stuff in throwaway VMs, so that we need no longer care about malicious code for the purpose of testing PRs.
As an optimisation: The outputs of the throwaway VMs should be kept a while. As soon as the corresponding change is labelled
nonmalicious-checked, those contents shall be unlocked to be shared with e.g. Hydra, so that we don’t have to build twice.
Allow people to more easily contribute build power based on throwaway VMs. Everybody should be able to run a daemon on their machines that hooks into an ofborg-like system and accepts jobs from it, and then post results to the PRs (edit for clarity: to provide evidence whether builds or tests break; untrusted community builder outputs would not be put on the official binary cache, but instead be re-built by trusted Hydra after merge; it’s the testing-while-iterating that’s expensive and needs community compute power, not the final Hydra build).
I suspect we would easily tenfold the currently available Hydra+ofborg resources this way, making it easy enough to test even every mass-rebuild PR against
master(as opposed to batching in
staging). I also expect it’ll make it easier for companies to contribute spare server power to this task, as it is often easier to do this than to get a company approve donations. Thinking this thought to an end, we could make it even easier to for companies to chime in by permitting a mini-plugs in the automated build results posting, for example the bot post could read “All NixOS tests passing. These build results were contributed by: AprilTree – all you need for devops”. (This approach allows companies to support orgs from the marketing budget, instead of a “charitable causes” budget which most companies don’t have.)
Finally, community-process-wise we should:
- Use sharded out sprints much more. They work so well. I was very impressed by how fast the migration of all Perl tests to Python was done, with ~265 tests being ported in record time by many contributors. We could do such things for changes like QT wrapping as well, thus allowing that the vast majority of packages could be moved to a new approach swiftly, and the new approach gets broad checks.
- Make a sprint to give each unmaintained package a maintainer.