I maintain a number of packages in Nixpkgs, perhaps most notably the CUDA and JAX ecosystems. As much as I love Nix and Nixpkgs, I do not think that I can afford to continue maintaining packages within the current development model. Something needs to change.
The goal of this post is not to place blame on any individual or commit. Rather, I hope we can engage in a frank, honest discussion of shortcomings with our overall development workflow thus far.
In Nixpkgs, anyone at any time can break any of your packages. Recently, the latest staging
, staging-next
, and python-updates
branches were merged into master
resulting in thousands of new commits breaking about half of all packages that I require in my day-to-day job. And I have no recourse other than pleading with commit authors to fix the breakages they have caused. These changes are made without any substantive form of CI (nixpkgs-review, etc) and oftentimes without even a peer-reviewed PR. By the time these breaking changes reach master
, they are bundled with hundreds of other commits (and breakages!) making bisections nearly impossible. How is a package maintainer expected to know when someone else breaks their build? And once they are made aware, how are they expected to sort through these hundreds of commits to debug?
The current system is simply not sustainable for downstream package maintainers, especially considering that these breaking changes come with no prior notice, no migration plan, and no alerting of failures. How is a package maintainer expected to reliably function in this environment?
I also sympathize with maintainers of widely-used packages, and those who work on massive, tree-wide changes. It’s incredibly stressful to understand all of the implications of your PRs. How are those maintainers expected to make changes with confidence?
We need a better system. IMHO a bare minimum is that we need a way to guarantee an “always green” status for at least some set of packages. I hope that this thread may kickstart a discussion.
I don’t claim to have any magical one-off solutions, but a few thoughts have come to mind:
-
Merge trains. Every large project beyond a certain size uses them. They are fundamental infra for Google, Facebook, the Rust project, and just about everyone else. See this video for an ok explainer. The current staging branch situation is basically like a poor man’s merge train anyhow: Everyone adds commits to a branch and then some human-run, error-prone process takes place to merge them all into
master
. Merge trains just automate this process and guarantee thatmaster
always stays green. What’s stopping us from running an opt-in merge train pilot program? - Is Nixpkgs too large? We have thousands upon thousands of packages, many of them of unclear maintenance status. As more and more packages are added and interconnected, the system becomes increasingly brittle and will eventually break down. Perhaps we need a federated model of some sort? Are flakes the answer?
Curious to hear others’ thoughts. I’m exhausted.