People have posted some very interesting thoughts and arguments here, I want to highlight a few to help form my own position.
Absolutely! Everyone, should you decide to leave, please pick someone to do your exit-interview with and post it here.
There is some evidence this is generally true of public projects. There is this classic analysis of Wikipedia author activity by Aaron Swartz (2006), but I remember having seen multiple studies supporting the notion.
What would you suggest to keep things going smoothly?
You totally nailed what I was subconsciously noticing for some time. Next to conceptual superiority this might be part of the secret sauce, and why I boldly predict that Nix and its ecosystem will continue to grow to the level of relevance of today’s Linux - despite all the historic cruft and the train wreck of a user interface. I still have hope for Nix not to become consultantware, and highly appreciate the work that goes into making it more accessible (such as options/package search, documentation, refactoring, wrappers, alternative user interfaces).
Strongly agree. I wouldn’t say that changes will get merged quickly, but everything considered it goes reasonably well.
I’ve read the idea multiple times now. The question is how this would empower maintainers, lower barriers to entry or motivate people to engage more.
I also think that more existing tests would relieve maintainers, since contributors could more immediately see if what they did actually works without performing any extra steps. Tangential: Can anyone explain to me why nixpkgs-review is not part of the default checks on GitHub?
Thank you for the hint. Would be great to see progress on this issue.
Funding: By now I’m convinced that bounties and especially recurrent funding would help with core tasks that are too laborious or unrewarding for the majority of volunteers. See above for community structure, see below for prior funding efforts.
Testing: We could clearly communicate that adding tests will improve turnaround on your pull requests. Finding the right place for that message would be the first step. Contributor guidelines? I doubt it would help to introduce a standard or formal process if there is not even an ad-hoc process.
I’m all for with nudging, but I have concerns with requirements (although I’m not against them), because they raise entry barriers. Given the above assumption on the volume of one-time or drive-by contributions and their significant value to the community (both in terms of content as well as the potential for newcomers to become regulars), we might want to keep friction as low as possible.
Where was attitude expressed that we don’t need tests? In context of this topic the question stands how adding more tests, or adding nudges, requirements or incentives to add tests, would impact contributor retention. From the comments it sounds like more automation is expected to reduce workload on maintainers and to tighten the loop for contributors, which are good things. There is a possible downside with putting up additional up-front requirements that either the review process becomes longer or the psychological threshold for even starting out becomes so high that it precludes potential contributions. Reviewing new tests also costs time.
How can we even measure how much agreement we have about testing?
What do you think can we do about this? I’m convinced this aspect of change management is highly important. How could we strike a balance between trying to make people write better commit messages and scaring them off with too much (what they may perceive as) overhead?
What do you think can we do to reduce the amount of (accumulated) breakage?
Yes!
Same here.
I would absolutely not want to see prompts for donations on every occasion. If there is something people want to happen, they can do it themselves or pay money. It just should be made trivial to find out that the possibility exists and how to do either.
My provisional conclusion is that more paid grunt work is desirable, and it should somehow be directed by the community.
I get it, it’s a fine line between effective advertisement and spam.
It can be done in a way that recognizes peoples freedoms and respects their choice.
I would suggest for everyone interested to read about Nudge theory - Wikipedia
and Libertarian paternalism - Wikipedia
People often need to see advertisement of options that are in their best interests, otherwise we tend to overlook those options
Nix/OS seems to have a pretty good testing framework for packages. May i suggest, the nix summer of tests!.
Or even a physical in person… you ever heard of a ‘test’ sprint now things seems to be opening up a little.
A lot of ‘work’ round here seems to be drudge work, nix needs janitors… not everyone is prepared to do that, the amazing developers I’ve worked along side always want to do the most interesting work, the best ones don’t really like and/or have real trouble writing excellent documentation that mere mortals can understand. Writing something and then ‘explaining something’ are very very different disciplines. Seldom do they know how to write interstella unit tests. Full Integration testing is usually left to someone else. It’s a reason why most large software projects have a dedicated ‘testing’ team.
Having spent quite a bit of time ‘testing’ software, i’ve got a pretty good handle of this stuff, and not only the technical, but how soft bits works too (humans!)
Because Nix/OS changes the fundamental way unix works, in fact i’d call it ‘unix compatible’, with high degree of patching, non-FHS etc etc etc… then i think testing is probably more crucial here then say other ‘distros’…where upstream usually have there own tests!. But remember, if you don’t test anything, nothing ever breaks, so you don’t have to fix anything! :-)… LOL
People have reasons for hanging around this project… contributions can bolster one’s CV , so that people can get a better paid job not doing drudge work. Avoiding being a code janitor, so they do not have to deal having a low paid ‘real’ janitor job, instead getting paid the big bucks to something the actually love and like to do.
Personally i like janitor type work… because it’s that keeps everything running, a mix between Hong Kong Fooey and Inspector Gadgets dog. It’s not glamorous, but someones gotta do it.
There seem to be quite a lot of people who are donating to NixOS through OpenCollective. Although there is a public ledger and I am sure that the NixOS Foundation spends the money well, it is not clear at the moment of donating how money is used.
For example, I can understand that macOS users want to see improved macOS support. The NixOS Foundation spent a bunch of money to buy Mac Minis for Hydra for M1 support. But personally, I do not see why I should spend time and money on improving macOS support. Apple is one of the richest companies in the world, why should I help them? I don’t even use a Mac anymore. [1]
I think asking for donations would be much more effective if there were ‘rallies’ around specific goals and funding targets. It makes it much more concrete what your money is used for and makes the results more tangible. Examples of goals could be Purchase a 64 core AMD Threadripper machine with 128GB RAM for running nixpkgs-review on PRs, Purchase 4 Apple M2 Mac Minis for Hydra, Employ someone for N hours per week for triaging CVEs. Obviously, there shouldn’t be too many goals, or donations get spread too thinly. But I would find such goals much more exciting and I would probably donate more if I knew my money would go to a Threadripper for building PRs or someone who triages CVEs, than generically donating to the NixOS foundation and waiting what happens next.
[1] Note, my point is not ‘Apple sucks, we shouldn’t improve Nix on macOS’, it’s that I don’t want to spend my time or money on it. I fully understand if others do.
I’m not sure these are a given? Auto-merge for straightforward updates would be nice in some sense, but the real value added is getting to shift human effort from “make sure it works” to other concerns that get less attention. Even if we still required a human approval for merge, I’d see it as a win if we could be shifting the focus of what that review should look at (skim the actual package changes for new or removed features/dependencies, suspicious commits/additions, etc.)
One thing that gives me pause with auto-merge is how efficiently it’d streamline supply-chain attacks. However, I also think it would be fairly trivial for the existing review process to miss a supply-chain attack. In both cases, it might be better to address that concern with some sort of ~rate-limiting focused on making sure a release has been out in the wild for a certain amount of time before it’s eligible for an auto-merge.
I think this is an important insight: work that you somehow feel “must happen” but that you don’t care about personally is perhaps the most stressful/unrewarding?
Of course there are many aspects, but minimizing this segment seems useful.
It wouldn’t, but it could be a building block in reducing the “must-happen-but-don’t-care” work for those with commit rights. If for some packages we can decide reviewing only needs to validate we’re not bringing in code from ‘malicious’ sources, and not that it actually works, that might reduce the burden on the group with commit rights somewhat. As another example we could have nixos-unstable advance even when there’s failures in ‘lower-tier’ packages, so someone who cares about nixos-unstable advancing isn’t forced to care about those packages building correctly and holding up the channel.
How that’d work exactly would need to be fleshed out further, of course, so perhaps it’s not so useful to discuss this further in abstract, and someone who cares about this (ha! see what I did there?) should come up with a specific proposal to discuss further?
Expression format is similar to other contributions
Adheres to guidelines for a given language / ecosystem
There should be an attempt to run tests
Highly impure tests can be disabled
InstallCheckPhase should be utilize that a trivial scenario works (e.g. <program> --version) works
(low priority) passthru.tests should be used to create more meaningful user tests / scenarios
(lower priority) some form of nixosTest if several packages / services interact with each other (e.g. nextcloud + postgres)
At least with python packages, we have done a pretty good job on emphasizing running the test suites, and using pythonImportsCheck to validate the python package builds, and is usable in at least the trivial case. Goes a long way to having a “the diff looks alright, so as long as the nixpkgs-review passes, it’s merge-able”.
There was an RFC which wanted to standardize the PR workflow, but I think it was just being too “rigid” and the effects covered a wide range of human oriented processes (e.g. reviewing, status updates, etc). And we could probably get an RFC passed by setting the bar lower, and just applying standardization around what a “nixpkgs package” should contain.
This may not actually disagree, but: I do think it’s still worth being careful about, depending on the scale and terms? If there’s enough funding to move the mountain on a given task, I think it’s fine. If the funding only pays one person to move 1/10th of the mountain, I imagine there’s a risk of demotivating anyone else who was picking at the mountain.
I’m not sure. This sounds very transactional. Transactional is fine if (once?) it actually works transactionally (i.e., sufficient coverage to unlock automatic merges). But if people invest time expecting it to help and don’t see any obvious impact, it may be easy to learn the wrong lesson.
Shorter PR cycles for contributors are worth working towards, but I imagine a big value of testing is that it may give individuals a concrete, actionable way to address their own perpetual anxieties around updates in a way that we all benefit from?
Mmm. I think you might be over-interpreting how rigidly I mean standard and process, here. Perhaps I could be using better terms. I’ll unpack what I mean a bit in case it helps:
I don’t mean to suggest someone in the ivory tower writes the rules before there’s real practice to inform standards.
But, I do mean to suggest that it would be easy to waste time and sow confusion if we added such an attribute without providing what/how guidance and then let people dole out the auto-merge attribute based on their own idiomatic preferences for a year before coming to terms with the patterns.
By “a standard for deciding when packages are well-tested enough to grant this attribute”, I mean: there’s a clear, documented, discoverable, living explanation of the goal (with examples), and everyone is roughly on the same page about what it is.
Having something documented makes it clearer that changes to the standard are the appropriate venue for value conflicts over it.
There will still be misunderstandings and rogue actors, but most people writing and evaluating tests in good faith can use a clear standard to find agreement on individual PRs instead of devolving into corrosive conflicts over idiomatic values.
By “a process for deciding whether a package is sufficiently tested to meet the standard”, I mean we have a statement (again: clear, documented, discoverable, living) that helps answer obvious questions. A good place to start: Is it acceptable for an individual committer to decide by themselves whether or not to grant the attribute and do so via close/merge without explaining their decision?
I have mixed thoughts, here:
I didn’t go into much detail, but I don’t see “additional levers” as near-term things–more like steps to take after the low-hanging fruit is picked (common packages done, examples are abundant, the concept has demonstrated its value, the goals and process are dialed in, everyone knows the ropes, it’s simple enough for first-time contributors, and there’s spare capacity).
I agree that it’s bad to make people crawl through glass to contribute. In this context I’ll say friction is resistance that doesn’t add value. Competent but busy people will have better things to do.
But, I think it can be good to add reasonable requirements that both add value to the ecosystem and help filter the contributor pool towards the kind of person you want as a regular contributor.
I have a background in literary writing/editing. Literary publications that read unsolicited submissions with no reading fee often have a list of fairly-precise submission guidelines. Many of these guidelines are ~for helping the staff run an efficient editorial pipeline.
Some publications may politely ignore violations of their guidelines when they like they like a specific submission (especially if they get few submissions). Others aggressively toss out any that violate the guidelines.
Writers who can’t follow the submission guidelines might submit brilliant work, but they are also sending a weak signal that, on balance, they are more likely to waste scarce time and resources than those who follow them.
A more relatable version may be putting nitpicky response requirements in Craigslist ads and ignoring messages that don’t follow them (even if they pass your Turing smell-test).
Okay, how to bootstrap it then? I literally thought, a well-placed line in the contributing guide would be a good start, and then see from there.
Thanks for the clarification.
I implied exactly that, and your publication example illustrates the trade-off well. Now jonringer already put his idea of requirements clearly enough. Why don’t we just copy-paste that into an RFC? We should more often ask the people involved directly.
Okay, how to bootstrap it then? I literally thought, a well-placed line in the contributing guide would be a good start, and then see from there.
Hm. Maybe allowing maintainers to set meta.autoMergeOnTests or something, and then defining at least the version bumps as safe-from-malice could be useful for those who opt in even before we get a lot of coverage?
(Some cooldown for is-this-package-in-the-news might still apply, of course)
I guess we would need to drop the test code quality expectations as low as possible as long as they pass…
This probably doesn’t help much with packages that change enough that tests need to be updated all the time, as there test changes need to be reviewed for maintaining the intent (unless the change is by the only maintainer or OK’d by another maintainer, but still).
With the caveat that I’m unsure how you mean bootstrap here, I think it makes sense to start with some seasoned committers/maintainers and a few of the packages that have been rolling over the fastest recently (past year?)
here's my swing at filtering something out of the commit log:
This might answer ~MVP questions like whether auto-merge is actually feasible, how much real-world value there is here, and what kind of tests deliver that value, within a few months?
At this point in discussion, let’s assume I might be able to free a varying number of hours between 2 & 5 per week, to dedicate towards resolving some of the issues we publicly ponder here, and all I need is somebody to tell me what to do and legitimize and authorize my work (so I can have sufficient confidence it will be effectivly spent hours and not in vain).
Who in this thread, taking into account ± my skillset can tell me what to do and defend it in front of the comunity so that I could focus calmly and confidently on doing it?
Please go ahead, if it’s moderately meaningful and would earn me some , I might consider engaging.
You must be able to somehow ensure its meaningful and effective, though.
This is an offer & an experiment if the community is able to deliver those baseline guarantees of engagement.
Before we retain contributors, we need to capture them first
I think the documentation needs two types of work:
More attention to what’s in the existing structure. IMO, this is going very well, but slowly, and it’s always in danger of falling behind as the ecosystem grows (e.g. with flakes), so the more work it gets the better. (And FWIW I think it could also be fun work.)
Ultimately, it seems to me that we will need someone employed at least half time to go through the whole thing, including everything that’s in all the manuals and the wiki and various people’s blogs and gists and nix pills and so on, and make a coherent structure. (I think the manuals are great, and coherent, but very incomplete, and it’s not at all obvious how to make them complete.) I predict that this paid job will materialise one day, either by this community finding the money or by some big company forking NixOS. Now, personally (although I am not an expert), I think that this big organising job should be done all at once rather than piecemeal.
If the counterparty to my proposed contract can furthermore commit to ensure positive and self-enhancing social and environmental feedback loops, we are not bound to this at all: the possibilities are endless.
I guess all that’s missing now is that counterparty. Gentlethem, …
I think the PR does a good job of addressing “how you’d write a test from scratch.”
It occurred to me while reading through it that the other half, “what would be meaningful”, is unactionably vague. I’m not really sure how to bundle up what I think that includes, so instead I’ve just compiled a list of things/questions I’d be happy to learn from documentation on “what would be meaningful” (numbered for reference but in no particular order):
Which packages are generally agreed to be a paragon for how test similar packages?
When/where do these tests run and what happens when they pass/fail?
What do we hope the time invested in them enables us to accomplish?
What’s missing between here and there? (i.e., is it just about incrementally filling in new tests? Do we need to review and raise the quality of existing tests? Will tests under-deliver on these goals until some tool/abstraction/infra gaps get filled in?)
Where should we focus energy to this end? (Help with or wait for toolchain improvements? Expand minimal testing to more packages? Improve the depth/coverage where packages already have tests? Review the quality of existing tests?)
With respect to logistics/norms, how should people approach the task? (Is it okay to PR a single new test for a single package? Add conceptually-similar tests to multiple packages with one PR? Only add/touch tests when you’re otherwise modifying a package?)
What do we want to be testing?
Are there (kinds of?) packages we shouldn’t bother writing this kind of test for? (how should we recognize them?)
Under what circumstances do we want tests to fail?
What kinds of functionality do we want to test (and how should we prioritize?)
What kinds of functionality should we avoid testing?
Where is the line between too-trivial and a minimum-meaningful-test?
What kinds of test, if any, should we see as essential (or at least high RoI) for all packages of a given type? (prioritized if possible)
How should we evaluate whether a package’s tests (as a whole, or individually) take too long to run?
How should we see the value of adding tests for package functionality that is already being exercised by other packages? (ex: adding a test to ensure a library is usable when it’s already used by dozens of active packages)
Are there common anti-patterns in use that I should be cautious to avoid picking up, look for in reviews, and fix if I encounter them?
How should we weight the value of testing only the local package vs multi-package synthesis?
What ~code-quality standards should apply? (mostly thinking about comment/naming standards)
From a drive-by contributor’s point of view, the contribution guidelines are already very lengthy and hard to digest. They jump right into a fairly dense level of technical detail, and so does the document on GitHub. Your list is technically reasonable, but scares me off just looking at it, if I imagine I have to consider all of that to start out.
That is to say, most of the requirements are self-evident if you care about software-development, and I do try to be considerate. The question here is about effectiveness and efficiency of communication. This is one of the dials to calibrate the contributor pool.
As a first-time reader I’d like to know what the high-level requirements or norms of judgment are, and only maintainers can answer that. I’d be happy to draft an introduction to the guidelines once there is enough meaningful information for me to write down.