Expression format is similar to other contributions
Adheres to guidelines for a given language / ecosystem
There should be an attempt to run tests
Highly impure tests can be disabled
InstallCheckPhase should be utilize that a trivial scenario works (e.g. <program> --version) works
(low priority) passthru.tests should be used to create more meaningful user tests / scenarios
(lower priority) some form of nixosTest if several packages / services interact with each other (e.g. nextcloud + postgres)
At least with python packages, we have done a pretty good job on emphasizing running the test suites, and using pythonImportsCheck to validate the python package builds, and is usable in at least the trivial case. Goes a long way to having a “the diff looks alright, so as long as the nixpkgs-review passes, it’s merge-able”.
There was an RFC which wanted to standardize the PR workflow, but I think it was just being too “rigid” and the effects covered a wide range of human oriented processes (e.g. reviewing, status updates, etc). And we could probably get an RFC passed by setting the bar lower, and just applying standardization around what a “nixpkgs package” should contain.
This may not actually disagree, but: I do think it’s still worth being careful about, depending on the scale and terms? If there’s enough funding to move the mountain on a given task, I think it’s fine. If the funding only pays one person to move 1/10th of the mountain, I imagine there’s a risk of demotivating anyone else who was picking at the mountain.
I’m not sure. This sounds very transactional. Transactional is fine if (once?) it actually works transactionally (i.e., sufficient coverage to unlock automatic merges). But if people invest time expecting it to help and don’t see any obvious impact, it may be easy to learn the wrong lesson.
Shorter PR cycles for contributors are worth working towards, but I imagine a big value of testing is that it may give individuals a concrete, actionable way to address their own perpetual anxieties around updates in a way that we all benefit from?
Mmm. I think you might be over-interpreting how rigidly I mean standard and process, here. Perhaps I could be using better terms. I’ll unpack what I mean a bit in case it helps:
I don’t mean to suggest someone in the ivory tower writes the rules before there’s real practice to inform standards.
But, I do mean to suggest that it would be easy to waste time and sow confusion if we added such an attribute without providing what/how guidance and then let people dole out the auto-merge attribute based on their own idiomatic preferences for a year before coming to terms with the patterns.
By “a standard for deciding when packages are well-tested enough to grant this attribute”, I mean: there’s a clear, documented, discoverable, living explanation of the goal (with examples), and everyone is roughly on the same page about what it is.
Having something documented makes it clearer that changes to the standard are the appropriate venue for value conflicts over it.
There will still be misunderstandings and rogue actors, but most people writing and evaluating tests in good faith can use a clear standard to find agreement on individual PRs instead of devolving into corrosive conflicts over idiomatic values.
By “a process for deciding whether a package is sufficiently tested to meet the standard”, I mean we have a statement (again: clear, documented, discoverable, living) that helps answer obvious questions. A good place to start: Is it acceptable for an individual committer to decide by themselves whether or not to grant the attribute and do so via close/merge without explaining their decision?
I have mixed thoughts, here:
I didn’t go into much detail, but I don’t see “additional levers” as near-term things–more like steps to take after the low-hanging fruit is picked (common packages done, examples are abundant, the concept has demonstrated its value, the goals and process are dialed in, everyone knows the ropes, it’s simple enough for first-time contributors, and there’s spare capacity).
I agree that it’s bad to make people crawl through glass to contribute. In this context I’ll say friction is resistance that doesn’t add value. Competent but busy people will have better things to do.
But, I think it can be good to add reasonable requirements that both add value to the ecosystem and help filter the contributor pool towards the kind of person you want as a regular contributor.
I have a background in literary writing/editing. Literary publications that read unsolicited submissions with no reading fee often have a list of fairly-precise submission guidelines. Many of these guidelines are ~for helping the staff run an efficient editorial pipeline.
Some publications may politely ignore violations of their guidelines when they like they like a specific submission (especially if they get few submissions). Others aggressively toss out any that violate the guidelines.
Writers who can’t follow the submission guidelines might submit brilliant work, but they are also sending a weak signal that, on balance, they are more likely to waste scarce time and resources than those who follow them.
A more relatable version may be putting nitpicky response requirements in Craigslist ads and ignoring messages that don’t follow them (even if they pass your Turing smell-test).
Okay, how to bootstrap it then? I literally thought, a well-placed line in the contributing guide would be a good start, and then see from there.
Thanks for the clarification.
I implied exactly that, and your publication example illustrates the trade-off well. Now jonringer already put his idea of requirements clearly enough. Why don’t we just copy-paste that into an RFC? We should more often ask the people involved directly.
Okay, how to bootstrap it then? I literally thought, a well-placed line in the contributing guide would be a good start, and then see from there.
Hm. Maybe allowing maintainers to set meta.autoMergeOnTests or something, and then defining at least the version bumps as safe-from-malice could be useful for those who opt in even before we get a lot of coverage?
(Some cooldown for is-this-package-in-the-news might still apply, of course)
I guess we would need to drop the test code quality expectations as low as possible as long as they pass…
This probably doesn’t help much with packages that change enough that tests need to be updated all the time, as there test changes need to be reviewed for maintaining the intent (unless the change is by the only maintainer or OK’d by another maintainer, but still).
With the caveat that I’m unsure how you mean bootstrap here, I think it makes sense to start with some seasoned committers/maintainers and a few of the packages that have been rolling over the fastest recently (past year?)
here's my swing at filtering something out of the commit log:
This might answer ~MVP questions like whether auto-merge is actually feasible, how much real-world value there is here, and what kind of tests deliver that value, within a few months?
At this point in discussion, let’s assume I might be able to free a varying number of hours between 2 & 5 per week, to dedicate towards resolving some of the issues we publicly ponder here, and all I need is somebody to tell me what to do and legitimize and authorize my work (so I can have sufficient confidence it will be effectivly spent hours and not in vain).
Who in this thread, taking into account ± my skillset can tell me what to do and defend it in front of the comunity so that I could focus calmly and confidently on doing it?
Please go ahead, if it’s moderately meaningful and would earn me some , I might consider engaging.
You must be able to somehow ensure its meaningful and effective, though.
This is an offer & an experiment if the community is able to deliver those baseline guarantees of engagement.
Before we retain contributors, we need to capture them first
I think the documentation needs two types of work:
More attention to what’s in the existing structure. IMO, this is going very well, but slowly, and it’s always in danger of falling behind as the ecosystem grows (e.g. with flakes), so the more work it gets the better. (And FWIW I think it could also be fun work.)
Ultimately, it seems to me that we will need someone employed at least half time to go through the whole thing, including everything that’s in all the manuals and the wiki and various people’s blogs and gists and nix pills and so on, and make a coherent structure. (I think the manuals are great, and coherent, but very incomplete, and it’s not at all obvious how to make them complete.) I predict that this paid job will materialise one day, either by this community finding the money or by some big company forking NixOS. Now, personally (although I am not an expert), I think that this big organising job should be done all at once rather than piecemeal.
If the counterparty to my proposed contract can furthermore commit to ensure positive and self-enhancing social and environmental feedback loops, we are not bound to this at all: the possibilities are endless.
I guess all that’s missing now is that counterparty. Gentlethem, …
I think the PR does a good job of addressing “how you’d write a test from scratch.”
It occurred to me while reading through it that the other half, “what would be meaningful”, is unactionably vague. I’m not really sure how to bundle up what I think that includes, so instead I’ve just compiled a list of things/questions I’d be happy to learn from documentation on “what would be meaningful” (numbered for reference but in no particular order):
Which packages are generally agreed to be a paragon for how test similar packages?
When/where do these tests run and what happens when they pass/fail?
What do we hope the time invested in them enables us to accomplish?
What’s missing between here and there? (i.e., is it just about incrementally filling in new tests? Do we need to review and raise the quality of existing tests? Will tests under-deliver on these goals until some tool/abstraction/infra gaps get filled in?)
Where should we focus energy to this end? (Help with or wait for toolchain improvements? Expand minimal testing to more packages? Improve the depth/coverage where packages already have tests? Review the quality of existing tests?)
With respect to logistics/norms, how should people approach the task? (Is it okay to PR a single new test for a single package? Add conceptually-similar tests to multiple packages with one PR? Only add/touch tests when you’re otherwise modifying a package?)
What do we want to be testing?
Are there (kinds of?) packages we shouldn’t bother writing this kind of test for? (how should we recognize them?)
Under what circumstances do we want tests to fail?
What kinds of functionality do we want to test (and how should we prioritize?)
What kinds of functionality should we avoid testing?
Where is the line between too-trivial and a minimum-meaningful-test?
What kinds of test, if any, should we see as essential (or at least high RoI) for all packages of a given type? (prioritized if possible)
How should we evaluate whether a package’s tests (as a whole, or individually) take too long to run?
How should we see the value of adding tests for package functionality that is already being exercised by other packages? (ex: adding a test to ensure a library is usable when it’s already used by dozens of active packages)
Are there common anti-patterns in use that I should be cautious to avoid picking up, look for in reviews, and fix if I encounter them?
How should we weight the value of testing only the local package vs multi-package synthesis?
What ~code-quality standards should apply? (mostly thinking about comment/naming standards)
From a drive-by contributor’s point of view, the contribution guidelines are already very lengthy and hard to digest. They jump right into a fairly dense level of technical detail, and so does the document on GitHub. Your list is technically reasonable, but scares me off just looking at it, if I imagine I have to consider all of that to start out.
That is to say, most of the requirements are self-evident if you care about software-development, and I do try to be considerate. The question here is about effectiveness and efficiency of communication. This is one of the dials to calibrate the contributor pool.
As a first-time reader I’d like to know what the high-level requirements or norms of judgment are, and only maintainers can answer that. I’d be happy to draft an introduction to the guidelines once there is enough meaningful information for me to write down.
with a decent and well organized set of commands (not more than X)
and also some templates per language / framework
so that I can do something around the lines of nix-tmpl goModule > folder/default.nix
and of course other shift-left amenities such as nixpkgs-review
Regulars within every package subsystem would be encouraged to keep their templates up to date according to latest best practices, so they can also save on chore tasks / reviews.
“Lead by example” is currently difficult, because the codebase is not homogenous enough. With a template, precedence setting becomes deliberate, manageable & intuitive.
Such templates would have positive impact on projects like divnix/devos (which I just happen to be familiar with), too who very heavily encourage upstreaming. Hence, an independent NixOS/pkgs-template, might be worth its institution.
Having more automation around contributions is always good, and you’re free (and if anyone cares, I strongly encourage you) to build whatever you think makes sense and show it around. Nonetheless, the guidelines are central to get across the expected mindset. You cannot template and automate everything, especially things like communicating reasons for doing X and not Y.
As I said here (and elaborated a little in the PR thread), I think the PR takes a big bite out of documenting “how you’d write a test from scratch.” Side-by-side, “what would be meaningful” is not a question to expect drive-by contributors to have a handle on, and “how you’d write a test from scratch” might be.
Getting a handle on “what would be meaningful” seems like a good first step toward clarifying whether there are straightforward+meaningful tests we should prescribe to new contributors in the contributing section, or if writing meaningful tests just entails more perspective than we can expect.
$ 🔨 Welcome to nixpkgs
[folder tree checks]
evalnix - Check Nix parsing
fmt - Check Nix formatting
[general commands]
menu - prints this menu
[nixpkgs]$
Since there was no feedback so far, I’ll reiterate that for moving consensually forward, for example to include more conveniences or to implement templates I’ll need (at least some) support from the regulars around here. So please step up (or ping them, if you know somebody who could help out).
Please @all ensure, that this doesn’t die off into indecisioning*.
* ad hoc word creation to denominate the practice of stalling things due to lack of authority or decision making
How this is related: to retain contributors, we must first efficiently capture and develop them into retainables.
First of all, who is capable in moving the contributing section to markdown? I don’t think it is a reasonable expectation to improve it (significantly) when it’s still in xml. For once, the target audience (new contributors) will tend to run away as far as they possibly could from xml.
It does not mean that they stopped using NixOS on their laptops and in production (I can talk about myself here and some of the repositories I’ve seen). I see the problem in following NixOS release cycle is too expensive in terms of supporting existing deployments. Too many things are declared deprecated or refactored for no purpose. Contributing to master is fun when one has no own legacy. Those who could be “success stories” of NixOS adoption go to shadow.