Decision Making 2.0! (1.0.1?)

What I see: I see PRs stalling. I see discussions circeling. Overall, I see very inefficient decision making processes. I see an overall situation that quite heavily chews on our all resources and prevents us from unlocking a (still) hidden wealth of progress, impulse and momentum.

Where I stand: This situation frustrates (not only) me. As a guy who likes fixing things at the root, it keeps me awake thinking about how to fix this (at the root). It would feel really incredible if we’d become a kick-ass efficient rock-n-roll community that has an easy time to completely take over the domain of system administration with skill and wit — in style.

What I want: I want you to join me in a grass roots way and try to improve the situation about decision making a little bit with your own interactions in the day-to-day works. Please do whatever you deem pertienent, however, I’ll suggest to apply and educate the two most fundamental conceptual guardrails of decision making: pareto-improvement vs trade-off.

I’d love to see PRs more clearly labeld as to which decision category they belong and help others to tell them apart.

A pareto-improvement is a new situation where some agents will gain and no agent will loose. The initial situation is called “pareto-dominated”.

A trade-off-improvememt is a new situation where some agents will gain and some agents will loose, and the net gain is (deemed) positive. The initial situation is called “trade-off-dominated”.

Why it’s important: A certain meta understanding of a particular argument helps all parties involed to develop that argument more productively. I also hope, that well identified pareto-improvements will get a better chance merging. A better chance at merging attributed to a clearly observable criterion (labeled “pareto-improvement”) incentivizes to autor pareto-dominated patches. So my assumption goes.


Note: this is a practical and immediately applicable spin-off motion carved out of the community manifesto draft. It is a normative meta post for all intents and purposes that seeks to inspire mind-share.

8 Likes

I have to admit that I have trouble following this post, what are you concretely asking?

The solution to avoid stale PRs is to go out and start reviewing and encourage other people to do so. :wink: If you feel certain labels are missing, open a PR in the ofborg repo when applicable) or write a label bot and showcase it on your nixpkgs fork.

Anyone reading this who is not an active nixpkgs reviewer: please consider reviewing PRs. Your work will be very much appreciated by the community. If you have reviewed a PR, but cannot merge it and it’s stuck, use the PRs already reviewed topic. Also feel free to ping me (@danieldk) in the PR, as long as it is not anything cryptocurrency-related.

2 Likes

@blaggacao asked privately to clarify my post. So, here it goes:

I just have trouble understanding the post and what the goal is. Maybe it helps if I formulate in one sentence what I think the post proposes:

You propose that two additional labels are added for PRs, namely: pareto-dominated and trade-off-dominated.

Could you confirm if that is correct?

If it is, then:

  • Who adds these labels? The PR submitter? Reviewers? Random passer-by?

  • pareto-dominated and trade-off-dominated may be clear to people who are familiar management literature. But to those who aren’t it doesn’t mean much. I think it would benefit the discussion if there is a concrete proposal of labels that can also be understood by ‘outsiders’.

  • It is unclear to me how you would decide the correct label. Let’s take the simple example of a version bump. At face value you’d think that is what you call pareto-dominated. But maybe the version update makes incompatible configuration changes and there is a user who has to spend an hour to update their configuration (e.g. how my old tmux configuration files do not work with current versions anymore). Or the new version introduces a yet-undiscovered bug that negatively affects users but not the PR submitter. Perhaps the only cases that are perato-dominated are pure bug fixes, where no-one is dependent on buggy behavior.

Assuming that I understood your proposal correctly, I agree that in theory classifying whether a PR is a pure win or a net win would be great. However, I don’t see how you would operationalize or measure this.

6 Likes

For what it is worth, most PRs get merged pretty efficiently. Over the last several months we’ve had a fairly low amount of growth of open PRs: Grafana

I agree there is not always efficient decision making, but I think we’re doing fairly okay. The RFC process exists and works to this end.

18 Likes

It may be nice to have standardized language for this. But I’m not convinced it would address any root issues. The ability to correctly make this assessment is essentially a PR review. There are few cases that are trade off dominated, and a PR may languish - but at that point the “label” is clear and the blocker is something else.

In other words, I don’t see a difference in practice.

3 Likes

I guess the aim is to have a standard language to discuss whether it is time to stop the nitpicking and merge whatever thre is now as its being an improvement on the current state it not objectionable.

2 Likes

I’d want to add the following:

Please note that I boldly title “2.0” only to retract immediatly to a “1.0.1”.

This is an idea of common language, basically a suggestion to add (on a voluntary basis) a new glossary entry in our all vocabulary (@danieldk I intend this answers your question. Please PM me if not.)

This is small enough of a “change” to only count as “patch release”.

But hey, common glossaries are a pre-requisit for group cohesion and efficient communication. So this might only be a “1.0.1”, but it might also be the beginning of a productive series of patches to our (non-RFC) decision making processes.

Cheers :sparkles:


We probably are. Or are we? This is a matter of measurement. I think we can do ourselves a favor by keeping strech goals handy and plot ourselves against an imaginary ideal solution.

There is ex servicemen among the participants (me included), maybe also some people with more decision making theory background in their professions.

I have this question for all: What would an imaginary ideal world (“2.0”) look like that we could plot our current state of decision making affairs against? (dreaming encouraged, sharing those dreams even more so, if you wish)

As an attempt at a constructive proposal, sometimes it is hard to know whether or not a pull request has the potential to break up something that is depending on old behavior. It would be helpful to know the relative usages of a package, download metrics from the binary cache, other indicators of use. This would allow us to be more aggressive in re-factoring, iterating, and making cleanup changes to relatively unused parts of Nixpkgs that is in need.

Then to address nitpicks breaking down otherwise good PRs; we can have a principle of “do no harm” or “PR does no harm”. This would work if the nits persist somewhere, but it would be frustrating for the reviewer to constantly repeat themselves each time.

I thought of asking reviewers to mark comments with “mandatory for merge” vs “nice to have”. But that probably won’t work as it’s more work. The real culprit here is that GitHub is fairly heavyweight in incorporating a slew of small changes.

5 Likes

I appreciate your entire input.

This, in particular, leads me to another thought or idea: instead of any formulaic approach, that quickly gets annoying (see PR templates), reviewers could be coached [on a voluntary basis] on effective expectation setting and management:

When one is the first reviewer, one can use the summary comment to set clear expectations as to when a PR is mergeable.

Setting those expectations not only sets the stage for consecutive reviewers, but also gives practical guidance for the final merger.

The key is that whoever happens to be this reviewer must be put into a position of entitlement in order to make that considerate call.

To support adoption, we can formulate this as a formulaic “first reviewer privilige” (FRP): the first to review is also expected to set the stage [and tone] that shall lead to a merge.

The solution to avoid stale PRs is to go out and start reviewing and encourage other people to do so.

While I totally agree with this, I have to admit, as someone without commit access, it can be kind of frustrating to review PRs only to see that they still have yet to be merged, and watch the conflicts stack up and the authors lose patience/interest.

Reviewing is great, but there is still the problem of actually getting a hold of someone with the time and interest to merge the damn thing. Which is why I thought BORS might be able to assist us in that respect (and I still think that), but I’ve seen enough push back on the idea that I doubt it’s going to happen.

For now, I will still attempt to look for PRs that I can leave meaningful reviews on, but we could definitely do with some improvement in the merging process, at least from my perspective.

4 Likes

Let’s put that forth for a second, since there seems to be a lot of confusion about this:

A BORS implementation, as far as I can tell is strictly pareto-dominated and I havn’t yet seen one argument that I could remember that demonstrates otherwise (= being trade-off-dominated). Though, a lot of people who opposed BORS until recently don’t seem to have been able to identify this fact.

My heart fills with joy realizing this glossary is already being useful :heart: .

1 Like

Well, a committer needs to buy the sufficiency (although for stylistic things that might be eventually possible to make the expectation), and also if a committer things first review (from someone who is not a maintainer of the packages) demands too much from code style point of view I would like them to ignore and merge…

Nixpkgs targets enough environments that it is hard to write strictly correct BORS policy, and yet harder to convince everyone it does not have unexpected consequences. (Silly example: a change to some package definition plus to all-packages.nix might cause zero rebuilds on the three native platforms and fix one cross-compilation case, but also do something very wrong in many other cross-compilation cases for many other packages; label-wise things look better than they are). I have seen objections substantially similar to that.

(I am not sure I can formulate a clearly convincing to myself policy that does not look like silly overhead; depends on what you want to catch, of course)

What do you want instead? More rigorous closing of PRs that aren’t ready? That will just make people angry. Merging them instead, even if they are not ready? Probably a bad idea.

Also, RFCs exist, but not many people seem to be interested in using them.

Less than one in 10 PRs is «ready» when it’s merged. Half of PRs that take more than one update before merging are «ready» before the last update. I can imagine reasonable-ish definitions of «ready» making either claim true.

Changing the notion of «ready» — or slightly reducing the variance — can be useful.

I’ve found the marvin-mk2 bot to be useful for this. It basically helps keep the process going by automatically assigning reviewers/mergers, and reminding them at specified intervals.

I’m not sure it’s still active, as I haven’t been assigned any reviews by it recently (it’s opt-in), but it’s helped get some PRs over the finish line where there was quite a bit of back and forth needed (e.g. this one).

/cc @timokau

2 Likes

Some PRs stall when there are so many reviewers that it becomes unclear which reviews are blockers, which reviewers are “drive-by” and which are in a position to commit. Some of this is politeness, we defer to the first response, but also understand others may have a better understanding.

Or someone tries to be helpful and adds a review, but cannot merge themselves. This can confuse the original submitter.

The thoughts we’d like to express are:

  1. “Here is my review. I am ready to merge if this is fixed.”
  2. “Here is my review to help the PR, I have no further commitment or responsibility.”
  3. “As a committer, I don’t have enough domain expertise and would need further approvals from experts.”
  4. “As a committer I trust this specific submitter’s expertise, but want additional input from community.”
  5. “As an expert, here is an opinion.”
  6. another?

I think some confusion occurs when people don’t know in which capacity a review/comment was made in. Or assume which it is.

12 Likes

@tomberek I think these “modes of review” are an important enough complementary field of thought, that we probably should dedicate it it’s own thread. I’m happy that our interaction hear lead you to post this.

Therefore, I’d like to ask a moderator to split this off and maybe @tomberek can reformulate this to make it its own introduction to a new thread. [“Review 2.0! (1.0.1?)”]

I also think on pareto and tradeoff, everything that can be said at this point is already said, and we should shift the discussion towards the “modes of review”, when looking for ways to further address the underlying issues.

Thank you all! :sparkles:

1 Like

I think this topic is worth reviving because the issues stated still are present and have been triggered again by Announcing nixpkgs-merge-bot . For the sake of keeping things separated (that discussion there and the meta about decision making here): the community needs a better decision making process, IMO. Not just regarding PRs, but in general it is very unclear:

  • what the decision making process even is
  • who/which group has the last say
  • who can contribute
  • where decisions are made

If a mod thinks this should be a new thread, do fork it :slight_smile:

The “problem”

It feels like committers have a large say in many things and can (somewhat) unilaterally make decisions that impact the community. A PR can be self-merged by a committer, RFCs that kind of stall can just be merged to “get things moving”, a new domain can be declared an official source of documentation, repositories can just be created in the official namespace with no awareness of the rest of the community, etc.

It’s also difficult to track what’s going in terms of development but in the internal nix ecosystem too. There seem to be personal or group efforts at project management, but definitely no standard, or community-approved, or official way of initiating a project, getting feedback on it, including members, etc.

Collecting feedback

One big issue IMO is how feedback is collected. Taking Announcing nixpkgs-merge-bot as the most recent example, it turned into a large, disparate and difficult discussion to track because:

  1. discourse doesn’t have a proper thread view
  2. someone made the decision to discuss it on matrix as well, which shuts out anybody who is async / not “live” → now to get “caught up” you need to follow the discussion in 2 places. If an RFC were created the discussion would be on 3 different platforms.

Most glaringly, there doesn’t to be a way to vote in a manner that counts, nor a method to count votes. Somebody may make an suggestion regarding an item e.g “we should enable this in a verbose and dry-run fashion”, some may agree with it, some not, but then the discussion moves on and decision makers may not even see the general consensus on the issue.


In conclusion, IMO there has to be a main clear, documented decision making process that is easy to find, inclusive, and in one central location. “main” meaning a master decision making process from which others are made.

This doesn’t mean it has to be cumbersome, slow, overbearing, and so on.

I know it’s early to propose solutions without feedback, so food for thought:
Examples of software that attempt solving similar issues:

Maybe RFCs aren’t being used properly, maybe there’s even a discourse plugin for a similar issue or a forum with threaded views could be used… in any case, the process could be improved.

1 Like

I think the list needs the main question: when it is even worth making a decision as a full-community decision.

Which is more often than not a good thing if it is either a low-rev-deps change, or a change with moderate rev-dep count with a low amount of people actually submitting changes. For large-impact things, it’s another story.

Uhm, most glaringly we lack an answer for what is a constituency for which decisions, and this is actually discussed rather regularly (in different words) in case of actually contentious decisions.

2 Likes

Interestingly if you subscribe to the forum via mail you get a proper threaded view in your mail client. Looks like something that could be (ab)used by having an account that would forward mail to e.g. mailman where a read-only threaded view could be available via its interface.