Decision Making 2.0! (1.0.1?)

I appreciate your entire input.

This, in particular, leads me to another thought or idea: instead of any formulaic approach, that quickly gets annoying (see PR templates), reviewers could be coached [on a voluntary basis] on effective expectation setting and management:

When one is the first reviewer, one can use the summary comment to set clear expectations as to when a PR is mergeable.

Setting those expectations not only sets the stage for consecutive reviewers, but also gives practical guidance for the final merger.

The key is that whoever happens to be this reviewer must be put into a position of entitlement in order to make that considerate call.

To support adoption, we can formulate this as a formulaic “first reviewer privilige” (FRP): the first to review is also expected to set the stage [and tone] that shall lead to a merge.

The solution to avoid stale PRs is to go out and start reviewing and encourage other people to do so.

While I totally agree with this, I have to admit, as someone without commit access, it can be kind of frustrating to review PRs only to see that they still have yet to be merged, and watch the conflicts stack up and the authors lose patience/interest.

Reviewing is great, but there is still the problem of actually getting a hold of someone with the time and interest to merge the damn thing. Which is why I thought BORS might be able to assist us in that respect (and I still think that), but I’ve seen enough push back on the idea that I doubt it’s going to happen.

For now, I will still attempt to look for PRs that I can leave meaningful reviews on, but we could definitely do with some improvement in the merging process, at least from my perspective.

4 Likes

Let’s put that forth for a second, since there seems to be a lot of confusion about this:

A BORS implementation, as far as I can tell is strictly pareto-dominated and I havn’t yet seen one argument that I could remember that demonstrates otherwise (= being trade-off-dominated). Though, a lot of people who opposed BORS until recently don’t seem to have been able to identify this fact.

My heart fills with joy realizing this glossary is already being useful :heart: .

1 Like

Well, a committer needs to buy the sufficiency (although for stylistic things that might be eventually possible to make the expectation), and also if a committer things first review (from someone who is not a maintainer of the packages) demands too much from code style point of view I would like them to ignore and merge…

Nixpkgs targets enough environments that it is hard to write strictly correct BORS policy, and yet harder to convince everyone it does not have unexpected consequences. (Silly example: a change to some package definition plus to all-packages.nix might cause zero rebuilds on the three native platforms and fix one cross-compilation case, but also do something very wrong in many other cross-compilation cases for many other packages; label-wise things look better than they are). I have seen objections substantially similar to that.

(I am not sure I can formulate a clearly convincing to myself policy that does not look like silly overhead; depends on what you want to catch, of course)

What do you want instead? More rigorous closing of PRs that aren’t ready? That will just make people angry. Merging them instead, even if they are not ready? Probably a bad idea.

Also, RFCs exist, but not many people seem to be interested in using them.

Less than one in 10 PRs is «ready» when it’s merged. Half of PRs that take more than one update before merging are «ready» before the last update. I can imagine reasonable-ish definitions of «ready» making either claim true.

Changing the notion of «ready» — or slightly reducing the variance — can be useful.

I’ve found the marvin-mk2 bot to be useful for this. It basically helps keep the process going by automatically assigning reviewers/mergers, and reminding them at specified intervals.

I’m not sure it’s still active, as I haven’t been assigned any reviews by it recently (it’s opt-in), but it’s helped get some PRs over the finish line where there was quite a bit of back and forth needed (e.g. this one).

/cc @timokau

2 Likes

Some PRs stall when there are so many reviewers that it becomes unclear which reviews are blockers, which reviewers are “drive-by” and which are in a position to commit. Some of this is politeness, we defer to the first response, but also understand others may have a better understanding.

Or someone tries to be helpful and adds a review, but cannot merge themselves. This can confuse the original submitter.

The thoughts we’d like to express are:

  1. “Here is my review. I am ready to merge if this is fixed.”
  2. “Here is my review to help the PR, I have no further commitment or responsibility.”
  3. “As a committer, I don’t have enough domain expertise and would need further approvals from experts.”
  4. “As a committer I trust this specific submitter’s expertise, but want additional input from community.”
  5. “As an expert, here is an opinion.”
  6. another?

I think some confusion occurs when people don’t know in which capacity a review/comment was made in. Or assume which it is.

12 Likes

@tomberek I think these “modes of review” are an important enough complementary field of thought, that we probably should dedicate it it’s own thread. I’m happy that our interaction hear lead you to post this.

Therefore, I’d like to ask a moderator to split this off and maybe @tomberek can reformulate this to make it its own introduction to a new thread. [“Review 2.0! (1.0.1?)”]

I also think on pareto and tradeoff, everything that can be said at this point is already said, and we should shift the discussion towards the “modes of review”, when looking for ways to further address the underlying issues.

Thank you all! :sparkles:

1 Like

I think this topic is worth reviving because the issues stated still are present and have been triggered again by Announcing nixpkgs-merge-bot . For the sake of keeping things separated (that discussion there and the meta about decision making here): the community needs a better decision making process, IMO. Not just regarding PRs, but in general it is very unclear:

  • what the decision making process even is
  • who/which group has the last say
  • who can contribute
  • where decisions are made

If a mod thinks this should be a new thread, do fork it :slight_smile:

The “problem”

It feels like committers have a large say in many things and can (somewhat) unilaterally make decisions that impact the community. A PR can be self-merged by a committer, RFCs that kind of stall can just be merged to “get things moving”, a new domain can be declared an official source of documentation, repositories can just be created in the official namespace with no awareness of the rest of the community, etc.

It’s also difficult to track what’s going in terms of development but in the internal nix ecosystem too. There seem to be personal or group efforts at project management, but definitely no standard, or community-approved, or official way of initiating a project, getting feedback on it, including members, etc.

Collecting feedback

One big issue IMO is how feedback is collected. Taking Announcing nixpkgs-merge-bot as the most recent example, it turned into a large, disparate and difficult discussion to track because:

  1. discourse doesn’t have a proper thread view
  2. someone made the decision to discuss it on matrix as well, which shuts out anybody who is async / not “live” → now to get “caught up” you need to follow the discussion in 2 places. If an RFC were created the discussion would be on 3 different platforms.

Most glaringly, there doesn’t to be a way to vote in a manner that counts, nor a method to count votes. Somebody may make an suggestion regarding an item e.g “we should enable this in a verbose and dry-run fashion”, some may agree with it, some not, but then the discussion moves on and decision makers may not even see the general consensus on the issue.


In conclusion, IMO there has to be a main clear, documented decision making process that is easy to find, inclusive, and in one central location. “main” meaning a master decision making process from which others are made.

This doesn’t mean it has to be cumbersome, slow, overbearing, and so on.

I know it’s early to propose solutions without feedback, so food for thought:
Examples of software that attempt solving similar issues:

Maybe RFCs aren’t being used properly, maybe there’s even a discourse plugin for a similar issue or a forum with threaded views could be used… in any case, the process could be improved.

1 Like

I think the list needs the main question: when it is even worth making a decision as a full-community decision.

Which is more often than not a good thing if it is either a low-rev-deps change, or a change with moderate rev-dep count with a low amount of people actually submitting changes. For large-impact things, it’s another story.

Uhm, most glaringly we lack an answer for what is a constituency for which decisions, and this is actually discussed rather regularly (in different words) in case of actually contentious decisions.

2 Likes

Interestingly if you subscribe to the forum via mail you get a proper threaded view in your mail client. Looks like something that could be (ab)used by having an account that would forward mail to e.g. mailman where a read-only threaded view could be available via its interface.

@UefiPls I agree with the general sentiment you express, as well as with a few particular points such as multiple communication channels and various accessibility issues. I think we’re in an untenable situation in that regard, because it disproportionally privileges people with lots of time on their hands, and tends to burn out even those.

I‘m personally working towards establishing more structured decision making processes, and bootstrapping that essentially means getting “the right people” to play along – which is a very unstructured decision making process. It’s tricky and delicate.

I was involved in some of the things you mentioned in a critical tone, and I‘m aware of a couple of mistakes that happened. In particular, in my opinion the nix-book repo turned out to be a bad implementation of a good idea and we should garbage-collect that without breaking too many links.

What I’ve experienced people being most successful with so far is this: visibly propose small changes and implement them immediately once there is consensus. If there is headwind, the change is too large in scope. If you can’t implement it within an afternoon, the change is too large in volume. Anything that’s not merged (in the broadest sense) doesn’t matter anyway, therefore optimise for finishing things.

That doesn’t answer how to make far-reaching decisions more efficient, of course. I think we can get there with the same kind of small steps. We’ve established multiple new teams in the past two years, and each one of them is building organisational knowledge and culture as they go. We can already see how this is slowly leading to clarifying responsibilities, establishing predictable routines, and increasing visibility of decisions made and work done, and how those approach ever more difficult problems together.

One of the next challenges in that area will be to make all that easier to participate in, by finding a healthier balance of in-person and written asychronous communication, as well as it’s amount and pacing, in order to fight the curse of availability. I‘m convinced this primarily requires more care and discipline by those privileged with availability, and especially those getting paid, myself included.

But primarily, again in my opinion based on what I’ve seen work well or fail, we have to double down on establishing firm ownership and responsibilities, combined with transparent communication and predictable processes. I think it almost doesn‘t matter how decisions are made as long as they can be introspected by those affected and leave enough time to raise concerns.

Do you have concrete ideas how to start fixing the issues you mentioned in small steps?

1 Like

Maybe a more realistic aim is to figure out how to cut the losses cheaply when the change turns out too far-reaching. Unfortunately, this might require holding the line long enough that Flakes / CLI / purity — clearly a change too large to go smoothly as a single piece — are sliced into pieces small enough to polish.

(Documenting whatever is figured out could also end up useful)

Maybe we need better procedures for public and clear single-issue trust delegation. E.g. «getting the core ideas right», «fleshing out the entire plan», «processing the discussion to determine the changes most neede to acceptance» — our RFC traditions strongly push it to be done by the same person, while these are different kinds of work with different timelines etc.

2 Likes

To chime in with Rust experience — this seems to be part of the “essential complexity” here (or at least something which is very hard to solve with tooling). In Rust, discussions are always sprawling, official places to discuss designs are:

  • the RFC PR
  • the tracking issue
  • the ACP, MCP issues some teams are using
  • the official Zulip
  • the official discourse forum
  • the official discord

And there are a host of unofficial channels as well, like reddit, community discord and what not. Controversial things tend to get discussed to death across all of these venues (especially big ones also earn a string of stand-alone blog posts by community members).

I would say historically the Rust project tried to organize “one true venue”, but that didn’t pan out, the current state is more of a product of accidental historical factors, rather than intelligent design (some may recall that Rust move off to discord off the IRC, but then it somehow end up using Zulip. Today, I also feel that the internals discourse isn’t really any more “the place that matters”, but there wasn’t an official deprecation).

The best way the Rust project found to fight this is with summariessomeone has to plough through all the branching discussion, but, if they compile (mostly redundant) findins into a single, concise document, everyone else can refer to it.

There are many places in Rust process which make use of summaries:

  • For long discussion thread, someone often writes the summary in the middle example.
  • Before voting on a decision, a concise summary is usually provided for what exactly is voted for example
  • After the base decision have been made, a tracking issue is created which includes the live summary of what’s actually there example

The people who go and write summaries could be:

  • people pushing a particular feature (summarizing is one of the most impactful actions to get things moving)
  • members of the decision-making teams (in some sense, it’s their job to ensure that summaries are available for their respective areas)
  • just general, “random” community facilitators, who happened to read thousands of messages of discussion of an obscure topic instead of doing something more productive, and who can procrastinate even more by summarizing their learning.
10 Likes

That’s so interesting, this matches my experience exactly, though I didn’t really realize it until now.

There were two highly upvoted feature requests on GitHub (#5567 and #5110) that I was interested in implementing. They were both over two years old and while many people seemed to want these features and expressed their opinions, nothing really happend. In both cases, I wrote summaries because it just felt like I had to to get a starting point for any potential work.

That got the ball rolling, some more discussion followed, the Nix team put it on their agenda, and for both issues we now have a decision. (PR is ready and will be merged in one case, postpone until after flakes are stabilized in the other case)

Maybe I should do a lot more of this.

6 Likes

Part of the issue is that GitHub Issues is a really bad platform for long-term task tracking and and decision making discussions. There’s no way to pin comments, maintainers can hide comments but they still take up a bunch of space, GitHub won’t load all the comments at once so you can’t Ctrl-F to look for keywords… it all has a super low signal-to-noise ratio.

Thank you for your summaries, they do help mitigate this issue!

2 Likes

This doesn’t match my experience with Rust, it tracks its running tasks fine. They key technical enabler here is that you can edit issue description after the fact, so issue description is kept as a live summary.

The key here (comparing Rust with some other projects I’ve seen) is actually to have a well-defined, crisp flow for tracking work. In Rust, that would be the tracking issues pattern. Some characteristics:

  • The process is a distinct thing with a name, it is separate from your usual goop of GitHub issues, and people think in terms of “tracking issues”, there’s associated GitHub tag, e.t.c.

  • It is consistently applied for every piece of work in progress where the decision has been made, but the implementation isn’t there. As a result, if anyone wants to know the status of something, they can easily find the tracking issue.

  • Tracking issues clearly fit into the overall feature lifecycle of

    Idea → RFC → Decision → Tracking Issue → Implementation → Stabilization Report → Decision → Stabilization → Release

  • Tracking issue is a GitHub issue, whose primary purpose is to track work elsewhere. It is

    • clearly named, name says that it is a tracking issue, and which single feature is tracked by it
    • contains a link which explains, in detail, what is being tracked (typically the originating RFC)
    • optionally contains a brief summary of the current state in prose
    • links to all open, merged, or closed implementation PRs
    • contains a list of unresolved (open) questions
    • when an question is resolved, a link to the resolution is added (“resolution” is usually someone leaving a comment saying “let’s do X rather than Y because Z”)
  • Tracking issues naturally accrue a lot of comments over time, but everything consequential is added to the issue description (typically, as an unresolved question), so there’s little need to organize that better.

Here’s an example of a manageable tracking issue for a very discussed feature:
https://github.com/rust-lang/rust/issues/74465

Although it took years, the tracking issue enabled someone different from the author of the original RFC to complete the work.

and decision making discussions.

Yup, here I think GitHub lacks. To decompose this, there is:

  1. Information governing & discussion to figure out what exactly is the proposal to make a decision on, this is collaborative RFC writing
  2. Decision process per-se — given the RFC, is it accepted or rejected?
  3. After decision have been made, tracking of the implementation work

These three are completely different processes.

Rust uses GitHub repo with RFCs for 1, not because GitHub is good, but because GitHub is central. If someone doesn’t want to miss an RFC, they can watch the single repository.

For 2, as each decision is made by a separate team of a handful of people, GitHub isn’t really needed. Actual decisions typically happen in async team meetings on Zulip or sync video meetings. There’s also a bot to manage 2PC-like “final comment period” process and formal voting.

The 3 is the tracking issue process described above. It comes into play once the decision have been made.

2 Likes

I proposed to do this very thing in [RFC 0138] Developing RFCs in repositories by infinisil · Pull Request #138 · NixOS/rfcs · GitHub but there was not enough interest… I do often advocate to use this approach regardless, can definitely recommend.

Also for RFC 140, I’m adding all the implementation work to a GitHub Milestone, acting as a tracking issue to a degree. This is working well, a dedicated tracking issue does sound even better though.

1 Like

Sorry, I was ambiguous — what I meant that there’s a single repository for all RFCs (https://github.com/rust-lang/rfcs), where an RFC is submitted for discussion “for real” and is often adjusted (in minor or major way) before being voted on.

This is different from the process which gives you an RFC text to begin with: some people have a “single-repo-RFC” where they work in the open, others just wake up from feverish dream with the RFC text inscribed in golden letters in their mind’s eye.

The difference between pre-RFC and RFC phases is that the “pre” phase is only for people who actively seek out the RFC, while the RFC phase is to notify everyone. In terms of safety-vs-liveness, the goal of RFC PR against RFC repo is safety — we want to make sure that anyone who could have input has a chance to provide it.

How we get liveness, the RFC in the first place, is unspecified. At some point Rust tried to do “each RFC is a GitHub repo”, but that didn’t work out. Luckily, there needn’t be a single process here, each RFC can be different.

Also for RFC 140 , I’m adding all the implementation work to a GitHub Milestone, acting as a tracking issue to a degree. This is working well, a dedicated tracking issue does sound even better though.

Yeah, I think milestone is significantly worse, for two reasons:

  • there’s no place where you can, in prose, contextualize the work, clearly delimit what’s blocking and what’s nice to have.
  • I (as a GitHub passer by) can’t add a comment to the milestone. One role of a tracking issue is that it is the center — tracking issue itself is probably a bad place for any discussion, but every discussion elsewhere can be started by a comment on a tracking issue.

And yeah, a big thing here is also being consistent across all different features. Tracking issues are "commons’, in a sense that, if everyone uses tracking issues, there’s ecosystem-wide improvement in coordination, as opposed to a situation where each specific feature is tracked meticulously, but with a separate mechanism.

2 Likes