Decision Making 2.0! (1.0.1?)

fricklerhandwerk · October 21, 2023, 10:08pm

@UefiPls I agree with the general sentiment you express, as well as with a few particular points such as multiple communication channels and various accessibility issues. I think we’re in an untenable situation in that regard, because it disproportionally privileges people with lots of time on their hands, and tends to burn out even those.

I‘m personally working towards establishing more structured decision making processes, and bootstrapping that essentially means getting “the right people” to play along – which is a very unstructured decision making process. It’s tricky and delicate.

I was involved in some of the things you mentioned in a critical tone, and I‘m aware of a couple of mistakes that happened. In particular, in my opinion the nix-book repo turned out to be a bad implementation of a good idea and we should garbage-collect that without breaking too many links.

What I’ve experienced people being most successful with so far is this: visibly propose small changes and implement them immediately once there is consensus. If there is headwind, the change is too large in scope. If you can’t implement it within an afternoon, the change is too large in volume. Anything that’s not merged (in the broadest sense) doesn’t matter anyway, therefore optimise for finishing things.

That doesn’t answer how to make far-reaching decisions more efficient, of course. I think we can get there with the same kind of small steps. We’ve established multiple new teams in the past two years, and each one of them is building organisational knowledge and culture as they go. We can already see how this is slowly leading to clarifying responsibilities, establishing predictable routines, and increasing visibility of decisions made and work done, and how those approach ever more difficult problems together.

One of the next challenges in that area will be to make all that easier to participate in, by finding a healthier balance of in-person and written asychronous communication, as well as it’s amount and pacing, in order to fight the curse of availability. I‘m convinced this primarily requires more care and discipline by those privileged with availability, and especially those getting paid, myself included.

But primarily, again in my opinion based on what I’ve seen work well or fail, we have to double down on establishing firm ownership and responsibilities, combined with transparent communication and predictable processes. I think it almost doesn‘t matter how decisions are made as long as they can be introspected by those affected and leave enough time to raise concerns.

Do you have concrete ideas how to start fixing the issues you mentioned in small steps?

7c6f434c · October 22, 2023, 11:30am

Maybe a more realistic aim is to figure out how to cut the losses cheaply when the change turns out too far-reaching. Unfortunately, this might require holding the line long enough that Flakes / CLI / purity — clearly a change too large to go smoothly as a single piece — are sliced into pieces small enough to polish.

(Documenting whatever is figured out could also end up useful)

Maybe we need better procedures for public and clear single-issue trust delegation. E.g. «getting the core ideas right», «fleshing out the entire plan», «processing the discussion to determine the changes most neede to acceptance» — our RFC traditions strongly push it to be done by the same person, while these are different kinds of work with different timelines etc.

matklad · October 23, 2023, 10:20am

To chime in with Rust experience — this seems to be part of the “essential complexity” here (or at least something which is very hard to solve with tooling). In Rust, discussions are always sprawling, official places to discuss designs are:

the RFC PR
the tracking issue
the ACP, MCP issues some teams are using
the official Zulip
the official discourse forum
the official discord

And there are a host of unofficial channels as well, like reddit, community discord and what not. Controversial things tend to get discussed to death across all of these venues (especially big ones also earn a string of stand-alone blog posts by community members).

I would say historically the Rust project tried to organize “one true venue”, but that didn’t pan out, the current state is more of a product of accidental historical factors, rather than intelligent design (some may recall that Rust move off to discord off the IRC, but then it somehow end up using Zulip. Today, I also feel that the internals discourse isn’t really any more “the place that matters”, but there wasn’t an official deprecation).

The best way the Rust project found to fight this is with summaries — someone has to plough through all the branching discussion, but, if they compile (mostly redundant) findins into a single, concise document, everyone else can refer to it.

There are many places in Rust process which make use of summaries:

For long discussion thread, someone often writes the summary in the middle example.
Before voting on a decision, a concise summary is usually provided for what exactly is voted for example
After the base decision have been made, a tracking issue is created which includes the live summary of what’s actually there example

The people who go and write summaries could be:

people pushing a particular feature (summarizing is one of the most impactful actions to get things moving)
members of the decision-making teams (in some sense, it’s their job to ensure that summaries are available for their respective areas)
just general, “random” community facilitators, who happened to read thousands of messages of discussion of an obscure topic instead of doing something more productive, and who can procrastinate even more by summarizing their learning.

iFreilicht · October 29, 2023, 3:55pm

That’s so interesting, this matches my experience exactly, though I didn’t really realize it until now.

There were two highly upvoted feature requests on GitHub (#5567 and #5110) that I was interested in implementing. They were both over two years old and while many people seemed to want these features and expressed their opinions, nothing really happend. In both cases, I wrote summaries because it just felt like I had to to get a starting point for any potential work.

That got the ball rolling, some more discussion followed, the Nix team put it on their agenda, and for both issues we now have a decision. (PR is ready and will be merged in one case, postpone until after flakes are stabilized in the other case)

Maybe I should do a lot more of this.

9999years · October 29, 2023, 10:35pm

Part of the issue is that GitHub Issues is a really bad platform for long-term task tracking and and decision making discussions. There’s no way to pin comments, maintainers can hide comments but they still take up a bunch of space, GitHub won’t load all the comments at once so you can’t Ctrl-F to look for keywords… it all has a super low signal-to-noise ratio.

Thank you for your summaries, they do help mitigate this issue!

matklad · October 30, 2023, 9:36am

This doesn’t match my experience with Rust, it tracks its running tasks fine. They key technical enabler here is that you can edit issue description after the fact, so issue description is kept as a live summary.

The key here (comparing Rust with some other projects I’ve seen) is actually to have a well-defined, crisp flow for tracking work. In Rust, that would be the tracking issues pattern. Some characteristics:

The process is a distinct thing with a name, it is separate from your usual goop of GitHub issues, and people think in terms of “tracking issues”, there’s associated GitHub tag, e.t.c.
It is consistently applied for every piece of work in progress where the decision has been made, but the implementation isn’t there. As a result, if anyone wants to know the status of something, they can easily find the tracking issue.
Tracking issues clearly fit into the overall feature lifecycle of

Idea → RFC → Decision → Tracking Issue → Implementation → Stabilization Report → Decision → Stabilization → Release
Tracking issue is a GitHub issue, whose primary purpose is to track work elsewhere. It is
- clearly named, name says that it is a tracking issue, and which single feature is tracked by it
- contains a link which explains, in detail, what is being tracked (typically the originating RFC)
- optionally contains a brief summary of the current state in prose
- links to all open, merged, or closed implementation PRs
- contains a list of unresolved (open) questions
- when an question is resolved, a link to the resolution is added (“resolution” is usually someone leaving a comment saying “let’s do X rather than Y because Z”)
Tracking issues naturally accrue a lot of comments over time, but everything consequential is added to the issue description (typically, as an unresolved question), so there’s little need to organize that better.

Here’s an example of a manageable tracking issue for a very discussed feature:

github.com/rust-lang/rust

Tracking Issue for `once_cell`

opened 12:07AM - 18 Jul 20 UTC

closed 03:30PM - 30 Mar 23 UTC

KodrAus

A-concurrency T-libs-api B-unstable C-tracking-issue Libs-Tracked

This is a tracking issue for the RFC "standard lazy types" (rust-lang/rfcs#2788)…. The feature gate for the issue is `#![feature(once_cell)]`. ### Unstable API ```rust // core::lazy pub struct OnceCell<T> { .. } impl<T> OnceCell<T> { pub const fn new() -> OnceCell<T>; pub fn get(&self) -> Option<&T>; pub fn get_mut(&mut self) -> Option<&mut T>; pub fn set(&self, value: T) -> Result<(), T>; pub fn get_or_init<F>(&self, f: F) -> &T where F: FnOnce() -> T; pub fn get_or_try_init<F, E>(&self, f: F) -> Result<&T, E> where F: FnOnce() -> Result<T, E>; pub fn into_inner(self) -> Option<T>; pub fn take(&mut self) -> Option<T>; } impl<T> From<T> for OnceCell<T>; impl<T> Default for OnceCell<T>; impl<T: Clone> Clone for OnceCell<T>; impl<T: PartialEq> PartialEq for OnceCell<T>; impl<T: Eq> Eq for OnceCell<T>; impl<T: fmt::Debug> fmt::Debug for OnceCell<T>; pub struct Lazy<T, F = fn() -> T> { .. } impl<T, F> Lazy<T, F> { pub const fn new(init: F) -> Lazy<T, F>; } impl<T, F: FnOnce() -> T> Lazy<T, F> { pub fn force(this: &Lazy<T, F>) -> &T; } impl<T: Default> Default for Lazy<T>; impl<T, F: FnOnce() -> T> Deref for Lazy<T, F>; impl<T: fmt::Debug, F> fmt::Debug for Lazy<T, F>; // std::lazy pub struct SyncOnceCell<T> { .. } impl<T> SyncOnceCell<T> { pub const fn new() -> SyncOnceCell<T>; pub fn get(&self) -> Option<&T>; pub fn get_mut(&mut self) -> Option<&mut T>; pub fn set(&self, value: T) -> Result<(), T>; pub fn get_or_init<F>(&self, f: F) -> &T where F: FnOnce() -> T; pub fn get_or_try_init<F, E>(&self, f: F) -> Result<&T, E> where F: FnOnce() -> Result<T, E>; pub fn into_inner(mut self) -> Option<T>; pub fn take(&mut self) -> Option<T>; fn is_initialized(&self) -> bool; fn initialize<F, E>(&self, f: F) -> Result<(), E> where F: FnOnce() -> Result<T, E>; unsafe fn get_unchecked(&self) -> &T; unsafe fn get_unchecked_mut(&mut self) -> &mut T; } impl<T> From<T> for SyncOnceCell<T>; impl<T> Default for SyncOnceCell<T>; impl<T: RefUnwindSafe + UnwindSafe> RefUnwindSafe for SyncOnceCell<T>; impl<T: UnwindSafe> UnwindSafe for SyncOnceCell<T>; impl<T: Clone> Clone for SyncOnceCell<T>; impl<T: PartialEq> PartialEq for SyncOnceCell<T>; impl<T: Eq> Eq for SyncOnceCell<T>; unsafe impl<T: Sync + Send> Sync for SyncOnceCell<T>; unsafe impl<T: Send> Send for SyncOnceCell<T>; impl<T: fmt::Debug> fmt::Debug for SyncOnceCell<T>; pub struct SyncLazy<T, F = fn() -> T>; impl<T, F> SyncLazy<T, F> { pub const fn new(f: F) -> SyncLazy<T, F>; } impl<T, F: FnOnce() -> T> SyncLazy<T, F> { pub fn force(this: &SyncLazy<T, F>) -> &T; } impl<T, F: FnOnce() -> T> Deref for SyncLazy<T, F>; impl<T: Default> Default for SyncLazy<T>; impl<T, F: UnwindSafe> RefUnwindSafe for SyncLazy<T, F> where SyncOnceCell<T>: RefUnwindSafe; impl<T, F: UnwindSafe> UnwindSafe for SyncLazy<T, F> where SyncOnceCell<T>: UnwindSafe; unsafe impl<T, F: Send> Sync for SyncLazy<T, F> where SyncOnceCell<T>: Sync; impl<T: fmt::Debug, F> fmt::Debug for SyncLazy<T, F>; ``` ### Steps - [X] Complete the RFC process over at https://github.com/rust-lang/rfcs/pull/2788 - [X] FCP https://github.com/rust-lang/rust/pull/105587#issuecomment-1367890678 - [X] Stabilization PR: https://github.com/rust-lang/rust/pull/105587 ### Unresolved Questions Inlined from #72414: - [X] Naming. I'm ok to just roll with the `Sync` prefix like `SyncLazy` for now, but [have a personal preference for `Atomic`](https://github.com/rust-lang/rfcs/pull/2788#issuecomment-574466983) like `AtomicLazy`. Resolved in: https://github.com/rust-lang/rust/issues/74465#issuecomment-1098359963. Surprisingly, after more than a year of deliberation we actually found a better name. - [x] [Poisoning](https://github.com/rust-lang/rfcs/pull/2788#discussion_r366725768). It seems like there's [some regret around poisoning in other `std::sync` types that we might want to just avoid upfront for `std::lazy`, especially if that would align with a future `std::mutex` that doesn't poison](https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/parking_lot.3A.3AMutex.20in.20std/near/190331199). Personally, if we're adding these types to `std::lazy` instead of `std::sync`, I'd be on-board with not worrying about poisoning in `std::lazy`, and potentially deprecating `std::sync::Once` and `lazy_static` in favour of `std::lazy` down the track if it's possible, rather than attempting to replicate their behavior. cc @Amanieu @sfackler. - [x] [Consider making`SyncOnceCell::get` blocking](https://github.com/matklad/once_cell/pull/92). There doesn't seem to be consensus in the linked PR on whether or not that's strictly better than the non-blocking variant. (resolved in https://github.com/rust-lang/rust/issues/74465#issuecomment-663414310). - [X] [Atomic Ordering](https://github.com/rust-lang/rfcs/pull/2788#issuecomment-570555592). the implementation currently use `Release/Acquire`, but it could also use the elusive [Consume](https://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/) ordering. Should we spec that we guarantee `Release/Acquire`? (resolved as yes: consume ordering is not defined enough to merit inclusion into std) - [x] [Sync no_std subset](https://github.com/rust-lang/rfcs/pull/2788#issuecomment-569845023). It seems plausible that we might provide some subset of `SyncOnceCell` in no_std. I think there's consensus that we don't want to include "blocking" parts of API, but it's unclear if non-blocking subset (get+set) would be useful. (resolved in https://github.com/rust-lang/rust/issues/74465#issuecomment-725360596). - [x] [Method naming](https://github.com/rust-lang/rust/issues/74465#issuecomment-726020743) is `get_or[_try]_init` the best name? (resolved as yes in https://github.com/rust-lang/rust/pull/107184) ### Implementation history - #68198 (closed in favor of #72414) - #72414 initial imlementation - #74814 fixed `UnwindSafe` bounds

Although it took years, the tracking issue enabled someone different from the author of the original RFC to complete the work.

and decision making discussions.

Yup, here I think GitHub lacks. To decompose this, there is:

Information governing & discussion to figure out what exactly is the proposal to make a decision on, this is collaborative RFC writing
Decision process per-se — given the RFC, is it accepted or rejected?
After decision have been made, tracking of the implementation work

These three are completely different processes.

Rust uses GitHub repo with RFCs for 1, not because GitHub is good, but because GitHub is central. If someone doesn’t want to miss an RFC, they can watch the single repository.

For 2, as each decision is made by a separate team of a handful of people, GitHub isn’t really needed. Actual decisions typically happen in async team meetings on Zulip or sync video meetings. There’s also a bot to manage 2PC-like “final comment period” process and formal voting.

The 3 is the tracking issue process described above. It comes into play once the decision have been made.

Infinisil · October 30, 2023, 9:55am

I proposed to do this very thing in [RFC 0138] Developing RFCs in repositories by infinisil · Pull Request #138 · NixOS/rfcs · GitHub but there was not enough interest… I do often advocate to use this approach regardless, can definitely recommend.

Also for RFC 140, I’m adding all the implementation work to a GitHub Milestone, acting as a tracking issue to a degree. This is working well, a dedicated tracking issue does sound even better though.

matklad · October 30, 2023, 10:15am

Sorry, I was ambiguous — what I meant that there’s a single repository for all RFCs (GitHub - rust-lang/rfcs: RFCs for changes to Rust), where an RFC is submitted for discussion “for real” and is often adjusted (in minor or major way) before being voted on.

This is different from the process which gives you an RFC text to begin with: some people have a “single-repo-RFC” where they work in the open, others just wake up from feverish dream with the RFC text inscribed in golden letters in their mind’s eye.

The difference between pre-RFC and RFC phases is that the “pre” phase is only for people who actively seek out the RFC, while the RFC phase is to notify everyone. In terms of safety-vs-liveness, the goal of RFC PR against RFC repo is safety — we want to make sure that anyone who could have input has a chance to provide it.

How we get liveness, the RFC in the first place, is unspecified. At some point Rust tried to do “each RFC is a GitHub repo”, but that didn’t work out. Luckily, there needn’t be a single process here, each RFC can be different.

Also for RFC 140 , I’m adding all the implementation work to a GitHub Milestone, acting as a tracking issue to a degree. This is working well, a dedicated tracking issue does sound even better though.

Yeah, I think milestone is significantly worse, for two reasons:

there’s no place where you can, in prose, contextualize the work, clearly delimit what’s blocking and what’s nice to have.
I (as a GitHub passer by) can’t add a comment to the milestone. One role of a tracking issue is that it is the center — tracking issue itself is probably a bad place for any discussion, but every discussion elsewhere can be started by a comment on a tracking issue.

And yeah, a big thing here is also being consistent across all different features. Tracking issues are "commons’, in a sense that, if everyone uses tracking issues, there’s ecosystem-wide improvement in coordination, as opposed to a situation where each specific feature is tracked meticulously, but with a separate mechanism.