As moderation in the current atmosphere has taken an emotional toll and comes with bias,
I’m looking to experiment using AI to enforce code of conduct against the reddit posts.
That is a highly interesting request. In particular in so far as it might allow for making moderation criteria more transparent if done right and in the open. I am not sure a ready-made solution for this use case is out there tbh.
But I will look out for one.
The bias issue @Atemu mentioned however is a real one. And that’s not cynicism. I just think it is an interesting thing to try nonetheless.
A language model is, by design, not transparent in how it makes its inferences, and it is a known problem that it inherits the biases from the training data. As such, it is already fundamentally unsuitable for moderation.
More concretely, here is a recent study from using a language model for moderation on reddit, which finds that it doesn’t work:
We find that while LLM-Mod has a good true-negative rate (92.3%), it has a bad true-positive rate (43.1%), performing poorly when flagging rule-violating posts. LLM-Mod is likely to flag keyword-matching-based rule violations, but cannot reason about posts with higher complexity.
I don’t have experience moderating NixOS spaces, though some experience elsewhere.
I could see this work well for spam, but not so much for ‘actual’ ‘community’ moderation. For the latter, these are rarely per-post decisions: the meaningful decisions are the result of longer-standing patterns of behavior and (public and/or private) interaction with people whose posts (or even accounts) might eventually get moderated.
These latter, difficult cases are the ones that take the emotional toll on the moderators, and I don’t see how AIs would be of much help there.
As long as there’s always a human in the loop when actions are taken, I guess it might be an interesting experiment, but I’m not terribly optimistic it’ll help…
I could see this work well for spam, but not so much for ‘actual’ ‘community’ moderation. For the latter, these are rarely per-post decisions: the meaningful decisions are the result of longer-standing patterns of behavior and (public and/or private) interaction with people whose posts (or even accounts) might eventually get moderated.
I’m not sure that moderation is a way to resolve these wounds. That’s not what I’m trying to moderate, but rather CoC as mentioned in my post.
These latter, difficult cases are the ones that take the emotional toll on the moderators, and I don’t see how AIs would be of much help there.
Not handling this at moderation is a way to resolve these things, but that’s a topic for another thread please.
As long as there’s always a human in the loop when actions are taken, I guess it might be an interesting experiment, but I’m not terribly optimistic it’ll help…
That does sound like a reasonable approach. Even though that most likely would mean building some open source tool on our own, I suppose. Do you know what technology is in use under the hood for that service?
I guess this might be partly due to my unfamiliarity with (NixOS) Reddit, but I’m not sure I understand what you mean by “moderating CoC” then. CoC’s are typically nuanced enough that violations are hopefully unintentional, so a (possible) violation would be a reason for a conversation - but that quickly gets into territory where I don’t see AIs help much, especially for the emotionally stressful cases.
Yeah, ‘you do you’ obviously, I’m just curious what kinds of cases you expect it to help with.
FWIW I am personally a fan of the light-touch-but-not-zero moderation currently happening on /r/nixos. Hiding posts that lead to obvious brigading is useful and can’t really be self moderated. But aside from that there’s a good amount of self moderation happening, actively toxic posts get down voted and addressed, and it’s good to have a place where people can air grievances and feel heard.
For anyone determined to be toxic, I’m sure it will take all of 3 seconds to figure out how to get any point across without triggering this filter. If nothing else, by creating a new kind of jargon with implied meaning and highly in-speak style of sarcasm.
I think letting an AI censor /r/nixos, whether directly or indirectly, will have a chilling effect on bona fide members of the community.
It’s reddit, I think people go there precisely because it’s different from discourse. Anyone who wants the discourse flavor is already here. Isn’t it nice to have different environments? After all, we’re all a little crazy, just in different ways
sadly domen cannot onboard new mods (he lacks the permissions), and the current admins seems to be missing. Multiple people already tried to help out by applying for moderation