Automating Moderation on Reddit

paperdigits · July 4, 2024, 9:15pm

Seems like if @domenkozar wants to be able to onboard some new mods, it can be requested in r/redditrequest.

I tried to request, but since he’s taken moderation recently, the request was closed. That seems like a better idea then turning it over to the machines.

piegames · July 4, 2024, 11:00pm

At least copying over the bans from the Nix community would by my minimal expectation for moderation on the subreddit, I find it weird that experiments like this take priority over that.

nyanbinary · July 5, 2024, 3:29am

Just what this community needs, more AI slop/robotic interaction instead of actual human connection /s. But yeah, this is a bad idea IMO, get more human mods :P.

crertel · July 5, 2024, 4:17am

Success criteria might be:

accuracy in identifying posts (though for moderation, it’s patterns of behavior that are arguably trickier to handle)
time to resolution of posts (how long a bad post stays up)
amount of complaining after action

Any others you’re thinking of?

APCodes · July 5, 2024, 8:37am

At least copying over the bans from the Nix community would by my minimal expectation for moderation on the subreddit, I find it weird that experiments like this take priority over that.

I strongly feel that independent spaces should not copy-paste decisions from the moderation team here, but remain independent in their moderation approach.

And speaking of this, you might be aware of how much controversy this proposal has already raised in the recent past so as to understand that it is maybe not the right time to raise it yet again. That is not necessary until the assembly has done it’s job and these things can be seen in a new light.

I find Domen’s approach very interesting precisely because it represents an experiment with moderation on an independent space. It allows us all to evaluate a completely different approach to moderation. Copy-pasting existing decisions would just deprive us of that possibility.

Edit: I should hope that Domen thinks of a way to enable us here to evaluate the results of his experiment?

TurboUnicorn · July 5, 2024, 7:59pm

The problem with human mods is bias, as we’ve seen on Discourse, and if by chance a partisan mod is appointed that would result in furthering the existing division. There is a significant crisis of trust that has only been growing and unless said potential moderator (assuming it would be possible to appoint one in the first place) should be someone trusted and acceptable by both sides, IMO.

paperdigits · July 5, 2024, 8:36pm

Yes this is also a problem with AI. Not sure why you’d think its limited to humans only. Training AI on human data leads to bias as well.

TurboUnicorn · July 5, 2024, 8:47pm

I should’ve been more precise in my initial statement, I suppose. I’m highly sceptical about AI moderation, but at least there the bias leads to mostly false negatives, whereas with humans it’s more likely to be false positives. The first one can by verification by Domen (who has so far proven to be very reasonable), whereas the second is much less likely to have recourse and can lead to pitchforks.

paperdigits · July 5, 2024, 9:45pm

How do you figure?

Its all down to training data.

Atemu · July 6, 2024, 7:39am

Go ask Microsoft’s Tay what it “thinks” of benign comments authored by members of marginalised groups.

APCodes · July 6, 2024, 10:02am

Okay but Microsoft’s Tay is from several years ago, even before the Attention Is All You Need paper came out in 2017 which paved the way for the current transformer-based architecture of LLMs. Bringing that up in the current context is just creating unnecessary fear.

On how many false negatives or false positives you get, that can only be evaluated by an actual test run. Because in the end, Domen will have to keep tabs about how many things he thinks the models falsely classifies as violations and how many things it falsely classifies as non-violations. And then we know how it performs. Obviously that only makes sense if he very diligently looks at all the posts on the subreddit for the time of the test run in order to get all the false negatives.

However, if anyone doesn’t trust Domen’s judgement to begin with, well then we are back to square one. But that is an issue that can’t be solved sadly.

But I’d really just like to see the experiment and it’s outcome tbh. if we get the necessary data transparently so we can evaluate it for ourselves after a few dozen true positives and a lot more of true negatives.

piegames · July 7, 2024, 6:27pm

Regardless whether or not this is a good idea to do, are you even allowed to feed other people’s comments into some ChatGPT service? People often just assume that they can do anything with any public text (and especially the LLM training people have been pushing hard for that), but what does the actual legalese say on this question? Both in terms of actual laws like for example the GDPR, and also in terms of EULAs and licensing

APCodes · July 7, 2024, 8:14pm

That is a really good question. But that will somewhat be determined by Reddit’s terms of service I would assume.

lassulus · July 7, 2024, 9:23pm

there seems to be some functionality already included in reddit, not sure if it’s helpful or already activated for the nixos subreddit. But maybe it’s interesting to investigate:
https://www.reddit.com/r/modnews/comments/1bd3b82/a_new_harassment_filter_and_user_reporting_type/

domenkozar · July 13, 2024, 10:53am

https://watchdog.chat/ has been turned on for /r/NixOS, it’s going to post a comment to violating comments and also send modmail.

Keep an eye out on false positives!

domenkozar · July 15, 2024, 10:38am

The good: /r/NixOS Reddit comments that get a warning from https://watchdog.chat are now frequently edited to pay respect to CoC guidelines. Success!

The bad: There are still some false positives, mostly related to irony or swearing into nature.

L-as · July 15, 2024, 1:07pm

Do you know what model it’s using underneath?

I do think this is a great idea FWIW. It’s easy to hate on reflectively,
but it’s better to see it as an automated tool that tells you when you might not be following
the CoC. Most will try to align with the CoC when that happens.

domenkozar · July 15, 2024, 2:22pm

I don’t know the model, maybe Ben can share.

Preliminary data shows 50% of posts that get a warning get edited, which I’d call a success. I’m sure there’s plenty of room for improvement though!

delroth · July 18, 2024, 6:12am

It’s easy to call anything a success when you don’t define success metrics ahead of time.

domenkozar · July 18, 2024, 1:46pm

We’re experimenting with analytics, tracking things like how many posts got the warning over time,
how many were edited after the fact, removed, etc. That way we can see if the trend of moderation is going in the right way.

Another thing that’s high on the priority list is the moderation log, so we can be transparent about all CoC violations.

The last thing I have in mind is to be transparent about the escalation policy when the CoC is violated continuously and banning of people (although that wasn’t necessary so far, so it’s low priority).