MSC3845: Draft: Expanding policy rooms to reputation#3845
MSC3845: Draft: Expanding policy rooms to reputation#3845Yoric wants to merge 4 commits intomatrix-org:mainfrom
Conversation
| - a mechanism to store, publish and share actual *actions* against an entity (such as kicking or muting). | ||
|
|
||
| The current proposal builds upon policies introduced in MSC2313 to serve as the former, letting | ||
| communities share their opinion of an entity as a number in [0, 100). Further tools may be developed |
There was a problem hiding this comment.
I'm a little concerned about the user friendliness behind this number. Would this number be directly exposed to people trying to build opinion lists on entities? Or would it be hidden by the client in some way (e.g. by never displaying the number itself, but asking if an entity seems friendly, is nearly infringing the CoC, or has actually infringed the CoC)
There was a problem hiding this comment.
I assumed that users would have some kind of slider from "toxic" to "pillar of the community" or such.
More advanced moderation tools would probably define some well-known values.
| @@ -0,0 +1,132 @@ | |||
| # MSCXXXX: Expanding Policy Rooms towards Distributed Reputation | |||
|
|
|||
| The Matrix network and protocol are open. However, some users, rooms or servers (let's call them | |||
There was a problem hiding this comment.
I would personally use the word party since its various parties that have opinions about each other in the case of reputation. Parties can be individual persons or groups/communities. Persons in this context is not Natural persons but persons inclusive of legal persons so companies etc.
I mean its just my prefered language for this but Entities could also be a choice. In my MSC3784 i like to use the word stakeholders but it doesnt apply here the same way since we are now talking about interactions between various parties instead of how this affects the stakeholders.
There was a problem hiding this comment.
That makes sense.
In this case, we're building upon MSC2313, which uses "entity", so I'm reusing because it's the simplest path.
| - a mechanism to store, publish and share actual *actions* against an entity (such as kicking or muting). | ||
|
|
||
| The current proposal builds upon policies introduced in MSC2313 to serve as the former, letting | ||
| communities share their opinion of an entity as a number in [-100, 100]. Further tools may be developed |
There was a problem hiding this comment.
So I personally don't have a good experience with numbers to rate stuff. People tend to pick 1-3 numbers and apply them to everyone and different people use numbers completely differently. I.e. on polls some people like me never give a score of 10/10, but always do 9/10 5/10 or 1/10.
What we currently do instead is make different severity lists. I.e. we have a ToS and a CoC list, where ToS is unambiguous bans, while CoC violations are often treated by communities differently. That way you can agree with a different communities CoC or ToS on a case by case basis instead of trying to agree on a global network to rate stuff using the same scale.
TL;DR I don't think numbers are a good way to rate things.
There was a problem hiding this comment.
I think the idea is that clients would present some kind of label mapping to these ratings, but it probably is something that should be standardized
There was a problem hiding this comment.
I understand the point. The problem we're trying to solve is that ToS and CoC are sometimes (not always!) too binary.
Here, a few typical cases are:
- "first strike" (lose 15 points on your CoC
m.opinion), "second strike" (lose 15 more points on your CoCm.opinion), etc. — and program Mjölnir to put users who are below -50m.opinionon a CoCm.ban; - program Mjölnir to combine the opinion of community A and community B — if they both believe that Marvin is a bad user (for instance they both have an opinion of -50 or worse on Marvin), then ban Marvin.
For these uses m.ban is the (almost) end result, but the m.opinion is an intermediate step to reach that decision.
There was a problem hiding this comment.
I do think being able to share your opinion on a user in a more nuanced manner is a reasonable goal. What I am struggling with is how to put a scale on those numbers.
If you want mjolnir to automatically ban when the opinion reaches -50 and you want to automatically ban when the opinion reaches -50 in 2 other communities, you can already reach something very similar to the latter by banning a user only if they are banned in A and B. However if a different community or application uses a different scale for bans and strikes (i.e. -1 per strike and ban on -3), how do you combine 2 of those lists? Since this MSC doesn't put any concrete moderation actions on a scale, you now have a lot of room for ambiguity and to add incompatibilities beween implementations. And it also doesn't solve how to distinguish a ban for "just being annoying" from a ban for something serious like sending CSAM. In the latter case the opinion might be -100, but there isn't really a way to tell in this MSC.
It also leaves a lot of room for new issues. If someone sent CSAM in a room, got -100, but you combine the score by averaging 3 policy lists, they wouldn't cross the -50 threshold. Otoh if you naively add the 3 rooms, they might get banned far too early, since a single strike applied by several communities might still add up to a ban. There is also no decay, so any account might be banned eventually, since the -30 score you racked up 5 years ago is still there and a later misunderstanding suddenly puts you over the threshold.
Basically what I want to say is, while numbers have more states than 2, without attaching concrete units to them, that have a proper meaning at certain intervals, they will need a lot of undocumented knowledge to be useful. Meanwhile this MSC doesn't really address aging an opinion. Usually an opinion isn't binary, but it also isn't onedimensional. I think providing an explicit strike system or temporary bans might be more useful than an overly generic mechanism, that clients will use in incompatible ways.
There was a problem hiding this comment.
You make entirely good points. I'll try to address them all.
If you want mjolnir to automatically ban when the opinion reaches -50 and you want to automatically ban when the opinion reaches -50 in 2 other communities, you can already reach something very similar to the latter by banning a user only if they are banned in A and B.
That is true. Now what about more nuanced policies such as "you can only start posting links and images if we have reached a state in which we believe that we can trust you"?
However if a different community or application uses a different scale for bans and strikes (i.e. -1 per strike and ban on -3), how do you combine 2 of those lists? Since this MSC doesn't put any concrete moderation actions on a scale, you now have a lot of room for ambiguity and to add incompatibilities beween implementations.
[...]
It also leaves a lot of room for new issues. If someone sent CSAM in a room, got -100, but you combine the score by averaging 3 policy lists, they wouldn't cross the -50 threshold. Otoh if you naively add the 3 rooms, they might get banned far too early, since a single strike applied by several communities might still add up to a ban.
I agree that, in many cases, neither adding nor averaging are good operators. My hope is to experiment with operators during prototyping. I have (just) started writing a few experimental patches for Mjölnir to implement opinions and I expect that this will inform a bit further evolutions of this MSC.
I suspect that, fairly soon, we'll end up needing a full range of operators and that will require an MSC on how to combine policy lists, something that is so far pretty much unspecified.
Basically what I want to say is, while numbers have more states than 2, without attaching concrete units to them, that have a proper meaning at certain intervals, they will need a lot of undocumented knowledge to be useful.
That makes entire sense. What units would you use?
Meanwhile this MSC doesn't really address aging an opinion. Usually an opinion isn't binary, but it also isn't onedimensional.
I agree that aging is something that needs to be solved, probably both for opinions and for bans. I hope that this can be done further down the line by a subsequent MSC, because I'd like to limit the scope of this MSC.
That being said, you get me thinking. One way to implement aging could be:
- add a
durationto policies (includingm.banpolicies) ; - instead of
opinionbeing an absolute number, make it an operation (for the time being+xor-x, possibly other operations down the line).
Would this make sense to you?
And it also doesn't solve how to distinguish a ban for "just being annoying" from a ban for something serious like sending CSAM. In the latter case the opinion might be -100, but there isn't really a way to tell in this MSC.
It also leaves a lot of room for new issues. If someone sent CSAM in a room, got -100, but you combine the score by averaging 3 policy lists, they wouldn't cross the -50 threshold. Otoh if you naively add the 3 rooms, they might get banned far too early, since a single strike applied by several communities might still add up to a ban. There is also no decay, so any account might be banned eventually, since the -30 score you racked up 5 years ago is still there and a later misunderstanding suddenly puts you over the threshold.
Your example on CSAM makes entire sense. I believe that the best way to do things would be to have both opinions and absolute bans (which could be issued from opinions in some cases, but not all). In this specific example, CSAM would deserve an immediate absolute ban in addition to a low opinion.
I think providing an explicit strike system or temporary bans might be more useful than an overly generic mechanism, that clients will use in incompatible ways.
I would really like to have temporary bans and I have a few ideas for them, but they're mostly orthogonal to this work.
There was a problem hiding this comment.
I think I will just have to experience with that for a bit. My current idea would be to instead allow others to label content posted in or by an entity and then allow my moderation bot to take that data, either put it to a vote by cross-checking multiple lists or assign a score based on those labels and then decide to ban based on that instead of some number. I.e. to allow describing a malicious entity in a manner that allows me to form an opinion instead of relying on a numeric opinion. I currently just don't see pure numbers work as well, but that might change if I have some implementation experience with a few different approaches to this.
Thank you for reading through my messy feedback. I think you have some good ideas to address it, but I really just need to just try it, I guess. But so far the ban process in my communities was "what was posted" -> "cross check if others can confirm or deny that" -> "ban or no ban" instead of giving opinions that can be put into numbers, so I have a hard time picturing that in practice.
|
|
||
| This MSC does not specify how to combine opinions from two trusted groups. If group A assigns | ||
| an opinion of -20 to Marvin and group B assigns an opinion of -10 to Marvin, does this mean | ||
| that Marvin should have a total opinion of -30? -15? |
There was a problem hiding this comment.
It's worth considering the potential drawbacks of a single, all-deciding reputation score. Instead, it may be more useful to allow clients to interpret reputation in a way that is most relevant to their needs.
For example, instead of a single score, you could consider categorizing users as "controversial" or "bad" based on the balance of positive and negative scores they receive. This approach might be particularly useful for mitigating direct message spam, as it would allow clients to identify users who are considered "generally bad" or "mostly good with a few haters" or "bad because one person has rated them so far" and take appropriate action if necessary.
| The community of Alicites may publish a positive opinion on users that they appreciate. If, | ||
| however, the activities of Alicites are illegal in some regions, authorities may decide to | ||
| use the opinions published by the Alicites to try and locate users friendly to Alicites. | ||
| Similarly, a malicious group of Marvinites may use the opinions published by the Alicites |
There was a problem hiding this comment.
Although this might not solve the issue of bullying, it's important to consider the potential for malicious actors to manipulate a reputation system when trying to have bad users snitch. It's important to discourage users from considering the reputation of "bad" users, as this could destabilize the reputation system.
Ignoring the reputation scores assigned by negatively scored users can help to prevent situations where a malicious user, such as Marvin in your example, can unjustifiably hurt the reputation of a positively scored user, such as Alice, by introducing a large number of bots that highly recommend her and then misbehave.
| or perhaps be muted for 15 minutes, or lose the ability to post links and images, etc. | ||
|
|
||
| To achieve this, we need two mechanisms: | ||
| - a mechanism to store, publish and share the *opinion* of a community (or a single user) on an entity; |
There was a problem hiding this comment.
It's important to consider how users will be able to determine the source of reputation scores and whether they reflect the views of a single user or a broader community. For example, Alice might have difficulty determining whether a reputation score is coming from Bob or a wider consensus, especially if Bob moderates many of the rooms that Alice is a part of.
Rendered