MSC4334: Add m.room.language state event.#4334
MSC4334: Add m.room.language state event.#4334dragonfly1033 wants to merge 9 commits intomatrix-org:mainfrom
m.room.language state event.#4334Conversation
This MSC proposes the addition of a new `m.room.language` state event to allow users to set language preferences for each room, enhancing features like message search and accessibility.
Update the title and content of the MSC to introduce per-room language support with an example event.
Corrected a typo in the accessibility features section.
m.room.language state event.m.room.language state event.
There was a problem hiding this comment.
Implementation requirements:
- Client that can set the language
- Client that uses the language
| ## Alternatives | ||
|
|
||
| Rather than adding a new state event, this could be a client setting. However, the drawbacks of | ||
| this are that each member of a room would have to ensure they have the right language setting. |
There was a problem hiding this comment.
Could messages just specify what language they're in? Rooms could be multilingual
There was a problem hiding this comment.
I would suggest that the overhead would be wasted most of the time as most rooms will have a primary language. Instead we could have a room setting with optional per-message override.
There was a problem hiding this comment.
On the other hand if the room language is changed, it becomes more complicated to determine the language of an older message. Arguably the overhead is not very significant for this information.
However, it would be user-friendly to have the client know which language is preferred in a room instead of having to always choose when joining a new one, so the state event would be useful for that.
There was a problem hiding this comment.
I tend to agree that both are useful to be honest. A language state event would be useful as part of a room directory that shows you what language a room is in, and may even go as far as warning you before you join if your language set does not match.
Per message language keys is useful for services that might want to hook in and translate messages inside your client.
According to unstable [MSC4334](matrix-org/matrix-spec-proposals#4334).
| ```json | ||
| { | ||
| "content": { | ||
| "language": "en-US" |
There was a problem hiding this comment.
In the topic of multilingual rooms, shouldn't this be a list of languages?
There was a problem hiding this comment.
As mentioned above, my suggestion is for a room to have a primary language with per-message override.
The problem with having a list of languages for a room is that anyone could send a message in a language that isn't in the list 🙂
There was a problem hiding this comment.
I concur it would probably be better having a list of expected languages, you could even up with a perfectly reasonable ["en-GB", "en-US"] for some rooms :)
I don't think this state needs to be anything more than informational to be honest. Users could easily send messages in any language and there is nothing to stop them from doing so, I don't think we really need to moderate that behaviour. The state key is mostly useful at giving joining users a reasonable idea of what language they might expect, so they know whether it is worthwhile participating.
There was a problem hiding this comment.
Well, the problem would be that, say you are implementing search indexing, you can't tokenize a french message in a mandarin room (unless the french message has its own overriding state).
Another issue may occur when trying to find the language for an un-annotated message. This would imply that it is one of the languages in the list but if your list is ["fr", "zh"] then how do you know which tokenizer to use?
Perhaps I'm missing something, what would be the advantage of having multiple "primary" languages for a room?
There was a problem hiding this comment.
Perhaps I'm missing something, what would be the advantage of having multiple "primary" languages for a room?
A couple examples that come to mind:
- A room dedicated for language learning/practice, which might want to list both the language of instruction and the language being taught.
- A room for interaction within a multilingual community which is small enough that they didn't feel any need to split the community between multiple single-language rooms.
There was a problem hiding this comment.
For clarity: I'm in support of a list (maybe even an ordered list for preference? Hm. That's really just an idle thought.) of expected languages for the room.
Well, the problem would be that, say you are implementing search indexing, you can't tokenize a french message in a mandarin room (unless the french message has its own overriding state).
Suppose you had no information to go off at all (pre-this MSC), you can use language detection classifiers to help you decide. I've used https://lib.rs/crates/lingua in the past for a search project.
The problem is: classifiers are prone to make mistakes, especially on short messages (chat messages are).
If you have a list of languages for the room, we can either constrain the language detector to that list, or at least use it as a pretty big hint to bias the probabilities.
In your case of French and Mandarin, a language detector can do pretty well, since the alphabets are entirely different.
But in the case of say Mandarin and Japanese, they both use some of the same characters (Japanese took and extended Chinese's writing system, more or less), so it's relatively harder to distinguish those two. Yet, if you know your room only contains (either absolutely, or 'likely') French and Mandarin, that resolves one ambiguity.
(Hope that makes sense)
|
|
||
| Accessibility features such as screen readers would also benefit from knowing what language they | ||
| are reading. | ||
|
|
There was a problem hiding this comment.
Other possible use case: clients can change the language of the spell checker when users are composing messages between rooms.
According to unstable [MSC4334](matrix-org/matrix-spec-proposals#4334).
Rendered
This proposal adds a
m.room.languagestate event.