Skip to content

MSC4334: Add m.room.language state event.#4334

Open
dragonfly1033 wants to merge 9 commits intomatrix-org:mainfrom
dragonfly1033:main
Open

MSC4334: Add m.room.language state event.#4334
dragonfly1033 wants to merge 9 commits intomatrix-org:mainfrom
dragonfly1033:main

Conversation

@dragonfly1033
Copy link
Copy Markdown

@dragonfly1033 dragonfly1033 commented Aug 29, 2025

Rendered

This proposal adds a m.room.language state event.

This MSC proposes the addition of a new `m.room.language` state event to allow users to set language preferences for each room, enhancing features like message search and accessibility.
Update the title and content of the MSC to introduce per-room language support with an example event.
@dragonfly1033 dragonfly1033 marked this pull request as draft August 29, 2025 11:10
Corrected a typo in the accessibility features section.
@dragonfly1033 dragonfly1033 changed the title MSCxxxx: Add m.room.language state event. MSC4334: Add m.room.language state event. Aug 29, 2025
@dragonfly1033 dragonfly1033 marked this pull request as ready for review August 29, 2025 11:20
@tulir tulir added proposal A matrix spec change proposal client-server Client-Server API kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels Aug 29, 2025
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation requirements:

  • Client that can set the language
  • Client that uses the language

## Alternatives

Rather than adding a new state event, this could be a client setting. However, the drawbacks of
this are that each member of a room would have to ensure they have the right language setting.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could messages just specify what language they're in? Rooms could be multilingual

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that the overhead would be wasted most of the time as most rooms will have a primary language. Instead we could have a room setting with optional per-message override.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand if the room language is changed, it becomes more complicated to determine the language of an older message. Arguably the overhead is not very significant for this information.

However, it would be user-friendly to have the client know which language is preferred in a room instead of having to always choose when joining a new one, so the state event would be useful for that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree that both are useful to be honest. A language state event would be useful as part of a room directory that shows you what language a room is in, and may even go as far as warning you before you join if your language set does not match.

Per message language keys is useful for services that might want to hook in and translate messages inside your client.

dragonfly1033 pushed a commit to dragonfly1033/ruma that referenced this pull request Aug 29, 2025
```json
{
"content": {
"language": "en-US"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the topic of multilingual rooms, shouldn't this be a list of languages?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, my suggestion is for a room to have a primary language with per-message override.

The problem with having a list of languages for a room is that anyone could send a message in a language that isn't in the list 🙂

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur it would probably be better having a list of expected languages, you could even up with a perfectly reasonable ["en-GB", "en-US"] for some rooms :)

I don't think this state needs to be anything more than informational to be honest. Users could easily send messages in any language and there is nothing to stop them from doing so, I don't think we really need to moderate that behaviour. The state key is mostly useful at giving joining users a reasonable idea of what language they might expect, so they know whether it is worthwhile participating.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the problem would be that, say you are implementing search indexing, you can't tokenize a french message in a mandarin room (unless the french message has its own overriding state).

Another issue may occur when trying to find the language for an un-annotated message. This would imply that it is one of the languages in the list but if your list is ["fr", "zh"] then how do you know which tokenizer to use?

Perhaps I'm missing something, what would be the advantage of having multiple "primary" languages for a room?

Copy link
Copy Markdown

@kepstin kepstin Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm missing something, what would be the advantage of having multiple "primary" languages for a room?

A couple examples that come to mind:

  • A room dedicated for language learning/practice, which might want to list both the language of instruction and the language being taught.
  • A room for interaction within a multilingual community which is small enough that they didn't feel any need to split the community between multiple single-language rooms.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity: I'm in support of a list (maybe even an ordered list for preference? Hm. That's really just an idle thought.) of expected languages for the room.

Well, the problem would be that, say you are implementing search indexing, you can't tokenize a french message in a mandarin room (unless the french message has its own overriding state).

Suppose you had no information to go off at all (pre-this MSC), you can use language detection classifiers to help you decide. I've used https://lib.rs/crates/lingua in the past for a search project.

The problem is: classifiers are prone to make mistakes, especially on short messages (chat messages are).

If you have a list of languages for the room, we can either constrain the language detector to that list, or at least use it as a pretty big hint to bias the probabilities.

In your case of French and Mandarin, a language detector can do pretty well, since the alphabets are entirely different.
But in the case of say Mandarin and Japanese, they both use some of the same characters (Japanese took and extended Chinese's writing system, more or less), so it's relatively harder to distinguish those two. Yet, if you know your room only contains (either absolutely, or 'likely') French and Mandarin, that resolves one ambiguity.
(Hope that makes sense)

Comment thread proposals/4334-room-language.md Outdated

Accessibility features such as screen readers would also benefit from knowing what language they
are reading.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other possible use case: clients can change the language of the spell checker when users are composing messages between rooms.

Comment thread proposals/4334-room-language.md Outdated
zecakeh pushed a commit to ruma/ruma that referenced this pull request Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client-server Client-Server API kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants