fix(matrix): only use 2-member DM fallback when dm refresh fails by private-peter · Pull Request #54890 · openclaw/openclaw

private-peter · 2026-03-26T04:32:09Z

Issue #54772 reports that two separate 2-person Matrix rooms with the same participants can both get routed as DMs. The regression comes from treating strict 2-member membership as a general fallback even when the DM cache is available and says the room is not a DM.

Move the catch from refreshDmCache() into isDirectMessage(). When the refresh succeeds, keep the existing client.dms.isDm(roomId) && isStrictDirectMembership(...) gate so non-DM 2-person rooms stay grouped. Only when the refresh fails do we fall back to the exact 2-member heuristic and keep the warning log.

Summary

Describe the problem and fix in 2–5 bullets:

Problem:
Why it matters:
What changed:
What did NOT change (scope boundary):

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes [Bug]: bug(matrix): 2-person rooms incorrectly classified as DMs, causing message routing confusion #54772
Related fix(matrix): remove unconditional 2-member DM fallback classification #54787
This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write N/A. If the cause is unclear, write Unknown.

Root cause: 94693f7#r180612893 added back blind checks for members.length == 2 so it assumes that any room with only 2 members is a DM. All DMs get collapsed into a single session. Openclaw can't keep context separate or figure out which room to reply to.
Missing detection / guardrail:
Prior context (git blame, prior PR, issue, or refactor if known): issue was originally fixed here https://github.com/openclaw/openclaw/pull/31023/files#r2991791467
Why this regressed now: 94693f7#r180612893
If unknown, what was ruled out:

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should have caught this. Otherwise write N/A.

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file:
Scenario the test should lock in:
Why this is the smallest reliable guardrail:
Existing test that already covers this (if any):
If no new test is added, why not:

User-visible / Behavior Changes

List user-visible changes (including defaults/config).
If none, write None.

Users with multiple matrix rooms will no longer have their conversations duplicated or mis-routed.

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) No
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS:
Runtime/container:
Model/provider:
Integration/channel (if any):
Relevant config (redacted):

Steps to reproduce

Create two Matrix rooms each with exactly 2 members (bot + same user)
Mark only one as a DM in your Matrix client
Send a message in the non-DM room → OpenClaw routes as if it were a DM

Expected behavior

A 2-person room should only be treated as a DM if it is explicitly marked as one (via m.direct or is_direct member state). Otherwise it should be handled as a regular group room.

Actual behavior

This causes replies to be routed to both rooms simultaneously, producing duplicate or out-of-context messages.

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios: Verified that with this change that rooms (not DM) get their own session)
Edge cases checked:
What you did not verify: State where you have no 1:1 / DM matrix channels

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes/No)
Config/env changes? (Yes/No)
Migration needed? (Yes/No)
If yes, exact upgrade steps:

Failure Recovery (if this breaks)

How to disable/revert this change quickly:
Files/config to restore:
Known bad symptoms reviewers should watch for:

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

Risk:
- Mitigation:

greptile-apps · 2026-03-26T04:34:40Z

Greptile Summary

This PR fixes a regression (linked to issue #54772) where any 2-member Matrix room was being classified as a DM, causing separate rooms with the same two participants to collapse into a single session. The fix moves the try/catch from refreshDmCache() into isDirectMessage() so the strict 2-member heuristic is only used as a last resort when the DM cache refresh actually fails — not as a blanket fallback on every call.

Key changes:

refreshDmCache() now propagates its error rather than silently catching it; isDirectMessage() catches it and sets a refreshFailed flag.
When the refresh succeeds, the code gates on client.dms.isDm(roomId) && strictDirectMembership, keeping non-DM 2-person rooms from being misrouted.
When the refresh fails, the fallback to the exact 2-member heuristic is preserved with its warning log.
Tests are updated to reflect the success/failure distinction and now assert on getJoinedRoomMembers calls.

One issue found:

lastDmUpdateMs is only written when client.dms.update() succeeds (it was moved to after the await). In the original code it was set before the try/catch, so a failed refresh still suppressed retries for 30 s. Now, every isDirectMessage call in a persistent-failure scenario will attempt a fresh network call to dms.update(), bypassing the TTL rate-limiter entirely. This could cause a request flood against a failing backend. A one-line fix (set lastDmUpdateMs = now before the await) restores the original behaviour while keeping the error propagation the PR needs.

Confidence Score: 3/5

The core bug fix is correct and well-tested, but there is one targeted fix needed before merge: restoring TTL rate-limiting for failed DM cache refreshes to avoid a request flood under persistent failure conditions.

The logic change correctly solves the reported issue. The only concrete problem is the unintended removal of the 30-second retry gate when dms.update() fails, which could cause noisy logs and repeated failed network calls in an already-degraded state. This is a real reliability concern in a high-message-volume room but is a one-line fix away from being clean.

extensions/matrix/src/matrix/monitor/direct.ts — specifically refreshDmCache() where lastDmUpdateMs must be set before the await to preserve failure-path rate-limiting.

Important Files Changed

Filename	Overview
extensions/matrix/src/matrix/monitor/direct.ts	Core logic change correctly restricts the 2-member DM fallback to failure cases only, fixing the routing bug. One P1 issue: `lastDmUpdateMs` is no longer updated on refresh failure, silently removing the 30-second retry rate-limit for the error path.
extensions/matrix/src/matrix/monitor/direct.test.ts	Tests are well-restructured to cover the new success/failure distinction. The TTL test does not exercise `dms.update()` failure, so the missing rate-limiting behaviour noted in direct.ts would not be caught by these tests.

Comments Outside Diff (1)

extensions/matrix/src/matrix/monitor/direct.ts, line 45-52 (link)

TTL rate-limiting lost on refresh failure

lastDmUpdateMs is now only updated when client.dms.update() succeeds. When the refresh fails persistently, lastDmUpdateMs stays at 0 (or its previous stale value), which means now - lastDmUpdateMs always exceeds DM_CACHE_TTL_MS. Every subsequent call to isDirectMessage will attempt a new client.dms.update() call rather than being gated by the 30-second TTL.

The original code set lastDmUpdateMs = now before the try/catch so that even a failed refresh would suppress retries for 30 s. That rate-limiting is now gone for the failure path.

If dms.update() is failing persistently — e.g. a network issue — a high-volume room will hammer the backend on every single message received.

This preserves the TTL rate-limiting regardless of outcome while still propagating the error to isDirectMessage for the fallback path.

Prompt To Fix With AI

This is a comment left during a code review.
Path: extensions/matrix/src/matrix/monitor/direct.ts
Line: 45-52

Comment:
**TTL rate-limiting lost on refresh failure**

`lastDmUpdateMs` is now only updated when `client.dms.update()` succeeds. When the refresh fails persistently, `lastDmUpdateMs` stays at `0` (or its previous stale value), which means `now - lastDmUpdateMs` always exceeds `DM_CACHE_TTL_MS`. Every subsequent call to `isDirectMessage` will attempt a new `client.dms.update()` call rather than being gated by the 30-second TTL.

The original code set `lastDmUpdateMs = now` **before** the try/catch so that even a failed refresh would suppress retries for 30 s. That rate-limiting is now gone for the failure path.

If `dms.update()` is failing persistently — e.g. a network issue — a high-volume room will hammer the backend on every single message received.



This preserves the TTL rate-limiting regardless of outcome while still propagating the error to `isDirectMessage` for the fallback path.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: extensions/matrix/src/matrix/monitor/direct.ts
Line: 45-52

Comment:
**TTL rate-limiting lost on refresh failure**

`lastDmUpdateMs` is now only updated when `client.dms.update()` succeeds. When the refresh fails persistently, `lastDmUpdateMs` stays at `0` (or its previous stale value), which means `now - lastDmUpdateMs` always exceeds `DM_CACHE_TTL_MS`. Every subsequent call to `isDirectMessage` will attempt a new `client.dms.update()` call rather than being gated by the 30-second TTL.

The original code set `lastDmUpdateMs = now` **before** the try/catch so that even a failed refresh would suppress retries for 30 s. That rate-limiting is now gone for the failure path.

If `dms.update()` is failing persistently — e.g. a network issue — a high-volume room will hammer the backend on every single message received.

```suggestion
  const refreshDmCache = async (): Promise<void> => {
    const now = Date.now();
    if (now - lastDmUpdateMs < DM_CACHE_TTL_MS) {
      return;
    }
    lastDmUpdateMs = now;
    await client.dms.update();
  };
```

This preserves the TTL rate-limiting regardless of outcome while still propagating the error to `isDirectMessage` for the fallback path.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(matrix): only use 2-member DM fallba..." | Re-trigger Greptile}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 297852c713

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

extensions/matrix/src/matrix/monitor/direct.ts

private-peter · 2026-03-26T04:42:07Z

lastDmUpdateMs is only written when client.dms.update() succeeds (it was moved to after the await). In the original code it was set before the try/catch, so a failed refresh still suppressed retries for 30 s. Now, every isDirectMessage call in a persistent-failure scenario will attempt a fresh network call to dms.update(), bypassing the TTL rate-limiter entirely. This could cause a request flood against a failing backend. A one-line fix (set lastDmUpdateMs = now before the await) restores the original behaviour while keeping the error propagation the PR needs.

Fixed in 297852c. It will now set the variable so duplicate requests will only cause 1 cache update.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c5e220a014

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

extensions/matrix/src/matrix/monitor/direct.ts

Issue openclaw#54772 reports that two separate 2-person Matrix rooms with the same participants can both get routed as DMs. The regression comes from treating strict 2-member membership as a general fallback even when the DM cache is available and says the room is not a DM. Move the catch from refreshDmCache() into isDirectMessage(). When the refresh succeeds, keep the existing client.dms.isDm(roomId) && isStrictDirectMembership(...) gate so non-DM 2-person rooms stay grouped. Only when the refresh fails do we fall back to the exact 2-member heuristic and keep the warning log.

gumadeiras · 2026-03-27T23:08:44Z

Merged via squash.

Prepared head SHA: e32d220ef0cbfdca09353d2e9b121a355851a1a0
Merge commit: 0558f2470dd5501f2d64547b804eb70320922218

Thanks @private-peter!

@gumadeiras

…nclaw#54890) Merged via squash. Prepared head SHA: e32d220 Co-authored-by: private-peter <[email protected]> Co-authored-by: gumadeiras <[email protected]> Reviewed-by: @gumadeiras

@gumadeiras

…nclaw#54890) Merged via squash. Prepared head SHA: e32d220 Co-authored-by: private-peter <[email protected]> Co-authored-by: gumadeiras <[email protected]> Reviewed-by: @gumadeiras

@gumadeiras

…nclaw#54890) Merged via squash. Prepared head SHA: e32d220 Co-authored-by: private-peter <[email protected]> Co-authored-by: gumadeiras <[email protected]> Reviewed-by: @gumadeiras

@gumadeiras

…nclaw#54890) Merged via squash. Prepared head SHA: e32d220 Co-authored-by: private-peter <[email protected]> Co-authored-by: gumadeiras <[email protected]> Reviewed-by: @gumadeiras

openclaw-barnacle bot added channel: matrix Channel integration: matrix size: S labels Mar 26, 2026

private-peter force-pushed the matrix-dm-bug branch from f2eaab1 to 297852c Compare March 26, 2026 04:39

chatgpt-codex-connector bot reviewed Mar 26, 2026

View reviewed changes

extensions/matrix/src/matrix/monitor/direct.ts Outdated Show resolved Hide resolved

gumadeiras self-assigned this Mar 27, 2026

gumadeiras force-pushed the matrix-dm-bug branch from 297852c to c5e220a Compare March 27, 2026 22:53

openclaw-barnacle bot added size: M and removed size: S labels Mar 27, 2026

chatgpt-codex-connector bot reviewed Mar 27, 2026

View reviewed changes

extensions/matrix/src/matrix/monitor/direct.ts Outdated Show resolved Hide resolved

extensions/matrix/src/matrix/monitor/direct.ts Outdated Show resolved Hide resolved

gumadeiras force-pushed the matrix-dm-bug branch from c5e220a to cab689f Compare March 27, 2026 23:03

private-peter and others added 2 commits March 27, 2026 19:07

Matrix: restore explicit DM detection

e32d220

gumadeiras force-pushed the matrix-dm-bug branch from cab689f to e32d220 Compare March 27, 2026 23:08

gumadeiras merged commit 0558f24 into openclaw:main Mar 27, 2026
7 checks passed

This was referenced Mar 27, 2026

fix(matrix): remove unconditional 2-member DM fallback classification #54787

Closed

fix(matrix): align outbound direct-room selection #56076

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(matrix): only use 2-member DM fallback when dm refresh fails#54890

fix(matrix): only use 2-member DM fallback when dm refresh fails#54890
gumadeiras merged 2 commits intoopenclaw:mainfrom
private-peter:matrix-dm-bug

private-peter commented Mar 26, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 26, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (1)

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

private-peter commented Mar 26, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gumadeiras commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

private-peter commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause / Regression History (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps to reproduce

Expected behavior

Actual behavior

Human Verification (required)

Review Conversations

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Uh oh!

greptile-apps bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

private-peter commented Mar 26, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gumadeiras commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

private-peter commented Mar 26, 2026 •

edited

Loading

greptile-apps bot commented Mar 26, 2026 •

edited

Loading