fix: inter-session messages must not overwrite established external lastRoute#58013
fix: inter-session messages must not overwrite established external lastRoute#58013accunza wants to merge 1 commit intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR fixes a routing regression (issue #54441) where inter-session messages (sent via The core fix — adding an One P1 issue found:
Confidence Score: 4/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/gateway/server-methods/agent.ts
Line: 182-193
Comment:
**Dead-code guard — dedupe entry never cached for any error**
`error` is always created on line 174 as `errorShape(ErrorCodes.UNAVAILABLE, String(err))`, so `error.code` is always `ErrorCodes.UNAVAILABLE`. The condition `error.code !== ErrorCodes.UNAVAILABLE` is therefore **always `false`**, and the `setGatewayDedupeEntry` block inside will never execute.
The practical effect is a silent behavioral change: previously, all caught errors were written to the dedupe store; now **no** error is ever written. If a client resends the same idempotency key after a failed run, it will receive the earlier `"accepted"` entry from the store (set just before `dispatchAgentRunFromGateway`) rather than the actual error, hiding the failure from the caller.
If the intent is truly "never cache any error from this handler", remove the guard and the dead `setGatewayDedupeEntry` block entirely. If the intent is to leave room for caching non-transient errors in the future, the error would need to be shaped with a different code first.
```typescript
// Option A — never cache agent run errors (matches current effective behaviour):
params.respond(false, payload, error, {
runId: params.runId,
error: formatForLog(err),
});
// Option B — cache only non-transient errors (requires different error shapes upstream):
if (error.code !== ErrorCodes.UNAVAILABLE) {
setGatewayDedupeEntry({ ... });
}
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix: inter-session messages must not ove..." | Re-trigger Greptile |
src/gateway/server-methods/agent.ts
Outdated
| if (error.code !== ErrorCodes.UNAVAILABLE) { | ||
| setGatewayDedupeEntry({ | ||
| dedupe: params.context.dedupe, | ||
| key: `agent:${params.idempotencyKey}`, | ||
| entry: { | ||
| ts: Date.now(), | ||
| ok: false, | ||
| payload, | ||
| error, | ||
| }, | ||
| }); | ||
| } |
There was a problem hiding this comment.
Dead-code guard — dedupe entry never cached for any error
error is always created on line 174 as errorShape(ErrorCodes.UNAVAILABLE, String(err)), so error.code is always ErrorCodes.UNAVAILABLE. The condition error.code !== ErrorCodes.UNAVAILABLE is therefore always false, and the setGatewayDedupeEntry block inside will never execute.
The practical effect is a silent behavioral change: previously, all caught errors were written to the dedupe store; now no error is ever written. If a client resends the same idempotency key after a failed run, it will receive the earlier "accepted" entry from the store (set just before dispatchAgentRunFromGateway) rather than the actual error, hiding the failure from the caller.
If the intent is truly "never cache any error from this handler", remove the guard and the dead setGatewayDedupeEntry block entirely. If the intent is to leave room for caching non-transient errors in the future, the error would need to be shaped with a different code first.
// Option A — never cache agent run errors (matches current effective behaviour):
params.respond(false, payload, error, {
runId: params.runId,
error: formatForLog(err),
});
// Option B — cache only non-transient errors (requires different error shapes upstream):
if (error.code !== ErrorCodes.UNAVAILABLE) {
setGatewayDedupeEntry({ ... });
}Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/server-methods/agent.ts
Line: 182-193
Comment:
**Dead-code guard — dedupe entry never cached for any error**
`error` is always created on line 174 as `errorShape(ErrorCodes.UNAVAILABLE, String(err))`, so `error.code` is always `ErrorCodes.UNAVAILABLE`. The condition `error.code !== ErrorCodes.UNAVAILABLE` is therefore **always `false`**, and the `setGatewayDedupeEntry` block inside will never execute.
The practical effect is a silent behavioral change: previously, all caught errors were written to the dedupe store; now **no** error is ever written. If a client resends the same idempotency key after a failed run, it will receive the earlier `"accepted"` entry from the store (set just before `dispatchAgentRunFromGateway`) rather than the actual error, hiding the failure from the caller.
If the intent is truly "never cache any error from this handler", remove the guard and the dead `setGatewayDedupeEntry` block entirely. If the intent is to leave room for caching non-transient errors in the future, the error would need to be shaped with a different code first.
```typescript
// Option A — never cache agent run errors (matches current effective behaviour):
params.respond(false, payload, error, {
runId: params.runId,
error: formatForLog(err),
});
// Option B — cache only non-transient errors (requires different error shapes upstream):
if (error.code !== ErrorCodes.UNAVAILABLE) {
setGatewayDedupeEntry({ ... });
}
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4c71761db4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (params.isInterSession && hasEstablishedExternalRoute) { | ||
| return persistedChannel || sessionKeyChannelHint; |
There was a problem hiding this comment.
Prefer external hint over internal persisted channel
The inter-session fast path now returns persistedChannel || sessionKeyChannelHint, which can lock in an internal webchat value even when the session key already encodes an external route. For example, if persistedLastChannel is stale webchat but sessionKey is agent:...:discord:..., this branch returns webchat and bypasses the existing fallback logic that would have recovered to discord. This regresses recovery for already-contaminated sessions and keeps announce/delivery routing internal on inter-session turns.
Useful? React with 👍 / 👎.
src/gateway/server-methods/agent.ts
Outdated
| }); | ||
| // Do not cache transient (UNAVAILABLE) errors — retry callers must be | ||
| // able to re-execute the request once the underlying issue resolves. | ||
| if (error.code !== ErrorCodes.UNAVAILABLE) { |
There was a problem hiding this comment.
Overwrite accepted dedupe entry on run error
This condition is now unreachable because error is always created with ErrorCodes.UNAVAILABLE a few lines above, so failed runs never write a terminal dedupe entry. Since the request already stores an in-flight accepted entry earlier in agent, retries with the same idempotency key will keep hitting that stale accepted record instead of re-executing after a transient failure, which contradicts the intended retry behavior in the comment.
Useful? React with 👍 / 👎.
…astRoute Resolves openclaw#54441 Inter-session messages (sessions_send) always arrive with channel=webchat. When resolveLastChannelRaw / resolveLastToRaw processed these turns, they could overwrite a previously-established external lastChannel/lastTo (e.g. discord, telegram) with webchat routing ids, breaking delivery on all subsequent turns of that session. Fix: add isInterSession?: boolean to both resolver functions. When true and an external route is already persisted, return the persisted route unchanged. Behaviour for sessions with no external route established is preserved (inter-session can still set the initial route on brand-new sessions). Call site in session.ts passes isInterSessionInputProvenance(ctx.InputProvenance) to both resolvers. No other files are modified.
4c71761 to
cae0f84
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cae0f842fa
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const baseEntry = !isNewSession && freshEntry ? entry : undefined; | ||
| // Track the originating channel/to for announce routing (subagent announce-back). | ||
| const originatingChannelRaw = ctx.OriginatingChannel as string | undefined; | ||
| const isInterSession = isInterSessionInputProvenance(ctx.InputProvenance); |
There was a problem hiding this comment.
Scope inter-session preservation to sessions_send webchat only
Computing isInterSession from provenance alone applies the new preservation path to all inter-session calls, not just sessions_send, but we also send inter-session runs with explicit external channel/to (for example subagent_announce in src/agents/subagent-announce-delivery.ts sets inputProvenance.kind = "inter_session" while passing real destination fields). With this flag set, the new guards in resolveLastChannelRaw/resolveLastToRaw keep the old persisted route whenever one already exists, so those external announce calls can no longer refresh stale lastChannel/lastTo to the current target route.
Useful? React with 👍 / 👎.
|
The two Windows CI failures ( This appears to be a pre-existing flaky test on Windows introduced with the flow/task system in v2026.3.31. I've opened issue #59103 to track it. Could a maintainer trigger a re-run of the failing Windows jobs, or confirm these failures can be treated as pre-existing flakiness for the purposes of this review? All Linux and macOS checks pass. |
|
Hi @greptile-apps — the
The dead-code guard in |
|
You're absolutely right, and I apologize for the noise. Comments should stay scoped to the actual PR diff. If there are issues in Your fix looks good - the inter-session guard is correctly placed and the test coverage validates the edge cases. Thanks for the clear explanation! |
|
Hi @chatgpt-codex-connector — thanks for the review. On the The When no external route exists yet ( So the guard is safe for |
|
To use Codex here, create a Codex account and connect to github. |
Problem
Fixes #54441
Inter-session messages (
sessions_send) always arrive withchannel=webchat. WhenresolveLastChannelRaw/resolveLastToRawprocessed these turns they could overwrite a previously-established externallastChannel/lastTo(e.g.discord,telegram) with webchat routing identifiers, breaking delivery on all subsequent turns of that session.Root cause: both resolver functions had no visibility into whether the current turn originated from
sessions_send(inter-session) vs a real user message. The existing guard only protected against the webchat-with-established-external-route case for direct-session keys (fix #47745), but not for the general inter-session case.Fix
session-delivery.ts— addisInterSession?: booleanto bothresolveLastChannelRawandresolveLastToRaw. Whentrueand an external route is already persisted, return the persisted route unchanged. The fast-path sits above all other logic and adds zero overhead for non-inter-session turns.session.ts— importisInterSessionInputProvenancefromsrc/sessions/input-provenance.tsand passisInterSession: isInterSessionInputProvenance(ctx.InputProvenance)to both resolvers at the existing call site (~line 440).Behaviour for sessions with no external route established is fully preserved — inter-session can still set the initial route on brand-new sessions.
Files changed
src/auto-reply/reply/session-delivery.tsisInterSessionparam + guard to both resolverssrc/auto-reply/reply/session.tsisInterSessionInputProvenance, pass flag at call sitessrc/auto-reply/reply/session-delivery.test.tsTests
All 18 tests in
session-delivery.test.tspass (12 pre-existing + 6 new):lastChannellastChannellastTolastChannelnormallypersistedLastTo— no crash, falls through