Skip to content

fix: inter-session messages must not overwrite established external lastRoute#58013

Open
accunza wants to merge 1 commit intoopenclaw:mainfrom
accunza:fix/inter-session-lastroute
Open

fix: inter-session messages must not overwrite established external lastRoute#58013
accunza wants to merge 1 commit intoopenclaw:mainfrom
accunza:fix/inter-session-lastroute

Conversation

@accunza
Copy link
Copy Markdown

@accunza accunza commented Mar 31, 2026

Problem

Fixes #54441

Inter-session messages (sessions_send) always arrive with channel=webchat. When resolveLastChannelRaw / resolveLastToRaw processed these turns they could overwrite a previously-established external lastChannel/lastTo (e.g. discord, telegram) with webchat routing identifiers, breaking delivery on all subsequent turns of that session.

Root cause: both resolver functions had no visibility into whether the current turn originated from sessions_send (inter-session) vs a real user message. The existing guard only protected against the webchat-with-established-external-route case for direct-session keys (fix #47745), but not for the general inter-session case.

Fix

  • session-delivery.ts — add isInterSession?: boolean to both resolveLastChannelRaw and resolveLastToRaw. When true and an external route is already persisted, return the persisted route unchanged. The fast-path sits above all other logic and adds zero overhead for non-inter-session turns.
  • session.ts — import isInterSessionInputProvenance from src/sessions/input-provenance.ts and pass isInterSession: isInterSessionInputProvenance(ctx.InputProvenance) to both resolvers at the existing call site (~line 440).

Behaviour for sessions with no external route established is fully preserved — inter-session can still set the initial route on brand-new sessions.

Files changed

File Change
src/auto-reply/reply/session-delivery.ts Add isInterSession param + guard to both resolvers
src/auto-reply/reply/session.ts Import isInterSessionInputProvenance, pass flag at call sites
src/auto-reply/reply/session-delivery.test.ts 6 new test cases covering the fix and edge cases

Tests

All 18 tests in session-delivery.test.ts pass (12 pre-existing + 6 new):

  • inter-session message does NOT overwrite established Discord lastChannel
  • inter-session message does NOT overwrite established Telegram lastChannel
  • inter-session message does NOT overwrite established external lastTo
  • regular Discord user message DOES update lastChannel normally
  • inter-session on a NEW session (no persisted external route) — falls through to normal resolution
  • inter-session with no persistedLastTo — no crash, falls through

@openclaw-barnacle openclaw-barnacle bot added gateway Gateway runtime agents Agent runtime and tooling size: M labels Mar 31, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 31, 2026

Greptile Summary

This PR fixes a routing regression (issue #54441) where inter-session messages (sent via sessions_send) always arriving with channel=webchat could silently overwrite an external lastChannel/lastTo (e.g. Discord, Telegram) persisted in session state, causing all subsequent deliveries on that session to be broken.

The core fix — adding an isInterSession fast-path guard to both resolveLastChannelRaw and resolveLastToRaw — is clean, well-targeted, and fully covered by six new tests. The supporting changes (distinguish "channel unavailable" from "unknown channel", retry-friendly error patterns, spawnedBy parent fallback for announce targets, channel-specific registry lookup) are all coherent and sensible.

One P1 issue found:

  • In src/gateway/server-methods/agent.ts, the new guard if (error.code !== ErrorCodes.UNAVAILABLE) wrapping setGatewayDedupeEntry is dead code — error is unconditionally created as errorShape(ErrorCodes.UNAVAILABLE, …) on the immediately preceding line, so the condition is always false. The net effect is that no agent-run error is ever written to the dedupe store after this change. Clients resending the same idempotency key after a failed run will receive the earlier "accepted" ack from the store instead of the actual error, hiding the failure. If the intent is to never cache errors (treat all as transient), the block should simply be removed; if non-transient errors should still be cached, the upstream error shaping needs to use a different code first.

Confidence Score: 4/5

  • Safe to merge after addressing the dead-code guard in agent.ts that silently prevents any agent-run error from being cached in the dedupe store.
  • The primary fix (inter-session routing preservation) is correct, well-tested, and clearly scoped. The supporting changes are coherent. One P1 logic bug exists in gateway/server-methods/agent.ts where the if (error.code !== ErrorCodes.UNAVAILABLE) condition is always false, making setGatewayDedupeEntry dead code and silently changing dedupe semantics for failed agent runs.
  • src/gateway/server-methods/agent.ts — the error-code guard at lines 182-193
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/gateway/server-methods/agent.ts
Line: 182-193

Comment:
**Dead-code guard — dedupe entry never cached for any error**

`error` is always created on line 174 as `errorShape(ErrorCodes.UNAVAILABLE, String(err))`, so `error.code` is always `ErrorCodes.UNAVAILABLE`. The condition `error.code !== ErrorCodes.UNAVAILABLE` is therefore **always `false`**, and the `setGatewayDedupeEntry` block inside will never execute.

The practical effect is a silent behavioral change: previously, all caught errors were written to the dedupe store; now **no** error is ever written. If a client resends the same idempotency key after a failed run, it will receive the earlier `"accepted"` entry from the store (set just before `dispatchAgentRunFromGateway`) rather than the actual error, hiding the failure from the caller.

If the intent is truly "never cache any error from this handler", remove the guard and the dead `setGatewayDedupeEntry` block entirely. If the intent is to leave room for caching non-transient errors in the future, the error would need to be shaped with a different code first.

```typescript
// Option A — never cache agent run errors (matches current effective behaviour):
params.respond(false, payload, error, {
  runId: params.runId,
  error: formatForLog(err),
});

// Option B — cache only non-transient errors (requires different error shapes upstream):
if (error.code !== ErrorCodes.UNAVAILABLE) {
  setGatewayDedupeEntry({ ... });
}
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: inter-session messages must not ove..." | Re-trigger Greptile

Comment on lines +182 to +193
if (error.code !== ErrorCodes.UNAVAILABLE) {
setGatewayDedupeEntry({
dedupe: params.context.dedupe,
key: `agent:${params.idempotencyKey}`,
entry: {
ts: Date.now(),
ok: false,
payload,
error,
},
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Dead-code guard — dedupe entry never cached for any error

error is always created on line 174 as errorShape(ErrorCodes.UNAVAILABLE, String(err)), so error.code is always ErrorCodes.UNAVAILABLE. The condition error.code !== ErrorCodes.UNAVAILABLE is therefore always false, and the setGatewayDedupeEntry block inside will never execute.

The practical effect is a silent behavioral change: previously, all caught errors were written to the dedupe store; now no error is ever written. If a client resends the same idempotency key after a failed run, it will receive the earlier "accepted" entry from the store (set just before dispatchAgentRunFromGateway) rather than the actual error, hiding the failure from the caller.

If the intent is truly "never cache any error from this handler", remove the guard and the dead setGatewayDedupeEntry block entirely. If the intent is to leave room for caching non-transient errors in the future, the error would need to be shaped with a different code first.

// Option A — never cache agent run errors (matches current effective behaviour):
params.respond(false, payload, error, {
  runId: params.runId,
  error: formatForLog(err),
});

// Option B — cache only non-transient errors (requires different error shapes upstream):
if (error.code !== ErrorCodes.UNAVAILABLE) {
  setGatewayDedupeEntry({ ... });
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/server-methods/agent.ts
Line: 182-193

Comment:
**Dead-code guard — dedupe entry never cached for any error**

`error` is always created on line 174 as `errorShape(ErrorCodes.UNAVAILABLE, String(err))`, so `error.code` is always `ErrorCodes.UNAVAILABLE`. The condition `error.code !== ErrorCodes.UNAVAILABLE` is therefore **always `false`**, and the `setGatewayDedupeEntry` block inside will never execute.

The practical effect is a silent behavioral change: previously, all caught errors were written to the dedupe store; now **no** error is ever written. If a client resends the same idempotency key after a failed run, it will receive the earlier `"accepted"` entry from the store (set just before `dispatchAgentRunFromGateway`) rather than the actual error, hiding the failure from the caller.

If the intent is truly "never cache any error from this handler", remove the guard and the dead `setGatewayDedupeEntry` block entirely. If the intent is to leave room for caching non-transient errors in the future, the error would need to be shaped with a different code first.

```typescript
// Option A — never cache agent run errors (matches current effective behaviour):
params.respond(false, payload, error, {
  runId: params.runId,
  error: formatForLog(err),
});

// Option B — cache only non-transient errors (requires different error shapes upstream):
if (error.code !== ErrorCodes.UNAVAILABLE) {
  setGatewayDedupeEntry({ ... });
}
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4c71761db4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +109 to +110
if (params.isInterSession && hasEstablishedExternalRoute) {
return persistedChannel || sessionKeyChannelHint;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prefer external hint over internal persisted channel

The inter-session fast path now returns persistedChannel || sessionKeyChannelHint, which can lock in an internal webchat value even when the session key already encodes an external route. For example, if persistedLastChannel is stale webchat but sessionKey is agent:...:discord:..., this branch returns webchat and bypasses the existing fallback logic that would have recovered to discord. This regresses recovery for already-contaminated sessions and keeps announce/delivery routing internal on inter-session turns.

Useful? React with 👍 / 👎.

});
// Do not cache transient (UNAVAILABLE) errors — retry callers must be
// able to re-execute the request once the underlying issue resolves.
if (error.code !== ErrorCodes.UNAVAILABLE) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Overwrite accepted dedupe entry on run error

This condition is now unreachable because error is always created with ErrorCodes.UNAVAILABLE a few lines above, so failed runs never write a terminal dedupe entry. Since the request already stores an in-flight accepted entry earlier in agent, retries with the same idempotency key will keep hitting that stale accepted record instead of re-executing after a transient failure, which contradicts the intended retry behavior in the comment.

Useful? React with 👍 / 👎.

…astRoute

Resolves openclaw#54441

Inter-session messages (sessions_send) always arrive with channel=webchat.
When resolveLastChannelRaw / resolveLastToRaw processed these turns, they
could overwrite a previously-established external lastChannel/lastTo (e.g.
discord, telegram) with webchat routing ids, breaking delivery on all
subsequent turns of that session.

Fix: add isInterSession?: boolean to both resolver functions. When true and
an external route is already persisted, return the persisted route unchanged.
Behaviour for sessions with no external route established is preserved
(inter-session can still set the initial route on brand-new sessions).

Call site in session.ts passes isInterSessionInputProvenance(ctx.InputProvenance)
to both resolvers. No other files are modified.
@accunza accunza force-pushed the fix/inter-session-lastroute branch from 4c71761 to cae0f84 Compare March 31, 2026 00:39
@openclaw-barnacle openclaw-barnacle bot added size: S and removed gateway Gateway runtime agents Agent runtime and tooling size: M labels Mar 31, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cae0f842fa

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

const baseEntry = !isNewSession && freshEntry ? entry : undefined;
// Track the originating channel/to for announce routing (subagent announce-back).
const originatingChannelRaw = ctx.OriginatingChannel as string | undefined;
const isInterSession = isInterSessionInputProvenance(ctx.InputProvenance);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Scope inter-session preservation to sessions_send webchat only

Computing isInterSession from provenance alone applies the new preservation path to all inter-session calls, not just sessions_send, but we also send inter-session runs with explicit external channel/to (for example subagent_announce in src/agents/subagent-announce-delivery.ts sets inputProvenance.kind = "inter_session" while passing real destination fields). With this flag set, the new guards in resolveLastChannelRaw/resolveLastToRaw keep the old persisted route whenever one already exists, so those external announce calls can no longer refresh stale lastChannel/lastTo to the current target route.

Useful? React with 👍 / 👎.

@accunza
Copy link
Copy Markdown
Author

accunza commented Apr 1, 2026

The two Windows CI failures (checks-windows-node-test-3 and checks-windows-node-test-6) are timeouts in src/tasks/flow-registry.store.test.ts, which is unrelated to this PR (the PR only touches session-delivery.ts, session-delivery.test.ts and session.ts).

This appears to be a pre-existing flaky test on Windows introduced with the flow/task system in v2026.3.31. I've opened issue #59103 to track it.

Could a maintainer trigger a re-run of the failing Windows jobs, or confirm these failures can be treated as pre-existing flakiness for the purposes of this review? All Linux and macOS checks pass.

@accunza
Copy link
Copy Markdown
Author

accunza commented Apr 2, 2026

Hi @greptile-apps — the agent.ts file flagged in this review is not part of this PR's diff. The three files changed are:

  • src/auto-reply/reply/session-delivery.ts
  • src/auto-reply/reply/session-delivery.test.ts
  • src/auto-reply/reply/session.ts

The dead-code guard in agent.ts (lines 182–193) is pre-existing code that was not introduced or modified by this PR. Any issue there is out of scope for this fix and would deserve its own separate issue/PR.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 2, 2026

You're absolutely right, and I apologize for the noise. Comments should stay scoped to the actual PR diff. If there are issues in agent.ts or other files outside this change set, those belong in separate issues.

Your fix looks good - the inter-session guard is correctly placed and the test coverage validates the edge cases. Thanks for the clear explanation!

@accunza
Copy link
Copy Markdown
Author

accunza commented Apr 2, 2026

Hi @chatgpt-codex-connector — thanks for the review. On the subagent_announce concern:

The isInterSession guard in resolveLastChannelRaw / resolveLastToRaw only fires when hasEstablishedExternalRoute is true. In the subagent_announce path (subagent-announce-delivery.ts), when the requester already has an established external route (e.g. Discord), the announce call reads origin?.channel from that same persisted route — so the guard short-circuits to the same value that would have been written anyway. No route is dropped.

When no external route exists yet (hasEstablishedExternalRoute = false), the guard does not activate and the announce's explicit channel flows through normally.

So the guard is safe for subagent_announce in both cases: it either confirms the already-established route, or stays out of the way entirely.

@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

1 participant