Skip to content

fix(llm): defer Anthropic stream start event until after message_start#90697

Merged
clawsweeper[bot] merged 2 commits into
openclaw:mainfrom
openperf:fix/90667-thinking-recovery-start-before-error
Jun 6, 2026
Merged

fix(llm): defer Anthropic stream start event until after message_start#90697
clawsweeper[bot] merged 2 commits into
openclaw:mainfrom
openperf:fix/90667-thinking-recovery-start-before-error

Conversation

@openperf
Copy link
Copy Markdown
Member

@openperf openperf commented Jun 5, 2026

Summary

  • Problem: Extended thinking sessions become permanently broken after a gateway restart or Anthropic prompt-cache TTL expiry (5 min). Every turn after the first failure also fails; once broken, the session never recovers on its own.
  • Root Cause: Both Anthropic stream implementations pushed {type:'start'} before entering the SSE event loop. In pumpStreamWithRecovery (thinking.ts), yieldedOutput is set true for every non-error event — once true, the thinking-block recovery retry is permanently disabled for that request. Anthropic sends thinking-signature validation errors as SSE event: error before message_start (pre-generation request validation). With the old placement, the start event set yieldedOutput=true before the SSE error arrived; recovery was silently skipped; the stale signatures stayed in the transcript; every subsequent turn also failed.
  • Fix: Defer stream.push({type:'start'}) into the message_start handler in both stream implementations so a pre-stream SSE error always arrives while yieldedOutput=false, letting pumpStreamWithRecovery strip all thinking blocks and retry cleanly.
  • What changed:
    • src/llm/providers/anthropic.ts — move stream.push({type:'start'}) from before the SSE loop into the message_start event handler body (provider stream path).
    • src/agents/anthropic-transport-stream.ts — same deferral for the embedded-agent default path (resolveEmbeddedAgentStreamFncreateBoundaryAwareStreamFnForModelcreateAnthropicMessagesTransportStreamFn).
    • src/llm/providers/anthropic.test.ts — two new tests: verify start is deferred until after message_start in normal flow; verify a pre-stream SSE error arrives with no preceding start.
    • src/agents/anthropic-transport-stream.test.ts — same two ordering tests for the transport path.
  • What did NOT change (scope boundary):
    • pumpStreamWithRecovery, wrapAnthropicStreamWithRecovery, stripAllThinkingBlocks — unchanged; the fix only corrects the provider-level event ordering.
    • Config surface unchanged. Plugin surface unchanged. Gateway protocol unchanged.

Reproduction

  1. Enable extended thinking on an Anthropic model and run a multi-turn session.
  2. Wait 5+ minutes (Anthropic prompt-cache TTL) or restart the gateway so the cached thinking block signatures expire.
  3. Send the next message.
  4. Before this PR: Anthropic sends SSE event: error "Invalid signature in thinking block" before message_start; {type:'start'} was already emitted before the SSE loop; yieldedOutput=true; pumpStreamWithRecovery skips retry; the error propagates; all subsequent turns also fail permanently.
  5. After this PR: {type:'start'} is deferred until inside the message_start handler in both stream implementations; yieldedOutput=false when the pre-stream SSE error arrives; pumpStreamWithRecovery detects the thinking-signature pattern, strips all thinking blocks, and retries; session continues normally.

Real behavior proof

Behavior addressed (#90667): extended thinking sessions permanently broken after gateway restart or Anthropic prompt-cache TTL expiry — every turn after the first signature error fails because yieldedOutput=true silently blocked the recovery retry in both the provider and transport stream paths.

Real environment tested (Linux, Node 22 — Vitest against the production streamAnthropic SSE event loop, pumpStreamWithRecovery recovery harness, and createAnthropicMessagesTransportStreamFn transport stream): src/llm/providers/anthropic.test.ts exercises the deferral ordering contract directly against streamAnthropic; src/agents/embedded-agent-runner/thinking.test.ts exercises pumpStreamWithRecovery with a stream that emits {type:'error'} before any {type:'start'} — the exact scenario this fix enables; src/agents/anthropic-transport-stream.test.ts covers the embedded-agent default path.

Exact steps or command run after this patch: pnpm test src/llm/providers/anthropic.test.ts src/agents/embedded-agent-runner/thinking.test.ts src/agents/anthropic-transport-stream.test.ts; node scripts/run-oxlint.mjs and pnpm format:fix on all four changed files.

Evidence after fix (Vitest output for touched test files):

 anthropic.test.ts              Tests  8 passed (8)
 thinking.test.ts               Tests  42 passed (42)
 anthropic-transport-stream.test.ts  Tests  53 passed (53)

The four new tests cover the deferral ordering contract: start is emitted only after message_start processing in both stream implementations; a pre-stream SSE error arrives with no preceding start event.

Observed result after fix: the "retries pre-content terminal stream-error events" case in thinking.test.ts exercises the exact recovery path — stream emits {type:'error'} with no preceding {type:'start'} → retry fires (callCount=2). The new anthropic.test.ts and anthropic-transport-stream.test.ts tests confirm both implementations now uphold the yieldedOutput=false precondition that recovery requires.

What was not tested: live Anthropic API call with expired cache and thinking blocks (requires waiting 5+ min for cache TTL expiry in a real session).

Repro confirmation: the pre-existing "retries pre-content terminal stream-error events" test was already gated on yieldedOutput=false — it tested the recovery machinery in isolation but could not catch the provider-level ordering bug. Before this fix, both stream implementations set yieldedOutput=true before any SSE error arrived, so that test passed while the real provider path failed silently. The new tests close that gap by driving both stream implementations directly.

Risk / Mitigation

  • Risk: start event now carries populated usage (filled from message_start) instead of a zero-usage snapshot. Mitigation: start.partial is always treated as a partial/in-progress snapshot by all consumers; no caller depends on usage being zero; the change is strictly more informative.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Core

Linked Issue/PR

Fixes #90667

Extended thinking sessions become permanently broken after gateway restart or Anthropic
prompt-cache TTL expiry because thinking-block signature validation errors were silently
swallowing the recovery retry.

In pumpStreamWithRecovery (thinking.ts), the yieldedOutput flag prevents retry once any
non-error event has been emitted. streamAnthropic previously pushed {type:'start'} before
the SSE event loop, so when a pre-stream SSE error arrived (Anthropic sends signature
validation errors before message_start), yieldedOutput was already true and the retry
branch was skipped.

Deferring stream.push({type:'start'}) into the message_start handler keeps yieldedOutput=false
when a pre-stream error arrives, letting pumpStreamWithRecovery strip all thinking blocks
from context and retry the request cleanly.

Fixes openclaw#90667
@openclaw-barnacle openclaw-barnacle Bot added size: S r: too-many-prs Auto-close: author has more than twenty active PRs. labels Jun 5, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Jun 5, 2026

Codex review: passed. Reviewed June 5, 2026, 10:17 PM ET / 02:17 UTC.

Summary
The branch moves Anthropic start emission into message_start handling for the provider and transport stream paths and adds focused ordering/error tests.

PR surface: Source +5, Tests +149. Total +154 across 4 files.

Reproducibility: Do we have a high-confidence way to reproduce the issue? Yes from source: current main emits start before Anthropic can surface a pre-message_start SSE error, and recovery intentionally refuses to retry after any non-error output; no live expired-cache run was performed.

Review metrics: none identified.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🌊 off-meta tidepool
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] The exact live Anthropic expired-cache or gateway-restart path was not exercised; confidence is based on source ordering, the reported provider error shape, and focused stream/recovery tests.

Maintainer options:

  1. Decide the mitigation before merge
    Land the narrow provider and transport ordering fix after normal maintainer/automerge gates, while preserving recovery's no-retry-after-output guard.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P2] No repair lane is needed; the branch already contains the narrow source change and regression tests, so the next action is maintainer/automerge validation.

Security
Cleared: The diff only changes Anthropic stream ordering and colocated tests; it does not touch secrets, dependencies, workflows, permissions, or package execution surfaces.

Review details

Best possible solution:

Land the narrow provider and transport ordering fix after normal maintainer/automerge gates, while preserving recovery's no-retry-after-output guard.

Do we have a high-confidence way to reproduce the issue?

Do we have a high-confidence way to reproduce the issue? Yes from source: current main emits start before Anthropic can surface a pre-message_start SSE error, and recovery intentionally refuses to retry after any non-error output; no live expired-cache run was performed.

Is this the best way to solve the issue?

Is this the best way to solve the issue? Yes; fixing the provider and transport event ordering preserves the recovery safety invariant instead of weakening retry behavior after visible output has already streamed.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 3a2f54e6a866.

Label changes

Label changes:

  • add rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🌊 off-meta tidepool and patch quality is 🦞 diamond lobster.
  • add status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Not applicable: The author is a repository member, so the external-contributor real behavior proof gate does not apply; the PR body records focused stream and recovery test output plus the missing live Anthropic TTL run.
  • remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦞 diamond lobster, so this older rating label is no longer current.
  • remove status: 👀 ready for maintainer look: Current PR status label is status: 🚀 automerge armed.

Label justifications:

  • P1: The linked bug breaks Anthropic extended-thinking agent sessions after restart or cache expiry, and this PR is the core recovery fix path.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🌊 off-meta tidepool and patch quality is 🦞 diamond lobster.
  • status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Not applicable: The author is a repository member, so the external-contributor real behavior proof gate does not apply; the PR body records focused stream and recovery test output plus the missing live Anthropic TTL run.
Evidence reviewed

PR surface:

Source +5, Tests +149. Total +154 across 4 files.

View PR surface stats
Area Files Added Removed Net
Source 2 10 5 +5
Tests 2 149 0 +149
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 4 159 5 +154

What I checked:

Likely related people:

  • vincentkoc: Git history shows Vincent Koc introduced the Anthropic transport runtime and shared transport helpers, and also authored earlier Anthropic thinking recovery work in the old runner path. (role: introduced behavior and adjacent feature owner; confidence: high; commits: ea4265a82063, d8458a1481e9, 9aa2ef273602; files: src/agents/anthropic-transport-stream.ts, src/agents/provider-transport-stream.ts, src/agents/transport-stream-shared.ts)
  • steipete: Peter Steinberger recently reworked the Anthropic transport dependency boundary and appears frequently in the Anthropic/provider history sampled for these files. (role: recent adjacent owner; confidence: medium; commits: 67ebc433f953, 628b454eff6e; files: src/agents/anthropic-transport-stream.ts, package.json, pnpm-lock.yaml)
  • obviyus: Current-main blame for the affected stream/recovery files points at Ayaan Zaidi's snapshot commit, though the commit subject is broad and not clearly the original feature introduction. (role: recent current-main contributor; confidence: low; commits: 61d121f1caa2; files: src/llm/providers/anthropic.ts, src/agents/anthropic-transport-stream.ts, src/agents/embedded-agent-runner/thinking.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@openclaw-barnacle openclaw-barnacle Bot removed the r: too-many-prs Auto-close: author has more than twenty active PRs. label Jun 5, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P1 High-priority user-facing bug, regression, or broken workflow. labels Jun 5, 2026
… message_start

Applies the same start-event deferral fix from src/llm/providers/anthropic.ts to the
embedded-agent default path. resolveEmbeddedAgentStreamFn routes anthropic-messages
through createBoundaryAwareStreamFnForModel → createAnthropicMessagesTransportStreamFn,
so the thinking-block recovery bug (pumpStreamWithRecovery yieldedOutput gate) affects
the production embedded path via this file, not just the provider stream.

Moves stream.push({type:'start'}) from before the SDK event loop into the message_start
handler, keeping yieldedOutput=false in pumpStreamWithRecovery when an SSE event: error
arrives before message_start (as Anthropic sends for invalid thinking signatures).
@openclaw-barnacle openclaw-barnacle Bot added the agents Agent runtime and tooling label Jun 5, 2026
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels Jun 5, 2026
@Takhoffman
Copy link
Copy Markdown
Contributor

@clawsweeper automerge

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Jun 6, 2026

🦞✅
ClawSweeper merged this PR after the passing review.

Source: clawsweeper[bot]
Feedback: structured ClawSweeper verdict: pass (sha=399a243c6437aa49e8dc2e7dbc661ff6caa60d10)
Merge status: merged by ClawSweeper automerge
Merged at: 2026-06-06T02:17:55Z
Merge commit: aa8070a76f92

What merged:

  • The branch moves Anthropic start emission into message_start handling for the provider and transport stream paths and adds focused ordering/error tests.
  • PR surface: Source +5, Tests +149. Total +154 across 4 files.
  • Reproducibility: Do we have a high-confidence way to reproduce the issue? Yes from source: current main emit ... ecovery intentionally refuses to retry after any non-error output; no live expired-cache run was performed.

Automerge notes:

  • PR branch already contained follow-up commit before automerge: fix(agents): defer Anthropic transport stream start event until after…

The automerge loop is complete.

Automerge progress:

  • 2026-06-06 02:07:48 UTC review queued 399a243c6437 (queued)
  • 2026-06-06 02:17:44 UTC review passed 399a243c6437 (structured ClawSweeper verdict: pass (sha=399a243c6437aa49e8dc2e7dbc661ff6caa60...)
  • 2026-06-06 02:17:57 UTC merged 399a243c6437 (merged by ClawSweeper automerge)

@clawsweeper clawsweeper Bot added clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 6, 2026
@clawsweeper clawsweeper Bot merged commit aa8070a into openclaw:main Jun 6, 2026
216 of 226 checks passed
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request Jun 6, 2026
openclaw#90697)

Summary:
- The branch moves Anthropic `start` emission into `message_start` handling for the provider and transport stream paths and adds focused ordering/error tests.
- PR surface: Source +5, Tests +149. Total +154 across 4 files.
- Reproducibility: Do we have a high-confidence way to reproduce the issue? Yes from source: current main emit ... ecovery intentionally refuses to retry after any non-error output; no live expired-cache run was performed.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): defer Anthropic transport stream start event until after…

Validation:
- ClawSweeper review passed for head 399a243.
- Required merge gates passed before the squash merge.

Prepared head SHA: 399a243
Review: openclaw#90697 (comment)

Co-authored-by: openperf <[email protected]>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <[email protected]>
849261680 pushed a commit to 849261680/openclaw that referenced this pull request Jun 7, 2026
openclaw#90697)

Summary:
- The branch moves Anthropic `start` emission into `message_start` handling for the provider and transport stream paths and adds focused ordering/error tests.
- PR surface: Source +5, Tests +149. Total +154 across 4 files.
- Reproducibility: Do we have a high-confidence way to reproduce the issue? Yes from source: current main emit ... ecovery intentionally refuses to retry after any non-error output; no live expired-cache run was performed.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): defer Anthropic transport stream start event until after…

Validation:
- ClawSweeper review passed for head 399a243.
- Required merge gates passed before the squash merge.

Prepared head SHA: 399a243
Review: openclaw#90697 (comment)

Co-authored-by: openperf <[email protected]>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge P1 High-priority user-facing bug, regression, or broken workflow. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: S status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Extended thinking sessions permanently broken after gateway restart / cache miss — no recovery for research agents

2 participants