fix(llm): defer Anthropic stream start event until after message_start by openperf · Pull Request #90697 · openclaw/openclaw

openperf · 2026-06-05T14:55:50Z

Summary

Problem: Extended thinking sessions become permanently broken after a gateway restart or Anthropic prompt-cache TTL expiry (5 min). Every turn after the first failure also fails; once broken, the session never recovers on its own.
Root Cause: Both Anthropic stream implementations pushed {type:'start'} before entering the SSE event loop. In pumpStreamWithRecovery (thinking.ts), yieldedOutput is set true for every non-error event — once true, the thinking-block recovery retry is permanently disabled for that request. Anthropic sends thinking-signature validation errors as SSE event: error before message_start (pre-generation request validation). With the old placement, the start event set yieldedOutput=true before the SSE error arrived; recovery was silently skipped; the stale signatures stayed in the transcript; every subsequent turn also failed.
Fix: Defer stream.push({type:'start'}) into the message_start handler in both stream implementations so a pre-stream SSE error always arrives while yieldedOutput=false, letting pumpStreamWithRecovery strip all thinking blocks and retry cleanly.
What changed:
- src/llm/providers/anthropic.ts — move stream.push({type:'start'}) from before the SSE loop into the message_start event handler body (provider stream path).
- src/agents/anthropic-transport-stream.ts — same deferral for the embedded-agent default path (resolveEmbeddedAgentStreamFn → createBoundaryAwareStreamFnForModel → createAnthropicMessagesTransportStreamFn).
- src/llm/providers/anthropic.test.ts — two new tests: verify start is deferred until after message_start in normal flow; verify a pre-stream SSE error arrives with no preceding start.
- src/agents/anthropic-transport-stream.test.ts — same two ordering tests for the transport path.
What did NOT change (scope boundary):
- pumpStreamWithRecovery, wrapAnthropicStreamWithRecovery, stripAllThinkingBlocks — unchanged; the fix only corrects the provider-level event ordering.
- Config surface unchanged. Plugin surface unchanged. Gateway protocol unchanged.

Reproduction

Enable extended thinking on an Anthropic model and run a multi-turn session.
Wait 5+ minutes (Anthropic prompt-cache TTL) or restart the gateway so the cached thinking block signatures expire.
Send the next message.
Before this PR: Anthropic sends SSE event: error "Invalid signature in thinking block" before message_start; {type:'start'} was already emitted before the SSE loop; yieldedOutput=true; pumpStreamWithRecovery skips retry; the error propagates; all subsequent turns also fail permanently.
After this PR: {type:'start'} is deferred until inside the message_start handler in both stream implementations; yieldedOutput=false when the pre-stream SSE error arrives; pumpStreamWithRecovery detects the thinking-signature pattern, strips all thinking blocks, and retries; session continues normally.

Real behavior proof

Behavior addressed (#90667): extended thinking sessions permanently broken after gateway restart or Anthropic prompt-cache TTL expiry — every turn after the first signature error fails because yieldedOutput=true silently blocked the recovery retry in both the provider and transport stream paths.

Real environment tested (Linux, Node 22 — Vitest against the production streamAnthropic SSE event loop, pumpStreamWithRecovery recovery harness, and createAnthropicMessagesTransportStreamFn transport stream): src/llm/providers/anthropic.test.ts exercises the deferral ordering contract directly against streamAnthropic; src/agents/embedded-agent-runner/thinking.test.ts exercises pumpStreamWithRecovery with a stream that emits {type:'error'} before any {type:'start'} — the exact scenario this fix enables; src/agents/anthropic-transport-stream.test.ts covers the embedded-agent default path.

Exact steps or command run after this patch: pnpm test src/llm/providers/anthropic.test.ts src/agents/embedded-agent-runner/thinking.test.ts src/agents/anthropic-transport-stream.test.ts; node scripts/run-oxlint.mjs and pnpm format:fix on all four changed files.

Evidence after fix (Vitest output for touched test files):

 anthropic.test.ts              Tests  8 passed (8)
 thinking.test.ts               Tests  42 passed (42)
 anthropic-transport-stream.test.ts  Tests  53 passed (53)

The four new tests cover the deferral ordering contract: start is emitted only after message_start processing in both stream implementations; a pre-stream SSE error arrives with no preceding start event.

Observed result after fix: the "retries pre-content terminal stream-error events" case in thinking.test.ts exercises the exact recovery path — stream emits {type:'error'} with no preceding {type:'start'} → retry fires (callCount=2). The new anthropic.test.ts and anthropic-transport-stream.test.ts tests confirm both implementations now uphold the yieldedOutput=false precondition that recovery requires.

What was not tested: live Anthropic API call with expired cache and thinking blocks (requires waiting 5+ min for cache TTL expiry in a real session).

Repro confirmation: the pre-existing "retries pre-content terminal stream-error events" test was already gated on yieldedOutput=false — it tested the recovery machinery in isolation but could not catch the provider-level ordering bug. Before this fix, both stream implementations set yieldedOutput=true before any SSE error arrived, so that test passed while the real provider path failed silently. The new tests close that gap by driving both stream implementations directly.

Risk / Mitigation

Risk: start event now carries populated usage (filled from message_start) instead of a zero-usage snapshot. Mitigation: start.partial is always treated as a partial/in-progress snapshot by all consumers; no caller depends on usage being zero; the change is strictly more informative.

Change Type (select all)

Bug fix

Scope (select all touched areas)

Core

Linked Issue/PR

Fixes #90667

Extended thinking sessions become permanently broken after gateway restart or Anthropic prompt-cache TTL expiry because thinking-block signature validation errors were silently swallowing the recovery retry. In pumpStreamWithRecovery (thinking.ts), the yieldedOutput flag prevents retry once any non-error event has been emitted. streamAnthropic previously pushed {type:'start'} before the SSE event loop, so when a pre-stream SSE error arrived (Anthropic sends signature validation errors before message_start), yieldedOutput was already true and the retry branch was skipped. Deferring stream.push({type:'start'}) into the message_start handler keeps yieldedOutput=false when a pre-stream error arrives, letting pumpStreamWithRecovery strip all thinking blocks from context and retry the request cleanly. Fixes openclaw#90667

clawsweeper · 2026-06-05T14:57:51Z

Codex review: passed. Reviewed June 5, 2026, 10:17 PM ET / 02:17 UTC.

Summary
The branch moves Anthropic start emission into message_start handling for the provider and transport stream paths and adds focused ordering/error tests.

PR surface: Source +5, Tests +149. Total +154 across 4 files.

Reproducibility: Do we have a high-confidence way to reproduce the issue? Yes from source: current main emits start before Anthropic can surface a pre-message_start SSE error, and recovery intentionally refuses to retry after any non-error output; no live expired-cache run was performed.

Review metrics: none identified.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🌊 off-meta tidepool
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

[P1] The exact live Anthropic expired-cache or gateway-restart path was not exercised; confidence is based on source ordering, the reported provider error shape, and focused stream/recovery tests.

Maintainer options:

Decide the mitigation before merge
Land the narrow provider and transport ordering fix after normal maintainer/automerge gates, while preserving recovery's no-retry-after-output guard.
Pause or close
Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

[P2] No repair lane is needed; the branch already contains the narrow source change and regression tests, so the next action is maintainer/automerge validation.

Security
Cleared: The diff only changes Anthropic stream ordering and colocated tests; it does not touch secrets, dependencies, workflows, permissions, or package execution surfaces.

Review details

Best possible solution:

Land the narrow provider and transport ordering fix after normal maintainer/automerge gates, while preserving recovery's no-retry-after-output guard.

Do we have a high-confidence way to reproduce the issue?

Do we have a high-confidence way to reproduce the issue? Yes from source: current main emits start before Anthropic can surface a pre-message_start SSE error, and recovery intentionally refuses to retry after any non-error output; no live expired-cache run was performed.

Is this the best way to solve the issue?

Is this the best way to solve the issue? Yes; fixing the provider and transport event ordering preserves the recovery safety invariant instead of weakening retry behavior after visible output has already streamed.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 3a2f54e6a866.

Label changes

Label changes:

add rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🌊 off-meta tidepool and patch quality is 🦞 diamond lobster.
add status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Not applicable: The author is a repository member, so the external-contributor real behavior proof gate does not apply; the PR body records focused stream and recovery test output plus the missing live Anthropic TTL run.
remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦞 diamond lobster, so this older rating label is no longer current.
remove status: 👀 ready for maintainer look: Current PR status label is status: 🚀 automerge armed.

Label justifications:

P1: The linked bug breaks Anthropic extended-thinking agent sessions after restart or cache expiry, and this PR is the core recovery fix path.
rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🌊 off-meta tidepool and patch quality is 🦞 diamond lobster.
status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Not applicable: The author is a repository member, so the external-contributor real behavior proof gate does not apply; the PR body records focused stream and recovery test output plus the missing live Anthropic TTL run.

Evidence reviewed

PR surface:

Source +5, Tests +149. Total +154 across 4 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	2	10	5	+5
Tests	2	149	0	+149
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	4	159	5	+154

What I checked:

Current main root cause: Current main emits start before entering the Anthropic SSE loop, so a later pre-message_start provider error is seen by recovery after a non-error event. (src/llm/providers/anthropic.ts:518, 3a2f54e6a866)
Provider fix: The PR moves provider start emission into the message_start branch after response id and usage initialization. (src/llm/providers/anthropic.ts:537, 399a243c6437)
Transport fix: The PR applies the same start deferral to the Anthropic transport stream path used by managed embedded-agent transport. (src/agents/anthropic-transport-stream.ts:1132, 399a243c6437)
Recovery invariant: pumpStreamWithRecovery only retries thinking-signature errors before any non-error stream event, so moving start is the right boundary fix. (src/agents/embedded-agent-runner/thinking.ts:614, 3a2f54e6a866)
Consumer contract: The agent loop treats start as the assistant message-start signal and still emits a final message start for terminal errors that arrive before start. (packages/agent-core/src/agent-loop.ts:386, 3a2f54e6a866)
Regression coverage: The PR adds provider and transport tests covering no start before pre-stream SSE errors, alongside existing recovery tests that retry when an error arrives before output. (src/llm/providers/anthropic.test.ts:302, 399a243c6437)

Likely related people:

vincentkoc: Git history shows Vincent Koc introduced the Anthropic transport runtime and shared transport helpers, and also authored earlier Anthropic thinking recovery work in the old runner path. (role: introduced behavior and adjacent feature owner; confidence: high; commits: ea4265a82063, d8458a1481e9, 9aa2ef273602; files: src/agents/anthropic-transport-stream.ts, src/agents/provider-transport-stream.ts, src/agents/transport-stream-shared.ts)
steipete: Peter Steinberger recently reworked the Anthropic transport dependency boundary and appears frequently in the Anthropic/provider history sampled for these files. (role: recent adjacent owner; confidence: medium; commits: 67ebc433f953, 628b454eff6e; files: src/agents/anthropic-transport-stream.ts, package.json, pnpm-lock.yaml)
obviyus: Current-main blame for the affected stream/recovery files points at Ayaan Zaidi's snapshot commit, though the commit subject is broad and not clearly the original feature introduction. (role: recent current-main contributor; confidence: low; commits: 61d121f1caa2; files: src/llm/providers/anthropic.ts, src/agents/anthropic-transport-stream.ts, src/agents/embedded-agent-runner/thinking.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

… message_start Applies the same start-event deferral fix from src/llm/providers/anthropic.ts to the embedded-agent default path. resolveEmbeddedAgentStreamFn routes anthropic-messages through createBoundaryAwareStreamFnForModel → createAnthropicMessagesTransportStreamFn, so the thinking-block recovery bug (pumpStreamWithRecovery yieldedOutput gate) affects the production embedded path via this file, not just the provider stream. Moves stream.push({type:'start'}) from before the SDK event loop into the message_start handler, keeping yieldedOutput=false in pumpStreamWithRecovery when an SSE event: error arrives before message_start (as Anthropic sends for invalid thinking signatures).

Takhoffman · 2026-06-06T02:07:02Z

@clawsweeper automerge

clawsweeper · 2026-06-06T02:07:04Z

🦞✅
ClawSweeper merged this PR after the passing review.

Source: clawsweeper[bot]
Feedback: structured ClawSweeper verdict: pass (sha=399a243c6437aa49e8dc2e7dbc661ff6caa60d10)
Merge status: merged by ClawSweeper automerge
Merged at: 2026-06-06T02:17:55Z
Merge commit: aa8070a76f92

What merged:

The branch moves Anthropic start emission into message_start handling for the provider and transport stream paths and adds focused ordering/error tests.
PR surface: Source +5, Tests +149. Total +154 across 4 files.
Reproducibility: Do we have a high-confidence way to reproduce the issue? Yes from source: current main emit ... ecovery intentionally refuses to retry after any non-error output; no live expired-cache run was performed.

Automerge notes:

PR branch already contained follow-up commit before automerge: fix(agents): defer Anthropic transport stream start event until after…

The automerge loop is complete.

Automerge progress:

2026-06-06 02:07:48 UTC review queued 399a243c6437 (queued)

2026-06-06 02:17:44 UTC review passed 399a243c6437 (structured ClawSweeper verdict: pass (sha=399a243c6437aa49e8dc2e7dbc661ff6caa60...)

2026-06-06 02:17:57 UTC merged 399a243c6437 (merged by ClawSweeper automerge)

openclaw#90697) Summary: - The branch moves Anthropic `start` emission into `message_start` handling for the provider and transport stream paths and adds focused ordering/error tests. - PR surface: Source +5, Tests +149. Total +154 across 4 files. - Reproducibility: Do we have a high-confidence way to reproduce the issue? Yes from source: current main emit ... ecovery intentionally refuses to retry after any non-error output; no live expired-cache run was performed. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): defer Anthropic transport stream start event until after… Validation: - ClawSweeper review passed for head 399a243. - Required merge gates passed before the squash merge. Prepared head SHA: 399a243 Review: openclaw#90697 (comment) Co-authored-by: openperf <[email protected]> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <[email protected]>

openclaw-barnacle Bot added size: S r: too-many-prs Auto-close: author has more than twenty active PRs. labels Jun 5, 2026

openclaw-barnacle Bot removed the r: too-many-prs Auto-close: author has more than twenty active PRs. label Jun 5, 2026

openclaw-barnacle Bot added the agents Agent runtime and tooling label Jun 5, 2026

clawsweeper Bot merged commit aa8070a into openclaw:main Jun 6, 2026
216 of 226 checks passed

Haderach-Ram mentioned this pull request Jun 6, 2026

Ecosystem Digest — 2026-06-06 Haderach-Ram/openclaw-radar#30

Open

github-actions Bot mentioned this pull request Jun 6, 2026

📡 Upstream Digest — 2026-06-06 02:27 UTC curtismercier/openclaw-mods#1022

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(llm): defer Anthropic stream start event until after message_start#90697

fix(llm): defer Anthropic stream start event until after message_start#90697
clawsweeper[bot] merged 2 commits into
openclaw:mainfrom
openperf:fix/90667-thinking-recovery-start-before-error

openperf commented Jun 5, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

Takhoffman commented Jun 6, 2026

Uh oh!

clawsweeper Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

openperf commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reproduction

Real behavior proof

Risk / Mitigation

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Uh oh!

clawsweeper Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Takhoffman commented Jun 6, 2026

Uh oh!

clawsweeper Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

openperf commented Jun 5, 2026 •

edited

Loading

clawsweeper Bot commented Jun 5, 2026 •

edited

Loading

clawsweeper Bot commented Jun 6, 2026 •

edited

Loading