fix: preserve thinking/redacted_thinking blocks during context pruning#49657
fix: preserve thinking/redacted_thinking blocks during context pruning#49657ayushozha wants to merge 2 commits intoopenclaw:mainfrom
Conversation
When context pruning modifies assistant messages containing thinking blocks, the Anthropic API rejects the request. This fix preserves those blocks intact during pruning and prevents raw API errors from being sent to chat channels. Fixes openclaw#49573 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Greptile SummaryThis PR fixes two related issues with Anthropic extended-thinking support: (1) the context-pruning size estimator now counts
Confidence Score: 4/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/auto-reply/reply/agent-runner-execution.ts
Line: 624-626
Comment:
**No automatic session recovery for thinking-block errors**
Unlike `isRoleOrderingError` and `isCompactionFailure`, the `isThinkingBlockError` path does not attempt an automatic session reset — it only advises the user to run `/new` manually. The user-facing copy calls the data "stale reasoning data", which implicitly suggests a fixable state, but nothing is cleaned up automatically.
This is consistent with a "safe first pass" approach, but if thinking-block errors are reliably caused by persistent session state (i.e. they will always fail until the session is reset), then auto-recovery via `params.resetSessionAfterCompactionFailure` or a dedicated `resetSessionAfterThinkingBlockError` would give users a better experience — matching the treatment of role-ordering and compaction errors. Worth noting for a follow-up.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: "fix: preserve thinki..." |
| const isThinkingBlockError = /thinking.*(?:block|content|signature)|redacted_thinking/i.test( | ||
| message, | ||
| ); |
There was a problem hiding this comment.
No automatic session recovery for thinking-block errors
Unlike isRoleOrderingError and isCompactionFailure, the isThinkingBlockError path does not attempt an automatic session reset — it only advises the user to run /new manually. The user-facing copy calls the data "stale reasoning data", which implicitly suggests a fixable state, but nothing is cleaned up automatically.
This is consistent with a "safe first pass" approach, but if thinking-block errors are reliably caused by persistent session state (i.e. they will always fail until the session is reset), then auto-recovery via params.resetSessionAfterCompactionFailure or a dedicated resetSessionAfterThinkingBlockError would give users a better experience — matching the treatment of role-ordering and compaction errors. Worth noting for a follow-up.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/agent-runner-execution.ts
Line: 624-626
Comment:
**No automatic session recovery for thinking-block errors**
Unlike `isRoleOrderingError` and `isCompactionFailure`, the `isThinkingBlockError` path does not attempt an automatic session reset — it only advises the user to run `/new` manually. The user-facing copy calls the data "stale reasoning data", which implicitly suggests a fixable state, but nothing is cleaned up automatically.
This is consistent with a "safe first pass" approach, but if thinking-block errors are reliably caused by persistent session state (i.e. they will always fail until the session is reset), then auto-recovery via `params.resetSessionAfterCompactionFailure` or a dedicated `resetSessionAfterThinkingBlockError` would give users a better experience — matching the treatment of role-ordering and compaction errors. Worth noting for a follow-up.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Chenglin97
left a comment
There was a problem hiding this comment.
Review: fix: preserve thinking/redacted_thinking blocks during context pruning
This PR does three distinct things. Reviewing each separately:
1. pruner.ts — include thinkingSignature in size estimation
The fix is correct: thinkingSignature is an opaque base64-ish payload that can be tens of kilobytes long, so excluding it from estimateMessageChars would cause the pruner to underestimate context size and fail to activate soft-trim when needed. The type cast (b as { thinkingSignature?: string }) is pragmatic — ideally the thinking block type would include this field, but the cast is safe since we guard with typeof sig === "string".
One thing to verify: does thinkingSignature appear on non-redacted thinking blocks too? If so, the PR's comment "opaque payload for redacted thinking blocks" may be misleading, though the logic would still be correct.
2. agent-runner-execution.ts — thinking block error detection + broader sanitization
The isThinkingBlockError regex (/thinking.*(?:block|content|signature)|redacted_thinking/i) is reasonable, but it could over-match. An error message like "The model is thinking about block chain transactions" would trigger this branch and show "stale reasoning data" to the user, which would be confusing. Consider tightening the pattern to match more specific API error strings (e.g., thinking_block, thinking_signature, redacted_thinking).
More importantly: the change to always call sanitizeUserFacingText (previously only for isTransientHttp) is a meaningful behavior change that goes beyond the PR title. Raw error messages are now sanitized for all error types, including billing errors, context overflow, etc. This is arguably correct, but it's a broader change than "preserve thinking blocks" implies. Worth calling out explicitly in the PR description so reviewers know what they're approving.
3. Tests in pruner.test.ts
The first test ("preserves assistant messages with thinking blocks unmodified during pruning") checks reference equality (toBe) to confirm the pruner doesn't clone or mutate assistant message objects. This is a valid regression guard, but it's also testing existing behavior — nothing in pruner.ts was changed to preserve references. If this test was already passing before the PR, it's good documentation; if it was failing, that's not visible from the diff.
The second test ("accounts for thinkingSignature in size estimates") directly tests the new code path and is well-constructed — uses a large signature to force the size threshold and verifies soft-trim was triggered. Good coverage for the actual fix.
Missing test coverage: the agent-runner-execution.ts changes have no test coverage in this PR. The thinking-block error detection regex and the sanitization behavior change are both untested.
Summary: the pruner fix is correct and well-tested. The agent-runner changes are sensible in intent but would benefit from tighter regex and test coverage, and the sanitization scope change deserves explicit callout in the PR description.
|
Thanks for the thorough review @Chenglin97! Addressing your points: 1. 2. Regex over-matching — Good catch on "thinking about block chain". I'll tighten the regex to match Anthropic's specific error patterns more precisely ( 3. Broader sanitization scope — Agreed this deserves explicit callout. The rationale: raw API errors should never reach end users regardless of error type. I'll update the PR description to make this clear. 4. Missing test coverage — Fair point. The agent-runner changes are in a code path that's harder to unit test (requires mocking the full agent execution pipeline), but I agree the regex at minimum should have test cases. Will push fixes for items 2 and 3 shortly. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 78942be7fc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const sig = (b as { thinkingSignature?: string }).thinkingSignature; | ||
| if (typeof sig === "string") { | ||
| chars += sig.length; |
There was a problem hiding this comment.
Account thinking signatures only when they are sent
estimateMessageChars now always adds thinkingSignature bytes, but for Claude-family runs we strip thinking blocks from outbound requests (src/agents/transcript-policy.ts sets dropThinkingBlocks, and src/agents/pi-embedded-runner/run/attempt.ts removes them before streaming). In cache-ttl mode this overestimates context size, so pruning can soft-trim or hard-clear tool results even when the real request would fit, degrading answer quality by dropping useful tool output unnecessarily.
Useful? React with 👍 / 👎.
|
Closing as duplicate; this was superseded by #58916. |
Summary
Test plan
Fixes #49573
🤖 Generated with Claude Code