Skip to content

fix(anthropic): skip cache_control for thinking/redacted_thinking blocks#27768

Closed
scz2011 wants to merge 2 commits intoopenclaw:mainfrom
scz2011:fix/anthropic-thinking-cache-control-27752
Closed

fix(anthropic): skip cache_control for thinking/redacted_thinking blocks#27768
scz2011 wants to merge 2 commits intoopenclaw:mainfrom
scz2011:fix/anthropic-thinking-cache-control-27752

Conversation

@scz2011
Copy link
Copy Markdown

@scz2011 scz2011 commented Feb 26, 2026

Fixes #27752

When thinkingDefault is enabled and sessions are long (~100+ turns), Anthropic's API rejects requests with a 400 error because thinking or redacted_thinking blocks in assistant messages appear to be modified during context rebuild by the prompt caching cache_control injection.

Root Cause

Anthropic requires thinking/redacted_thinking blocks to remain byte-for-byte unmodified from the original response (see Anthropic docs). The OpenRouter Anthropic cache_control injection logic was adding cache_control: { type: "ephemeral" } to the last content block of messages, including thinking/redacted_thinking blocks, which violates this requirement.

Error observed:
\
LLM request rejected: messages.135.content.13: thinking or redacted_thinking blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response.
\\

Changes

  • Modified OpenRouter Anthropic cache_control injection to skip thinking/redacted_thinking blocks in system/developer messages
  • Added logic to remove cache_control from thinking/redacted_thinking blocks in assistant messages if present (defensive measure)
  • Added comprehensive tests to verify thinking blocks are not modified:
    • Skips cache_control injection for thinking blocks in system messages
    • Removes cache_control from thinking blocks in assistant messages
    • Removes cache_control from redacted_thinking blocks
    • Handles mixed content with thinking blocks correctly

Impact

  • Long sessions with thinkingDefault enabled no longer break after ~100+ turns
  • Users can continue conversations without needing to issue /new to start fresh
  • Thinking blocks remain compliant with Anthropic's requirements
  • No impact on non-thinking content blocks or prompt caching functionality

Testing

All tests pass, including 4 new tests specifically for thinking block handling.

Fixes openclaw#27752

When thinkingDefault is enabled and sessions are long (~100+ turns),
Anthropic's API rejects requests with a 400 error because thinking or
redacted_thinking blocks in assistant messages appear to be modified
during context rebuild by the prompt caching cache_control injection.

Anthropic requires thinking/redacted_thinking blocks to remain
byte-for-byte unmodified from the original response. Adding
cache_control to these blocks violates this requirement.

Changes:
- Modified OpenRouter Anthropic cache_control injection to skip
  thinking/redacted_thinking blocks in system/developer messages
- Added logic to remove cache_control from thinking/redacted_thinking
  blocks in assistant messages if present
- Added comprehensive tests to verify thinking blocks are not modified

Impact:
- Long sessions with thinkingDefault enabled no longer break
- Users can continue conversations beyond 100+ turns without /new
- Thinking blocks remain compliant with Anthropic's requirements
- No impact on non-thinking content blocks
@openclaw-barnacle openclaw-barnacle bot added agents Agent runtime and tooling size: S labels Feb 26, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 26, 2026

Greptile Summary

Fixes a 400 error from Anthropic's API in long sessions (~100+ turns) with extended thinking enabled, caused by cache_control being injected into thinking/redacted_thinking blocks during prompt caching. Anthropic requires these blocks to remain byte-for-byte unmodified.

  • Skips cache_control injection for thinking/redacted_thinking blocks when processing system/developer messages in the OpenRouter Anthropic cache wrapper
  • Defensively strips cache_control from thinking/redacted_thinking blocks in assistant messages
  • Adds 4 new test cases covering thinking block handling (though they are placed outside the existing describe block)

Confidence Score: 4/5

  • This PR is safe to merge — it's a targeted, well-scoped fix for a documented Anthropic API constraint.
  • The fix correctly addresses the root cause (cache_control on thinking blocks violating Anthropic's immutability requirement). The logic is straightforward, well-commented, and handles both the injection-prevention and defensive-removal cases. Tests cover the main scenarios. Minor style issue with test placement outside the describe block, but no functional concerns.
  • No files require special attention.

Last reviewed commit: eb58462

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

});
});

it("skips cache_control injection for thinking blocks in system messages", () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests placed outside describe block

The 4 new tests (lines 95–185) are placed outside the describe("extra-params: OpenRouter Anthropic cache_control", ...) block which closes at line 93. They should be nested inside the describe block to keep them grouped with the related cache_control tests. As-is, they'll still run and pass, but they won't appear under the same test suite grouping in test output.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/extra-params.openrouter-cache-control.test.ts
Line: 95

Comment:
**Tests placed outside `describe` block**

The 4 new tests (lines 95–185) are placed outside the `describe("extra-params: OpenRouter Anthropic cache_control", ...)` block which closes at line 93. They should be nested inside the `describe` block to keep them grouped with the related cache_control tests. As-is, they'll still run and pass, but they won't appear under the same test suite grouping in test output.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Address Greptile review feedback: the 4 new tests for thinking block
handling were placed outside the describe block. Moved them inside
to keep them grouped with related cache_control tests for better
test suite organization.
@openclaw-barnacle
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle bot added the stale Marked as stale due to inactivity label Mar 4, 2026
@openclaw-barnacle openclaw-barnacle bot removed the stale Marked as stale due to inactivity label Mar 26, 2026
@openclaw-barnacle
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle bot added the stale Marked as stale due to inactivity label Apr 1, 2026
@obviyus
Copy link
Copy Markdown
Contributor

obviyus commented Apr 1, 2026

Closing as duplicate; this was superseded by #58916.

@obviyus obviyus closed this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S stale Marked as stale due to inactivity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Anthropic rejects request when thinking/redacted_thinking blocks are modified during context rebuild (long sessions with thinkingDefault enabled)

2 participants