-
Notifications
You must be signed in to change notification settings - Fork 14.1k
[BUG] Prompt Caching Works on Claude 3.7 But Not On Claude 4 Family (AWS Bedrock) #1347
Description
Environment
- Platform (select one):
- Anthropic API
- [XX] AWS Bedrock
- Google Vertex AI
- Other:
- Claude CLI version: 1.0.3
- Operating System: macOS 24.4.0 (Darwin) on ARM64 architectur
- Terminal: VS Code
Bug Description
Prompt caching works properly on Claude 3.7 Sonnet but does not work on newer Claude models (Claude Sonnet 4 and Claude Opus 4).
Steps to Reproduce
- Start a Claude Code session with default model (Claude 3.7 Sonnet)
- Note that cache writes are properly recorded
- Switch to Claude Sonnet 4 or Claude Opus 4 using model command
- Observe that cache read/write counts remain at 0
Expected Behavior
Claude 4 models should properly utilize prompt caching to reduce token usage and costs.
Actual Behavior
Only Claude 3.7 Sonnet shows cache activity (both reads and writes). Claude Sonnet 4 and Claude Opus 4 show 0 cache read/write regardless of
Additional Context
Edit: Separately, the ANTHROPIC_SMALL_FAST_MODEL does not use prompt caching regardless of what model is set. This means that Claude 3.7 does not use prompt caching when ANTHROPIC_SMALL_FAST_MODEL is set to Claude 3.7. This is clearly a bug considering that Claude 3.7 prompt caching does work when it is set as the main model (ANTHROPIC_MODEL).
Verified with a clean install and completely clean configs.
Anthropic guys, I am begging you to fix this. It makes the Claude 4 family completely unusable for coding - the costs are astronomical compared to using Claude 3.7 with prompt caching - and worse - the latency is terrible once you have a bit of context, especially with Claude 4 Opus!
Thanks in advance - love the tool - life changing!