Add Anthropic automatic prompt caching support#4840
Conversation
0f73868 to
2dc1add
Compare
|
@DenysMoskalenko Thanks for working on this! Please have a look at the review comments; let me know if any are not relevant. |
8e920a2 to
1eed77f
Compare
8d4d357 to
4496ea5
Compare
a1f7cb1 to
b5371c9
Compare
|
@DouweM Looks like we good now |
b5371c9 to
d7a5912
Compare
d7a5912 to
85e0e94
Compare
85e0e94 to
dbada65
Compare
|
@DenysMoskalenko Thanks for all your patience and hard work here! I'm taking over this PR now and will get it out in the next few days and I don't keep us going in a never ending review loop! |
No problem at all. I see the number of open PRs and issues, and I understand the workload involved. I am happy to wait as long as needed and update the code however you would like. For me, it is more important to keep the project aligned with your vision than to get this PR merged quickly. |
| If the last block already has `cache_control` (e.g. from an explicit `CachePoint`), | ||
| it is left unchanged to preserve the user's chosen TTL. | ||
| """ | ||
| cache_setting = model_settings.get('anthropic_cache') or model_settings.get('anthropic_cache_messages') |
There was a problem hiding this comment.
_apply_per_block_caching_fallback independently re-reads and re-interprets the cache settings (anthropic_cache / anthropic_cache_messages) via a different code path than _build_automatic_cache_control. The latter handles the conflict check, the deprecation warning, and the True → '5m' normalization — none of which are replicated here. That works today because _build_automatic_cache_control is always called first, but it's fragile.
Consider resolving the effective cache setting once — either in _build_automatic_cache_control (returning both the top-level param and the resolved TTL/boolean), or in a small shared helper — and passing the resolved value into _apply_per_block_caching_fallback instead of having it re-derive it from raw settings. This would also let you drop the or chain here and the duplicate '5m' if ... is True else ... on line 1262.
There was a problem hiding this comment.
@DouweM — judgment call on a refactor.
The bot's concern: _apply_per_block_caching_fallback re-reads anthropic_cache / anthropic_cache_messages and re-derives the TTL, duplicating the resolution logic from _build_automatic_cache_control. Works today because _build_automatic_cache_control is always called first (so the deprecation warning fires and the conflict raises), but it's fragile to reorder.
Concrete shape of the refactor: have _build_automatic_cache_control return a (top_level_param_or_None, resolved_ttl_or_None) tuple or a small dataclass, pass it into _apply_per_block_caching_fallback instead of having it re-derive from model_settings. Each callsite already calls them back-to-back so plumbing is straightforward.
It's a real improvement, ~15-20 lines of change. Want me to do it?
| model = AnthropicModel('claude-haiku-4-5', provider=AnthropicProvider(anthropic_client=mock_client)) | ||
|
|
||
| settings = AnthropicModelSettings(anthropic_cache=True) | ||
| assert model._build_automatic_cache_control(settings) is None # pyright: ignore[reportPrivateUsage] |
There was a problem hiding this comment.
Several of the new tests (test_automatic_cache_control_none_on_unsupported_clients, test_anthropic_cache_per_block_fallback_on_unsupported_clients, test_deprecated_cache_messages_per_block_fallback_on_unsupported_clients, test_per_block_fallback_preserves_existing_cache_control) invoke private methods (_build_automatic_cache_control, _apply_per_block_caching_fallback) directly.
The test guidelines prefer testing through public APIs. Since the Bedrock/Vertex behavior is already covered end-to-end by the VCR test test_anthropic_cache_bedrock_real_api and the integration tests use mock clients + agent.run, these private-method tests are largely redundant. Consider consolidating them into integration tests that exercise the same behavior through agent.run with mock Bedrock/Vertex clients — similar to how test_anthropic_cache_messages_deprecated already works.
There was a problem hiding this comment.
@DouweM — judgment call on test consolidation.
The bot's point is fair: test_automatic_cache_control_none_on_unsupported_clients, test_anthropic_cache_per_block_fallback_on_unsupported_clients, test_deprecated_cache_messages_per_block_fallback_on_unsupported_clients, and test_per_block_fallback_preserves_existing_cache_control all poke at private methods directly. tests/CLAUDE.md says to test through public APIs.
There's some end-to-end coverage already: test_anthropic_cache_bedrock_real_api (VCR, Bedrock multi-turn with cache) and test_anthropic_cache_messages_deprecated (mock client, end-to-end via agent.run).
But the private-method tests also cover edge cases the end-to-end tests don't:
- Vertex (not just Bedrock) — no Vertex cassette exists
- TTL passthrough on fallback for
'1h'— currently onlyTrueis end-to-end tested CachePoint-preserved-by-fallback — no end-to-end test
Options:
- Leave as-is (private-method tests exist because end-to-end coverage is incomplete).
- Add mock-client Vertex/TTL/CachePoint tests that exercise
agent.run, then delete the private-method tests. - Delete the private-method tests now, accept the coverage gap.
My lean is (2), but it's ~100 lines of churn on a PR that's already large. Want me to do it, or leave it for a follow-up?
959ef52 to
6c5f2ca
Compare
Add `anthropic_automatic_caching` setting to `AnthropicModelSettings` that passes a top-level `cache_control` parameter to Anthropic's API, enabling server-managed automatic cache breakpoints. - Supports `True` (5m TTL), `'5m'`, or `'1h'` TTL values - Reduces explicit cache point budget from 4 to 3 when enabled - Silently ignored for AsyncAnthropicBedrock clients (Bedrock does not support automatic caching) - Bumps minimum anthropic SDK to >=0.83.0 Made-with: Cursor
…r-block fallback - Rename setting from anthropic_automatic_caching to anthropic_cache - Deprecate anthropic_cache_messages in favor of anthropic_cache - On Bedrock, anthropic_cache falls back to per-block cache_control on the last user message (since top-level automatic caching is not supported) - Remove TTL stripping for Bedrock in _build_cache_control (Bedrock accepts TTL) - Add VCR-recorded integration test for Bedrock per-block caching with TTL - Update docs and docstrings to reflect Bedrock fallback behavior Made-with: Cursor
…larity - Remove Foundry from fallback list (it supports automatic caching per Anthropic docs); only Bedrock and Vertex need per-block fallback - Replace double backticks with single backticks in docstrings - Clarify anthropic_cache_messages deprecation: now behaves same as anthropic_cache (automatic on API/Foundry, per-block on Bedrock/Vertex) - Remove redundant per-block caching in _map_message for cache_messages (now handled by _build_automatic_cache_control + _apply_per_block_caching_fallback) - Add link to Anthropic automatic caching docs in docs page - Move Bedrock/Vertex note to Automatic Caching section where anthropic_cache is introduced - Reframe cache point budget: anthropic_cache counts as 1 cache point like other settings; clarify we auto-trim excess - Update Bedrock test to multi-turn (message_history) and re-record cassette Made-with: Cursor
An explicit opt-out via `anthropic_cache=False` alongside `anthropic_cache_messages=True` should not raise; the user is clearly opting into one and out of the other. Only treat truthy settings on both keys as a conflict.
6c5f2ca to
cba1934
Compare
…ethod tests - Have _build_automatic_cache_control return (top_level_param, resolved_ttl) tuple so _apply_per_block_caching_fallback no longer re-derives cache settings from raw model_settings - Replace 4 private-method tests with integration tests through agent.run using mock Bedrock/Vertex clients - Switch compaction tests from deprecated anthropic_cache_messages to anthropic_cache Made-with: Cursor
cba1934 to
c8a4dba
Compare
…assertion - Parametrize base_url alongside the client class so the Vertex case uses a real Vertex URL instead of a Bedrock one (cosmetic only, since base_url isn't read on this code path). - Replace `not isinstance(cache_control, dict)` with an explicit `cache_control is anthropic.omit` check — says exactly what we mean.
|
@DenysMoskalenko Thanks Denys! |
Adds support for Anthropic's automatic caching — a top-level
cache_controlparameter onmessages.create()that lets the server automatically place a cache breakpoint on the last cacheable block and move it forward as conversations grow. This is simpler than manually placingCachePointmarkers or using the per-sectionanthropic_cache_*settings, and is Anthropic's recommended approach for multi-turn conversations.What changed
New setting
anthropic_automatic_caching: bool | Literal['5m', '1h']onAnthropicModelSettings, following the same type pattern as the existinganthropic_cache_instructions/anthropic_cache_tool_definitions/anthropic_cache_messagessettings.Truedefaults to 5-minute TTL;'1h'opts into the extended cache duration.The top-level
cache_controlparameter is passed through to bothmessages.create()andcount_tokens(). When enabled,_limit_cache_pointsreduces the explicit breakpoint budget from 4 to 3, since the server-applied breakpoint occupies one slot.SDK version bump: minimum
anthropicfrom>=0.80.0to>=0.83.0(the top-levelcache_controlparameter was added in v0.83.0).Pre-Review Checklist
make formatandmake typecheck.Pre-Merge Checklist