Skip to content

Add Anthropic automatic prompt caching support#4840

Merged
DouweM merged 7 commits intopydantic:mainfrom
DenysMoskalenko:feature/anthropic-automatic-caching
Apr 15, 2026
Merged

Add Anthropic automatic prompt caching support#4840
DouweM merged 7 commits intopydantic:mainfrom
DenysMoskalenko:feature/anthropic-automatic-caching

Conversation

@DenysMoskalenko
Copy link
Copy Markdown
Contributor

Adds support for Anthropic's automatic caching — a top-level cache_control parameter on messages.create() that lets the server automatically place a cache breakpoint on the last cacheable block and move it forward as conversations grow. This is simpler than manually placing CachePoint markers or using the per-section anthropic_cache_* settings, and is Anthropic's recommended approach for multi-turn conversations.

What changed

New setting anthropic_automatic_caching: bool | Literal['5m', '1h'] on AnthropicModelSettings, following the same type pattern as the existing anthropic_cache_instructions / anthropic_cache_tool_definitions / anthropic_cache_messages settings. True defaults to 5-minute TTL; '1h' opts into the extended cache duration.

The top-level cache_control parameter is passed through to both messages.create() and count_tokens(). When enabled, _limit_cache_points reduces the explicit breakpoint budget from 4 to 3, since the server-applied breakpoint occupies one slot.

SDK version bump: minimum anthropic from >=0.80.0 to >=0.83.0 (the top-level cache_control parameter was added in v0.83.0).

Pre-Review Checklist

  • Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
  • No breaking changes in accordance with the version policy.
  • Linting and type checking pass per make format and make typecheck.
  • PR title is fit for the release changelog.

Pre-Merge Checklist

  • New tests for any fix or new behavior, maintaining 100% coverage.
  • Updated documentation for new features and behaviors, including docstrings for API docs.

@github-actions github-actions Bot added size: M Medium PR (101-500 weighted lines) feature New feature request, or PR implementing a feature (enhancement) labels Mar 25, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

@DenysMoskalenko DenysMoskalenko force-pushed the feature/anthropic-automatic-caching branch from 0f73868 to 2dc1add Compare March 25, 2026 12:21
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@DouweM
Copy link
Copy Markdown
Collaborator

DouweM commented Mar 25, 2026

@DenysMoskalenko Thanks for working on this! Please have a look at the review comments; let me know if any are not relevant.

github-actions[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

@DenysMoskalenko DenysMoskalenko force-pushed the feature/anthropic-automatic-caching branch 2 times, most recently from 8e920a2 to 1eed77f Compare March 25, 2026 16:30
devin-ai-integration[bot]

This comment was marked as resolved.

@DenysMoskalenko DenysMoskalenko force-pushed the feature/anthropic-automatic-caching branch 2 times, most recently from 8d4d357 to 4496ea5 Compare March 25, 2026 16:56
devin-ai-integration[bot]

This comment was marked as resolved.

@DenysMoskalenko DenysMoskalenko force-pushed the feature/anthropic-automatic-caching branch 2 times, most recently from a1f7cb1 to b5371c9 Compare March 25, 2026 20:11
@DenysMoskalenko
Copy link
Copy Markdown
Contributor Author

@DouweM Looks like we good now

github-actions[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

@DenysMoskalenko DenysMoskalenko force-pushed the feature/anthropic-automatic-caching branch from b5371c9 to d7a5912 Compare March 26, 2026 13:38
devin-ai-integration[bot]

This comment was marked as resolved.

Comment thread docs/models/anthropic.md Outdated
Comment thread docs/models/anthropic.md Outdated
Comment thread docs/models/anthropic.md Outdated
@DenysMoskalenko DenysMoskalenko force-pushed the feature/anthropic-automatic-caching branch from d7a5912 to 85e0e94 Compare March 30, 2026 09:49
devin-ai-integration[bot]

This comment was marked as resolved.

@DenysMoskalenko DenysMoskalenko force-pushed the feature/anthropic-automatic-caching branch from 85e0e94 to dbada65 Compare March 30, 2026 11:16
@DouweM
Copy link
Copy Markdown
Collaborator

DouweM commented Apr 13, 2026

@DenysMoskalenko Thanks for all your patience and hard work here! I'm taking over this PR now and will get it out in the next few days and I don't keep us going in a never ending review loop!

devin-ai-integration[bot]

This comment was marked as resolved.

@DenysMoskalenko
Copy link
Copy Markdown
Contributor Author

@DenysMoskalenko Thanks for all your patience and hard work here! I'm taking over this PR now and will get it out in the next few days and I don't keep us going in a never ending review loop!

No problem at all. I see the number of open PRs and issues, and I understand the workload involved. I am happy to wait as long as needed and update the code however you would like. For me, it is more important to keep the project aligned with your vision than to get this PR merged quickly.

devin-ai-integration[bot]

This comment was marked as resolved.

If the last block already has `cache_control` (e.g. from an explicit `CachePoint`),
it is left unchanged to preserve the user's chosen TTL.
"""
cache_setting = model_settings.get('anthropic_cache') or model_settings.get('anthropic_cache_messages')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_apply_per_block_caching_fallback independently re-reads and re-interprets the cache settings (anthropic_cache / anthropic_cache_messages) via a different code path than _build_automatic_cache_control. The latter handles the conflict check, the deprecation warning, and the True → '5m' normalization — none of which are replicated here. That works today because _build_automatic_cache_control is always called first, but it's fragile.

Consider resolving the effective cache setting once — either in _build_automatic_cache_control (returning both the top-level param and the resolved TTL/boolean), or in a small shared helper — and passing the resolved value into _apply_per_block_caching_fallback instead of having it re-derive it from raw settings. This would also let you drop the or chain here and the duplicate '5m' if ... is True else ... on line 1262.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DouweM — judgment call on a refactor.

The bot's concern: _apply_per_block_caching_fallback re-reads anthropic_cache / anthropic_cache_messages and re-derives the TTL, duplicating the resolution logic from _build_automatic_cache_control. Works today because _build_automatic_cache_control is always called first (so the deprecation warning fires and the conflict raises), but it's fragile to reorder.

Concrete shape of the refactor: have _build_automatic_cache_control return a (top_level_param_or_None, resolved_ttl_or_None) tuple or a small dataclass, pass it into _apply_per_block_caching_fallback instead of having it re-derive from model_settings. Each callsite already calls them back-to-back so plumbing is straightforward.

It's a real improvement, ~15-20 lines of change. Want me to do it?

Comment thread tests/models/test_anthropic.py Outdated
model = AnthropicModel('claude-haiku-4-5', provider=AnthropicProvider(anthropic_client=mock_client))

settings = AnthropicModelSettings(anthropic_cache=True)
assert model._build_automatic_cache_control(settings) is None # pyright: ignore[reportPrivateUsage]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several of the new tests (test_automatic_cache_control_none_on_unsupported_clients, test_anthropic_cache_per_block_fallback_on_unsupported_clients, test_deprecated_cache_messages_per_block_fallback_on_unsupported_clients, test_per_block_fallback_preserves_existing_cache_control) invoke private methods (_build_automatic_cache_control, _apply_per_block_caching_fallback) directly.

The test guidelines prefer testing through public APIs. Since the Bedrock/Vertex behavior is already covered end-to-end by the VCR test test_anthropic_cache_bedrock_real_api and the integration tests use mock clients + agent.run, these private-method tests are largely redundant. Consider consolidating them into integration tests that exercise the same behavior through agent.run with mock Bedrock/Vertex clients — similar to how test_anthropic_cache_messages_deprecated already works.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DouweM — judgment call on test consolidation.

The bot's point is fair: test_automatic_cache_control_none_on_unsupported_clients, test_anthropic_cache_per_block_fallback_on_unsupported_clients, test_deprecated_cache_messages_per_block_fallback_on_unsupported_clients, and test_per_block_fallback_preserves_existing_cache_control all poke at private methods directly. tests/CLAUDE.md says to test through public APIs.

There's some end-to-end coverage already: test_anthropic_cache_bedrock_real_api (VCR, Bedrock multi-turn with cache) and test_anthropic_cache_messages_deprecated (mock client, end-to-end via agent.run).

But the private-method tests also cover edge cases the end-to-end tests don't:

  • Vertex (not just Bedrock) — no Vertex cassette exists
  • TTL passthrough on fallback for '1h' — currently only True is end-to-end tested
  • CachePoint-preserved-by-fallback — no end-to-end test

Options:

  1. Leave as-is (private-method tests exist because end-to-end coverage is incomplete).
  2. Add mock-client Vertex/TTL/CachePoint tests that exercise agent.run, then delete the private-method tests.
  3. Delete the private-method tests now, accept the coverage gap.

My lean is (2), but it's ~100 lines of churn on a PR that's already large. Want me to do it, or leave it for a follow-up?

github-actions[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

@DenysMoskalenko DenysMoskalenko force-pushed the feature/anthropic-automatic-caching branch 2 times, most recently from 959ef52 to 6c5f2ca Compare April 15, 2026 20:21
DenysMoskalenko and others added 5 commits April 15, 2026 22:23
Add `anthropic_automatic_caching` setting to `AnthropicModelSettings`
that passes a top-level `cache_control` parameter to Anthropic's API,
enabling server-managed automatic cache breakpoints.

- Supports `True` (5m TTL), `'5m'`, or `'1h'` TTL values
- Reduces explicit cache point budget from 4 to 3 when enabled
- Silently ignored for AsyncAnthropicBedrock clients
  (Bedrock does not support automatic caching)
- Bumps minimum anthropic SDK to >=0.83.0

Made-with: Cursor
…r-block fallback

- Rename setting from anthropic_automatic_caching to anthropic_cache
- Deprecate anthropic_cache_messages in favor of anthropic_cache
- On Bedrock, anthropic_cache falls back to per-block cache_control on the
  last user message (since top-level automatic caching is not supported)
- Remove TTL stripping for Bedrock in _build_cache_control (Bedrock accepts TTL)
- Add VCR-recorded integration test for Bedrock per-block caching with TTL
- Update docs and docstrings to reflect Bedrock fallback behavior

Made-with: Cursor
…larity

- Remove Foundry from fallback list (it supports automatic caching per
  Anthropic docs); only Bedrock and Vertex need per-block fallback
- Replace double backticks with single backticks in docstrings
- Clarify anthropic_cache_messages deprecation: now behaves same as
  anthropic_cache (automatic on API/Foundry, per-block on Bedrock/Vertex)
- Remove redundant per-block caching in _map_message for cache_messages
  (now handled by _build_automatic_cache_control + _apply_per_block_caching_fallback)
- Add link to Anthropic automatic caching docs in docs page
- Move Bedrock/Vertex note to Automatic Caching section where anthropic_cache
  is introduced
- Reframe cache point budget: anthropic_cache counts as 1 cache point like
  other settings; clarify we auto-trim excess
- Update Bedrock test to multi-turn (message_history) and re-record cassette

Made-with: Cursor
An explicit opt-out via `anthropic_cache=False` alongside
`anthropic_cache_messages=True` should not raise; the user is
clearly opting into one and out of the other. Only treat truthy
settings on both keys as a conflict.
@DenysMoskalenko DenysMoskalenko force-pushed the feature/anthropic-automatic-caching branch from 6c5f2ca to cba1934 Compare April 15, 2026 20:24
devin-ai-integration[bot]

This comment was marked as resolved.

…ethod tests

- Have _build_automatic_cache_control return (top_level_param, resolved_ttl)
  tuple so _apply_per_block_caching_fallback no longer re-derives cache
  settings from raw model_settings
- Replace 4 private-method tests with integration tests through agent.run
  using mock Bedrock/Vertex clients
- Switch compaction tests from deprecated anthropic_cache_messages to
  anthropic_cache

Made-with: Cursor
@DenysMoskalenko DenysMoskalenko force-pushed the feature/anthropic-automatic-caching branch from cba1934 to c8a4dba Compare April 15, 2026 20:35
…assertion

- Parametrize base_url alongside the client class so the Vertex case
  uses a real Vertex URL instead of a Bedrock one (cosmetic only, since
  base_url isn't read on this code path).
- Replace `not isinstance(cache_control, dict)` with an explicit
  `cache_control is anthropic.omit` check — says exactly what we mean.
@DouweM DouweM merged commit b0eca08 into pydantic:main Apr 15, 2026
43 checks passed
@DouweM
Copy link
Copy Markdown
Collaborator

DouweM commented Apr 15, 2026

@DenysMoskalenko Thanks Denys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature request, or PR implementing a feature (enhancement) size: M Medium PR (101-500 weighted lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow to use Anthropic's new automatic prompt caching

2 participants