fix: restore per-block `cache_control` for `anthropic_cache_messages` by Wh1isper · Pull Request #5227 · pydantic/pydantic-ai

Wh1isper · 2026-04-28T12:51:04Z

Summary

This PR restores anthropic_cache_messages as a per-block Anthropic cache_control setting for the final message content block.

PR #4840 introduced Anthropic automatic caching via the top-level cache_control parameter and changed anthropic_cache_messages to map to that new behavior with a deprecation warning. That made sense for the official Anthropic API, where automatic caching is the recommended simple path for multi-turn conversations, but it changed the existing behavior of an established Pydantic AI setting.

The previous behavior still matters because several Anthropic-compatible providers and proxy layers continue to use the explicit per-block Anthropic cache format. Restoring it keeps the original setting useful for those integrations while preserving anthropic_cache for Anthropic's top-level automatic caching.

Problem

anthropic_cache_messages previously added explicit cache_control metadata to message content blocks. After #4840, enabling it produced a top-level cache_control request parameter instead.

That created a breaking behavior change for users whose provider expects explicit per-block cache control in the Anthropic message body:

{
  "role": "user",
  "content": [
    {
      "type": "text",
      "text": "...",
      "cache_control": {"type": "ephemeral", "ttl": "5m"}
    }
  ]
}

Common affected scenarios include:

Anthropic-compatible providers that implement explicit prompt caching in the message format.
Gateways and proxy layers that translate Vertex AI or Bedrock APIs into Anthropic-compatible request bodies.
Routing providers where top-level automatic caching has provider-specific support, while explicit block-level cache controls remain the portable Anthropic-compatible format.

Evidence from provider documentation

The explicit per-block format is still used across current provider and gateway documentation:

MiniMax documents an "Explicit Prompt Caching (Anthropic API)" page and states that MiniMax supports Anthropic API compatible caching managed through explicit cache_control settings: https://platform.minimax.io/docs/api-reference/anthropic-api-compatible-cache
Amazon Bedrock documents prompt caching for Anthropic Claude InvokeModel requests using cache_control on content blocks inside messages, with supported fields including system, messages, and tools: https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html
Google Vertex AI documents Claude prompt caching through the Anthropic partner model surface and refers to cache_control as part of the matching cache key: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude/prompt-caching
LiteLLM lists Anthropic API, Vertex AI, and Bedrock among providers with prompt caching support, reflecting the need for gateway-compatible prompt caching behavior across multiple provider backends: https://docs.litellm.ai/docs/completion/prompt_caching
OpenRouter's prompt caching documentation describes provider-specific support for automatic caching and calls out Bedrock and Vertex behavior separately, which reinforces the need to keep explicit message-level caching available for compatible routing paths: https://openrouter.ai/docs/guides/best-practices/prompt-caching

Change

This PR makes the two settings distinct:

anthropic_cache keeps the top-level automatic caching behavior introduced by Add Anthropic automatic prompt caching support #4840.
anthropic_cache_messages again adds per-block cache_control to the final message content block.

It also keeps the conflict check between the two settings, since a request should use one message caching strategy at a time.

Tests

Added and updated Anthropic model tests for:

anthropic_cache_messages=True and custom TTL values adding per-block cache_control.
Preserving an existing explicit CachePoint cache control.
Raising UserError when anthropic_cache and anthropic_cache_messages are enabled together.
Cache point budgeting when message-level cache control is used.

Command run:

uv run pytest tests/models/test_anthropic.py -k 'cache_messages or anthropic_cache_fallback_on_unsupported_clients or limit_cache_points'

Result:

11 passed

Checklist

Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
No breaking changes in accordance with the version policy.
PR title is fit for the release changelog.

Wh1isper · 2026-04-28T12:53:13Z

@DouweM @DenysMoskalenko Sorry for the ping. Since the previous PR introduced a breaking change, I'd really appreciate it if we could prioritize this PR. Thanks!

## Summary - Restore `anthropic_cache_messages` as a per-block cache control option for final message content. - Document usage for Anthropic-compatible gateways and providers. - Update Anthropic cache tests for message cache behavior and cache-point limits. Tests: `uv run pytest tests/models/test_anthropic.py -k 'cache_messages or anthropic_cache_fallback_on_unsupported_clients or limit_cache_points'` Assisted-by: YAAI <[email protected]>

devin-ai-integration

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

devin-ai-integration · 2026-04-28T13:13:26Z

    anthropic_cache_messages: bool | Literal['5m', '1h']
-    """Deprecated: use `anthropic_cache` instead.
+    """Whether to add `cache_control` to the last message content block.

-    Behaves the same as `anthropic_cache`: uses automatic caching where supported,
-    falls back to per-block caching on Bedrock and Vertex. Emits a deprecation warning.
+    When enabled, this adds per-block `cache_control` to the last content block in the
+    final message. This is useful for Anthropic-compatible providers and gateways that
+    support explicit per-block caching but don't support Anthropic's top-level automatic
+    caching parameter.
+
+    If `True`, uses TTL='5m'. You can also specify '5m' or '1h' directly.
    Cannot be combined with `anthropic_cache`.
    """


🚩 Behavioral change for existing anthropic_cache_messages users (V1 backward compatibility)

This PR changes the semantics of anthropic_cache_messages from a deprecated alias of anthropic_cache (top-level automatic caching) to a distinct per-block caching feature. Previously, anthropic_cache_messages=True on a standard Anthropic client would emit a deprecation warning and set the top-level cache_control parameter for server-side automatic caching. Now it adds per-block cache_control only to the last content block of the last message — a meaningfully different caching behavior.

The version policy at docs/version-policy.md:7 states: "Functionality marked as deprecated will not be removed until V2." While the field isn't removed, its behavior is fundamentally changed. Users who had anthropic_cache_messages=True and hadn't yet migrated to anthropic_cache per the deprecation warning will silently get different caching behavior. On Bedrock/Vertex the behavior happens to be the same (both old and new do per-block fallback), but on the standard Anthropic API the difference is significant.

Given we're at v1.87.0 and the current date is April 2026 (the earliest V2 release date), this may be intentional preparation for V2. Worth confirming with maintainers whether this is acceptable in V1 or should wait for a V2 release.

Was this helpful? React with 👍 or 👎 to provide feedback.

I think the two should coexist. They are not substitutes for each other, although in some APIs one can replace the other, because their semantics are different.

The V1 backward-compat lens here is what motivated the PR: #4840 itself was the regression, changing anthropic_cache_messages from its original per-block behavior to a deprecation alias of the new top-level anthropic_cache. That broke users on Anthropic-compatible gateways and proxies that depend on the per-block cache_control format (Bedrock, Vertex partner surface, MiniMax, OpenRouter, LiteLLM). Restoring the original per-block semantics fixes that regression — the alias window was short, and anyone who had migrated into the alias saw a deprecation warning steering them to anthropic_cache already.

Agreed with @Wh1isper that the two should coexist with distinct semantics: top-level automatic caching vs. per-block message caching. Keeping the conflict-check between them keeps misuse loud.

DouweM · 2026-04-28T20:35:51Z

@Wh1isper Sorry for the breaking change, thanks for the report & fix! I'll take it over the line.

…essages-per-block # Conflicts: # pydantic_ai_slim/pydantic_ai/models/anthropic.py

- Extract `_apply_message_cache_control` shared between `_map_message` and `_apply_per_block_caching_fallback` - Frame `anthropic_cache_messages` docs as the gateway-compatible alternative to `anthropic_cache`, with explicit mutual exclusion

github-actions · 2026-04-28T21:40:17Z

+        if cache_messages := model_settings.get('anthropic_cache_messages'):
+            self._apply_message_cache_control(
+                anthropic_messages, '5m' if cache_messages is True else cache_messages
+            )


The anthropic_cache_messages per-block caching is applied here inside _map_message, while the analogous per-block fallback for anthropic_cache (Bedrock/Vertex) is applied via _apply_per_block_caching_fallback as a separate step after _map_message returns in _messages_create / _messages_count_tokens. Both end up calling _apply_message_cache_control, but the split makes the caching pipeline harder to follow.

Consider moving this out of _map_message and into _messages_create / _messages_count_tokens as a separate step alongside _apply_per_block_caching_fallback, so all message-level caching is visible and sequenced together in the orchestration method. (Yes, this means it needs to appear in both callsites, but _apply_per_block_caching_fallback already does, so the pattern exists.)

github-actions · 2026-04-28T21:40:21Z

@@ -1305,84 +1307,39 @@ async def test_anthropic_cache_messages_deprecated_custom_ttl(allow_model_reques
        m,
        system_prompt='System instructions.',
        model_settings=AnthropicModelSettings(
-            anthropic_cache_messages='1h',
-        ),
-    )
-
-    with pytest.warns(DeprecationWarning, match='`anthropic_cache_messages` is deprecated'):
-        await agent.run('User message')
-
-    completion_kwargs = get_mock_chat_completion_kwargs(mock_client)[0]
-    assert completion_kwargs['cache_control'] == snapshot({'type': 'ephemeral', 'ttl': '1h'})
-
-
-async def test_anthropic_cache_and_cache_messages_conflict(allow_model_requests: None):
-    """Test that enabling both anthropic_cache and anthropic_cache_messages raises UserError."""
-    c = completion_message(
-        [BetaTextBlock(text='Response', type='text')],
-        usage=BetaUsage(input_tokens=10, output_tokens=5),
-    )
-    mock_client = MockAnthropic.create_mock(c)
-    m = AnthropicModel('claude-haiku-4-5', provider=AnthropicProvider(anthropic_client=mock_client))
-    agent = Agent(
-        m,
-        system_prompt='System instructions.',
-        model_settings=AnthropicModelSettings(
-            anthropic_cache=True,
            anthropic_cache_messages=True,
        ),
    )

-    with pytest.raises(UserError, match='cannot both be enabled'):
-        await agent.run('User message')
-
-
-async def test_limit_cache_points_with_deprecated_cache_messages(allow_model_requests: None):
-    """Test that deprecated anthropic_cache_messages maps to anthropic_cache for cache point limiting."""
-    c = completion_message(
-        [BetaTextBlock(text='Response', type='text')],
-        usage=BetaUsage(input_tokens=10, output_tokens=5),
-    )
-    mock_client = MockAnthropic.create_mock(c)
-    m = AnthropicModel('claude-haiku-4-5', provider=AnthropicProvider(anthropic_client=mock_client))
-    agent = Agent(
-        m,
-        system_prompt='System instructions.',
-        model_settings=AnthropicModelSettings(
-            anthropic_cache_messages=True,
-        ),
+    await agent.run(
+        [
+            'Context 1',
+            CachePoint(),
+            'Context 2',
+            CachePoint(),
+            'Context 3',
+            CachePoint(),
+            'Question',
+        ]
    )

-    # anthropic_cache_messages now maps to anthropic_cache (top-level cache_control),
-    # which reduces the explicit cache point budget from 4 to 3.
-    # With 4 CachePoint markers, the oldest should be removed to fit budget of 3.
-    with pytest.warns(DeprecationWarning, match='`anthropic_cache_messages` is deprecated'):
-        await agent.run(
-            [
-                'Context 1',
-                CachePoint(),  # Oldest, should be removed
-                'Context 2',
-                CachePoint(),  # Should be kept
-                'Context 3',
-                CachePoint(),  # Should be kept
-                'Context 4',
-                CachePoint(),  # Should be kept
-                'Question',
-            ]
-        )
-
    completion_kwargs = get_mock_chat_completion_kwargs(mock_client)[0]
    messages = completion_kwargs['messages']
-    assert completion_kwargs['cache_control'] == {'type': 'ephemeral', 'ttl': '5m'}
+    assert completion_kwargs['cache_control'] is OMIT

-    cache_count = 0
-    for msg in messages:
-        for block in msg['content']:
-            if 'cache_control' in block:
-                cache_count += 1
-
-    # Budget is 3 (reduced from 4 by automatic caching). 4 CachePoint markers means 1 removed.
-    assert cache_count == 3
+    assert messages == snapshot(
+        [
+            {
+                'role': 'user',
+                'content': [
+                    {'text': 'Context 1', 'type': 'text', 'cache_control': {'type': 'ephemeral', 'ttl': '5m'}},
+                    {'text': 'Context 2', 'type': 'text', 'cache_control': {'type': 'ephemeral', 'ttl': '5m'}},
+                    {'text': 'Context 3', 'type': 'text', 'cache_control': {'type': 'ephemeral', 'ttl': '5m'}},
+                    {'text': 'Question', 'type': 'text', 'cache_control': {'type': 'ephemeral', 'ttl': '5m'}},
+                ],
+            }
+        ]
+    )


This test has exactly 4 cache points (3 CachePoint + 1 from anthropic_cache_messages), which matches the budget of 4 exactly — so nothing is actually trimmed. The test name says "limit" but it's really only verifying that anthropic_cache_messages cache points are counted in the budget.

To actually test the limiting/trimming behavior, add enough CachePoint markers that the total (explicit + anthropic_cache_messages) exceeds 4, and verify the oldest explicit CachePoint is removed. That's what the old test_limit_cache_points_with_deprecated_cache_messages was testing (5 total, trimmed to 3 because automatic caching reduced the budget).

Both call sites (`_map_message` and `_apply_per_block_caching_fallback`) pass the just-built request message list, which is never empty in practice. Document the precondition in the docstring instead.

- Move `anthropic_cache_messages` per-block injection out of `_map_message` and into the request orchestration sites alongside `_apply_per_block_caching_fallback`, so all message-level caching is visible together - Extend `test_limit_cache_points_with_cache_messages` to actually exceed the 4-point budget so the trimming behavior is exercised

…pydantic#5227) Co-authored-by: Douwe Maan <[email protected]>

Wh1isper changed the title ~~fix: restore per-block Anthropic message cache control~~ Restore per-block Anthropic message cache control Apr 28, 2026

github-actions Bot added size: M Medium PR (101-500 weighted lines) bug Report that something isn't working, or PR implementing a fix labels Apr 28, 2026

devin-ai-integration Bot reviewed Apr 28, 2026

View reviewed changes

Wh1isper force-pushed the fix/anthropic-cache-messages-per-block branch from 9d786ed to 23dd692 Compare April 28, 2026 13:13

Merge remote-tracking branch 'origin/main' into fix/anthropic-cache-m…

840677c

…essages-per-block # Conflicts: # pydantic_ai_slim/pydantic_ai/models/anthropic.py

DouweM self-assigned this Apr 28, 2026

DouweM added the auto-review label Apr 28, 2026

github-actions Bot reviewed Apr 28, 2026

View reviewed changes

github-actions Bot removed the auto-review label Apr 28, 2026

DouweM added 3 commits April 28, 2026 15:41

Format helper call onto one line

b16a3b2

Drop unreachable empty-messages guard in cache helper

004e9ce

Both call sites (`_map_message` and `_apply_per_block_caching_fallback`) pass the just-built request message list, which is never empty in practice. Document the precondition in the docstring instead.

DouweM changed the title ~~Restore per-block Anthropic message cache control~~ fix: restore per-block cache_control for anthropic_cache_messages Apr 28, 2026

DouweM merged commit 88566ae into pydantic:main Apr 28, 2026
45 checks passed

Wh1isper deleted the fix/anthropic-cache-messages-per-block branch April 29, 2026 02:12

Alex-Resch pushed a commit to Alex-Resch/pydantic-ai that referenced this pull request Apr 29, 2026

fix: restore per-block cache_control for anthropic_cache_messages (…

a4e4319

…pydantic#5227) Co-authored-by: Douwe Maan <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: restore per-block `cache_control` for `anthropic_cache_messages`#5227

fix: restore per-block `cache_control` for `anthropic_cache_messages`#5227
DouweM merged 6 commits intopydantic:mainfrom
Wh1isper:fix/anthropic-cache-messages-per-block

Wh1isper commented Apr 28, 2026 •

edited

Loading

Uh oh!

Wh1isper commented Apr 28, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 28, 2026

Uh oh!

Wh1isper Apr 28, 2026

Uh oh!

DouweM Apr 28, 2026

Uh oh!

DouweM commented Apr 28, 2026

Uh oh!

github-actions Bot Apr 28, 2026

Uh oh!

github-actions Bot Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Wh1isper commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Evidence from provider documentation

Change

Tests

Checklist

Uh oh!

Wh1isper commented Apr 28, 2026

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Wh1isper Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

DouweM Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

DouweM commented Apr 28, 2026

Uh oh!

github-actions Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Wh1isper commented Apr 28, 2026 •

edited

Loading