Add Anthropic task budget support by dsfaccini · Pull Request #5140 · pydantic/pydantic-ai

dsfaccini · 2026-04-17T20:20:13Z

Follow-up to #5118.

Summary

add typed anthropic_task_budget support for Anthropic models
map it to output_config.task_budget and auto-enable task-budgets-2026-03-13
validate malformed configs and reject unsupported models
reject task_budget.remaining + AnthropicCompaction (mutually exclusive per Anthropic API) with a clear UserError instead of a server 400
document task budgets on the Anthropic provider page (where the feature lives), keeping docs/thinking.md to a one-line cross-ref since thinking tokens are just one of several token sources that count against the budget

Out of scope

task_budget.remaining carryover for client-side compaction patterns (where the user summarizes earlier turns themselves between requests). Pydantic AI does not yet expose a hook to track loop-wide token spend across user-driven compaction; users who want this can compute and pass remaining themselves.

Checklist

Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
No breaking changes in accordance with the version policy.
PR title is fit for the release changelog.

Verification

uv run pytest tests/models/test_anthropic.py tests/test_thinking.py -k "task_budget or explicit_effort_xhigh_unsupported_model_errors or drops_sampling_settings or keeps_non_sampling_extra_body or unified_thinking_opus_47_xhigh or AnthropicThinkingTranslation or supports_task_budgets or compaction_capability"
uv run pyright pydantic_ai_slim/pydantic_ai/models/anthropic.py tests/models/test_anthropic.py tests/test_thinking.py

# Conflicts: # pydantic_ai_slim/pydantic_ai/models/anthropic.py # tests/models/test_anthropic.py

- Allow remaining=0 in task budget validation (matches Anthropic API spec minimum: 0) - Replace PR-internal "This PR" language in docs with neutral phrasing - Update test to use remaining=-1 for the invalid-remaining case

Replace MockAnthropic with real API recordings for the two tests that verify output_config.task_budget is sent correctly. Bumps total from 2_000 to 20_000 to satisfy the API minimum for opus-4-7.

…as import Strip runtime field validation from _get_task_budget to match how other Anthropic settings (e.g. anthropic_effort) are handled — rely on types at type-check time and API rejection at runtime. Move TypeAlias import from typing to typing_extensions for codebase consistency.

…gnore - Revert task budget changes merged in from another feature; they belong in pydantic#5140 - Restrict fast-mode support to Claude Opus 4.6 only (per Anthropic docs) - Silently ignore `anthropic_speed` on unsupported models (no UserWarning), per DouweM's review and models/AGENTS.md convention - OMIT `speed` on unsupported models regardless of value (not just for 'fast') - Document prompt-cache invalidation when switching speeds, and that fast mode is not available on Bedrock/Vertex - Fix docs example to use claude-opus-4-6 - Remove stale untracked cassettes from failed real-API recording attempts

…ropic.md Anthropic's API rejects requests that combine `output_config.task_budget.remaining` with a `compact_20260112` context-management edit (server-side compaction tracks the budget itself). Surface this as a `UserError` before sending so users see a clear message instead of an opaque 400. Move the task-budget feature section from `docs/thinking.md` to `docs/models/anthropic.md` since it's a provider feature spanning the whole agentic loop, not a thinking-specific one. Leave a one-line cross-ref in `thinking.md` noting that thinking tokens count against the budget.

dsfaccini · 2026-04-22T21:17:25Z

Claude here: posting a follow-up summary of three things we missed when this PR was first opened — surfaced in a pair-review with @DouweM today and now addressed in c9aa78a.

1. Docs were on the wrong page. Task budgets had landed under docs/thinking.md because Anthropic groups them with thinking in their own docs, but the feature spans the entire agentic loop (thinking + tool calls + tool results + output) and isn't thinking-specific. Moved the main section to docs/models/anthropic.md next to other provider-level features (caching, compaction); thinking.md now just has a one-line cross-ref noting that thinking tokens are one of the things that drains the budget.

2. Interaction with AnthropicCompaction was unaddressed. Anthropic's docs have a "Carrying a budget across compaction with remaining" section, and we already ship AnthropicCompaction in this repo, so the obvious move was to auto-populate task_budget.remaining from accumulated RunUsage.output_tokens when both features were configured together. Tried that — the API hard-rejects it:

remaining and compact_20260112 context management edits cannot both be provided. Use remaining for client-side usage tracking or compact_20260112 for server-side compaction, not both.

So remaining is exclusively for client-side compaction patterns (where the user is rewriting context themselves between requests). Server-side compaction tracks the budget itself. The PR now raises UserError before sending if a user combines task_budget.remaining with the compact_20260112 edit, so they get a clear message instead of an opaque 400. Carryover support for genuine client-side compaction is out of scope for this PR — users can compute and pass remaining themselves.

3. Cassettes only assert request shape, not behavior. The two happy-path VCR tests (test_anthropic_task_budget_adds_output_config_and_beta, test_anthropic_task_budget_coexists_with_effort) send "What is 2+2?" and only inspect request_body['output_config']. They prove we send the right wire format but don't demonstrate the budget actually shaping a multi-turn agentic loop. Not blocking for this PR — flagged for a follow-up that records a multi-turn cassette under budget pressure with tools.

Worth flagging as a meta point too: the wire-up attempt for #2 was the kind of thing we'd have caught earlier if we'd dug into the docs end-to-end before opening the PR. Keeping that lesson in mind for the next round.

dsfaccini · 2026-04-22T21:24:27Z

+        if 'remaining' in task_budget:
+            cm = model_settings.get('anthropic_context_management')
+            if isinstance(cm, dict):
+                edits = cast(list[dict[str, Any]], cm.get('edits', []))
+                if any(isinstance(e, dict) and e.get('type') == 'compact_20260112' for e in edits):
+                    raise UserError(
+                        '`anthropic_task_budget.remaining` cannot be combined with the '
+                        '`compact_20260112` context-management edit (used by `AnthropicCompaction`). '
+                        'Use `remaining` for client-side budget tracking, or `AnthropicCompaction` '
+                        'for server-side compaction — not both.'
+                    )


ugly but acceptable IMO

Auto-generated compact_20260112 config (from CompactionPart in history) used to bypass the task_budget.remaining check since _get_task_budget ran before _add_compaction_params. Move the validation into a dedicated step that runs after context_management is fully resolved. Also link max_tokens in docs/models/anthropic.md per docs link rules.

…ish tests - Merge sampling-param filter into a single pass after super().prepare_request; inline the one-use _validate_thinking_settings check - Drop cast(list[dict[str, Any]], ...) in _validate_task_budget_vs_context_management - Replace the local _as_str_object_dict TypeAdapter wrapper with is_str_dict from pydantic_ai._utils - Extract _ANTHROPIC_COMPACT_EDIT_TYPE constant; use in _add_compaction_params, _validate_task_budget_vs_context_management, and AnthropicCompaction - Generify warning/error text from hardcoded "Claude Opus 4.7" to {self.model_name!r} - Type effort_map against BetaOutputConfigParam.effort so the # type: ignore[typeddict-item] can be removed - Collapse AnthropicModelName = LatestAnthropicModelNames into one alias - Retype vcr: Any -> vcr: Cassette on task-budget/opus-47 tests; centralize the VCR stub-gap ignores in a _single_request_body helper - Consolidate assert 'x' in betas and multi-line request-body asserts into set(...) >= {...} and snapshot({...}) forms - Parametrize settings_source as Literal['agent', 'model'] - Cross-reference AnthropicTaskBudget from docs/models/anthropic.md

…port guards Replace inline `from pydantic_ai.models.X import ...` inside test bodies and `pytest.importorskip` autouse fixtures with module-level `try_import()` blocks and class-level `@pytest.mark.skipif(not X_imports(), ...)` decorators, matching the pattern used in tests/models/test_anthropic.py and tests/test_capabilities.py. Provider-agnostic profile imports (AnthropicModelProfile, GoogleModelProfile, anthropic_model_profile, google_model_profile, openai_model_profile, groq_model_profile, cohere_model_profile, mistral_model_profile) are pure-Python and move to unguarded top-level imports. `from anthropic import omit` / `from openai import omit` / `from groq import NOT_GIVEN` are aliased at import time as `anthropic_omit`, `openai_omit`, and `groq_NOT_GIVEN` to avoid shadowing between providers.

devin-ai-integration

Devin Review found 1 new potential issue.

View 19 additional findings in Devin Review.

…side compaction

… and extra_body When a sampling param like `temperature` is set both at the top level of `AnthropicModelSettings` and inside `extra_body`, the dropped-params warning listed it twice. Switch the accumulator to a set, then re-order via the declaration order of `_ANTHROPIC_SAMPLING_PARAMS` so the warning stays deterministic.

…c-ai into anthropic-task-budgets

github-actions · 2026-05-06T18:44:26Z

+from pydantic_ai import Agent
+from pydantic_ai.models.anthropic import AnthropicModel, AnthropicModelSettings
+
+model = AnthropicModel('claude-opus-4-7')


This code block has {title="anthropic_task_budget.py"} but is missing test="skip". Since AnthropicModel('claude-opus-4-7') requires API credentials, the doc test runner will fail on this example. All other provider-specific examples on this page include test="skip" — this one needs it too.

(The guidelines say to avoid test="skip" when possible, but here it's unavoidable since the constructor needs a real Anthropic API key.)

github-actions · 2026-05-06T18:44:40Z

 from ._inline_snapshot import snapshot
+from .conftest import try_import
+
+with try_import() as anthropic_imports:


The bulk of the test_thinking.py diff (moving inline pytest.importorskip + imports to module-level try_import() blocks and @pytest.mark.skipif decorators) is an unrelated refactoring that accounts for the majority of this file's churn. Per guidelines, PRs should stay focused on their stated purpose — could this cleanup be split into a separate PR? It makes it harder to review the actual task-budget test additions amid 200+ lines of import restructuring.

@DouweM — is this refactoring something you'd prefer in a separate PR, or is it fine to include here?

github-actions · 2026-05-06T18:44:48Z

-"""Anthropic model names from the installed SDK."""
-
-AnthropicModelName = LatestAnthropicModelNames
+AnthropicModelName = ModelParam


The removal of LatestAnthropicModelNames is unrelated to task budgets. While it wasn't exported from __init__.py, it was a documented module-level name that users could import from pydantic_ai.models.anthropic. Removing it is technically a breaking change (albeit minor since AnthropicModelName still exists and resolves to the same type). Consider keeping it as a deprecated alias or splitting this into a separate cleanup PR.

dsfaccini added 3 commits April 16, 2026 10:29

Add Claude Opus 4.7 Anthropic support

a20e7d9

Merge remote-tracking branch 'upstream/main' into anthropic-task-budgets

c156c1f

# Conflicts: # pydantic_ai_slim/pydantic_ai/models/anthropic.py # tests/models/test_anthropic.py

Add Anthropic task budget support

34892c4

github-actions Bot added size: M Medium PR (101-500 weighted lines) feature New feature request, or PR implementing a feature (enhancement) labels Apr 17, 2026