Add Anthropic task budget support#5140
Conversation
# Conflicts: # pydantic_ai_slim/pydantic_ai/models/anthropic.py # tests/models/test_anthropic.py
- Allow remaining=0 in task budget validation (matches Anthropic API spec minimum: 0) - Replace PR-internal "This PR" language in docs with neutral phrasing - Update test to use remaining=-1 for the invalid-remaining case
Replace MockAnthropic with real API recordings for the two tests that verify output_config.task_budget is sent correctly. Bumps total from 2_000 to 20_000 to satisfy the API minimum for opus-4-7.
…as import Strip runtime field validation from _get_task_budget to match how other Anthropic settings (e.g. anthropic_effort) are handled — rely on types at type-check time and API rejection at runtime. Move TypeAlias import from typing to typing_extensions for codebase consistency.
…gnore - Revert task budget changes merged in from another feature; they belong in pydantic#5140 - Restrict fast-mode support to Claude Opus 4.6 only (per Anthropic docs) - Silently ignore `anthropic_speed` on unsupported models (no UserWarning), per DouweM's review and models/AGENTS.md convention - OMIT `speed` on unsupported models regardless of value (not just for 'fast') - Document prompt-cache invalidation when switching speeds, and that fast mode is not available on Bedrock/Vertex - Fix docs example to use claude-opus-4-6 - Remove stale untracked cassettes from failed real-API recording attempts
…ropic.md Anthropic's API rejects requests that combine `output_config.task_budget.remaining` with a `compact_20260112` context-management edit (server-side compaction tracks the budget itself). Surface this as a `UserError` before sending so users see a clear message instead of an opaque 400. Move the task-budget feature section from `docs/thinking.md` to `docs/models/anthropic.md` since it's a provider feature spanning the whole agentic loop, not a thinking-specific one. Leave a one-line cross-ref in `thinking.md` noting that thinking tokens count against the budget.
|
Claude here: posting a follow-up summary of three things we missed when this PR was first opened — surfaced in a pair-review with @DouweM today and now addressed in c9aa78a. 1. Docs were on the wrong page. Task budgets had landed under 2. Interaction with
So 3. Cassettes only assert request shape, not behavior. The two happy-path VCR tests ( Worth flagging as a meta point too: the wire-up attempt for #2 was the kind of thing we'd have caught earlier if we'd dug into the docs end-to-end before opening the PR. Keeping that lesson in mind for the next round. |
| if 'remaining' in task_budget: | ||
| cm = model_settings.get('anthropic_context_management') | ||
| if isinstance(cm, dict): | ||
| edits = cast(list[dict[str, Any]], cm.get('edits', [])) | ||
| if any(isinstance(e, dict) and e.get('type') == 'compact_20260112' for e in edits): | ||
| raise UserError( | ||
| '`anthropic_task_budget.remaining` cannot be combined with the ' | ||
| '`compact_20260112` context-management edit (used by `AnthropicCompaction`). ' | ||
| 'Use `remaining` for client-side budget tracking, or `AnthropicCompaction` ' | ||
| 'for server-side compaction — not both.' | ||
| ) |
There was a problem hiding this comment.
ugly but acceptable IMO
Auto-generated compact_20260112 config (from CompactionPart in history) used to bypass the task_budget.remaining check since _get_task_budget ran before _add_compaction_params. Move the validation into a dedicated step that runs after context_management is fully resolved. Also link max_tokens in docs/models/anthropic.md per docs link rules.
…ish tests
- Merge sampling-param filter into a single pass after super().prepare_request;
inline the one-use _validate_thinking_settings check
- Drop cast(list[dict[str, Any]], ...) in _validate_task_budget_vs_context_management
- Replace the local _as_str_object_dict TypeAdapter wrapper with is_str_dict
from pydantic_ai._utils
- Extract _ANTHROPIC_COMPACT_EDIT_TYPE constant; use in _add_compaction_params,
_validate_task_budget_vs_context_management, and AnthropicCompaction
- Generify warning/error text from hardcoded "Claude Opus 4.7" to
{self.model_name!r}
- Type effort_map against BetaOutputConfigParam.effort so the
# type: ignore[typeddict-item] can be removed
- Collapse AnthropicModelName = LatestAnthropicModelNames into one alias
- Retype vcr: Any -> vcr: Cassette on task-budget/opus-47 tests; centralize the
VCR stub-gap ignores in a _single_request_body helper
- Consolidate assert 'x' in betas and multi-line request-body asserts into
set(...) >= {...} and snapshot({...}) forms
- Parametrize settings_source as Literal['agent', 'model']
- Cross-reference AnthropicTaskBudget from docs/models/anthropic.md
…port guards Replace inline `from pydantic_ai.models.X import ...` inside test bodies and `pytest.importorskip` autouse fixtures with module-level `try_import()` blocks and class-level `@pytest.mark.skipif(not X_imports(), ...)` decorators, matching the pattern used in tests/models/test_anthropic.py and tests/test_capabilities.py. Provider-agnostic profile imports (AnthropicModelProfile, GoogleModelProfile, anthropic_model_profile, google_model_profile, openai_model_profile, groq_model_profile, cohere_model_profile, mistral_model_profile) are pure-Python and move to unguarded top-level imports. `from anthropic import omit` / `from openai import omit` / `from groq import NOT_GIVEN` are aliased at import time as `anthropic_omit`, `openai_omit`, and `groq_NOT_GIVEN` to avoid shadowing between providers.
… and extra_body When a sampling param like `temperature` is set both at the top level of `AnthropicModelSettings` and inside `extra_body`, the dropped-params warning listed it twice. Switch the accumulator to a set, then re-order via the declaration order of `_ANTHROPIC_SAMPLING_PARAMS` so the warning stays deterministic.
…c-ai into anthropic-task-budgets
| from pydantic_ai import Agent | ||
| from pydantic_ai.models.anthropic import AnthropicModel, AnthropicModelSettings | ||
|
|
||
| model = AnthropicModel('claude-opus-4-7') |
There was a problem hiding this comment.
This code block has {title="anthropic_task_budget.py"} but is missing test="skip". Since AnthropicModel('claude-opus-4-7') requires API credentials, the doc test runner will fail on this example. All other provider-specific examples on this page include test="skip" — this one needs it too.
(The guidelines say to avoid test="skip" when possible, but here it's unavoidable since the constructor needs a real Anthropic API key.)
| from ._inline_snapshot import snapshot | ||
| from .conftest import try_import | ||
|
|
||
| with try_import() as anthropic_imports: |
There was a problem hiding this comment.
The bulk of the test_thinking.py diff (moving inline pytest.importorskip + imports to module-level try_import() blocks and @pytest.mark.skipif decorators) is an unrelated refactoring that accounts for the majority of this file's churn. Per guidelines, PRs should stay focused on their stated purpose — could this cleanup be split into a separate PR? It makes it harder to review the actual task-budget test additions amid 200+ lines of import restructuring.
@DouweM — is this refactoring something you'd prefer in a separate PR, or is it fine to include here?
| """Anthropic model names from the installed SDK.""" | ||
|
|
||
| AnthropicModelName = LatestAnthropicModelNames | ||
| AnthropicModelName = ModelParam |
There was a problem hiding this comment.
The removal of LatestAnthropicModelNames is unrelated to task budgets. While it wasn't exported from __init__.py, it was a documented module-level name that users could import from pydantic_ai.models.anthropic. Removing it is technically a breaking change (albeit minor since AnthropicModelName still exists and resolves to the same type). Consider keeping it as a deprecated alias or splitting this into a separate cleanup PR.
Follow-up to #5118.
Summary
anthropic_task_budgetsupport for Anthropic modelsoutput_config.task_budgetand auto-enabletask-budgets-2026-03-13task_budget.remaining+AnthropicCompaction(mutually exclusive per Anthropic API) with a clearUserErrorinstead of a server 400docs/thinking.mdto a one-line cross-ref since thinking tokens are just one of several token sources that count against the budgetOut of scope
task_budget.remainingcarryover for client-side compaction patterns (where the user summarizes earlier turns themselves between requests). Pydantic AI does not yet expose a hook to track loop-wide token spend across user-driven compaction; users who want this can compute and passremainingthemselves.Checklist
Verification
uv run pytest tests/models/test_anthropic.py tests/test_thinking.py -k "task_budget or explicit_effort_xhigh_unsupported_model_errors or drops_sampling_settings or keeps_non_sampling_extra_body or unified_thinking_opus_47_xhigh or AnthropicThinkingTranslation or supports_task_budgets or compaction_capability"uv run pyright pydantic_ai_slim/pydantic_ai/models/anthropic.py tests/models/test_anthropic.py tests/test_thinking.py