Skip to content

Add Anthropic task budget support#5140

Open
dsfaccini wants to merge 19 commits intopydantic:mainfrom
dsfaccini:anthropic-task-budgets
Open

Add Anthropic task budget support#5140
dsfaccini wants to merge 19 commits intopydantic:mainfrom
dsfaccini:anthropic-task-budgets

Conversation

@dsfaccini
Copy link
Copy Markdown
Collaborator

@dsfaccini dsfaccini commented Apr 17, 2026

Follow-up to #5118.

Summary

  • add typed anthropic_task_budget support for Anthropic models
  • map it to output_config.task_budget and auto-enable task-budgets-2026-03-13
  • validate malformed configs and reject unsupported models
  • reject task_budget.remaining + AnthropicCompaction (mutually exclusive per Anthropic API) with a clear UserError instead of a server 400
  • document task budgets on the Anthropic provider page (where the feature lives), keeping docs/thinking.md to a one-line cross-ref since thinking tokens are just one of several token sources that count against the budget

Out of scope

  • task_budget.remaining carryover for client-side compaction patterns (where the user summarizes earlier turns themselves between requests). Pydantic AI does not yet expose a hook to track loop-wide token spend across user-driven compaction; users who want this can compute and pass remaining themselves.

Checklist

  • Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
  • No breaking changes in accordance with the version policy.
  • PR title is fit for the release changelog.

Verification

  • uv run pytest tests/models/test_anthropic.py tests/test_thinking.py -k "task_budget or explicit_effort_xhigh_unsupported_model_errors or drops_sampling_settings or keeps_non_sampling_extra_body or unified_thinking_opus_47_xhigh or AnthropicThinkingTranslation or supports_task_budgets or compaction_capability"
  • uv run pyright pydantic_ai_slim/pydantic_ai/models/anthropic.py tests/models/test_anthropic.py tests/test_thinking.py

@github-actions github-actions Bot added size: M Medium PR (101-500 weighted lines) feature New feature request, or PR implementing a feature (enhancement) labels Apr 17, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

- Allow remaining=0 in task budget validation (matches Anthropic API spec minimum: 0)
- Replace PR-internal "This PR" language in docs with neutral phrasing
- Update test to use remaining=-1 for the invalid-remaining case
Replace MockAnthropic with real API recordings for the two tests that
verify output_config.task_budget is sent correctly. Bumps total from
2_000 to 20_000 to satisfy the API minimum for opus-4-7.
@dsfaccini dsfaccini self-assigned this Apr 17, 2026
github-actions[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

…as import

Strip runtime field validation from _get_task_budget to match how other
Anthropic settings (e.g. anthropic_effort) are handled — rely on types at
type-check time and API rejection at runtime. Move TypeAlias import from
typing to typing_extensions for codebase consistency.
Comment thread tests/test_thinking.py Outdated
Comment thread docs/thinking.md Outdated
dsfaccini added a commit to bohdanhr/pydantic-ai that referenced this pull request Apr 22, 2026
…gnore

- Revert task budget changes merged in from another feature; they belong in pydantic#5140
- Restrict fast-mode support to Claude Opus 4.6 only (per Anthropic docs)
- Silently ignore `anthropic_speed` on unsupported models (no UserWarning),
  per DouweM's review and models/AGENTS.md convention
- OMIT `speed` on unsupported models regardless of value (not just for 'fast')
- Document prompt-cache invalidation when switching speeds, and that fast mode
  is not available on Bedrock/Vertex
- Fix docs example to use claude-opus-4-6
- Remove stale untracked cassettes from failed real-API recording attempts
…ropic.md

Anthropic's API rejects requests that combine `output_config.task_budget.remaining`
with a `compact_20260112` context-management edit (server-side compaction tracks
the budget itself). Surface this as a `UserError` before sending so users see a
clear message instead of an opaque 400.

Move the task-budget feature section from `docs/thinking.md` to
`docs/models/anthropic.md` since it's a provider feature spanning the whole
agentic loop, not a thinking-specific one. Leave a one-line cross-ref in
`thinking.md` noting that thinking tokens count against the budget.
@dsfaccini
Copy link
Copy Markdown
Collaborator Author

Claude here: posting a follow-up summary of three things we missed when this PR was first opened — surfaced in a pair-review with @DouweM today and now addressed in c9aa78a.

1. Docs were on the wrong page. Task budgets had landed under docs/thinking.md because Anthropic groups them with thinking in their own docs, but the feature spans the entire agentic loop (thinking + tool calls + tool results + output) and isn't thinking-specific. Moved the main section to docs/models/anthropic.md next to other provider-level features (caching, compaction); thinking.md now just has a one-line cross-ref noting that thinking tokens are one of the things that drains the budget.

2. Interaction with AnthropicCompaction was unaddressed. Anthropic's docs have a "Carrying a budget across compaction with remaining" section, and we already ship AnthropicCompaction in this repo, so the obvious move was to auto-populate task_budget.remaining from accumulated RunUsage.output_tokens when both features were configured together. Tried that — the API hard-rejects it:

remaining and compact_20260112 context management edits cannot both be provided. Use remaining for client-side usage tracking or compact_20260112 for server-side compaction, not both.

So remaining is exclusively for client-side compaction patterns (where the user is rewriting context themselves between requests). Server-side compaction tracks the budget itself. The PR now raises UserError before sending if a user combines task_budget.remaining with the compact_20260112 edit, so they get a clear message instead of an opaque 400. Carryover support for genuine client-side compaction is out of scope for this PR — users can compute and pass remaining themselves.

3. Cassettes only assert request shape, not behavior. The two happy-path VCR tests (test_anthropic_task_budget_adds_output_config_and_beta, test_anthropic_task_budget_coexists_with_effort) send "What is 2+2?" and only inspect request_body['output_config']. They prove we send the right wire format but don't demonstrate the budget actually shaping a multi-turn agentic loop. Not blocking for this PR — flagged for a follow-up that records a multi-turn cassette under budget pressure with tools.

Worth flagging as a meta point too: the wire-up attempt for #2 was the kind of thing we'd have caught earlier if we'd dug into the docs end-to-end before opening the PR. Keeping that lesson in mind for the next round.

devin-ai-integration[bot]

This comment was marked as resolved.

Comment on lines +1564 to +1574
if 'remaining' in task_budget:
cm = model_settings.get('anthropic_context_management')
if isinstance(cm, dict):
edits = cast(list[dict[str, Any]], cm.get('edits', []))
if any(isinstance(e, dict) and e.get('type') == 'compact_20260112' for e in edits):
raise UserError(
'`anthropic_task_budget.remaining` cannot be combined with the '
'`compact_20260112` context-management edit (used by `AnthropicCompaction`). '
'Use `remaining` for client-side budget tracking, or `AnthropicCompaction` '
'for server-side compaction — not both.'
)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugly but acceptable IMO

devin-ai-integration[bot]

This comment was marked as resolved.

Auto-generated compact_20260112 config (from CompactionPart in history)
used to bypass the task_budget.remaining check since _get_task_budget ran
before _add_compaction_params. Move the validation into a dedicated step
that runs after context_management is fully resolved.

Also link max_tokens in docs/models/anthropic.md per docs link rules.
devin-ai-integration[bot]

This comment was marked as resolved.

…ish tests

- Merge sampling-param filter into a single pass after super().prepare_request;
  inline the one-use _validate_thinking_settings check
- Drop cast(list[dict[str, Any]], ...) in _validate_task_budget_vs_context_management
- Replace the local _as_str_object_dict TypeAdapter wrapper with is_str_dict
  from pydantic_ai._utils
- Extract _ANTHROPIC_COMPACT_EDIT_TYPE constant; use in _add_compaction_params,
  _validate_task_budget_vs_context_management, and AnthropicCompaction
- Generify warning/error text from hardcoded "Claude Opus 4.7" to
  {self.model_name!r}
- Type effort_map against BetaOutputConfigParam.effort so the
  # type: ignore[typeddict-item] can be removed
- Collapse AnthropicModelName = LatestAnthropicModelNames into one alias
- Retype vcr: Any -> vcr: Cassette on task-budget/opus-47 tests; centralize the
  VCR stub-gap ignores in a _single_request_body helper
- Consolidate assert 'x' in betas and multi-line request-body asserts into
  set(...) >= {...} and snapshot({...}) forms
- Parametrize settings_source as Literal['agent', 'model']
- Cross-reference AnthropicTaskBudget from docs/models/anthropic.md
…port guards

Replace inline `from pydantic_ai.models.X import ...` inside test bodies and
`pytest.importorskip` autouse fixtures with module-level `try_import()` blocks
and class-level `@pytest.mark.skipif(not X_imports(), ...)` decorators, matching
the pattern used in tests/models/test_anthropic.py and tests/test_capabilities.py.

Provider-agnostic profile imports (AnthropicModelProfile, GoogleModelProfile,
anthropic_model_profile, google_model_profile, openai_model_profile,
groq_model_profile, cohere_model_profile, mistral_model_profile) are pure-Python
and move to unguarded top-level imports.

`from anthropic import omit` / `from openai import omit` / `from groq import
NOT_GIVEN` are aliased at import time as `anthropic_omit`, `openai_omit`, and
`groq_NOT_GIVEN` to avoid shadowing between providers.
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 19 additional findings in Devin Review.

Open in Devin Review

Comment thread pydantic_ai_slim/pydantic_ai/models/anthropic.py Outdated
dsfaccini added 4 commits May 5, 2026 16:38
… and extra_body

When a sampling param like `temperature` is set both at the top level of
`AnthropicModelSettings` and inside `extra_body`, the dropped-params warning
listed it twice. Switch the accumulator to a set, then re-order via the
declaration order of `_ANTHROPIC_SAMPLING_PARAMS` so the warning stays
deterministic.
Comment thread docs/models/anthropic.md
from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel, AnthropicModelSettings

model = AnthropicModel('claude-opus-4-7')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code block has {title="anthropic_task_budget.py"} but is missing test="skip". Since AnthropicModel('claude-opus-4-7') requires API credentials, the doc test runner will fail on this example. All other provider-specific examples on this page include test="skip" — this one needs it too.

(The guidelines say to avoid test="skip" when possible, but here it's unavoidable since the constructor needs a real Anthropic API key.)

Comment thread tests/test_thinking.py
from ._inline_snapshot import snapshot
from .conftest import try_import

with try_import() as anthropic_imports:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bulk of the test_thinking.py diff (moving inline pytest.importorskip + imports to module-level try_import() blocks and @pytest.mark.skipif decorators) is an unrelated refactoring that accounts for the majority of this file's churn. Per guidelines, PRs should stay focused on their stated purpose — could this cleanup be split into a separate PR? It makes it harder to review the actual task-budget test additions amid 200+ lines of import restructuring.

@DouweM — is this refactoring something you'd prefer in a separate PR, or is it fine to include here?

"""Anthropic model names from the installed SDK."""

AnthropicModelName = LatestAnthropicModelNames
AnthropicModelName = ModelParam
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of LatestAnthropicModelNames is unrelated to task budgets. While it wasn't exported from __init__.py, it was a documented module-level name that users could import from pydantic_ai.models.anthropic. Removing it is technically a breaking change (albeit minor since AnthropicModelName still exists and resolves to the same type). Consider keeping it as a deprecated alias or splitting this into a separate cleanup PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

anthropic feature New feature request, or PR implementing a feature (enhancement) size: M Medium PR (101-500 weighted lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants