Adding per tool usage limit#3691
Conversation
… if it should be somewhere else altogether to cover granular limit exceeded as well?
5f4c955 to
0f63561
Compare
… it does seem to be coming together
…tic#3691 Analysis addresses Douwe's request about feature interaction. Context: PR pydantic#3691 by @adtyavrdhn (1 month of work) introduces ToolPolicy for tool usage limits with configurable behaviors. Our ToolFailed work addresses tool failure handling from issue pydantic#2586. Key findings: - ToolPolicy and ToolFailed address different but related concerns - Both can coexist: ToolPolicy = declarative, ToolFailed = explicit - ToolFailed can adapt to fit ToolPolicy's mode-based design - PR pydantic#3691 has priority - we defer to team on integration approach Integration options: 1. ToolFailed maps to ToolPolicy modes (recommended if both proceed) 2. Wait for pydantic#3691, implement via its on_error modes 3. Parallel features, unify later based on usage Suggests discussion points for team sync respecting pydantic#3691's timeline and existing design work.
Add section noting @adtyavrdhn's ToolPolicy work and how features relate. Defer to team sync on integration approach - respecting their timeline and existing design work.
Quick reference doc outlining: - What we built (ToolFailed implementation) - How it relates to PR pydantic#3691 - Options for moving forward - Recommendation to defer to team sync Not submitting PR yet - waiting for coordination clarity.
There was a problem hiding this comment.
🟡 MCP tool prefix handling changes ctx.tool_name to unprefixed, breaking ctx.tool_use/tools_use_counts consistency for prefixed MCP tools
When MCPServer.tool_prefix is set, MCPServer.call_tool() strips the prefix and replaces ctx.tool_name with the unprefixed name.
Actual behavior: the framework’s successful-use counters (RunContext.tools_use_counts) are keyed by the tool name used in the ToolCallPart (the prefixed name), but the RunContext seen by process_tool_call is mutated to unprefixed tool_name. As a result, ctx.tool_use (and direct lookups in ctx.tools_use_counts) will read the wrong key and typically show 0 even after prior successful calls.
Expected behavior: either the tool-use counters should use the same key as ctx.tool_name presented to callbacks, or ctx.tool_name should remain consistent with the tool name used for accounting.
Impact: any custom process_tool_call logic (or future MCP tooling relying on ctx.tool_use) will behave incorrectly when a prefix is configured—e.g. it may believe a tool has never been used and repeatedly call it.
Click to expand
MCPServer.call_tool()strips prefix and rewritesctx.tool_name:pydantic_ai_slim/pydantic_ai/mcp.py:570-572
if self.tool_prefix:
name = name.removeprefix(f'{self.tool_prefix}_')
ctx = replace(ctx, tool_name=name)- Successful-use counting is keyed off
ToolCallPart.tool_name(the prefixed name) elsewhere (seeToolManager._call_tool()increment atpydantic_ai_slim/pydantic_ai/_tool_manager.py:199-201).
(Refers to lines 570-577)
Recommendation: Keep ctx.tool_name consistent with the key used for tools_use_counts (likely the prefixed tool name). If you need to pass the unprefixed name to the MCP server, do so via a separate local variable (e.g. server_tool_name) without rewriting ctx.tool_name, or update counting to use the unprefixed name everywhere for MCP tools.
Was this helpful? React with 👍 or 👎 to provide feedback.
| call.args or {}, allow_partial=pyd_allow_partial, context=ctx.validation_context | ||
| ) | ||
|
|
||
| return await self.toolset.call_tool(name, args_dict, ctx, tool) | ||
| result = await self.toolset.call_tool(name, args_dict, ctx, tool) | ||
| self.ctx.tools_use_counts[name] = self.ctx.tools_use_counts.get(name, 0) + 1 | ||
| return result |
There was a problem hiding this comment.
🟡 Output tools incorrectly increment RunContext.tools_use_counts despite comment that output tools are not counted
ToolManager.handle_call() explicitly treats output tools as “not traced and not counted”, but _call_tool() increments self.ctx.tools_use_counts[name] for all tool kinds, including output tools.
Actual behavior: successful output-tool executions increase RunContext.tools_use_counts, so ctx.tools_use_counts / ctx.tool_use can include output-tool invocations even though RunUsage.tool_calls does not.
Expected behavior: output tools should not affect the new “successful tool use” counters (or the comment/semantics should be updated consistently).
Impact: users relying on RunContext.tools_use_counts (and examples/documentation encourage this) may see inflated totals and may write incorrect logic (e.g. gating behavior or auditing tool usage) because output-tool calls are mixed into the same counter.
Click to expand
- Output tools are routed through
_call_tool()viahandle_call():pydantic_ai_slim/pydantic_ai/_tool_manager.py:130-139
_call_tool()incrementstools_use_countsunconditionally:pydantic_ai_slim/pydantic_ai/_tool_manager.py:199-201
# handle_call
if (tool := self.tools.get(call.tool_name)) and tool.tool_def.kind == 'output':
# Output tool calls are not traced and not counted
return await self._call_tool(...)
# _call_tool
result = await self.toolset.call_tool(...)
self.ctx.tools_use_counts[name] = self.ctx.tools_use_counts.get(name, 0) + 1(Refers to lines 130-201)
Recommendation: Skip incrementing tools_use_counts for output tools (e.g. guard in handle_call() or inside _call_tool() based on tool.tool_def.kind). Alternatively, if output tools should be counted, update comments/docs and ensure all related counters (RunUsage.tool_calls, usage limits) use the same definition.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Will read more on this but great find if correct.
…tic#3691 Analysis addresses Douwe's request about feature interaction. Context: PR pydantic#3691 by @adtyavrdhn (1 month of work) introduces ToolPolicy for tool usage limits with configurable behaviors. Our ToolFailed work addresses tool failure handling from issue pydantic#2586. Key findings: - ToolPolicy and ToolFailed address different but related concerns - Both can coexist: ToolPolicy = declarative, ToolFailed = explicit - ToolFailed can adapt to fit ToolPolicy's mode-based design - PR pydantic#3691 has priority - we defer to team on integration approach Integration options: 1. ToolFailed maps to ToolPolicy modes (recommended if both proceed) 2. Wait for pydantic#3691, implement via its on_error modes 3. Parallel features, unify later based on usage Suggests discussion points for team sync respecting pydantic#3691's timeline and existing design work.
Add section noting @adtyavrdhn's ToolPolicy work and how features relate. Defer to team sync on integration approach - respecting their timeline and existing design work.
Quick reference doc outlining: - What we built (ToolFailed implementation) - How it relates to PR pydantic#3691 - Options for moving forward - Recommendation to defer to team sync Not submitting PR yet - waiting for coordination clarity.
Introduces ToolFailed exception to allow tools to fail without terminating the agent run. This is especially useful for parallel tool execution where partial failures should not stop the entire batch. Key features: - Errors are traced in telemetry (unlike ModelRetry) - Agent continues execution (unlike arbitrary exceptions) Three exception modes: - ModelRetry: Expected retry behavior, not an error - ToolFailed(disable=False): System error that should be logged/monitored - ToolFailed(disable=True): Permanent failure, disable tool This can coexist with ToolPolicy from pydantic#3691. The main difference here is that ToolFailed is explicit inline in the user's tools (in Python) whereas ToolPolicy is declarative. I'll need to do more thinking on how to have `ToolFailed` map to `ToolPolicy` modes.
Introduces ToolFailed exception to allow tools to fail without terminating the agent run. This is especially useful for parallel tool execution where partial failures should not stop the entire batch. Key features: - Errors are traced in telemetry (unlike ModelRetry) - Agent continues execution (unlike arbitrary exceptions) Three exception modes: - ModelRetry: Expected retry behavior, not an error - ToolFailed(disable=False): System error that should be logged/monitored - ToolFailed(disable=True): Permanent failure, disable tool This can coexist with ToolPolicy from pydantic#3691. The main difference here is that ToolFailed is explicit inline in the user's tools (in Python) whereas ToolPolicy is declarative. I'll need to do more thinking on how to have `ToolFailed` map to `ToolPolicy` modes.
Introduces ToolFailed exception to allow tools to fail without terminating the agent run. This is especially useful for parallel tool execution where partial failures should not stop the entire batch. Key features: - Errors are traced in telemetry (unlike ModelRetry) - Agent continues execution (unlike arbitrary exceptions) Three exception modes: - ModelRetry: Expected retry behavior, not an error - ToolFailed(disable=False): System error that should be logged/monitored - ToolFailed(disable=True): Permanent failure, disable tool This can coexist with ToolPolicy from pydantic#3691. The main difference here is that ToolFailed is explicit inline in the user's tools (in Python) whereas ToolPolicy is declarative. I'll need to do more thinking on how to have `ToolFailed` map to `ToolPolicy` modes.
Introduces ToolFailed exception to allow tools to fail without terminating the agent run. This is especially useful for parallel tool execution where partial failures should not stop the entire batch. Key features: - Errors are traced in telemetry (unlike ModelRetry) - Agent continues execution (unlike arbitrary exceptions) Three exception modes: - ModelRetry: Expected retry behavior, not an error - ToolFailed(disable=False): System error that should be logged/monitored - ToolFailed(disable=True): Permanent failure, disable tool This can coexist with ToolPolicy from pydantic#3691. The main difference here is that ToolFailed is explicit inline in the user's tools (in Python) whereas ToolPolicy is declarative. I'll need to do more thinking on how to have `ToolFailed` map to `ToolPolicy` modes.
|
/gemini summary |
Summary of ChangesThis pull request introduces a flexible system for managing tool usage within the agent framework, moving beyond strict error-raising limits to a more adaptive approach. By providing granular control over individual tools and aggregate agent-level policies, it empowers developers to guide model behavior when resource constraints are met, rather than abruptly halting execution. This enhancement allows for more sophisticated and resilient agent designs, particularly for managing expensive or rate-limited external services. Highlights
Changelog
Activity
|
…ctx.max_retries` on tool path Preparatory refactor of the output-retry machinery with three independently-motivated but tightly-coupled changes: - Rename confusing private/internal retry fields so the mental model is obvious from code: `Agent._max_result_retries` -> `_max_output_retries`; `GraphAgentDeps.max_result_retries` -> `max_output_retries`; `GraphAgentState.retries` -> `output_retries_used`; `GraphAgentState.increment_retries` -> `consume_output_retry`; `OutputToolset._output_max_retries` removed. Error message `Exceeded maximum retries (N) for output validation` -> `Exceeded maximum output retries (N)`. - Add `output_retries` kwarg to `run` / `run_sync` / `run_stream` / `run_stream_sync` / `run_stream_events` / `iter` with precedence `run arg > spec > agent default`. Plumbs through WrapperAgent and all three durable_exec wrappers (dbos/prefect/temporal). Runtime override clones the shared output toolset before mutating `max_retries` so concurrent runs don't race. - Fix the Devin review comment on pydantic#4956: `ctx.max_retries` in an output validator on the tool path now reflects `tool.max_retries` (the per-tool enforcement limit that will actually stop the validator) instead of the agent-level global default. Text path is unchanged. Also documents the two-path enforcement model in `docs/agent.md` with a new "How output retries are enforced" subsection. Intentionally out of scope: `tool_retries` parameter (blocked by ToolUsePolicy design in pydantic#3691), `Agent.override()` extension (reachable via `spec=` today), and deprecating `retries`.
Closes #3352
Soft Tool Usage Limits
Introduces soft tool usage limits via
ToolPolicyandToolsPolicy. Unlike the hardUsageLimits.tool_calls_limit(raisesUsageLimitExceeded), soft limits return a message to the model so it can adapt.APIs
ToolPolicy- Per-Tool LimitsSet on individual tools via
@agent.tool(usage_policy=...):ToolsPolicy- Agent-Wide LimitsSet on Agent via
tools_policyor per-run:Parameters
max_usesmax_uses_per_steppartial_acceptanceper_toolBehavior
max_usesis reached, the tool is removed from available toolspartial_acceptance=True(default): If model requests 5 calls but only 3 are allowed, 3 accepted, 2 rejectedpartial_acceptance=False: All-or-nothing — entire batch rejected if it would exceed limitsComparison to UsageLimits
UsageLimits.tool_calls_limitUsageLimitExceededToolPolicy.max_usesToolsPolicy.max_usesNot Included
Toolset-level handling (e.g.,
MCPServer(usage_policy=...)applying limits across all MCP tools as a group) is not part of this PR. Can take it up later? I am unsure of the value addition it will bring apart from what I've already added.