feat: add HandleDeferredToolCalls capability and handle_deferred_tool_calls hook#5142
feat: add HandleDeferredToolCalls capability and handle_deferred_tool_calls hook#5142
HandleDeferredToolCalls capability and handle_deferred_tool_calls hook#5142Conversation
…ool_calls` hook Add a capability hook for inline resolution of deferred tool calls (approval-required and externally-executed) during agent runs. - New `handle_deferred_tool_calls` hook on `AbstractCapability` with accumulation dispatch in `CombinedCapability` (each capability resolves what it can, remaining passed to next) - New `HandleDeferredToolCalls` capability wrapping a user handler function - ToolManager owns all deferred resolution: `resolve_deferred_tool_calls()`, `execute_deferred_tool_results()`, and `handle_call(resolve_deferred=True)` default for automatic inline resolution - `process_tool_calls` uses ToolManager methods with loop for re-raised deferrals - Helper methods: `DeferredToolRequests.build_results()`, `.remaining()`, `DeferredToolResults.merge()` - Drop construction-time `requires_approval` check (runtime check catches it) Closes #3959 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
| continue | ||
| except (CallDeferred, ApprovalRequired): | ||
| # Tool re-raised deferral — it goes back to unresolved | ||
| continue |
There was a problem hiding this comment.
Bug: when an approved tool re-raises CallDeferred or ApprovalRequired during execution, the call is silently lost. The continue here skips adding it to executed, but remaining = requests.remaining(results) at line 757 also excludes it because the call's ID is in results.approvals. So the call ends up in neither executed nor remaining.
The comment says "it goes back to unresolved" but the code doesn't do that. The resolution loop in _agent_graph.py won't see the call again because it iterates over deferred_tool_requests (the remaining from this method).
To fix, track re-raised call IDs and exclude them from resolved_ids when computing remaining, or build remaining by subtracting only the IDs that were actually executed or denied (not just those the handler claimed to handle).
| metadata: dict[str, dict[str, Any]] | None = None, | ||
| ) -> DeferredToolResults: | ||
| """Create a [`DeferredToolResults`][pydantic_ai.tools.DeferredToolResults] for these requests.""" | ||
| return DeferredToolResults(approvals=approvals or {}, calls=calls or {}, metadata=metadata or {}) |
There was a problem hiding this comment.
build_results doesn't reference self at all — it doesn't validate that the provided IDs match this request's calls/approvals, nor does it pre-populate from the request. As-is it's just DeferredToolResults(approvals=approvals or {}, ...) and could be a @staticmethod or omitted entirely.
If the intent is a convenience builder, it would be more useful if it actually used self — e.g. to auto-approve all calls: requests.build_results(approve_all=True). Otherwise this adds surface area to the public API without adding value over constructing DeferredToolResults directly.
| approvals=[c for c in self.approvals if c.tool_call_id not in resolved_ids], | ||
| metadata={k: v for k, v in self.metadata.items() if k not in resolved_ids}, | ||
| ) | ||
| return remaining if remaining.calls or remaining.approvals else None # pyright: ignore[reportReturnType] |
There was a problem hiding this comment.
The pyright: ignore[reportReturnType] here is because the return type is Self | None but the method constructs a plain DeferredToolRequests(...) which isn't Self if this class is subclassed. Since DeferredToolRequests isn't designed for subclassing (it's kw_only=True dataclass), the simpler fix is to use DeferredToolRequests | None as the return type instead of Self | None, avoiding the type suppression.
| # Loop to handle cases where re-execution raises new deferrals that the | ||
| # same handler chain can resolve (e.g. a capability that registers | ||
| # approval-required tools and also handles them). | ||
| max_resolution_rounds = 5 |
There was a problem hiding this comment.
5 is a magic number — extract it to a module-level constant (e.g. _MAX_DEFERRED_RESOLUTION_ROUNDS = 5).
Also, when the limit is hit (the for/else path at line 1575), the remaining deferred calls are silently treated as unresolved. This could mask infinite-loop bugs during development. Consider logging a warning when the max is reached so users can diagnose unexpected deferred-tool-as-output behavior.
| """Metadata for deferred tool calls, keyed by `tool_call_id`. Each value will be available in the tool's RunContext as `tool_call_metadata`.""" | ||
|
|
||
| def merge(self, other: DeferredToolResults) -> None: | ||
| """Merge another `DeferredToolResults` into this one in-place.""" |
There was a problem hiding this comment.
Naming nit: merge as an in-place mutating method is surprising on a dataclass. The standard library convention is update for in-place dict-like mutation (à la dict.update()). If this stays in-place, renaming to update would be more Pythonic.
Docs Preview
|
- Fix re-deferred calls being silently dropped: track re-deferred IDs and add them back to remaining requests - Rename `DeferredToolResults.merge()` to `.update()` for Pythonic naming - Fix `DeferredToolRequests.remaining()` return type: concrete type - Extract `_MAX_DEFERRED_RESOLUTION_ROUNDS` constant with warning - Fix empty DeferredToolResults causing unnecessary loop iterations - Extract `_execute_approval_result` helper to reduce complexity Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The loop was offering the same handler chain repeated chances to resolve calls it already declined or that re-raised during execution. A single resolution pass is simpler and more predictable: unresolved and re-deferred calls go to remaining in one shot. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
| # Let capability handlers resolve deferred calls inline (one shot). | ||
| # If re-execution raises new deferrals, they go into remaining. | ||
| handler_results = await tool_manager.resolve_deferred_tool_calls(deferred_tool_requests) | ||
| if handler_results is not None: | ||
| executed, deferred_tool_requests = await tool_manager.execute_deferred_tool_results( | ||
| deferred_tool_requests, handler_results | ||
| ) | ||
| for _call_part, result_part in executed: | ||
| output_parts.append(result_part) | ||
| yield _messages.FunctionToolResultEvent(result_part) |
There was a problem hiding this comment.
📝 Info: Handler-resolved batch path skips usage limit checks for tool execution
In _agent_graph.py:1597-1606, handler-resolved calls go through _call_tools for execution without a preceding usage-limit check (the check at line 1488 only covers calls_to_run). Additionally, _call_tool calls execute_tool_call without passing usage, so executed tools don't increment usage.tool_calls. However, this is consistent with the existing deferred-resume path (via UserPromptNode._handle_deferred_tool_results), which also goes through _call_tools → _call_tool without usage counting. The handler-resolved path is not a regression — it follows the same pattern as pre-existing deferred tool execution.
Was this helpful? React with 👍 or 👎 to provide feedback.
…erred metadata - Fix external call ToolReturn.metadata being dropped in execute_deferred_tool_results - Fix ToolReturn.content (user prompt content) being dropped in _execute_approval_result — now returns user content alongside result part, and _agent_graph appends UserPromptPart - Fix re-deferred calls losing their new exception metadata — now captures and carries metadata from the re-raised exception Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
| if handler_results is not None: | ||
| executed, deferred_tool_requests = await tool_manager.execute_deferred_tool_results( | ||
| deferred_tool_requests, handler_results | ||
| ) | ||
| for _call_part, result_part, user_content in executed: | ||
| output_parts.append(result_part) | ||
| if user_content: | ||
| output_parts.append(_messages.UserPromptPart(content=user_content)) | ||
| yield _messages.FunctionToolResultEvent(result_part, content=user_content) |
There was a problem hiding this comment.
🚩 Missing FunctionToolCallEvent for non-ToolApproved handler-resolved calls in batch path
In the batch handler path (_agent_graph.py:1573-1606), FunctionToolCallEvents are only emitted for ToolApproved calls that reach the validation step (line 1590). For non-ToolApproved handler results (ToolDenied, ModelRetry, RetryPromptPart, ToolReturn), no FunctionToolCallEvent is emitted before the FunctionToolResultEvent produced by _call_tools. This contrasts with the UserPromptNode resume path (line 1484-1486) where non-ToolApproved deferred results get FunctionToolCallEvent(call) without args_valid. This is a minor inconsistency in event emission that could affect streaming consumers that expect paired call/result events, but is unlikely to cause practical issues since these calls were already deferred (and got their initial call event during the first processing pass).
Was this helpful? React with 👍 or 👎 to provide feedback.
- External call (CallDeferred) with ToolReturn metadata - handle_call(resolve_deferred=True) via nested tool manager (CodeMode pattern) - handle_call re-raise when no handler available - build_results() helper - WrapperCapability delegation of handle_deferred_tool_calls Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…a, batch deny - External call with plain value (not ToolReturn) — covers non-ToolReturn branch - Re-deferred tool with metadata — covers re_deferred_ids + metadata tracking - Batch denial — covers ToolDenied branch in execute_deferred_tool_results Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
When an approved tool re-raises as CallDeferred (instead of ApprovalRequired), place it in remaining.calls, not remaining.approvals. The re-raised exception type determines the category. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…lse, per-call happy path - Test get_serialization_name() returns None - Test handle_call(resolve_deferred=True) per-call resolution via nested ToolManager (exercises _resolve_single_deferred happy path and _execute_approval_result execution) - Test handle_call(resolve_deferred=False) propagates exceptions - Fix re-deferred categorization by new exception type Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Add explicit assertion that _resolve_single_deferred returns the correct tool result content - Verify tool return visible in agent messages - pragma: no cover on unreachable defensive raise in _resolve_single_deferred Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Address real coverage gaps (not an xdist issue as I initially thought): - ToolReturn with metadata + user content through approval path - Approved tool raising ModelRetry (RetryPromptPart handling) - ToolApproved(override_args=...) override path - handle_call path with ModelRetry (ToolRetryError propagation) - handle_call path when handler resolves wrong ID (remaining non-empty) - Re-deferred tool without metadata - Re-deferred as CallDeferred (not ApprovalRequired) — categorization - Mixed unresolved + re-deferred (remaining already populated) Also add pragma: no cover on defensive `return None` branch that only fires when root_capability/ctx is None (impossible via public API). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Simplify handler_b in accumulation test (no redundant tool_name check; handler_a already resolved tool_a so handler_b only sees tool_b) - Remove unused handler in resolve_deferred=False test - Simplify ModelRetry test — tool only called once, always raises retry Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…ferral When an approved tool re-raises with a different exception type or metadata than the original (e.g. ApprovalRequired → CallDeferred), the caller needs to see the new exception, not the stale original. Reconstruct the exception from remaining.calls/approvals and metadata. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…arts Fix semantic inconsistency where `handle_call(resolve_deferred=True)` was unwrapping `ToolReturn` wrappers (losing metadata and user content), while the non-deferred `handle_call` path returned the raw tool result verbatim. - New `DeferredCallOutcome` discriminated union: `DeferredCallSuccess`, `DeferredCallDenied`, `DeferredCallRetry` - `execute_deferred_tool_results` now returns `[(ToolCallPart, outcome)]` carrying the raw tool result / denial / retry part - New `build_tool_return_parts_from_outcomes` helper converts outcomes to `ToolReturnPart`s + user content for message history - `_resolve_single_deferred` returns raw tool result, preserving `ToolReturn` wrapper for CodeMode-style callers - `process_tool_calls` in `_agent_graph.py` composes the two methods Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
| The handler receives the [`RunContext`][pydantic_ai.tools.RunContext] and the | ||
| [`DeferredToolRequests`][pydantic_ai.tools.DeferredToolRequests], and must return | ||
| [`DeferredToolResults`][pydantic_ai.tools.DeferredToolResults] with results for | ||
| some or all pending tool calls. Unresolved calls are passed to the next capability |
There was a problem hiding this comment.
Let's allow the handler to return None as well if it doesn't want to handle at all.
|
|
||
| Called by [`ToolManager`][pydantic_ai.tool_manager.ToolManager] after tool calls raise | ||
| [`ApprovalRequired`][pydantic_ai.exceptions.ApprovalRequired] or | ||
| [`CallDeferred`][pydantic_ai.exceptions.CallDeferred]. |
There was a problem hiding this comment.
Also when the model calls a tool that was specifically registered as unapproved/external. We should link to the deferred tools / approval / external tools docs here.
| metadata: dict[str, dict[str, Any]] | None = None, | ||
| ) -> DeferredToolResults: | ||
| """Create a [`DeferredToolResults`][pydantic_ai.tools.DeferredToolResults] for these requests.""" | ||
| return DeferredToolResults(approvals=approvals or {}, calls=calls or {}, metadata=metadata or {}) |
There was a problem hiding this comment.
This is an instance method because in the future I expect to add a context: object field to Request that is meant to be reproduced on Results verbatim, but I agree with an earlier auto-review comment that this method doesn't serve much purpose right now.
So let's make it serve some purpose and raise an error if a result is provided for a tool call ID that wasn't in the appropriate "requests list". We may have a check already somewhere that you can't provide ToolApproved for an external call that wants a result. I don't know if we similarly block (or should block) providing a tool result directly for a tool that was looking for approval. But either way some validation here would be useful.
There was a problem hiding this comment.
An optional approve_all=True is also interesting. If that's provided along with approvals, it would apply to the rest?
|
|
||
|
|
||
| DeferredCallOutcome: TypeAlias = DeferredCallSuccess | DeferredCallDenied | DeferredCallRetry | ||
| """Outcome of resolving a single deferred tool call.""" |
There was a problem hiding this comment.
I see too much overlap here with these from tools.py:
DeferredToolApprovalResult: TypeAlias = Annotated[ToolApproved | ToolDenied, Discriminator('kind')]
"""Result for a tool call that required human-in-the-loop approval."""
DeferredToolCallResult: TypeAlias = Annotated[
Annotated[ToolReturn, Tag('tool-return')]
| Annotated[ModelRetry, Tag('model-retry')]
| Annotated[RetryPromptPart, Tag('retry-prompt')],
Discriminator(_deferred_tool_call_result_discriminator),
]
"""Result for a tool call that required external execution."""
DeferredToolResult = DeferredToolApprovalResult | DeferredToolCallResult
"""Result for a tool call that required approval or external execution."""
I don't think we need both? We should be able to use those?
| capability handler. When `True` (default), if the tool raises | ||
| [`ApprovalRequired`][pydantic_ai.exceptions.ApprovalRequired] or | ||
| [`CallDeferred`][pydantic_ai.exceptions.CallDeferred], the handler is | ||
| called to resolve it. When `False`, the exceptions propagate to the caller. |
There was a problem hiding this comment.
I suppose the case where we set this to False is if we want to call the handler in a batch? Do we do that currently in the agent graph?
| raise ToolRetryError(outcome.retry) | ||
| if isinstance(outcome, DeferredCallDenied): | ||
| # Consistent with the denial message surfacing as the tool return in message history | ||
| return outcome.denied.message |
There was a problem hiding this comment.
Do we lose outcome=denied on the ToolReturnPart here, somehow? should we raise the exception?...
| deferred_tool_requests, handler_results | ||
| ) | ||
| executed_parts = tool_manager.build_tool_return_parts_from_outcomes(executed_outcomes) | ||
| for _call_part, result_part, user_content in executed_parts: |
There was a problem hiding this comment.
I don't love the complicated list of tuples this method returns...
| executed_outcomes, deferred_tool_requests = await tool_manager.execute_deferred_tool_results( | ||
| deferred_tool_requests, handler_results | ||
| ) | ||
| executed_parts = tool_manager.build_tool_return_parts_from_outcomes(executed_outcomes) |
There was a problem hiding this comment.
Do these 2 method calls need to be separate? Would a single helper method make sense?
| raise exceptions.UserError( | ||
| 'To use tools that require approval, add `DeferredToolRequests` to the list of output types for this agent.' | ||
| ) | ||
| super().add_tool(tool) |
There was a problem hiding this comment.
Could we stop overriding this method now?
|
|
||
| agent = Agent( | ||
| FunctionModel(llm), | ||
| capabilities=[HandleDeferredToolCalls(handler=handle_deferred)], |
There was a problem hiding this comment.
We need docs where we currently introduce deferred/external/approved tools and how to work with DeferredToolRequests! This new approach is way simpler than what we had before. The old "stop the world/agent run" method still has its uses though, like with UI adapters where you need to return the pending tool call to the frontend, so that it can then start a new run that sends back the result/approval.
… DeferredCallOutcome - Add DeferredToolResults.to_tool_call_results() and reuse it in UserPromptNode and the new inline resolution path instead of hand-rolling the approvals/calls → DeferredToolResult conversion in ToolManager - Drop DeferredCallOutcome/Success/Denied/Retry, execute_deferred_tool_results, build_tool_return_parts_from_outcomes, and _execute_approval_result — the inline batch path now re-dispatches handler results through the existing _call_tools pipeline, so approvals/denials/retries/ToolReturn unwrapping all behave identically to the UserPromptNode resume flow - Drop the resolve_deferred flag on handle_call (no production caller used False) - Drop the no-op _AgentFunctionToolset.add_tool override - Allow HandleDeferredToolCalls handler to return None to decline handling - build_results validates tool call IDs match pending request lists and adds approve_all=True to auto-approve anything not explicitly specified - Expand handle_deferred_tool_calls docstring to cover tools registered as unapproved/external, with links to the deferred-tools docs - Add a "Resolving deferred calls inline" section to docs/deferred-tools.md and list HandleDeferredToolCalls in the built-in capabilities table
# Conflicts: # pydantic_ai_slim/pydantic_ai/_agent_graph.py
…inline validation Coverage gaps in CI post-merge: - tool_manager._resolve_single_deferred: approval via `bool False`, `ToolApproved(override_args=...)`, and external-call results for plain values / ModelRetry / RetryPromptPart - _agent_graph inline-handler path: UnexpectedModelBehavior on approved-call validation Added five via_handle_call tests covering the tool_manager branches; pragma'd the _agent_graph defensive branch (mirrors the non-deferred validation path above — naturally triggered there, requires exhausted retries + bad handler override_args here, which isn't realistically reachable from tests without artificial state).
|
|
||
| Note that handling deferred tool calls requires `DeferredToolRequests` to be in the `Agent`'s [`output_type`](output.md#structured-output) so that the possible types of the agent run output are correctly inferred. If your agent can also be used in a context where no deferred tools are available and you don't want to deal with that type everywhere you use the agent, you can instead pass the `output_type` argument when you run the agent using [`agent.run()`][pydantic_ai.agent.AbstractAgent.run], [`agent.run_sync()`][pydantic_ai.agent.AbstractAgent.run_sync], [`agent.run_stream()`][pydantic_ai.agent.AbstractAgent.run_stream], or [`agent.iter()`][pydantic_ai.agent.Agent.iter]. Note that the run-time `output_type` overrides the one specified at construction time (for type inference reasons), so you'll need to include the original output type explicitly. | ||
|
|
||
| As an alternative to returning control to the caller, a [`HandleDeferredToolCalls`][pydantic_ai.capabilities.HandleDeferredToolCalls] capability can resolve deferred calls inline with a handler function, so the agent run continues in one step without needing to end and restart. See [Resolving deferred calls inline](#resolving-deferred-calls-inline). |
There was a problem hiding this comment.
I'd prefer to present this as the primary method of dealing with deferred tools, with both methods getting their own section but the handler method being introduced first.
| elif approval is False: | ||
| approval = ToolDenied() | ||
| if isinstance(approval, ToolDenied): | ||
| return approval.message |
There was a problem hiding this comment.
It feels like we're missing setting ToolReturnPart.outcome='denied' here?
| if approval is True: | ||
| approval = ToolApproved() | ||
| elif approval is False: | ||
| approval = ToolDenied() |
There was a problem hiding this comment.
There is some apparent duplication between here and to_tool_call_results. Inevitable?
…handler
- Introduce `ToolDeniedError` exception (wraps `ToolDenied`). `_resolve_single_deferred`
now raises it on denial instead of returning the denial message verbatim, so callers
(e.g. CodeMode in the harness) can distinguish a denied call from a successful tool
return and record a `ToolReturnPart(outcome='denied')` in message history.
- `_resolve_single_deferred` reuses `DeferredToolResults.to_tool_call_results()` for the
bool/`ToolReturn` normalization it was duplicating, and adds a cross-reference comment
to `_call_tool` since both paths must accept the same `DeferredToolResult` surface.
- Flip docs to lead with the handler approach as the recommended primary method;
stop-the-world remains described as the UI-adapter pattern.
- Fill test spectrum gaps flagged in review:
* Batch path denial tests now assert `outcome='denied'` on the resulting ToolReturnPart
* Added batch tests for `approvals[id] = False`, default `ToolDenied()`, default
`ToolApproved()`, handler-supplied external `ToolReturn(metadata,content)`,
handler-supplied external `ModelRetry`, handler-supplied external `RetryPromptPart`
* Added per-call test for handler-supplied external `ToolReturn` verbatim-preservation
# Conflicts: # docs/capabilities.md # pydantic_ai_slim/pydantic_ai/capabilities/__init__.py
…er denials When the `HandleDeferredToolCalls` handler denies a tool call, `handle_call` now raises `ToolDeniedError` (on pydantic-ai-slim once released) instead of returning the denial message as a plain string. CodeMode catches it, records a `ToolReturnPart(outcome='denied')` in `nested_returns` so message history reflects the denial correctly, and re-raises so the sandbox surfaces the denial as an exception rather than as what would look like a successful tool return. The `ToolDeniedError` import is gated behind a compat shim so this module still loads against the currently released pydantic-ai-slim (which lacks the exception); the shim resolves to a placeholder class that never matches a real exception, leaving the except clause inert until a release ships `ToolDeniedError`. Depends on pydantic/pydantic-ai#5142.
| 1. Never reached here — the handler denies this call, so the model sees the denial message instead. | ||
| 2. The handler supplies the result for this external call, so the tool body just signals the deferral. | ||
|
|
||
| If the handler declines to resolve some or all of the calls (by omitting them from the returned [`DeferredToolResults`][pydantic_ai.tools.DeferredToolResults] or returning `None`), those calls bubble up as a [`DeferredToolRequests`][pydantic_ai.output.DeferredToolRequests] output, and `DeferredToolRequests` must be in the agent's output type — so you can combine inline handling with the stop-the-world flow when it makes sense. |
There was a problem hiding this comment.
Above it says somewhere that the handler should provide a result for "each pending call", but that's too strict, because we do allow partial resolution, and having multiple capabilities each handle a subset of deferred tool calls.
By the way, if we haven't yet done that, we should also add docs for the new handle_... "hook" method on capabilities.md. And likely also hooks.md, making sure that it can be used by the Hooks capability, like most of the other methods on AbstractCapability are.
There was a problem hiding this comment.
So "bubble up as output" is not completely accurate -- because they can also be handled y anotehr handler.
There was a problem hiding this comment.
It's also worth explicitly specifying that there is a hook method on the capability class (by linking to the capabilities doc for example), if you want to use HITL/external function calls in a custom capability.
| return '\n'.join(lines) | ||
|
|
||
|
|
||
| class ToolDeniedError(Exception): |
There was a problem hiding this comment.
Having this as a public class makes me want to also handle it in places like we handle ModelRetry, as a way for a tool call function to explicitly raise/wrap an error as "this means you weren't allowed to do this" (in surfacing to a user or o11y backend), even if our native tool approval wasn't used. That's similar to the ToolFailed exception being introduced in #2586 which is worth reading. But I don't want to implement the full scope of that right now (unless you think it's really easy as we're working on this code anyway -- to allow tool functions anywhere to raise ToolFailed('...', reason='denied') which is registered as outcome='denied'? But I'd prefer not to bring that in scope, as that'd imply documenting it etc...).
Does this need to be public? Is there another convention we could find for the way a method indicates that it was a tool-denial, e.g. returning ToolDenied | Any and telling people to isinstance check it? We already pass A | B | Any around in a few places. Discuss with me please.
…ks pages - `deferred-tools.md`: relax "each pending call" wording; clarify that unresolved calls can still be picked up by another handler in the chain before bubbling up; explicitly point custom-capability authors at the `handle_deferred_tool_calls` hook on `AbstractCapability`. - `capabilities.md`: document `handle_deferred_tool_calls` alongside the other lifecycle hooks, with a link back to the dedicated capability for application code. - `hooks.md`: add the new hook to the `Hooks` page with a runnable example. - `Hooks`: add `handle_deferred_tool_calls` registration (decorator + constructor kwarg) so it can be used like every other capability hook.
…ndle_call` Following review discussion: a denial isn't an exceptional condition, it's an outcome of the deferred-tool handler. Returning the existing public `ToolDenied` value (instead of raising a new public exception) reuses the class that already participates in `DeferredToolApprovalResult`, keeps the public surface smaller, and avoids implicitly inviting `ToolFailed`-style "raise to mark a denial" patterns from arbitrary tool functions (the broader scope being tracked separately in #2586). Callers of `ToolManager.handle_call` must now `isinstance`-check the return value to distinguish a denial from a successful tool result — documented in the docstring. The harness `dispatch_tool_call` will raise a sandbox-side exception when it sees `ToolDenied`, rather than surfacing the marker into the user's script. - Remove `ToolDeniedError` from `exceptions.py` and the top-level export - `_resolve_single_deferred` returns `ToolDenied` instead of raising - `handle_call` return annotation is now `Any | ToolDenied`; docstring spells out the `isinstance`-check requirement - `result.py` casts the output-tool path (output tools can't be deferred) - Tests updated to assert on the returned value, not a caught exception
| approved=approved, | ||
| metadata=metadata, | ||
| ) | ||
| return await self.execute_tool_call(validated) | ||
| try: | ||
| return await self.execute_tool_call(validated) | ||
| except (CallDeferred, ApprovalRequired) as exc: | ||
| return await self._resolve_single_deferred(call, exc) |
There was a problem hiding this comment.
🚩 handle_call return type change is a public API contract change
The return type of ToolManager.handle_call changed from Any to Any | ToolDenied. While ToolManager is not a heavily used public API (primarily internal), it IS exposed on RunContext.tool_manager and the PR's own tests show users calling ctx.tool_manager.handle_call() from within tool functions. Existing callers that don't check for ToolDenied will silently treat the denial message string as a successful tool result. The docstring at tool_manager.py:634-639 documents this requirement clearly, and the only internal caller (result.py:209-216) is correctly handled with a cast and comment explaining output tools can't be deferred. This is a valid design trade-off (returning ToolDenied vs raising an exception), but worth noting in changelog/migration docs.
(Refers to lines 612-657)
Was this helpful? React with 👍 or 👎 to provide feedback.
| hooks = Hooks() | ||
|
|
||
|
|
||
| @hooks.on.handle_deferred_tool_calls |
There was a problem hiding this comment.
Would this read nicer as on.deferred_tool_calls?
| approved: bool = False, | ||
| metadata: Any = None, | ||
| ) -> Any: | ||
| ) -> Any | ToolDenied: |
There was a problem hiding this comment.
Would it make sense to list ToolReurn[Any] in these types of cases as well? And I prefer to end with Any and start specific, here and elsewhere.
…Any]` first
- Drop the `handle_` prefix from the `Hooks` registration to match the
convention used elsewhere (e.g. `wrap_run` → `on.run`, `wrap_tool_execute`
→ `on.tool_execute`). The internal registry key (and the underlying
`AbstractCapability.handle_deferred_tool_calls` method name) stay the
same — only the user-facing hook name changes:
`@hooks.on.deferred_tool_calls` / `Hooks(deferred_tool_calls=...)`
- Spell out the `handle_call` / `_resolve_single_deferred` return types
as `ToolDenied | ToolReturn[Any] | Any` — specific variants first, `Any`
last — so the documented `ToolReturn` wrapper case is visible in the
signature too.
…andleDeferredToolCalls` (#220) * feat(code_mode): resolve deferred tool calls via HandleDeferredToolCalls Tools with `kind='external'` or `'unapproved'` (and tools that raise ApprovalRequired/CallDeferred at runtime) are no longer excluded from the sandbox and promoted back to native tools. They now take the normal sandboxed path, and a HandleDeferredToolCalls capability on the agent can resolve them inline — so the model sees the resolved return value instead of having the deferral bounce out as a separate native tool call. - Remove the td.defer filter in _partition_callable_tools (no more native fallback for deferred tools). - Drop the native_fallbacks return value and the corresponding deferred-tool warning. - Update the sandbox UserError message when no handler is configured to point users at HandleDeferredToolCalls. - Update the deferred_execution test to assert sandbox inclusion and the approval-retry test to match the new error message. Depends on pydantic/pydantic-ai#5142 landing and being released; once it does, bump the pydantic-ai-slim lower bound. * code_mode: record outcome='denied' on nested ToolReturnPart for handler denials When the `HandleDeferredToolCalls` handler denies a tool call, `handle_call` now raises `ToolDeniedError` (on pydantic-ai-slim once released) instead of returning the denial message as a plain string. CodeMode catches it, records a `ToolReturnPart(outcome='denied')` in `nested_returns` so message history reflects the denial correctly, and re-raises so the sandbox surfaces the denial as an exception rather than as what would look like a successful tool return. The `ToolDeniedError` import is gated behind a compat shim so this module still loads against the currently released pydantic-ai-slim (which lacks the exception); the shim resolves to a placeholder class that never matches a real exception, leaving the except clause inert until a release ships `ToolDeniedError`. Depends on pydantic/pydantic-ai#5142. * code_mode: handle denials via `ToolDenied` return value, not exception `ToolManager.handle_call` no longer raises a (now-removed) `ToolDeniedError` on handler denial — it returns the `ToolDenied` value the handler produced. Drop the compat shim, import `ToolDenied` directly, and switch the dispatch to inspect the return value: record the denial as `outcome='denied'` on the nested `ToolReturnPart` and raise a `RuntimeError` inside the sandbox so the script can't mistake the denial message for a regular string return. * Bump pydantic-ai-slim lock + cover denial path Now that the slim PR has merged to main, refresh the lockfile to pick up the `HandleDeferredToolCalls` capability and `handle_call`'s `ToolDenied` return value. Add a denial test that asserts the denied-call flow surfaces as `ModelRetry` with the original denial message preserved in the trace. Notes on the test: - The handler returns `ToolDenied('nope')`; the harness records `outcome='denied'` on the nested `ToolReturnPart` and raises `RuntimeError` inside the sandbox. - The script doesn't catch the RuntimeError, so Monty surfaces it as `MontyRuntimeError`, which the harness converts back to `ModelRetry`. The retry message preserves the denial message so the model knows what went wrong. * ci: test against floor pydantic-ai-slim (1.80.0) in addition to main The default `test` matrix uses the `[tool.uv.sources]` override pinning slim to its main branch, so it never exercises the published-PyPI install path. Add a `test-floor` job that overrides slim to the lowest version declared in `pyproject.toml` (>=1.80.0) and runs the test suite, so we catch any accidental dependency on unreleased slim features in code paths that should be backward-compatible. Gate the new HandleDeferredToolCalls denial test with `pytest.skip` when the capability isn't importable — currently the only test that requires a post-1.80.0 slim, but the pattern can be reused if more land later. * fix coverage: pragma the floor-only skip in the denial test The `except ImportError → pytest.skip` branch only fires when running against the slim floor (1.80.0) where `HandleDeferredToolCalls` doesn't exist yet. The default test matrix runs against slim main, so coverage counted those two lines as uncovered. Mark the branch `# pragma: no cover` since it's an explicit skip path that the floor-slim CI job exercises but isn't included in the coverage report (the floor job doesn't gate on coverage by design).
|
Awesome, curious to know the timeframe of the next release so we can migrate the Agent Runtimes tool approvals implementation datalayer/agent-runtimes#81 |
…ool_calls` hook (pydantic#5142) Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
…e_single_deferred`; drop unused `TestDeps` Auto-review caught a pre-existing latent bug from #5142: `_resolve_single_deferred` re-validates and re-executes the approved tool without forwarding `wrap_validation_errors`, so a sandboxed caller using `handle_call(wrap_validation_errors=False)` on a tool that requires approval would see post-approval errors wrap as `ToolRetryError` and consume the retry budget — inconsistent with the caller's intent. Fix by threading the flag through. Handler-constructed retry signals (`ModelRetry` / `RetryPromptPart` returned by the handler) still surface as `ToolRetryError` regardless — those are handler outputs, not exceptions raised by validation or the tool body. Also addresses the nit on the new test using an unused `TestDeps` dataclass — matches the rest of the file using `None` as deps.
Summary
handle_deferred_tool_callshook onAbstractCapabilityfor inline resolution of deferred tool calls (approval-required and externally-executed) during agent runsHandleDeferredToolCallscapability that wraps a user handler functionresolve_deferred_tool_calls(),execute_deferred_tool_results(), andhandle_call(resolve_deferred=True)defaultCombinedCapability: each capability resolves what it can, remaining passed to nextDeferredToolRequests.build_results(),.remaining(),DeferredToolResults.merge()requires_approvalcheck (runtime check catches it)Closes #3959
Test plan
HandleDeferredToolCalls: approve, deny, no-output-type-needed, fallback, partial resolution, sync handler, accumulation dispatch, unresolved errortest_tool_requires_approval_error→test_tool_requires_approval_no_output_type(construction-time check removed)🤖 Generated with Claude Code