Skip to content

feat: add HandleDeferredToolCalls capability and handle_deferred_tool_calls hook#5142

Merged
DouweM merged 22 commits intomainfrom
deferred-tool-handler-2
Apr 24, 2026
Merged

feat: add HandleDeferredToolCalls capability and handle_deferred_tool_calls hook#5142
DouweM merged 22 commits intomainfrom
deferred-tool-handler-2

Conversation

@DouweM
Copy link
Copy Markdown
Collaborator

@DouweM DouweM commented Apr 17, 2026

Summary

  • Adds handle_deferred_tool_calls hook on AbstractCapability for inline resolution of deferred tool calls (approval-required and externally-executed) during agent runs
  • Adds HandleDeferredToolCalls capability that wraps a user handler function
  • ToolManager owns all deferred resolution logic: resolve_deferred_tool_calls(), execute_deferred_tool_results(), and handle_call(resolve_deferred=True) default
  • Accumulation dispatch in CombinedCapability: each capability resolves what it can, remaining passed to next
  • Helper methods: DeferredToolRequests.build_results(), .remaining(), DeferredToolResults.merge()
  • Drops construction-time requires_approval check (runtime check catches it)
  • Replaces Add DeferredToolHandler capability and DeferredToolRequestsPending exception #4981 with a simpler approach that doesn't require pydantic-graph changes and naturally extends to nested scenarios (CodeMode, subagents)

Closes #3959

Test plan

  • 8 tests for HandleDeferredToolCalls: approve, deny, no-output-type-needed, fallback, partial resolution, sync handler, accumulation dispatch, unresolved error
  • Updated test_tool_requires_approval_errortest_tool_requires_approval_no_output_type (construction-time check removed)
  • All 765 existing tests pass

🤖 Generated with Claude Code

…ool_calls` hook

Add a capability hook for inline resolution of deferred tool calls
(approval-required and externally-executed) during agent runs.

- New `handle_deferred_tool_calls` hook on `AbstractCapability` with
  accumulation dispatch in `CombinedCapability` (each capability resolves
  what it can, remaining passed to next)
- New `HandleDeferredToolCalls` capability wrapping a user handler function
- ToolManager owns all deferred resolution: `resolve_deferred_tool_calls()`,
  `execute_deferred_tool_results()`, and `handle_call(resolve_deferred=True)`
  default for automatic inline resolution
- `process_tool_calls` uses ToolManager methods with loop for re-raised deferrals
- Helper methods: `DeferredToolRequests.build_results()`, `.remaining()`,
  `DeferredToolResults.merge()`
- Drop construction-time `requires_approval` check (runtime check catches it)

Closes #3959

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
continue
except (CallDeferred, ApprovalRequired):
# Tool re-raised deferral — it goes back to unresolved
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: when an approved tool re-raises CallDeferred or ApprovalRequired during execution, the call is silently lost. The continue here skips adding it to executed, but remaining = requests.remaining(results) at line 757 also excludes it because the call's ID is in results.approvals. So the call ends up in neither executed nor remaining.

The comment says "it goes back to unresolved" but the code doesn't do that. The resolution loop in _agent_graph.py won't see the call again because it iterates over deferred_tool_requests (the remaining from this method).

To fix, track re-raised call IDs and exclude them from resolved_ids when computing remaining, or build remaining by subtracting only the IDs that were actually executed or denied (not just those the handler claimed to handle).

Comment thread pydantic_ai_slim/pydantic_ai/tools.py Outdated
metadata: dict[str, dict[str, Any]] | None = None,
) -> DeferredToolResults:
"""Create a [`DeferredToolResults`][pydantic_ai.tools.DeferredToolResults] for these requests."""
return DeferredToolResults(approvals=approvals or {}, calls=calls or {}, metadata=metadata or {})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_results doesn't reference self at all — it doesn't validate that the provided IDs match this request's calls/approvals, nor does it pre-populate from the request. As-is it's just DeferredToolResults(approvals=approvals or {}, ...) and could be a @staticmethod or omitted entirely.

If the intent is a convenience builder, it would be more useful if it actually used self — e.g. to auto-approve all calls: requests.build_results(approve_all=True). Otherwise this adds surface area to the public API without adding value over constructing DeferredToolResults directly.

Comment thread pydantic_ai_slim/pydantic_ai/tools.py Outdated
approvals=[c for c in self.approvals if c.tool_call_id not in resolved_ids],
metadata={k: v for k, v in self.metadata.items() if k not in resolved_ids},
)
return remaining if remaining.calls or remaining.approvals else None # pyright: ignore[reportReturnType]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pyright: ignore[reportReturnType] here is because the return type is Self | None but the method constructs a plain DeferredToolRequests(...) which isn't Self if this class is subclassed. Since DeferredToolRequests isn't designed for subclassing (it's kw_only=True dataclass), the simpler fix is to use DeferredToolRequests | None as the return type instead of Self | None, avoiding the type suppression.

# Loop to handle cases where re-execution raises new deferrals that the
# same handler chain can resolve (e.g. a capability that registers
# approval-required tools and also handles them).
max_resolution_rounds = 5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 is a magic number — extract it to a module-level constant (e.g. _MAX_DEFERRED_RESOLUTION_ROUNDS = 5).

Also, when the limit is hit (the for/else path at line 1575), the remaining deferred calls are silently treated as unresolved. This could mask infinite-loop bugs during development. Consider logging a warning when the max is reached so users can diagnose unexpected deferred-tool-as-output behavior.

Comment thread pydantic_ai_slim/pydantic_ai/tools.py Outdated
"""Metadata for deferred tool calls, keyed by `tool_call_id`. Each value will be available in the tool's RunContext as `tool_call_metadata`."""

def merge(self, other: DeferredToolResults) -> None:
"""Merge another `DeferredToolResults` into this one in-place."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming nit: merge as an in-place mutating method is surprising on a dataclass. The standard library convention is update for in-place dict-like mutation (à la dict.update()). If this stays in-place, renaming to update would be more Pythonic.

@github-actions github-actions Bot added size: L Large PR (501-1500 weighted lines) feature New feature request, or PR implementing a feature (enhancement) and removed auto-review labels Apr 17, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 17, 2026

DouweM and others added 2 commits April 17, 2026 22:24
- Fix re-deferred calls being silently dropped: track re-deferred IDs
  and add them back to remaining requests
- Rename `DeferredToolResults.merge()` to `.update()` for Pythonic naming
- Fix `DeferredToolRequests.remaining()` return type: concrete type
- Extract `_MAX_DEFERRED_RESOLUTION_ROUNDS` constant with warning
- Fix empty DeferredToolResults causing unnecessary loop iterations
- Extract `_execute_approval_result` helper to reduce complexity

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The loop was offering the same handler chain repeated chances to resolve
calls it already declined or that re-raised during execution. A single
resolution pass is simpler and more predictable: unresolved and
re-deferred calls go to remaining in one shot.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 5 new potential issues.

View 6 additional findings in Devin Review.

Open in Devin Review

Comment thread pydantic_ai_slim/pydantic_ai/tool_manager.py Outdated
Comment thread pydantic_ai_slim/pydantic_ai/tool_manager.py Outdated
Comment thread pydantic_ai_slim/pydantic_ai/tool_manager.py Outdated
Comment on lines +1545 to +1554
# Let capability handlers resolve deferred calls inline (one shot).
# If re-execution raises new deferrals, they go into remaining.
handler_results = await tool_manager.resolve_deferred_tool_calls(deferred_tool_requests)
if handler_results is not None:
executed, deferred_tool_requests = await tool_manager.execute_deferred_tool_results(
deferred_tool_requests, handler_results
)
for _call_part, result_part in executed:
output_parts.append(result_part)
yield _messages.FunctionToolResultEvent(result_part)
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Handler-resolved batch path skips usage limit checks for tool execution

In _agent_graph.py:1597-1606, handler-resolved calls go through _call_tools for execution without a preceding usage-limit check (the check at line 1488 only covers calls_to_run). Additionally, _call_tool calls execute_tool_call without passing usage, so executed tools don't increment usage.tool_calls. However, this is consistent with the existing deferred-resume path (via UserPromptNode._handle_deferred_tool_results), which also goes through _call_tools_call_tool without usage counting. The handler-resolved path is not a regression — it follows the same pattern as pre-existing deferred tool execution.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread pydantic_ai_slim/pydantic_ai/tool_manager.py Outdated
…erred metadata

- Fix external call ToolReturn.metadata being dropped in
  execute_deferred_tool_results
- Fix ToolReturn.content (user prompt content) being dropped in
  _execute_approval_result — now returns user content alongside result
  part, and _agent_graph appends UserPromptPart
- Fix re-deferred calls losing their new exception metadata — now
  captures and carries metadata from the re-raised exception

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 7 additional findings in Devin Review.

Open in Devin Review

Comment thread pydantic_ai_slim/pydantic_ai/tool_manager.py Outdated
Comment on lines +1548 to +1556
if handler_results is not None:
executed, deferred_tool_requests = await tool_manager.execute_deferred_tool_results(
deferred_tool_requests, handler_results
)
for _call_part, result_part, user_content in executed:
output_parts.append(result_part)
if user_content:
output_parts.append(_messages.UserPromptPart(content=user_content))
yield _messages.FunctionToolResultEvent(result_part, content=user_content)
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Missing FunctionToolCallEvent for non-ToolApproved handler-resolved calls in batch path

In the batch handler path (_agent_graph.py:1573-1606), FunctionToolCallEvents are only emitted for ToolApproved calls that reach the validation step (line 1590). For non-ToolApproved handler results (ToolDenied, ModelRetry, RetryPromptPart, ToolReturn), no FunctionToolCallEvent is emitted before the FunctionToolResultEvent produced by _call_tools. This contrasts with the UserPromptNode resume path (line 1484-1486) where non-ToolApproved deferred results get FunctionToolCallEvent(call) without args_valid. This is a minor inconsistency in event emission that could affect streaming consumers that expect paired call/result events, but is unlikely to cause practical issues since these calls were already deferred (and got their initial call event during the first processing pass).

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

DouweM and others added 3 commits April 17, 2026 23:36
- External call (CallDeferred) with ToolReturn metadata
- handle_call(resolve_deferred=True) via nested tool manager (CodeMode pattern)
- handle_call re-raise when no handler available
- build_results() helper
- WrapperCapability delegation of handle_deferred_tool_calls

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…a, batch deny

- External call with plain value (not ToolReturn) — covers non-ToolReturn branch
- Re-deferred tool with metadata — covers re_deferred_ids + metadata tracking
- Batch denial — covers ToolDenied branch in execute_deferred_tool_results

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
When an approved tool re-raises as CallDeferred (instead of
ApprovalRequired), place it in remaining.calls, not remaining.approvals.
The re-raised exception type determines the category.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
DouweM and others added 2 commits April 18, 2026 00:07
…lse, per-call happy path

- Test get_serialization_name() returns None
- Test handle_call(resolve_deferred=True) per-call resolution via nested
  ToolManager (exercises _resolve_single_deferred happy path and
  _execute_approval_result execution)
- Test handle_call(resolve_deferred=False) propagates exceptions
- Fix re-deferred categorization by new exception type

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Add explicit assertion that _resolve_single_deferred returns the
  correct tool result content
- Verify tool return visible in agent messages
- pragma: no cover on unreachable defensive raise in _resolve_single_deferred

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
devin-ai-integration[bot]

This comment was marked as resolved.

Address real coverage gaps (not an xdist issue as I initially thought):

- ToolReturn with metadata + user content through approval path
- Approved tool raising ModelRetry (RetryPromptPart handling)
- ToolApproved(override_args=...) override path
- handle_call path with ModelRetry (ToolRetryError propagation)
- handle_call path when handler resolves wrong ID (remaining non-empty)
- Re-deferred tool without metadata
- Re-deferred as CallDeferred (not ApprovalRequired) — categorization
- Mixed unresolved + re-deferred (remaining already populated)

Also add pragma: no cover on defensive `return None` branch that only fires
when root_capability/ctx is None (impossible via public API).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
devin-ai-integration[bot]

This comment was marked as resolved.

DouweM and others added 2 commits April 22, 2026 22:29
- Simplify handler_b in accumulation test (no redundant tool_name check;
  handler_a already resolved tool_a so handler_b only sees tool_b)
- Remove unused handler in resolve_deferred=False test
- Simplify ModelRetry test — tool only called once, always raises retry

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…ferral

When an approved tool re-raises with a different exception type or
metadata than the original (e.g. ApprovalRequired → CallDeferred),
the caller needs to see the new exception, not the stale original.
Reconstruct the exception from remaining.calls/approvals and metadata.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
devin-ai-integration[bot]

This comment was marked as resolved.

…arts

Fix semantic inconsistency where `handle_call(resolve_deferred=True)` was
unwrapping `ToolReturn` wrappers (losing metadata and user content), while
the non-deferred `handle_call` path returned the raw tool result verbatim.

- New `DeferredCallOutcome` discriminated union: `DeferredCallSuccess`,
  `DeferredCallDenied`, `DeferredCallRetry`
- `execute_deferred_tool_results` now returns `[(ToolCallPart, outcome)]`
  carrying the raw tool result / denial / retry part
- New `build_tool_return_parts_from_outcomes` helper converts outcomes
  to `ToolReturnPart`s + user content for message history
- `_resolve_single_deferred` returns raw tool result, preserving
  `ToolReturn` wrapper for CodeMode-style callers
- `process_tool_calls` in `_agent_graph.py` composes the two methods

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The handler receives the [`RunContext`][pydantic_ai.tools.RunContext] and the
[`DeferredToolRequests`][pydantic_ai.tools.DeferredToolRequests], and must return
[`DeferredToolResults`][pydantic_ai.tools.DeferredToolResults] with results for
some or all pending tool calls. Unresolved calls are passed to the next capability
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's allow the handler to return None as well if it doesn't want to handle at all.


Called by [`ToolManager`][pydantic_ai.tool_manager.ToolManager] after tool calls raise
[`ApprovalRequired`][pydantic_ai.exceptions.ApprovalRequired] or
[`CallDeferred`][pydantic_ai.exceptions.CallDeferred].
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also when the model calls a tool that was specifically registered as unapproved/external. We should link to the deferred tools / approval / external tools docs here.

Comment thread pydantic_ai_slim/pydantic_ai/tools.py Outdated
metadata: dict[str, dict[str, Any]] | None = None,
) -> DeferredToolResults:
"""Create a [`DeferredToolResults`][pydantic_ai.tools.DeferredToolResults] for these requests."""
return DeferredToolResults(approvals=approvals or {}, calls=calls or {}, metadata=metadata or {})
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an instance method because in the future I expect to add a context: object field to Request that is meant to be reproduced on Results verbatim, but I agree with an earlier auto-review comment that this method doesn't serve much purpose right now.

So let's make it serve some purpose and raise an error if a result is provided for a tool call ID that wasn't in the appropriate "requests list". We may have a check already somewhere that you can't provide ToolApproved for an external call that wants a result. I don't know if we similarly block (or should block) providing a tool result directly for a tool that was looking for approval. But either way some validation here would be useful.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An optional approve_all=True is also interesting. If that's provided along with approvals, it would apply to the rest?



DeferredCallOutcome: TypeAlias = DeferredCallSuccess | DeferredCallDenied | DeferredCallRetry
"""Outcome of resolving a single deferred tool call."""
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see too much overlap here with these from tools.py:

DeferredToolApprovalResult: TypeAlias = Annotated[ToolApproved | ToolDenied, Discriminator('kind')]
"""Result for a tool call that required human-in-the-loop approval."""
DeferredToolCallResult: TypeAlias = Annotated[
    Annotated[ToolReturn, Tag('tool-return')]
    | Annotated[ModelRetry, Tag('model-retry')]
    | Annotated[RetryPromptPart, Tag('retry-prompt')],
    Discriminator(_deferred_tool_call_result_discriminator),
]
"""Result for a tool call that required external execution."""
DeferredToolResult = DeferredToolApprovalResult | DeferredToolCallResult
"""Result for a tool call that required approval or external execution."""

I don't think we need both? We should be able to use those?

capability handler. When `True` (default), if the tool raises
[`ApprovalRequired`][pydantic_ai.exceptions.ApprovalRequired] or
[`CallDeferred`][pydantic_ai.exceptions.CallDeferred], the handler is
called to resolve it. When `False`, the exceptions propagate to the caller.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the case where we set this to False is if we want to call the handler in a batch? Do we do that currently in the agent graph?

raise ToolRetryError(outcome.retry)
if isinstance(outcome, DeferredCallDenied):
# Consistent with the denial message surfacing as the tool return in message history
return outcome.denied.message
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we lose outcome=denied on the ToolReturnPart here, somehow? should we raise the exception?...

deferred_tool_requests, handler_results
)
executed_parts = tool_manager.build_tool_return_parts_from_outcomes(executed_outcomes)
for _call_part, result_part, user_content in executed_parts:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the complicated list of tuples this method returns...

executed_outcomes, deferred_tool_requests = await tool_manager.execute_deferred_tool_results(
deferred_tool_requests, handler_results
)
executed_parts = tool_manager.build_tool_return_parts_from_outcomes(executed_outcomes)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these 2 method calls need to be separate? Would a single helper method make sense?

raise exceptions.UserError(
'To use tools that require approval, add `DeferredToolRequests` to the list of output types for this agent.'
)
super().add_tool(tool)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we stop overriding this method now?


agent = Agent(
FunctionModel(llm),
capabilities=[HandleDeferredToolCalls(handler=handle_deferred)],
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need docs where we currently introduce deferred/external/approved tools and how to work with DeferredToolRequests! This new approach is way simpler than what we had before. The old "stop the world/agent run" method still has its uses though, like with UI adapters where you need to return the pending tool call to the frontend, so that it can then start a new run that sends back the result/approval.

… DeferredCallOutcome

- Add DeferredToolResults.to_tool_call_results() and reuse it in UserPromptNode
  and the new inline resolution path instead of hand-rolling the approvals/calls
  → DeferredToolResult conversion in ToolManager
- Drop DeferredCallOutcome/Success/Denied/Retry, execute_deferred_tool_results,
  build_tool_return_parts_from_outcomes, and _execute_approval_result — the
  inline batch path now re-dispatches handler results through the existing
  _call_tools pipeline, so approvals/denials/retries/ToolReturn unwrapping all
  behave identically to the UserPromptNode resume flow
- Drop the resolve_deferred flag on handle_call (no production caller used False)
- Drop the no-op _AgentFunctionToolset.add_tool override
- Allow HandleDeferredToolCalls handler to return None to decline handling
- build_results validates tool call IDs match pending request lists and adds
  approve_all=True to auto-approve anything not explicitly specified
- Expand handle_deferred_tool_calls docstring to cover tools registered as
  unapproved/external, with links to the deferred-tools docs
- Add a "Resolving deferred calls inline" section to docs/deferred-tools.md and
  list HandleDeferredToolCalls in the built-in capabilities table
devin-ai-integration[bot]

This comment was marked as resolved.

DouweM added 2 commits April 24, 2026 14:20
# Conflicts:
#	pydantic_ai_slim/pydantic_ai/_agent_graph.py
…inline validation

Coverage gaps in CI post-merge:
- tool_manager._resolve_single_deferred: approval via `bool False`, `ToolApproved(override_args=...)`,
  and external-call results for plain values / ModelRetry / RetryPromptPart
- _agent_graph inline-handler path: UnexpectedModelBehavior on approved-call validation

Added five via_handle_call tests covering the tool_manager branches; pragma'd the
_agent_graph defensive branch (mirrors the non-deferred validation path above — naturally
triggered there, requires exhausted retries + bad handler override_args here, which isn't
realistically reachable from tests without artificial state).
Comment thread docs/deferred-tools.md Outdated

Note that handling deferred tool calls requires `DeferredToolRequests` to be in the `Agent`'s [`output_type`](output.md#structured-output) so that the possible types of the agent run output are correctly inferred. If your agent can also be used in a context where no deferred tools are available and you don't want to deal with that type everywhere you use the agent, you can instead pass the `output_type` argument when you run the agent using [`agent.run()`][pydantic_ai.agent.AbstractAgent.run], [`agent.run_sync()`][pydantic_ai.agent.AbstractAgent.run_sync], [`agent.run_stream()`][pydantic_ai.agent.AbstractAgent.run_stream], or [`agent.iter()`][pydantic_ai.agent.Agent.iter]. Note that the run-time `output_type` overrides the one specified at construction time (for type inference reasons), so you'll need to include the original output type explicitly.

As an alternative to returning control to the caller, a [`HandleDeferredToolCalls`][pydantic_ai.capabilities.HandleDeferredToolCalls] capability can resolve deferred calls inline with a handler function, so the agent run continues in one step without needing to end and restart. See [Resolving deferred calls inline](#resolving-deferred-calls-inline).
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to present this as the primary method of dealing with deferred tools, with both methods getting their own section but the handler method being introduced first.

elif approval is False:
approval = ToolDenied()
if isinstance(approval, ToolDenied):
return approval.message
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we're missing setting ToolReturnPart.outcome='denied' here?

if approval is True:
approval = ToolApproved()
elif approval is False:
approval = ToolDenied()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some apparent duplication between here and to_tool_call_results. Inevitable?

DouweM added 2 commits April 24, 2026 15:31
…handler

- Introduce `ToolDeniedError` exception (wraps `ToolDenied`). `_resolve_single_deferred`
  now raises it on denial instead of returning the denial message verbatim, so callers
  (e.g. CodeMode in the harness) can distinguish a denied call from a successful tool
  return and record a `ToolReturnPart(outcome='denied')` in message history.
- `_resolve_single_deferred` reuses `DeferredToolResults.to_tool_call_results()` for the
  bool/`ToolReturn` normalization it was duplicating, and adds a cross-reference comment
  to `_call_tool` since both paths must accept the same `DeferredToolResult` surface.
- Flip docs to lead with the handler approach as the recommended primary method;
  stop-the-world remains described as the UI-adapter pattern.
- Fill test spectrum gaps flagged in review:
  * Batch path denial tests now assert `outcome='denied'` on the resulting ToolReturnPart
  * Added batch tests for `approvals[id] = False`, default `ToolDenied()`, default
    `ToolApproved()`, handler-supplied external `ToolReturn(metadata,content)`,
    handler-supplied external `ModelRetry`, handler-supplied external `RetryPromptPart`
  * Added per-call test for handler-supplied external `ToolReturn` verbatim-preservation
# Conflicts:
#	docs/capabilities.md
#	pydantic_ai_slim/pydantic_ai/capabilities/__init__.py
DouweM added a commit to pydantic/pydantic-ai-harness that referenced this pull request Apr 24, 2026
…er denials

When the `HandleDeferredToolCalls` handler denies a tool call, `handle_call` now raises
`ToolDeniedError` (on pydantic-ai-slim once released) instead of returning the denial
message as a plain string. CodeMode catches it, records a
`ToolReturnPart(outcome='denied')` in `nested_returns` so message history reflects the
denial correctly, and re-raises so the sandbox surfaces the denial as an exception
rather than as what would look like a successful tool return.

The `ToolDeniedError` import is gated behind a compat shim so this module still loads
against the currently released pydantic-ai-slim (which lacks the exception); the shim
resolves to a placeholder class that never matches a real exception, leaving the except
clause inert until a release ships `ToolDeniedError`.

Depends on pydantic/pydantic-ai#5142.
devin-ai-integration[bot]

This comment was marked as resolved.

Comment thread docs/deferred-tools.md Outdated
1. Never reached here — the handler denies this call, so the model sees the denial message instead.
2. The handler supplies the result for this external call, so the tool body just signals the deferral.

If the handler declines to resolve some or all of the calls (by omitting them from the returned [`DeferredToolResults`][pydantic_ai.tools.DeferredToolResults] or returning `None`), those calls bubble up as a [`DeferredToolRequests`][pydantic_ai.output.DeferredToolRequests] output, and `DeferredToolRequests` must be in the agent's output type — so you can combine inline handling with the stop-the-world flow when it makes sense.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above it says somewhere that the handler should provide a result for "each pending call", but that's too strict, because we do allow partial resolution, and having multiple capabilities each handle a subset of deferred tool calls.

By the way, if we haven't yet done that, we should also add docs for the new handle_... "hook" method on capabilities.md. And likely also hooks.md, making sure that it can be used by the Hooks capability, like most of the other methods on AbstractCapability are.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So "bubble up as output" is not completely accurate -- because they can also be handled y anotehr handler.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also worth explicitly specifying that there is a hook method on the capability class (by linking to the capabilities doc for example), if you want to use HITL/external function calls in a custom capability.

return '\n'.join(lines)


class ToolDeniedError(Exception):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this as a public class makes me want to also handle it in places like we handle ModelRetry, as a way for a tool call function to explicitly raise/wrap an error as "this means you weren't allowed to do this" (in surfacing to a user or o11y backend), even if our native tool approval wasn't used. That's similar to the ToolFailed exception being introduced in #2586 which is worth reading. But I don't want to implement the full scope of that right now (unless you think it's really easy as we're working on this code anyway -- to allow tool functions anywhere to raise ToolFailed('...', reason='denied') which is registered as outcome='denied'? But I'd prefer not to bring that in scope, as that'd imply documenting it etc...).

Does this need to be public? Is there another convention we could find for the way a method indicates that it was a tool-denial, e.g. returning ToolDenied | Any and telling people to isinstance check it? We already pass A | B | Any around in a few places. Discuss with me please.

DouweM added 2 commits April 24, 2026 16:34
…ks pages

- `deferred-tools.md`: relax "each pending call" wording; clarify that
  unresolved calls can still be picked up by another handler in the chain
  before bubbling up; explicitly point custom-capability authors at the
  `handle_deferred_tool_calls` hook on `AbstractCapability`.
- `capabilities.md`: document `handle_deferred_tool_calls` alongside the
  other lifecycle hooks, with a link back to the dedicated capability for
  application code.
- `hooks.md`: add the new hook to the `Hooks` page with a runnable example.
- `Hooks`: add `handle_deferred_tool_calls` registration (decorator +
  constructor kwarg) so it can be used like every other capability hook.
…ndle_call`

Following review discussion: a denial isn't an exceptional condition, it's
an outcome of the deferred-tool handler. Returning the existing public
`ToolDenied` value (instead of raising a new public exception) reuses the
class that already participates in `DeferredToolApprovalResult`, keeps the
public surface smaller, and avoids implicitly inviting `ToolFailed`-style
"raise to mark a denial" patterns from arbitrary tool functions (the
broader scope being tracked separately in #2586).

Callers of `ToolManager.handle_call` must now `isinstance`-check the
return value to distinguish a denial from a successful tool result —
documented in the docstring. The harness `dispatch_tool_call` will raise
a sandbox-side exception when it sees `ToolDenied`, rather than surfacing
the marker into the user's script.

- Remove `ToolDeniedError` from `exceptions.py` and the top-level export
- `_resolve_single_deferred` returns `ToolDenied` instead of raising
- `handle_call` return annotation is now `Any | ToolDenied`; docstring
  spells out the `isinstance`-check requirement
- `result.py` casts the output-tool path (output tools can't be deferred)
- Tests updated to assert on the returned value, not a caught exception
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 18 additional findings in Devin Review.

Open in Devin Review

Comment on lines 651 to +657
approved=approved,
metadata=metadata,
)
return await self.execute_tool_call(validated)
try:
return await self.execute_tool_call(validated)
except (CallDeferred, ApprovalRequired) as exc:
return await self._resolve_single_deferred(call, exc)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 handle_call return type change is a public API contract change

The return type of ToolManager.handle_call changed from Any to Any | ToolDenied. While ToolManager is not a heavily used public API (primarily internal), it IS exposed on RunContext.tool_manager and the PR's own tests show users calling ctx.tool_manager.handle_call() from within tool functions. Existing callers that don't check for ToolDenied will silently treat the denial message string as a successful tool result. The docstring at tool_manager.py:634-639 documents this requirement clearly, and the only internal caller (result.py:209-216) is correctly handled with a cast and comment explaining output tools can't be deferred. This is a valid design trade-off (returning ToolDenied vs raising an exception), but worth noting in changelog/migration docs.

(Refers to lines 612-657)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread docs/hooks.md Outdated
hooks = Hooks()


@hooks.on.handle_deferred_tool_calls
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this read nicer as on.deferred_tool_calls?

approved: bool = False,
metadata: Any = None,
) -> Any:
) -> Any | ToolDenied:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to list ToolReurn[Any] in these types of cases as well? And I prefer to end with Any and start specific, here and elsewhere.

…Any]` first

- Drop the `handle_` prefix from the `Hooks` registration to match the
  convention used elsewhere (e.g. `wrap_run` → `on.run`, `wrap_tool_execute`
  → `on.tool_execute`). The internal registry key (and the underlying
  `AbstractCapability.handle_deferred_tool_calls` method name) stay the
  same — only the user-facing hook name changes:
    `@hooks.on.deferred_tool_calls`  /  `Hooks(deferred_tool_calls=...)`
- Spell out the `handle_call` / `_resolve_single_deferred` return types
  as `ToolDenied | ToolReturn[Any] | Any` — specific variants first, `Any`
  last — so the documented `ToolReturn` wrapper case is visible in the
  signature too.
@DouweM DouweM merged commit 2b0aa93 into main Apr 24, 2026
48 checks passed
@DouweM DouweM deleted the deferred-tool-handler-2 branch April 24, 2026 23:52
@DouweM DouweM mentioned this pull request Apr 25, 2026
6 tasks
DouweM added a commit to pydantic/pydantic-ai-harness that referenced this pull request Apr 25, 2026
…andleDeferredToolCalls` (#220)

* feat(code_mode): resolve deferred tool calls via HandleDeferredToolCalls

Tools with `kind='external'` or `'unapproved'` (and tools that raise
ApprovalRequired/CallDeferred at runtime) are no longer excluded from the
sandbox and promoted back to native tools. They now take the normal sandboxed
path, and a HandleDeferredToolCalls capability on the agent can resolve them
inline — so the model sees the resolved return value instead of having the
deferral bounce out as a separate native tool call.

- Remove the td.defer filter in _partition_callable_tools (no more native
  fallback for deferred tools).
- Drop the native_fallbacks return value and the corresponding deferred-tool
  warning.
- Update the sandbox UserError message when no handler is configured to point
  users at HandleDeferredToolCalls.
- Update the deferred_execution test to assert sandbox inclusion and the
  approval-retry test to match the new error message.

Depends on pydantic/pydantic-ai#5142 landing and being released; once it does,
bump the pydantic-ai-slim lower bound.

* code_mode: record outcome='denied' on nested ToolReturnPart for handler denials

When the `HandleDeferredToolCalls` handler denies a tool call, `handle_call` now raises
`ToolDeniedError` (on pydantic-ai-slim once released) instead of returning the denial
message as a plain string. CodeMode catches it, records a
`ToolReturnPart(outcome='denied')` in `nested_returns` so message history reflects the
denial correctly, and re-raises so the sandbox surfaces the denial as an exception
rather than as what would look like a successful tool return.

The `ToolDeniedError` import is gated behind a compat shim so this module still loads
against the currently released pydantic-ai-slim (which lacks the exception); the shim
resolves to a placeholder class that never matches a real exception, leaving the except
clause inert until a release ships `ToolDeniedError`.

Depends on pydantic/pydantic-ai#5142.

* code_mode: handle denials via `ToolDenied` return value, not exception

`ToolManager.handle_call` no longer raises a (now-removed) `ToolDeniedError`
on handler denial — it returns the `ToolDenied` value the handler produced.

Drop the compat shim, import `ToolDenied` directly, and switch the dispatch
to inspect the return value: record the denial as `outcome='denied'` on the
nested `ToolReturnPart` and raise a `RuntimeError` inside the sandbox so the
script can't mistake the denial message for a regular string return.

* Bump pydantic-ai-slim lock + cover denial path

Now that the slim PR has merged to main, refresh the lockfile to pick up
the `HandleDeferredToolCalls` capability and `handle_call`'s `ToolDenied`
return value. Add a denial test that asserts the denied-call flow
surfaces as `ModelRetry` with the original denial message preserved in
the trace.

Notes on the test:
- The handler returns `ToolDenied('nope')`; the harness records
  `outcome='denied'` on the nested `ToolReturnPart` and raises
  `RuntimeError` inside the sandbox.
- The script doesn't catch the RuntimeError, so Monty surfaces it as
  `MontyRuntimeError`, which the harness converts back to `ModelRetry`.
  The retry message preserves the denial message so the model knows
  what went wrong.

* ci: test against floor pydantic-ai-slim (1.80.0) in addition to main

The default `test` matrix uses the `[tool.uv.sources]` override pinning
slim to its main branch, so it never exercises the published-PyPI install
path. Add a `test-floor` job that overrides slim to the lowest version
declared in `pyproject.toml` (>=1.80.0) and runs the test suite, so we
catch any accidental dependency on unreleased slim features in code paths
that should be backward-compatible.

Gate the new HandleDeferredToolCalls denial test with `pytest.skip` when
the capability isn't importable — currently the only test that requires a
post-1.80.0 slim, but the pattern can be reused if more land later.

* fix coverage: pragma the floor-only skip in the denial test

The `except ImportError → pytest.skip` branch only fires when running
against the slim floor (1.80.0) where `HandleDeferredToolCalls` doesn't
exist yet. The default test matrix runs against slim main, so coverage
counted those two lines as uncovered.

Mark the branch `# pragma: no cover` since it's an explicit skip path
that the floor-slim CI job exercises but isn't included in the coverage
report (the floor job doesn't gate on coverage by design).
@echarles
Copy link
Copy Markdown
Contributor

echarles commented Apr 25, 2026

Awesome, curious to know the timeframe of the next release so we can migrate the Agent Runtimes tool approvals implementation datalayer/agent-runtimes#81

Alex-Resch pushed a commit to Alex-Resch/pydantic-ai that referenced this pull request Apr 29, 2026
…ool_calls` hook (pydantic#5142)

Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
DouweM added a commit that referenced this pull request May 1, 2026
…e_single_deferred`; drop unused `TestDeps`

Auto-review caught a pre-existing latent bug from #5142: `_resolve_single_deferred`
re-validates and re-executes the approved tool without forwarding
`wrap_validation_errors`, so a sandboxed caller using
`handle_call(wrap_validation_errors=False)` on a tool that requires approval would
see post-approval errors wrap as `ToolRetryError` and consume the retry budget —
inconsistent with the caller's intent. Fix by threading the flag through.

Handler-constructed retry signals (`ModelRetry` / `RetryPromptPart` returned by
the handler) still surface as `ToolRetryError` regardless — those are handler
outputs, not exceptions raised by validation or the tool body.

Also addresses the nit on the new test using an unused `TestDeps` dataclass —
matches the rest of the file using `None` as deps.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature request, or PR implementing a feature (enhancement) size: L Large PR (501-1500 weighted lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

deferred_handler for Inline Resolution of Deferred Tools

2 participants