feat(capabilities)!: output validate/process hooks; scope `prepare_tools` to function tools, add `prepare_output_tools` by DouweM · Pull Request #4859 · pydantic/pydantic-ai

DouweM · 2026-03-26T02:01:44Z

Summary

Output hooks (new)

Add before_output_validate / wrap_output_validate / after_output_validate / on_output_validate_error and before_output_process / wrap_output_process / after_output_process / on_output_process_error hooks on AbstractCapability (composed in CombinedCapability, delegated in WrapperCapability, available via @hooks.on.* decorators on Hooks). The validate pair fires for raw input shape (text or tool args); the process pair fires for the semantic value (MyModel(...), 42, the dict the multi-arg output function will receive, etc.).
Existing output_validator callbacks now run inside the process pair, so one capability can wrap built-in + user-defined validation in one place.
OutputContext exposes mode, output_type, object_def, has_function, tool_call, tool_def so hooks know what kind of output they're handling.

`prepare_tools` / `prepare_output_tools` (breaking change, framed as bug fix)

The released prepare_tools had a split-personality bug: the Agent(prepare_tools=...) kwarg saw function tools only and filtered at the registry level, while the AbstractCapability.prepare_tools hook saw all tools (function + output) and only filtered visibility. This PR normalizes both onto one path:

prepare_tools capability hook → now receives function tools only, filters at the registry level (via PreparedToolset). Mirrors what the kwarg has always done.
New prepare_output_tools capability hook → receives only [output tools][pydantic_ai.output.ToolOutput], with ctx.retry/ctx.max_retries reflecting the output retry budget (max_result_retries) — fixing the missing context info that was the original motivation for fix: propagate Agent(retries=...) to user-provided toolsets #4745.
The Agent(prepare_tools=...) and new Agent(prepare_output_tools=...) kwargs inject PrepareTools / PrepareOutputTools capabilities that flow through the same hook path the user-defined capability uses.
Order: prepare_tools wraps inside other capability get_wrapper_toolset results (e.g. ToolSearch, CodeMode), preserving main's kwarg ordering — toolset transformations layer on top of prepared defs.

This subsumes #5241 (was filed as a v2 follow-up).

Other behavior fixes (observable through public API)

A model hallucinating a call to a prepare_tools-filtered tool now hits "unknown tool" instead of silently executing it (filtering is registry-level, not just visibility).
Multi-arg output functions inside a union (e.g. PromptedOutput([combine_func, OtherType])) now actually run; previously the union dispatch isinstance-checked the validated dict against the function's first-arg type and silently bypassed the function.
ModelRetry raised by an output function during run_stream() now surfaces as UnexpectedModelBehavior caused-by ModelRetry (caught by the streaming validator), instead of escaping unhandled wrapped in ToolRetryError.
Output retry prompts pass include_context=False to ValidationError.errors() (matching tool retries), removing the duplicative ctx field from RetryPromptPart content.

Cleanup

Drop the legacy process() method on BaseOutputProcessor and its subclasses; production paths use hook_validate + hook_execute exclusively.

Docs

docs/capabilities.md and docs/hooks.md updated with output hook docs and the unified prepare_tools / prepare_output_tools model.

Test plan

All existing tests pass; new tests cover output validate/process hook lifecycle (per processor type and across composition), output-tool nesting, error recovery via on_output_validate_error, pre-parse transformation via before_output_validate, hook decorator API, the new prepare_output_tools hook + constructor arg, hallucinated-call blocking, multi-arg union dispatch, and the streaming-ModelRetry regression. Retry-flow tests now snapshot result.all_messages() per tests/AGENTS.md guidance.

Closes #5111
Closes #5241

🤖 Generated with Claude Code

Add {before,wrap,after,on_error}_{validate,execute}_output hooks to AbstractCapability, enabling pre-parse normalization, error recovery, and post-processing of model output across all output types. Output hooks fire for text, structured text, and tool-based output. For tool output, they fire inside the tool execution pipeline (tool hooks are the outer layer, output hooks the inner layer). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

github-actions · 2026-03-26T02:04:38Z



+@dataclass
+class OutputContext:


OutputContext is defined in pydantic_ai._output (a private module) but is used as a parameter type in the public AbstractCapability hook signatures. Users who want to subclass AbstractCapability and implement output hooks will need to import OutputContext, but they'd have to reach into pydantic_ai._output to do so.

It should be re-exported from pydantic_ai.capabilities (added to __init__.py and __all__) alongside the other public types like RawOutput, WrapOutputValidateHandler, etc.

Yeah this belongs in the public pydantic_ai.output module

github-actions · 2026-03-26T02:04:44Z

+        self,
+        ctx: RunContext[AgentDepsT],
+        *,
+        input: RawOutput,


input shadows the Python builtin input(). The tool hooks use args for the input side and result for the output side — consider a name that's consistent with those, e.g. validated_output or just keep the parameter name output and rename the result to result (matching after_tool_execute's args/result pattern).

github-actions · 2026-03-26T02:04:53Z


        return tool_result

+    async def _raw_execute_output_tool(


This method manually orchestrates the before/wrap/after/on_error hook pattern for both validate and execute phases (~70 lines), duplicating the logic in _run_output_validate_hooks and _run_output_execute_hooks in _output.py. The two paths can diverge silently (e.g. the error handling in _run_output_validate_hooks has allow_partial and wrap_validation_errors logic that this path skips).

Consider reusing those helpers here, or extracting a shared orchestration function that both call sites use. The tool-output-specific setup (identity do_validate, the do_execute that wraps processor.call + output validators) can be built up front and passed in.

github-actions · 2026-03-26T02:05:05Z

+        processor = toolset.processors[name]
+        output_context = processor.get_output_context('tool')
+        output_context.tool_call = validated.call
+        output_context.tool_def = validated.tool.tool_def


Mutating OutputContext after construction is a bit fragile — it means the dataclass fields have misleading defaults (None) that are only correct for text output, and tool-output callers must remember to fill them in. Consider making tool_call and tool_def constructor parameters (they're already known at this point), or adding a factory method like OutputContext.for_tool_output(...) that requires them.

github-actions · 2026-03-26T02:05:12Z

+            return output
+
+    else:
+        # UnionOutputProcessor and others: full process() in validate, identity execute.


The else branch handles "UnionOutputProcessor and others" but there's no exhaustive type check — a new BaseOutputProcessor subclass would silently fall into this branch. Consider using assert_never or at least an explicit isinstance(processor, UnionOutputProcessor) check with an else that raises, to match the project's convention of ending exhaustive chains with assert_never.

github-actions · 2026-03-26T02:05:18Z

+
+Like tool processing, output processing has two phases: **validation** (parsing the model's raw output against the output schema) and **execution** (extracting the value and calling any output function). Each phase has its own hooks.
+
+All output hooks receive an `output_context` parameter with [`OutputContext`][pydantic_ai._output.OutputContext] (mode, output type, schema info, and tool call details for tool output).


The docs link to [OutputContext][pydantic_ai._output.OutputContext] which points into a private module. If this type is re-exported from pydantic_ai.capabilities (per the other comment), update this link accordingly so it renders correctly in mkdocs and points to the public API reference.

github-actions · 2026-03-26T02:05:23Z

+        self,
+        ctx: RunContext[AgentDepsT],
+        *,
+        input: str | dict[str, Any],


Same input builtin shadowing issue as in abstract.py.

github-actions · 2026-03-26T02:05:27Z

+        self,
+        ctx: RunContext[AgentDepsT],
+        *,
+        input: str | dict[str, Any],


Same input builtin shadowing issue as in abstract.py.

github-actions · 2026-03-26T02:05:33Z

@DouweM Minor note: the PR description says Closes #??? — looks like the issue number wasn't filled in. If there's a tracking issue, it'd be good to link it.

github-actions · 2026-03-26T02:08:20Z

Docs Preview

commit:	`607d24a`
Preview URL:	https://a624bba2-pydantic-ai-previews.pydantic.workers.dev

- Use processor.process() instead of accessing private _str_argument_name and _function_schema in TextFunctionOutputProcessor handler - Add 8 new tests covering error paths: on_output_validate_error with retry, on_output_execute_error recovery, composed error hook chains, Hooks decorator API for error hooks, WrapperCapability delegation Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

devin-ai-integration

Devin Review found 6 new potential issues.

View 6 additional findings in Devin Review.

devin-ai-integration · 2026-03-26T02:25:34Z

+        result_data = await _output.run_output_with_hooks(
+            text_processor,
+            text,
+            run_context=run_context,
+            capability=ctx.deps.root_capability,
+            output_mode=ctx.deps.output_schema.mode,
+        )


📝 Info: output_mode in OutputContext reflects the schema mode, not the actual output format

In _handle_text_response at pydantic_ai_slim/pydantic_ai/_agent_graph.py:1171, output_mode=ctx.deps.output_schema.mode is passed. For a ToolOutputSchema that has a text_processor (hybrid mode), if the model returns text instead of a tool call, hooks receive OutputContext(mode='tool', ...) even though the actual output is text. This is a design choice — the mode represents the configured output schema, not the format of this particular response. Hook implementers should be aware that mode='tool' doesn't guarantee the output arrived via a tool call; they can check output_context.tool_call is None to distinguish.

Was this helpful? React with 👍 or 👎 to provide feedback.

Maybe output_context also needs an allow_text_output field, like we have on ModelRequestParameters? There's also image output to think about.

- Re-export OutputContext from pydantic_ai.capabilities - Rename `input` param to `validated_output` in after_output_execute to avoid shadowing builtin - Reuse _run_output_validate/execute_hooks in _tool_manager.py instead of duplicating orchestration logic - Use RawOutput alias consistently across combined/wrapper/hooks - Pass tool_call/tool_def as constructor args to OutputContext - Add explicit isinstance check for UnionOutputProcessor with assert_never fallback - Update docs links to point to public re-export Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Remove unnecessary # pyright: ignore[reportPrivateUsage] comment - Add tests for default error hook behavior (no override) - Add tests for edge cases: dict transform, streaming, no-capability fast path, Hooks decorator wrap_output_execute Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Rename _run_output_validate_hooks and _run_output_execute_hooks to drop the underscore prefix since they're intentionally used from _tool_manager.py. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Add tests exercising default error hooks (bare AbstractCapability), WrapperCapability error hook delegation, and Hooks class error chaining - Add pragma: no cover on ObjectOutputProcessor.process() error handling and OutputToolset.call_tool() which are now bypassed when capabilities are present (handled by run_output_with_hooks and _raw_execute_output_tool) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

RawOutput type alias was not imported in test file. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Mark streaming partial validation error wrapping (allow_partial branch) and rare ModelRetry/wrap_validation_errors=False paths with appropriate pragma comments. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

This path is only reachable during streaming partial validation which is exercised through stream_output, not the direct test path. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Simplify bad_function test helper to always raise (remove unreachable return) - Replace unreachable error hook body with simpler test that verifies hooks fire without triggering errors Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

#5238)

…ng path `execute_output_tool_call` was unconditionally wrapping `ModelRetry` from `processor.hook_execute` as `ToolRetryError`. With `wrap_validation_errors=False` (streaming, see `result.validate_response_output`), `ToolRetryError` isn't caught by the streaming handler, so the error propagated as an unhandled exception. Let `run_output_process_hooks` make the wrap/no-wrap decision based on the flag, matching how validator `ModelRetry` was already handled in the same closure.

…ols()` time Wraps the function and output toolsets in `Agent._get_toolset` with `PreparedToolset`s that dispatch the capability hook chain. The filtered/modified tool defs now flow into `ToolManager.tools` (so hallucinated calls to filtered tools fail with unknown-tool errors) and into the model's `ModelRequestParameters` simultaneously, instead of only the latter as before. Previously the dispatch happened in `_prepare_request_parameters` after `for_run_step` had already built `tool_manager.tools`, so the hook only changed what the model saw — it didn't affect tool execution lookup. Reported by Devin on PR #4859. The output-side dispatch closure overrides `ctx.max_retries` to the agent's `max_result_retries` to preserve the contract that `prepare_output_tools` sees the output retry budget (#4745). Adds a regression test asserting that a tool removed via `prepare_tools` is unreachable even when the model hallucinates a call to it.

…ute` `UnionOutputProcessor.hook_execute` checked `isinstance(semantic, inner.output_type)` to verify the validated value matched the resolved kind. For multi-arg output functions like `def f(x: int, y: str) -> Foo`, `output_type` is the first arg's type but `semantic` is the validated dict (no unwrap key), so the check always failed — the hook fell through to `_resolve_inner_for_value` which also failed, and the function was silently bypassed. Add a shape-aware match: when the inner is a multi-arg function, accept any dict (trust the validation kind); otherwise keep the isinstance check. `_resolve_inner_for_value` skips multi-arg inners on the type-mismatch fallthrough path, since their dict shape can't be picked out unambiguously from `output_type` alone. Reported by Devin on PR #4859.

…er_for_value`

github-actions · 2026-04-29T00:25:02Z

+                raise
+        raise error
+
+    # --- Output execute lifecycle hooks ---


The section comment says "Output execute lifecycle hooks" but the methods in this section are all *_output_process (before_output_process, after_output_process, wrap_output_process, on_output_process_error). Should be # --- Output process lifecycle hooks --- to match the method naming, consistent with the validate section above it (# --- Output validate lifecycle hooks ---).

Same issue in wrapper.py at the corresponding section.

…rage gaps `process()` was the legacy validate-then-call entry point on `BaseOutputProcessor` and its subclasses, replaced everywhere in production by `hook_validate`/`hook_execute` when output hooks landed. The two tests still calling it — covering only one branch each of `wrap_validation_errors=False` with invalid data — left the wrap-True and success branches uncovered. Remove the methods, update those tests to exercise `hook_validate` directly (the path that's actually hit in production). Also adds tests covering: - `PreparedToolset` with a synchronous prepare function (the no-await branch in `get_tools`). - The `@hooks.on.prepare_output_tools` decorator path (registration + dispatch). - `prepare_output_tools` filtering with a real `ToolOutput` so the hook actually fires (the prior tests used `output_type=str`, which has no output tools).

…h `*_output_process` method names Caught by github-actions[bot] review on PR #4859 — the methods in this section are all `before_output_process`/`after_output_process`/`wrap_output_process`/`on_output_process_error`, so the section should be "Output process lifecycle hooks", consistent with "Output validate lifecycle hooks" above. Already correct in `abstract.py`; fixing `combined.py` and `wrapper.py` to match.

…s additive Reverts the breaking-change part of the prepare-tools split so existing capabilities that override `prepare_tools` (released for ~2 weeks) keep getting the full tool set — function + output — as documented and shipped. `prepare_output_tools` is now a purely additive hook for output-tool-specific filtering with `ctx.max_retries` reflecting the output retry budget (the original motivation, see #4745). Implementation: - `PrepareTools` capability is back to overriding `get_wrapper_toolset` (matches main), so the agent-level `prepare_tools=` arg sees function tools only — unchanged from release. The `prepare_tools` capability hook itself now dispatches via a `PreparedToolset` wrap on the **combined** toolset, so it sees both function and output tool defs and the result still flows into `ToolManager.tools`. - `PrepareOutputTools` keeps its `prepare_output_tools` hook implementation; the hook is dispatched via a `PreparedToolset` wrap on the output toolset specifically (with `ctx.max_retries` overridden to the output budget). `Agent(prepare_output_tools=...)` injects a `PrepareOutputTools` capability mirroring `prepare_tools=`. Order: `prepare_output_tools` runs first (innermost — only sees output tools), then `prepare_tools` sees the merged list (outermost — fires after output prep). Also updates the `prepare_tools` / `prepare_output_tools` docstrings and the `docs/capabilities.md` / `docs/hooks.md` sections to reflect the additive shape, and flips the `TestPrepareToolsHook` snapshot test to assert that the hook sees both function and output tool kinds.

Per `tests/AGENTS.md` rule, agent/model/stream tests should snapshot the message history alongside the final-output assertion. Adds `assert result.all_messages() == snapshot(...)` to the output validate/process error-recovery and retry-flow tests in `TestOutputHookErrorPaths` and `TestDefaultOutputErrorHooks` so future regressions in retry-prompt content, tool-call IDs, message ordering, etc. get caught instead of being masked by tests that only check `result.output`.

DouweM · 2026-04-29T03:09:14Z

+                # `output_toolset.max_retries` is set to `max_result_retries` at agent construction.
+                output_cap = run_capability
+                output_max_retries = self._max_result_retries
+


It seems significant that we changed the order of agent.prepare_tools <> capability.get_wrapper_toolset? Should we change that back?

DouweM · 2026-04-29T03:23:06Z

+                                    'loc': (),
+                                    'msg': 'Invalid JSON: expected ident at line 1 column 2',
+                                    'input': 'not json',
+                                    'ctx': {'error': 'expected ident at line 1 column 2'},


This suggests to me that maybe we should've been using include_context=False on ValidationError.errors, because this is verbose + duplicative.

…via PreparedToolset `prepare_tools` capability hook now receives function tools only and filters at the registry level (via `PreparedToolset`), matching what the agent-level `prepare_tools=` kwarg has always done. Output tools route exclusively to `prepare_output_tools`. The kwarg now flows through the same hook path the capability uses (`PrepareTools.prepare_tools`), unifying both entry points. This is technically a behavior change — the released capability hook saw all tools (function + output) and was visibility-only — but framing it as a bug fix: the kwarg/hook split was inconsistent on two axes (which tools, registry vs visibility), and we already split tool execute hooks to skip output tools post-release. Closes #5241 (no v2 follow-up needed). Also: - Order: `prepare_tools` wraps **inside** other capability `get_wrapper_toolset` results (e.g. `ToolSearch`, `CodeMode`), preserving main's kwarg ordering — toolset transformations layer on top of prepared defs. - Pass `include_context=False` to `ValidationError.errors()` for output retry prompts, matching tool retries — drops the duplicative `ctx` field from `RetryPromptPart` content.

…ests Output retry prompts now use `include_context=False`, matching tool retries — drop the now-stale `ctx` entries from existing provider snapshots.

…Tools`/`PrepareOutputTools` `PreparedToolset.get_tools` already rejects added/renamed tools and normalizes `None` to an empty list. The capability hooks just need to normalize sync/async `prepare_func` calls — collapse the helper to that and let the toolset wrap handle validation.

After the output hooks PR (#4859), output tools no longer go through wrap_tool_execute, so the outer execute_tool span that wrapped output tool calls was lost. Restore it via Instrumentation.wrap_output_process, which fires for every output processing pass; emit only when output_context .tool_call is set so non-tool output (text/native/prompted/image) is unchanged. Output-function spans stay inline in execute_output_function — they nest inside the new wrap_output_process span when both are present, matching the historical two-level structure (outer tool, inner function).

…ols` to function tools, add `prepare_output_tools` (pydantic#4859) Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>

…antic#4859 + tracker pydantic#5238 DouweM said 'OK by me' on Devin's r3076333465 (2026-04-15) but then merged the opposite call in pydantic#4859 (2026-04-28): output validators see the GLOBAL output-retry budget, not the per-tool `ToolOutput(max_retries=N)`. The reasoning is documented in the merged code at `tool_manager.py::ToolManager.execute_output_tool_call`: the same validator stays consistent across the text path and across multiple `ToolOutput`s. The per-tool case where `ToolOutput(max_retries=N)` exceeds the global budget is tracked in pydantic#5238 and intentionally out of scope. This revert restores `OutputToolset.call_tool` to use `self.max_retries` (the global budget) for the validator context, drops the two regression tests that asserted per-tool semantics (`test_validator_sees_per_tool_max_retries`, `test_validator_max_retries_text_path_unchanged`), and reverts the snapshots in `test_output_validator_retry_consistency_across_paths` and `test_output_validator_retry_counter_with_tool_switch` to their pre-fix values. The rest of PR pydantic#5075 (rename, run-method `output_retries`, override() support, tool-retries docs, _agent_graph.py:1021 missed rename) is unchanged.

DouweM added the auto-review label Mar 26, 2026

github-actions Bot added size: XL Extra large PR (>1500 weighted lines) feature New feature request, or PR implementing a feature (enhancement) labels Mar 26, 2026

github-actions Bot reviewed Mar 26, 2026

View reviewed changes