fix: restore `wrap_validation_errors` on `ToolManager` function-tool methods by DouweM · Pull Request #5275 · pydantic/pydantic-ai

DouweM · 2026-05-01T18:19:27Z

Closes #

Summary

Restores wrap_validation_errors (kw-only, default True) on ToolManager.validate_tool_call, execute_tool_call, and handle_call, and properly threads it through _resolve_single_deferred so the flag is honored across the full function-tool surface — validation, capability hooks, the tool body, and post-approval re-execution after deferred-tool resolution.

Origin: regression from two PRs that compounded

This issue is the interaction of two recent merges that each got part of the way:

feat: add HandleDeferredToolCalls capability and handle_deferred_tool_calls hook #5142 (HandleDeferredToolCalls) introduced _resolve_single_deferred, called from inside handle_call when an approved tool needs re-validation + re-execution. At the time, handle_call did accept wrap_validation_errors, but the new helper didn't receive it — so the flag was already silently failing on the deferred-with-handler path. Latent bug.
feat(capabilities)!: output validate/process hooks; scope prepare_tools to function tools, add prepare_output_tools #4859 (output hooks) then refactored validate_tool_call to always return ValidatedToolCall and dropped the wrap_validation_errors keyword from validate_tool_call, execute_tool_call, and handle_call entirely (it was kept on the output-tool counterparts: validate_output_tool_call / execute_output_tool_call / handle_output_tool_call). The flag's complete disappearance also hid the feat: add HandleDeferredToolCalls capability and handle_deferred_tool_calls hook #5142 gap.

Together they broke pydantic-ai-harness's code-mode dispatch, which has been calling tool_manager.handle_call(call_part, wrap_validation_errors=False) to surface raw validation/ModelRetry errors at the sandbox await site.

Behavior

When wrap_validation_errors=False:

raw ValidationError / ModelRetry propagates from validation, capability hooks (before_tool_execute / wrap_tool_execute / after_tool_execute), the tool body itself, and post-approval re-execution after deferred-tool resolution
_check_max_retries and failed_tools.add are skipped, so out-of-band callers (e.g. nested sandboxed dispatch) don't consume the agent's retry budget

When wrap_validation_errors=True (default), behavior is unchanged from current main.

Capabilities themselves see raw exceptions in both modes — the wrap happens at the outer edge of _run_execute_hooks, so the boundary visible to hooks is unchanged.

Handler-constructed retry signals (ModelRetry / RetryPromptPart returned by a HandleDeferredToolCalls handler) still surface as ToolRetryError regardless of the flag — those are handler outputs, not exceptions raised by validation or the tool body. Documented in _resolve_single_deferred.

Slight semantic change vs. pre-#4859 (deliberate)

The pre-#4859 flag only governed validation-time errors; tool-body ModelRetry always wrapped as ToolRetryError regardless. This restoration makes the flag govern the full validate→hooks→execute pipeline, matching how wrap_validation_errors already works on the output-tool methods. The function-tool and output-tool paths now offer the same opt-out semantics.

This is technically a behavior change for callers passing wrap_validation_errors=False pre-#4859, but it's a strict improvement: anyone passing False was clearly trying to get raw exceptions, and the broader semantics avoid double-wrapping (e.g. for the harness, ModelRetry → ToolRetryError → MontyRuntimeError → ModelRetry collapses to ModelRetry → MontyRuntimeError → ModelRetry).

Test plan

test_handle_call_wrap_validation_errors_false (tests/test_toolsets.py) covers raw-error propagation for both ValidationError (bad args) and tool-body ModelRetry, plus the failed_tools invariant in raw vs default mode.
test_deferred_tool_handler_via_handle_call_wrap_validation_errors_false (tests/test_capabilities.py) covers the deferred-with-handler path: an approved tool re-raising ModelRetry propagates raw through _resolve_single_deferred when wrap_validation_errors=False is forwarded.
All existing toolsets/tools/capabilities tests pass.
All existing streaming/agent tests pass (covering the output-tool wrap_validation_errors path that was unaffected).

Checklist

Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
No breaking changes in accordance with the version policy.
PR title is fit for the release changelog.

…`validate_tool_call` / `execute_tool_call` #4859 unintentionally dropped the `wrap_validation_errors` keyword from the function-tool path on `ToolManager`, while keeping it on the output-tool methods (`validate_output_tool_call`, `execute_output_tool_call`, `handle_output_tool_call`). Restore it (kw-only, default `True`). When `False`, raw `ValidationError` / `ModelRetry` propagates at every stage — validation, capability hooks, and the tool body — and retry-budget state (`_check_max_retries`, `failed_tools`) is left untouched so out-of-band callers don't consume the agent's retry budget. Hooks themselves see raw exceptions in both modes; the wrap happens at the outer edge of `_run_execute_hooks`, so the boundary visible to capabilities is unchanged.

github-actions · 2026-05-01T18:26:18Z

Docs Preview

commit:	`ce62f64`
Preview URL:	https://09561a2f-pydantic-ai-previews.pydantic.workers.dev

…e_single_deferred`; drop unused `TestDeps` Auto-review caught a pre-existing latent bug from #5142: `_resolve_single_deferred` re-validates and re-executes the approved tool without forwarding `wrap_validation_errors`, so a sandboxed caller using `handle_call(wrap_validation_errors=False)` on a tool that requires approval would see post-approval errors wrap as `ToolRetryError` and consume the retry budget — inconsistent with the caller's intent. Fix by threading the flag through. Handler-constructed retry signals (`ModelRetry` / `RetryPromptPart` returned by the handler) still surface as `ToolRetryError` regardless — those are handler outputs, not exceptions raised by validation or the tool body. Also addresses the nit on the new test using an unused `TestDeps` dataclass — matches the rest of the file using `None` as deps.

…ver happy path; ruff format - `_execute_tool_call_impl` no longer unwraps a pre-wrapped `validation_error` to its cause when `wrap_validation_errors=False`. The branch was unreachable via the public API (`handle_call(wrap=False)` propagates to `validate_tool_call(wrap=False)` which raises raw before `execute_tool_call` is ever called) and only existed to "fix" mismatched flags between separate validate/execute calls — no real caller does that. Per-method flag semantics are cleaner: each method's `wrap_validation_errors` governs the errors it produces, not pre-existing wrapped errors carried in the input. - Add a happy-path call to the new `wrap_validation_errors=False` test so the tool body actually executes (was missed by coverage; the test only invoked the tool with bad args). - Apply ruff format to the new deferred-resolution test (CI lint was wanting the multi-line generator collapsed).

1.89.1 ships pydantic/pydantic-ai#5275 (restored `wrap_validation_errors` on `ToolManager.handle_call`). Pinning the floor there so harness users on the broken 1.86–1.89.0 window get a clean resolver error instead of a runtime `TypeError: ToolManager.handle_call() got an unexpected keyword argument 'wrap_validation_errors'`. `uv.lock` is unchanged: `tool.uv.sources` already pulls pydantic-ai-slim from `main`, which is past 1.89.1.

…ce; cover approved-tool ApprovalRequired re-raise (#223) * test(code_mode): cover approved-tool re-raising `ApprovalRequired` pydantic/pydantic-ai#5275 documents on `_resolve_single_deferred` that a re-raised `CallDeferred` / `ApprovalRequired` from the post-approval tool body bubbles up without re-invoking the handler. The harness's existing `except (CallDeferred, ApprovalRequired)` clause then converts it to a `UserError` → `MontyRuntimeError` → `ModelRetry`. Lock that contract in with a regression test so we notice if the upstream behavior shifts. Refresh `uv.lock` to pick up #5275 from pydantic-ai main, which restored `wrap_validation_errors` on `ToolManager.handle_call`. No harness code change is needed — the existing call site already uses the kwarg. * ci: add `compat-test.yml` reusable workflow for pydantic-ai → harness compat checks Called from pydantic-ai's CI (via `workflow_call`) to verify that an arbitrary pydantic-ai ref doesn't break the harness's lint / typecheck / test suite. The workflow: - checks out harness@main and pydantic-ai @ the input ref (with optional `pydantic-ai-repo` for fork PRs) - rewrites `[tool.uv.sources]` to point `pydantic-ai-slim` at the local checkout, then `uv lock --upgrade-package` to refresh the lock - runs `ruff format --check`, `ruff check`, `pyright`, `pytest` Runs without secrets and with `contents: read` only — safe to invoke from fork-PR contexts (the calling pydantic-ai workflow gates fork PRs behind an approval label before invoking, as defense-in-depth). * chore: bump `pydantic-ai-slim` floor to `>=1.89.1` 1.89.1 ships pydantic/pydantic-ai#5275 (restored `wrap_validation_errors` on `ToolManager.handle_call`). Pinning the floor there so harness users on the broken 1.86–1.89.0 window get a clean resolver error instead of a runtime `TypeError: ToolManager.handle_call() got an unexpected keyword argument 'wrap_validation_errors'`. `uv.lock` is unchanged: `tool.uv.sources` already pulls pydantic-ai-slim from `main`, which is past 1.89.1. * ci: bump `test-floor` pin to match new `>=1.89.1` floor Companion to ae9428e: that commit bumped the floor in `pyproject.toml` but left this job pinning 1.80.0, which (a) no longer matches the documented floor and (b) would have started failing because 1.80.0 predates #5275 and the harness's existing `handle_call(wrap_validation_errors=...)` call site would TypeError against it. * ci+chore: drop `[tool.uv.sources]` for pydantic-ai-slim; auto-floor `test-floor`; simplify compat-test - Drop `[tool.uv.sources]` git override for pydantic-ai-slim so the lock resolves from PyPI like users would. The harness's `test` matrix and install path now mirror what users actually get; bleeding-edge compat with pydantic-ai's main is verified by pydantic-ai's `harness compat` job (called from its CI on PRs, main pushes, and tags). `uv.lock` refreshed. - Refactor `test-floor` to read the floor version from `pyproject.toml` at runtime instead of hardcoding it. Bumping the floor in pyproject is now the single source of truth — no separate CI pin to keep in sync. - Simplify `compat-test.yml`: with no `[tool.uv.sources]` to fight, the workflow just `uv sync`s the lock then `uv pip install --no-deps -e`'s the local pydantic-ai checkout. Same pattern test-floor uses, just pointing at a path instead of a PyPI version. * ci: switch `test-floor` to `uv sync --resolution lowest-direct`; install `pydantic-graph` from checkout in compat-test - Replace the bespoke "read floor from pyproject + pin slim only" Python script in `test-floor` with `uv sync --resolution lowest-direct`, matching pydantic-ai's `test-lowest-versions` job. Resolves *all* direct deps to their floors so the job exercises the full claimed compatibility envelope, not just slim's floor. - Add explicit floors for the previously-unfloored dev deps so `lowest-direct` doesn't drag pre-Python-3 versions: `pytest>=9.0.0`, `anyio[trio]>=4.11.0` (where the pytest plugin started registering the `anyio_mode` ini option), `coverage>=7.10.7`. Match pydantic-ai's discipline. `pytest-anyio` has only 0.0.0 on PyPI so no floor. - `compat-test.yml` now also `uv pip install -e`'s `pydantic-graph` from the local pydantic-ai checkout alongside slim. The two are sibling packages that version-track together; testing slim from-source against a PyPI `pydantic-graph` would mask cross-package issues (e.g. a slim PR depending on an unreleased graph change).

github-actions Bot added the size: S Small PR (≤100 weighted lines) label May 1, 2026

DouweM added the auto-review label May 1, 2026

github-actions Bot added the bug Report that something isn't working, or PR implementing a fix label May 1, 2026

github-actions Bot reviewed May 1, 2026

View reviewed changes

Comment thread pydantic_ai_slim/pydantic_ai/tool_manager.py

github-actions Bot reviewed May 1, 2026

View reviewed changes

Comment thread tests/test_toolsets.py Outdated

github-actions Bot removed the auto-review label May 1, 2026

github-actions Bot added size: M Medium PR (101-500 weighted lines) and removed size: S Small PR (≤100 weighted lines) labels May 1, 2026

DouweM merged commit f7ff835 into main May 1, 2026
47 checks passed

DouweM deleted the tool-man-wrap-val-error branch May 1, 2026 18:50

DouweM mentioned this pull request May 1, 2026

ci: harness-compat reusable workflow + drop pydantic-ai-slim git source; cover approved-tool ApprovalRequired re-raise pydantic/pydantic-ai-harness#223

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: restore `wrap_validation_errors` on `ToolManager` function-tool methods#5275

fix: restore `wrap_validation_errors` on `ToolManager` function-tool methods#5275
DouweM merged 3 commits intomainfrom
tool-man-wrap-val-error

DouweM commented May 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DouweM commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Origin: regression from two PRs that compounded

Behavior

Slight semantic change vs. pre-#4859 (deliberate)

Test plan

Checklist

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Docs Preview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DouweM commented May 1, 2026 •

edited

Loading

github-actions Bot commented May 1, 2026 •

edited

Loading