fix: restore wrap_validation_errors on ToolManager function-tool methods#5275
Merged
fix: restore wrap_validation_errors on ToolManager function-tool methods#5275
wrap_validation_errors on ToolManager function-tool methods#5275Conversation
…`validate_tool_call` / `execute_tool_call` #4859 unintentionally dropped the `wrap_validation_errors` keyword from the function-tool path on `ToolManager`, while keeping it on the output-tool methods (`validate_output_tool_call`, `execute_output_tool_call`, `handle_output_tool_call`). Restore it (kw-only, default `True`). When `False`, raw `ValidationError` / `ModelRetry` propagates at every stage — validation, capability hooks, and the tool body — and retry-budget state (`_check_max_retries`, `failed_tools`) is left untouched so out-of-band callers don't consume the agent's retry budget. Hooks themselves see raw exceptions in both modes; the wrap happens at the outer edge of `_run_execute_hooks`, so the boundary visible to capabilities is unchanged.
Contributor
Docs Preview
|
…e_single_deferred`; drop unused `TestDeps` Auto-review caught a pre-existing latent bug from #5142: `_resolve_single_deferred` re-validates and re-executes the approved tool without forwarding `wrap_validation_errors`, so a sandboxed caller using `handle_call(wrap_validation_errors=False)` on a tool that requires approval would see post-approval errors wrap as `ToolRetryError` and consume the retry budget — inconsistent with the caller's intent. Fix by threading the flag through. Handler-constructed retry signals (`ModelRetry` / `RetryPromptPart` returned by the handler) still surface as `ToolRetryError` regardless — those are handler outputs, not exceptions raised by validation or the tool body. Also addresses the nit on the new test using an unused `TestDeps` dataclass — matches the rest of the file using `None` as deps.
…ver happy path; ruff format - `_execute_tool_call_impl` no longer unwraps a pre-wrapped `validation_error` to its cause when `wrap_validation_errors=False`. The branch was unreachable via the public API (`handle_call(wrap=False)` propagates to `validate_tool_call(wrap=False)` which raises raw before `execute_tool_call` is ever called) and only existed to "fix" mismatched flags between separate validate/execute calls — no real caller does that. Per-method flag semantics are cleaner: each method's `wrap_validation_errors` governs the errors it produces, not pre-existing wrapped errors carried in the input. - Add a happy-path call to the new `wrap_validation_errors=False` test so the tool body actually executes (was missed by coverage; the test only invoked the tool with bad args). - Apply ruff format to the new deferred-resolution test (CI lint was wanting the multi-line generator collapsed).
5 tasks
DouweM
added a commit
to pydantic/pydantic-ai-harness
that referenced
this pull request
May 1, 2026
1.89.1 ships pydantic/pydantic-ai#5275 (restored `wrap_validation_errors` on `ToolManager.handle_call`). Pinning the floor there so harness users on the broken 1.86–1.89.0 window get a clean resolver error instead of a runtime `TypeError: ToolManager.handle_call() got an unexpected keyword argument 'wrap_validation_errors'`. `uv.lock` is unchanged: `tool.uv.sources` already pulls pydantic-ai-slim from `main`, which is past 1.89.1.
DouweM
added a commit
to pydantic/pydantic-ai-harness
that referenced
this pull request
May 5, 2026
…ce; cover approved-tool ApprovalRequired re-raise (#223) * test(code_mode): cover approved-tool re-raising `ApprovalRequired` pydantic/pydantic-ai#5275 documents on `_resolve_single_deferred` that a re-raised `CallDeferred` / `ApprovalRequired` from the post-approval tool body bubbles up without re-invoking the handler. The harness's existing `except (CallDeferred, ApprovalRequired)` clause then converts it to a `UserError` → `MontyRuntimeError` → `ModelRetry`. Lock that contract in with a regression test so we notice if the upstream behavior shifts. Refresh `uv.lock` to pick up #5275 from pydantic-ai main, which restored `wrap_validation_errors` on `ToolManager.handle_call`. No harness code change is needed — the existing call site already uses the kwarg. * ci: add `compat-test.yml` reusable workflow for pydantic-ai → harness compat checks Called from pydantic-ai's CI (via `workflow_call`) to verify that an arbitrary pydantic-ai ref doesn't break the harness's lint / typecheck / test suite. The workflow: - checks out harness@main and pydantic-ai @ the input ref (with optional `pydantic-ai-repo` for fork PRs) - rewrites `[tool.uv.sources]` to point `pydantic-ai-slim` at the local checkout, then `uv lock --upgrade-package` to refresh the lock - runs `ruff format --check`, `ruff check`, `pyright`, `pytest` Runs without secrets and with `contents: read` only — safe to invoke from fork-PR contexts (the calling pydantic-ai workflow gates fork PRs behind an approval label before invoking, as defense-in-depth). * chore: bump `pydantic-ai-slim` floor to `>=1.89.1` 1.89.1 ships pydantic/pydantic-ai#5275 (restored `wrap_validation_errors` on `ToolManager.handle_call`). Pinning the floor there so harness users on the broken 1.86–1.89.0 window get a clean resolver error instead of a runtime `TypeError: ToolManager.handle_call() got an unexpected keyword argument 'wrap_validation_errors'`. `uv.lock` is unchanged: `tool.uv.sources` already pulls pydantic-ai-slim from `main`, which is past 1.89.1. * ci: bump `test-floor` pin to match new `>=1.89.1` floor Companion to ae9428e: that commit bumped the floor in `pyproject.toml` but left this job pinning 1.80.0, which (a) no longer matches the documented floor and (b) would have started failing because 1.80.0 predates #5275 and the harness's existing `handle_call(wrap_validation_errors=...)` call site would TypeError against it. * ci+chore: drop `[tool.uv.sources]` for pydantic-ai-slim; auto-floor `test-floor`; simplify compat-test - Drop `[tool.uv.sources]` git override for pydantic-ai-slim so the lock resolves from PyPI like users would. The harness's `test` matrix and install path now mirror what users actually get; bleeding-edge compat with pydantic-ai's main is verified by pydantic-ai's `harness compat` job (called from its CI on PRs, main pushes, and tags). `uv.lock` refreshed. - Refactor `test-floor` to read the floor version from `pyproject.toml` at runtime instead of hardcoding it. Bumping the floor in pyproject is now the single source of truth — no separate CI pin to keep in sync. - Simplify `compat-test.yml`: with no `[tool.uv.sources]` to fight, the workflow just `uv sync`s the lock then `uv pip install --no-deps -e`'s the local pydantic-ai checkout. Same pattern test-floor uses, just pointing at a path instead of a PyPI version. * ci: switch `test-floor` to `uv sync --resolution lowest-direct`; install `pydantic-graph` from checkout in compat-test - Replace the bespoke "read floor from pyproject + pin slim only" Python script in `test-floor` with `uv sync --resolution lowest-direct`, matching pydantic-ai's `test-lowest-versions` job. Resolves *all* direct deps to their floors so the job exercises the full claimed compatibility envelope, not just slim's floor. - Add explicit floors for the previously-unfloored dev deps so `lowest-direct` doesn't drag pre-Python-3 versions: `pytest>=9.0.0`, `anyio[trio]>=4.11.0` (where the pytest plugin started registering the `anyio_mode` ini option), `coverage>=7.10.7`. Match pydantic-ai's discipline. `pytest-anyio` has only 0.0.0 on PyPI so no floor. - `compat-test.yml` now also `uv pip install -e`'s `pydantic-graph` from the local pydantic-ai checkout alongside slim. The two are sibling packages that version-track together; testing slim from-source against a PyPI `pydantic-graph` would mask cross-package issues (e.g. a slim PR depending on an unreleased graph change).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Restores
wrap_validation_errors(kw-only, defaultTrue) onToolManager.validate_tool_call,execute_tool_call, andhandle_call, and properly threads it through_resolve_single_deferredso the flag is honored across the full function-tool surface — validation, capability hooks, the tool body, and post-approval re-execution after deferred-tool resolution.Origin: regression from two PRs that compounded
This issue is the interaction of two recent merges that each got part of the way:
HandleDeferredToolCallscapability andhandle_deferred_tool_callshook #5142 (HandleDeferredToolCalls) introduced_resolve_single_deferred, called from insidehandle_callwhen an approved tool needs re-validation + re-execution. At the time,handle_calldid acceptwrap_validation_errors, but the new helper didn't receive it — so the flag was already silently failing on the deferred-with-handler path. Latent bug.prepare_toolsto function tools, addprepare_output_tools#4859 (output hooks) then refactoredvalidate_tool_callto always returnValidatedToolCalland dropped thewrap_validation_errorskeyword fromvalidate_tool_call,execute_tool_call, andhandle_callentirely (it was kept on the output-tool counterparts:validate_output_tool_call/execute_output_tool_call/handle_output_tool_call). The flag's complete disappearance also hid the feat: addHandleDeferredToolCallscapability andhandle_deferred_tool_callshook #5142 gap.Together they broke
pydantic-ai-harness's code-mode dispatch, which has been callingtool_manager.handle_call(call_part, wrap_validation_errors=False)to surface raw validation/ModelRetryerrors at the sandboxawaitsite.Behavior
When
wrap_validation_errors=False:ValidationError/ModelRetrypropagates from validation, capability hooks (before_tool_execute/wrap_tool_execute/after_tool_execute), the tool body itself, and post-approval re-execution after deferred-tool resolution_check_max_retriesandfailed_tools.addare skipped, so out-of-band callers (e.g. nested sandboxed dispatch) don't consume the agent's retry budgetWhen
wrap_validation_errors=True(default), behavior is unchanged from currentmain.Capabilities themselves see raw exceptions in both modes — the wrap happens at the outer edge of
_run_execute_hooks, so the boundary visible to hooks is unchanged.Handler-constructed retry signals (
ModelRetry/RetryPromptPartreturned by aHandleDeferredToolCallshandler) still surface asToolRetryErrorregardless of the flag — those are handler outputs, not exceptions raised by validation or the tool body. Documented in_resolve_single_deferred.Slight semantic change vs. pre-#4859 (deliberate)
The pre-#4859 flag only governed validation-time errors; tool-body
ModelRetryalways wrapped asToolRetryErrorregardless. This restoration makes the flag govern the full validate→hooks→execute pipeline, matching howwrap_validation_errorsalready works on the output-tool methods. The function-tool and output-tool paths now offer the same opt-out semantics.This is technically a behavior change for callers passing
wrap_validation_errors=Falsepre-#4859, but it's a strict improvement: anyone passingFalsewas clearly trying to get raw exceptions, and the broader semantics avoid double-wrapping (e.g. for the harness,ModelRetry → ToolRetryError → MontyRuntimeError → ModelRetrycollapses toModelRetry → MontyRuntimeError → ModelRetry).Test plan
test_handle_call_wrap_validation_errors_false(tests/test_toolsets.py) covers raw-error propagation for bothValidationError(bad args) and tool-bodyModelRetry, plus thefailed_toolsinvariant in raw vs default mode.test_deferred_tool_handler_via_handle_call_wrap_validation_errors_false(tests/test_capabilities.py) covers the deferred-with-handler path: an approved tool re-raisingModelRetrypropagates raw through_resolve_single_deferredwhenwrap_validation_errors=Falseis forwarded.wrap_validation_errorspath that was unaffected).Checklist