Skip to content

fix: restore wrap_validation_errors on ToolManager function-tool methods#5275

Merged
DouweM merged 3 commits intomainfrom
tool-man-wrap-val-error
May 1, 2026
Merged

fix: restore wrap_validation_errors on ToolManager function-tool methods#5275
DouweM merged 3 commits intomainfrom
tool-man-wrap-val-error

Conversation

@DouweM
Copy link
Copy Markdown
Collaborator

@DouweM DouweM commented May 1, 2026

  • Closes #

Summary

Restores wrap_validation_errors (kw-only, default True) on ToolManager.validate_tool_call, execute_tool_call, and handle_call, and properly threads it through _resolve_single_deferred so the flag is honored across the full function-tool surface — validation, capability hooks, the tool body, and post-approval re-execution after deferred-tool resolution.

Origin: regression from two PRs that compounded

This issue is the interaction of two recent merges that each got part of the way:

Together they broke pydantic-ai-harness's code-mode dispatch, which has been calling tool_manager.handle_call(call_part, wrap_validation_errors=False) to surface raw validation/ModelRetry errors at the sandbox await site.

Behavior

When wrap_validation_errors=False:

  • raw ValidationError / ModelRetry propagates from validation, capability hooks (before_tool_execute / wrap_tool_execute / after_tool_execute), the tool body itself, and post-approval re-execution after deferred-tool resolution
  • _check_max_retries and failed_tools.add are skipped, so out-of-band callers (e.g. nested sandboxed dispatch) don't consume the agent's retry budget

When wrap_validation_errors=True (default), behavior is unchanged from current main.

Capabilities themselves see raw exceptions in both modes — the wrap happens at the outer edge of _run_execute_hooks, so the boundary visible to hooks is unchanged.

Handler-constructed retry signals (ModelRetry / RetryPromptPart returned by a HandleDeferredToolCalls handler) still surface as ToolRetryError regardless of the flag — those are handler outputs, not exceptions raised by validation or the tool body. Documented in _resolve_single_deferred.

Slight semantic change vs. pre-#4859 (deliberate)

The pre-#4859 flag only governed validation-time errors; tool-body ModelRetry always wrapped as ToolRetryError regardless. This restoration makes the flag govern the full validate→hooks→execute pipeline, matching how wrap_validation_errors already works on the output-tool methods. The function-tool and output-tool paths now offer the same opt-out semantics.

This is technically a behavior change for callers passing wrap_validation_errors=False pre-#4859, but it's a strict improvement: anyone passing False was clearly trying to get raw exceptions, and the broader semantics avoid double-wrapping (e.g. for the harness, ModelRetry → ToolRetryError → MontyRuntimeError → ModelRetry collapses to ModelRetry → MontyRuntimeError → ModelRetry).

Test plan

  • test_handle_call_wrap_validation_errors_false (tests/test_toolsets.py) covers raw-error propagation for both ValidationError (bad args) and tool-body ModelRetry, plus the failed_tools invariant in raw vs default mode.
  • test_deferred_tool_handler_via_handle_call_wrap_validation_errors_false (tests/test_capabilities.py) covers the deferred-with-handler path: an approved tool re-raising ModelRetry propagates raw through _resolve_single_deferred when wrap_validation_errors=False is forwarded.
  • All existing toolsets/tools/capabilities tests pass.
  • All existing streaming/agent tests pass (covering the output-tool wrap_validation_errors path that was unaffected).

Checklist

  • Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
  • No breaking changes in accordance with the version policy.
  • PR title is fit for the release changelog.

…`validate_tool_call` / `execute_tool_call`

#4859 unintentionally dropped the `wrap_validation_errors` keyword from the
function-tool path on `ToolManager`, while keeping it on the output-tool
methods (`validate_output_tool_call`, `execute_output_tool_call`,
`handle_output_tool_call`). Restore it (kw-only, default `True`).

When `False`, raw `ValidationError` / `ModelRetry` propagates at every stage
— validation, capability hooks, and the tool body — and retry-budget state
(`_check_max_retries`, `failed_tools`) is left untouched so out-of-band
callers don't consume the agent's retry budget.

Hooks themselves see raw exceptions in both modes; the wrap happens at the
outer edge of `_run_execute_hooks`, so the boundary visible to capabilities
is unchanged.
@github-actions github-actions Bot added the size: S Small PR (≤100 weighted lines) label May 1, 2026
@github-actions github-actions Bot added the bug Report that something isn't working, or PR implementing a fix label May 1, 2026
Comment thread pydantic_ai_slim/pydantic_ai/tool_manager.py
Comment thread tests/test_toolsets.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

…e_single_deferred`; drop unused `TestDeps`

Auto-review caught a pre-existing latent bug from #5142: `_resolve_single_deferred`
re-validates and re-executes the approved tool without forwarding
`wrap_validation_errors`, so a sandboxed caller using
`handle_call(wrap_validation_errors=False)` on a tool that requires approval would
see post-approval errors wrap as `ToolRetryError` and consume the retry budget —
inconsistent with the caller's intent. Fix by threading the flag through.

Handler-constructed retry signals (`ModelRetry` / `RetryPromptPart` returned by
the handler) still surface as `ToolRetryError` regardless — those are handler
outputs, not exceptions raised by validation or the tool body.

Also addresses the nit on the new test using an unused `TestDeps` dataclass —
matches the rest of the file using `None` as deps.
@github-actions github-actions Bot added size: M Medium PR (101-500 weighted lines) and removed size: S Small PR (≤100 weighted lines) labels May 1, 2026
…ver happy path; ruff format

- `_execute_tool_call_impl` no longer unwraps a pre-wrapped `validation_error`
  to its cause when `wrap_validation_errors=False`. The branch was unreachable
  via the public API (`handle_call(wrap=False)` propagates to
  `validate_tool_call(wrap=False)` which raises raw before `execute_tool_call`
  is ever called) and only existed to "fix" mismatched flags between separate
  validate/execute calls — no real caller does that. Per-method flag semantics
  are cleaner: each method's `wrap_validation_errors` governs the errors it
  produces, not pre-existing wrapped errors carried in the input.

- Add a happy-path call to the new `wrap_validation_errors=False` test so the
  tool body actually executes (was missed by coverage; the test only invoked
  the tool with bad args).

- Apply ruff format to the new deferred-resolution test (CI lint was wanting
  the multi-line generator collapsed).
@DouweM DouweM merged commit f7ff835 into main May 1, 2026
47 checks passed
@DouweM DouweM deleted the tool-man-wrap-val-error branch May 1, 2026 18:50
DouweM added a commit to pydantic/pydantic-ai-harness that referenced this pull request May 1, 2026
1.89.1 ships pydantic/pydantic-ai#5275 (restored `wrap_validation_errors`
on `ToolManager.handle_call`). Pinning the floor there so harness users
on the broken 1.86–1.89.0 window get a clean resolver error instead of
a runtime `TypeError: ToolManager.handle_call() got an unexpected
keyword argument 'wrap_validation_errors'`.

`uv.lock` is unchanged: `tool.uv.sources` already pulls pydantic-ai-slim
from `main`, which is past 1.89.1.
DouweM added a commit to pydantic/pydantic-ai-harness that referenced this pull request May 5, 2026
…ce; cover approved-tool ApprovalRequired re-raise (#223)

* test(code_mode): cover approved-tool re-raising `ApprovalRequired`

pydantic/pydantic-ai#5275 documents on `_resolve_single_deferred` that a
re-raised `CallDeferred` / `ApprovalRequired` from the post-approval tool
body bubbles up without re-invoking the handler. The harness's existing
`except (CallDeferred, ApprovalRequired)` clause then converts it to a
`UserError` → `MontyRuntimeError` → `ModelRetry`. Lock that contract in
with a regression test so we notice if the upstream behavior shifts.

Refresh `uv.lock` to pick up #5275 from pydantic-ai main, which restored
`wrap_validation_errors` on `ToolManager.handle_call`. No harness code
change is needed — the existing call site already uses the kwarg.

* ci: add `compat-test.yml` reusable workflow for pydantic-ai → harness compat checks

Called from pydantic-ai's CI (via `workflow_call`) to verify that an arbitrary
pydantic-ai ref doesn't break the harness's lint / typecheck / test suite. The
workflow:

- checks out harness@main and pydantic-ai @ the input ref (with optional
  `pydantic-ai-repo` for fork PRs)
- rewrites `[tool.uv.sources]` to point `pydantic-ai-slim` at the local
  checkout, then `uv lock --upgrade-package` to refresh the lock
- runs `ruff format --check`, `ruff check`, `pyright`, `pytest`

Runs without secrets and with `contents: read` only — safe to invoke from
fork-PR contexts (the calling pydantic-ai workflow gates fork PRs behind an
approval label before invoking, as defense-in-depth).

* chore: bump `pydantic-ai-slim` floor to `>=1.89.1`

1.89.1 ships pydantic/pydantic-ai#5275 (restored `wrap_validation_errors`
on `ToolManager.handle_call`). Pinning the floor there so harness users
on the broken 1.86–1.89.0 window get a clean resolver error instead of
a runtime `TypeError: ToolManager.handle_call() got an unexpected
keyword argument 'wrap_validation_errors'`.

`uv.lock` is unchanged: `tool.uv.sources` already pulls pydantic-ai-slim
from `main`, which is past 1.89.1.

* ci: bump `test-floor` pin to match new `>=1.89.1` floor

Companion to ae9428e: that commit bumped the floor in `pyproject.toml` but
left this job pinning 1.80.0, which (a) no longer matches the documented
floor and (b) would have started failing because 1.80.0 predates #5275 and
the harness's existing `handle_call(wrap_validation_errors=...)` call site
would TypeError against it.

* ci+chore: drop `[tool.uv.sources]` for pydantic-ai-slim; auto-floor `test-floor`; simplify compat-test

- Drop `[tool.uv.sources]` git override for pydantic-ai-slim so the lock
  resolves from PyPI like users would. The harness's `test` matrix and
  install path now mirror what users actually get; bleeding-edge compat
  with pydantic-ai's main is verified by pydantic-ai's `harness compat`
  job (called from its CI on PRs, main pushes, and tags). `uv.lock`
  refreshed.

- Refactor `test-floor` to read the floor version from `pyproject.toml`
  at runtime instead of hardcoding it. Bumping the floor in pyproject is
  now the single source of truth — no separate CI pin to keep in sync.

- Simplify `compat-test.yml`: with no `[tool.uv.sources]` to fight, the
  workflow just `uv sync`s the lock then `uv pip install --no-deps -e`'s
  the local pydantic-ai checkout. Same pattern test-floor uses, just
  pointing at a path instead of a PyPI version.

* ci: switch `test-floor` to `uv sync --resolution lowest-direct`; install `pydantic-graph` from checkout in compat-test

- Replace the bespoke "read floor from pyproject + pin slim only" Python
  script in `test-floor` with `uv sync --resolution lowest-direct`,
  matching pydantic-ai's `test-lowest-versions` job. Resolves *all* direct
  deps to their floors so the job exercises the full claimed compatibility
  envelope, not just slim's floor.

- Add explicit floors for the previously-unfloored dev deps so
  `lowest-direct` doesn't drag pre-Python-3 versions: `pytest>=9.0.0`,
  `anyio[trio]>=4.11.0` (where the pytest plugin started registering the
  `anyio_mode` ini option), `coverage>=7.10.7`. Match pydantic-ai's
  discipline. `pytest-anyio` has only 0.0.0 on PyPI so no floor.

- `compat-test.yml` now also `uv pip install -e`'s `pydantic-graph` from
  the local pydantic-ai checkout alongside slim. The two are sibling
  packages that version-track together; testing slim from-source against a
  PyPI `pydantic-graph` would mask cross-package issues (e.g. a slim PR
  depending on an unreleased graph change).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Report that something isn't working, or PR implementing a fix size: M Medium PR (101-500 weighted lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant