Skip to content

fix: output validators see global retry counter on tool output path#4956

Merged
DouweM merged 7 commits intopydantic:mainfrom
dsfaccini:dsfaccini/fix-output-validator-retry-consistency
Apr 10, 2026
Merged

fix: output validators see global retry counter on tool output path#4956
DouweM merged 7 commits intopydantic:mainfrom
dsfaccini:dsfaccini/fix-output-validator-retry-consistency

Conversation

@dsfaccini
Copy link
Copy Markdown
Collaborator

Closes #4385

Summary

  • @agent.output_validator received inconsistent ctx.retry and ctx.max_retries depending on whether the model returned text (global counter) or called an output tool (per-tool counter)
  • Propagate global retry info through for_run_step to OutputToolset, which stores it and builds a separate validator context with global values
  • Output functions still receive per-tool context since they are tools with per-tool retry tracking — only validators see global values
  • Keep ctx.state.retries in sync when output tools fail validation or raise ModelRetry

Split from #4325 per maintainer request. Also supersedes #4527.

Test plan

  • test_output_validator_retry_consistency_across_paths — uses ToolOutput(max_retries=5) with output_retries=2 to verify validators see max_retries=2 (global) not 5 (per-tool)
  • All 221 tests in test_agent.py pass

🤖 Generated with Claude Code

Closes pydantic#4385

Output validators received inconsistent ctx.retry/max_retries depending
on whether the model returned text or called an output tool. Text path
used the global output retry counter; tool path used per-tool counters.

Propagate global retry info through for_run_step to OutputToolset, which
builds a separate validator context with global values. Output functions
still receive per-tool context since they are tools with per-tool retry
tracking.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@github-actions github-actions Bot added size: S Small PR (≤100 weighted lines) bug Report that something isn't working, or PR implementing a fix labels Apr 2, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

- Replace bare `retries += 1` with `increment_retries()` to enforce
  output_retries limit on tool output validation path
- Add test verifying UnexpectedModelBehavior when output_retries exceeded

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@dsfaccini
Copy link
Copy Markdown
Collaborator Author

Ralph Loop: Questions Need Answers

The Ralph Loop is blocked on the following questions. Please answer by editing questions.json or replying to this comment.

push-commit. MANUAL ACTION NEEDED: Run 'git push' to push commit f342b0b. Answer 'done' when pushed.

Free text answer expected.

  • push-commit.escalate: Reply with this to escalate — I'll post the question with full context on the PR for maintainer input.

resolve-PRRT_kwDOMMa-Ps54k634. Resolve thread PRRT_kwDOMMa-Ps54k634? (Replace bare retries+=1 with increment_retries)

  • resolve-PRRT_kwDOMMa-Ps54k634.yes: Resolve
  • resolve-PRRT_kwDOMMa-Ps54k634.no: Skip
  • resolve-PRRT_kwDOMMa-Ps54k634.escalate: Reply with this to escalate — I'll post the question with full context on the PR for maintainer input.

resolve-PRRT_kwDOMMa-Ps54k640. Resolve thread PRRT_kwDOMMa-Ps54k640? (Add negative test case for output_retries limit)

  • resolve-PRRT_kwDOMMa-Ps54k640.yes: Resolve
  • resolve-PRRT_kwDOMMa-Ps54k640.no: Skip
  • resolve-PRRT_kwDOMMa-Ps54k640.escalate: Reply with this to escalate — I'll post the question with full context on the PR for maintainer input.

Posted by Ralph Loop (headless mode) at 2026-04-03T04:21:19Z

@dsfaccini
Copy link
Copy Markdown
Collaborator Author

Ralph Loop: Questions Need Answers

The Ralph Loop is blocked on the following questions. Please answer by editing questions.json or replying to this comment.

push-commit. Push commit f342b0b to remote?

  • push-commit.yes: Push
  • push-commit.no: Skip
  • push-commit.escalate: Reply with this to escalate — I'll post the question with full context on the PR for maintainer input.

resolve-PRRT_kwDOMMa-Ps54k634. Resolve thread PRRT_kwDOMMa-Ps54k634? (Replace bare retries+=1 with increment_retries())

  • resolve-PRRT_kwDOMMa-Ps54k634.yes: Resolve
  • resolve-PRRT_kwDOMMa-Ps54k634.no: Skip
  • resolve-PRRT_kwDOMMa-Ps54k634.escalate: Reply with this to escalate — I'll post the question with full context on the PR for maintainer input.

resolve-PRRT_kwDOMMa-Ps54k640. Resolve thread PRRT_kwDOMMa-Ps54k640? (Add negative test case for output_retries limit)

  • resolve-PRRT_kwDOMMa-Ps54k640.yes: Resolve
  • resolve-PRRT_kwDOMMa-Ps54k640.no: Skip
  • resolve-PRRT_kwDOMMa-Ps54k640.escalate: Reply with this to escalate — I'll post the question with full context on the PR for maintainer input.

Posted by Ralph Loop (headless mode) at 2026-04-03T17:58:17Z

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@dsfaccini
Copy link
Copy Markdown
Collaborator Author

Claude here: pushed a fix for the CI failures. Two issues were resolved:

1. OutputToolset.for_run_stepreplace() incompatible with @dataclass(init=False)

OutputToolset uses @dataclass(init=False) with a custom __init__ where field names differ from init param names (e.g. field _tool_defs → param tool_defs). dataclasses.replace() passes field names as kwargs to __init__, causing TypeError. Fixed by using copy.copy() which bypasses __init__ and copies __dict__ directly — same pattern used in usage.py.

2. increment_retries in process_tool_calls — merge conflict with upstream #4687

Our commit f342b0beb added ctx.state.increment_retries(ctx.deps.max_result_retries) to enforce output_retries as a global cap. Meanwhile, upstream PR #4687 ("respect ToolOutput.max_retries parameter") removed those same increment_retries calls in favor of per-tool enforcement via ToolManager._check_max_retries. The merge kept both, causing the global max_result_retries check (from agent retries=2) to fire before the per-tool limit (ToolOutput(max_retries=5)), breaking test_tool_output_max_retries_overrides_agent_retries.

Fixed by replacing increment_retries() with bare ctx.state.retries += 1 — this still increments the global counter (needed for output validator context, which is the whole point of this PR), but defers limit enforcement to ToolManager._check_max_retries (the upstream approach). Updated the negative test accordingly: removed the per-tool max_retries=5 override so output_retries=2 flows through as the ToolsetTool.max_retries default.

…se tests

- OutputToolset.for_run_step: use copy() instead of replace() since
  @DataClass(init=False) with custom __init__ makes replace() fail
  (field names don't match init param names)
- process_tool_calls: replace increment_retries() with bare retries += 1
  to align with upstream pydantic#4687 which removed global limit checks in
  favor of per-tool enforcement via ToolManager._check_max_retries
- Update negative test to not use per-tool override that would bypass
  the output_retries default
- Add edge case tests: concurrent run isolation, retry counter across
  tool switches, validation vs execution failure retry counting

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@github-actions github-actions Bot added size: M Medium PR (101-500 weighted lines) and removed size: S Small PR (≤100 weighted lines) labels Apr 10, 2026
@dsfaccini
Copy link
Copy Markdown
Collaborator Author

Claude here: pushed 5043746cf on top of the merge commit — no force push needed.

Changes in this commit:

Fixes:

  • OutputToolset.for_run_step: use copy() instead of replace()@dataclass(init=False) with a custom __init__ where field names differ from param names (_tool_defs vs tool_defs) makes replace() pass unrecognized kwargs
  • process_tool_calls: replace increment_retries(ctx.deps.max_result_retries) with bare ctx.state.retries += 1 at both retry paths — aligns with upstream fix(output): respect ToolOutput.max_retries parameter #4687 which removed global limit checks in favor of per-tool enforcement via ToolManager._check_max_retries. We still need the bare increment to track the global counter for output validator context (the purpose of this PR)
  • Negative test updated: removed ToolOutput(Foo, max_retries=5) override so output_retries=2 flows through as the ToolsetTool.max_retries default

New edge case tests (3):

  1. test_concurrent_runs_output_retry_isolation — two asyncio.gather runs on the same agent verify _output_retry doesn't leak across concurrent runs (the copy() fix motivation)
  2. test_output_validator_retry_counter_with_tool_switch — model switches from tool B to tool A mid-run; verifies global retry counter increments across tool switches while per-tool counters stay independent
  3. test_output_tool_validation_vs_execution_retry_counting — bad args (validation failure) followed by ModelRetry (execution failure) both increment the global counter correctly

All 262 tests pass, lint clean.

devin-ai-integration[bot]

This comment was marked as resolved.

…t-validator-retry-consistency

# Conflicts:
#	tests/test_agent.py
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment thread tests/test_agent.py
Comment on lines +9323 to +9325
assert validator_retries == [0, 1, 2]
# max_retries reflects the agent-level default (0) since output_retries not set
assert validator_max_retries == [0, 0, 0]
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 last_attempt property becomes misleading for output validators when per-tool max_retries differs from output_retries

The RunContext.last_attempt property (_run_context.py:108-110) is self.retry == self.max_retries. With this PR, output validators see max_retries = ctx.deps.max_result_retries (the global output_retries value), while actual retry limits are per-tool. In test_output_validator_retry_counter_with_tool_switch, retries=0 and per-tool limits are 3/1. The validator sees max_retries=0 but runs 3 times, so last_attempt is True on the first call (0==0) and False on subsequent calls — the exact opposite of reality. This is a pre-existing semantic issue with last_attempt not designed for the two-tier retry system, but the PR makes it more visible by explicitly setting max_retries to the global value.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@DouweM DouweM enabled auto-merge (squash) April 10, 2026 22:56
@DouweM DouweM merged commit 4dcfb01 into pydantic:main Apr 10, 2026
43 of 44 checks passed
dsfaccini added a commit to dsfaccini/pydantic-ai that referenced this pull request Apr 11, 2026
… output-retry fix

The `ctx = replace(ctx, max_retries=self.default_max_retries)` line in
`ToolManager.for_run_step` was overriding the `max_result_retries` value
set by `_agent_graph.py` before each `for_run_step` call. This caused
output validators to see `max_retries=1` (the tool default) instead of the
correct `output_retries` value when `output_retries != retries`.

Upstream PR pydantic#4956 already propagates `ctx.max_retries=max_result_retries`
at all three `for_run_step` call sites in `_agent_graph.py`, so the
override in `ToolManager` is now both redundant and harmful on the output
path.

Also update `build_run_context` in tests to accept a `max_retries` param
so unit tests can simulate realistic agent contexts (where `_agent_graph.py`
sets `ctx.max_retries` before calling `for_run_step`).

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
dsfaccini added a commit to dsfaccini/pydantic-ai that referenced this pull request Apr 13, 2026
…ctx.max_retries` on tool path

Preparatory refactor of the output-retry machinery with three independently-motivated but tightly-coupled changes:

- Rename confusing private/internal retry fields so the mental model is obvious from code:
  `Agent._max_result_retries` -> `_max_output_retries`;
  `GraphAgentDeps.max_result_retries` -> `max_output_retries`;
  `GraphAgentState.retries` -> `output_retries_used`;
  `GraphAgentState.increment_retries` -> `consume_output_retry`;
  `OutputToolset._output_max_retries` removed. Error message
  `Exceeded maximum retries (N) for output validation` -> `Exceeded maximum output retries (N)`.
- Add `output_retries` kwarg to `run` / `run_sync` / `run_stream` / `run_stream_sync` / `run_stream_events` / `iter`
  with precedence `run arg > spec > agent default`. Plumbs through WrapperAgent and all three
  durable_exec wrappers (dbos/prefect/temporal). Runtime override clones the shared output toolset
  before mutating `max_retries` so concurrent runs don't race.
- Fix the Devin review comment on pydantic#4956: `ctx.max_retries` in an output validator on the tool path
  now reflects `tool.max_retries` (the per-tool enforcement limit that will actually stop the validator)
  instead of the agent-level global default. Text path is unchanged.

Also documents the two-path enforcement model in `docs/agent.md` with a new
"How output retries are enforced" subsection.

Intentionally out of scope: `tool_retries` parameter (blocked by ToolUsePolicy design in pydantic#3691),
`Agent.override()` extension (reachable via `spec=` today), and deprecating `retries`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Report that something isn't working, or PR implementing a fix size: M Medium PR (101-500 weighted lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

@agent.output_validator receives inconsistent ctx.retry depending on text vs tool output path

2 participants