fix: output validators see global retry counter on tool output path#4956
Conversation
Closes pydantic#4385 Output validators received inconsistent ctx.retry/max_retries depending on whether the model returned text or called an output tool. Text path used the global output retry counter; tool path used per-tool counters. Propagate global retry info through for_run_step to OutputToolset, which builds a separate validator context with global values. Output functions still receive per-tool context since they are tools with per-tool retry tracking. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Replace bare `retries += 1` with `increment_retries()` to enforce output_retries limit on tool output validation path - Add test verifying UnexpectedModelBehavior when output_retries exceeded Co-Authored-By: Claude Opus 4.5 <[email protected]>
Ralph Loop: Questions Need AnswersThe Ralph Loop is blocked on the following questions. Please answer by editing push-commit. MANUAL ACTION NEEDED: Run 'git push' to push commit f342b0b. Answer 'done' when pushed.Free text answer expected.
resolve-PRRT_kwDOMMa-Ps54k634. Resolve thread PRRT_kwDOMMa-Ps54k634? (Replace bare retries+=1 with increment_retries)
resolve-PRRT_kwDOMMa-Ps54k640. Resolve thread PRRT_kwDOMMa-Ps54k640? (Add negative test case for output_retries limit)
Posted by Ralph Loop (headless mode) at 2026-04-03T04:21:19Z |
Ralph Loop: Questions Need AnswersThe Ralph Loop is blocked on the following questions. Please answer by editing push-commit. Push commit f342b0b to remote?
resolve-PRRT_kwDOMMa-Ps54k634. Resolve thread PRRT_kwDOMMa-Ps54k634? (Replace bare retries+=1 with increment_retries())
resolve-PRRT_kwDOMMa-Ps54k640. Resolve thread PRRT_kwDOMMa-Ps54k640? (Add negative test case for output_retries limit)
Posted by Ralph Loop (headless mode) at 2026-04-03T17:58:17Z |
|
Claude here: pushed a fix for the CI failures. Two issues were resolved: 1.
2. Our commit Fixed by replacing |
…se tests - OutputToolset.for_run_step: use copy() instead of replace() since @DataClass(init=False) with custom __init__ makes replace() fail (field names don't match init param names) - process_tool_calls: replace increment_retries() with bare retries += 1 to align with upstream pydantic#4687 which removed global limit checks in favor of per-tool enforcement via ToolManager._check_max_retries - Update negative test to not use per-tool override that would bypass the output_retries default - Add edge case tests: concurrent run isolation, retry counter across tool switches, validation vs execution failure retry counting Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
Claude here: pushed Changes in this commit: Fixes:
New edge case tests (3):
All 262 tests pass, lint clean. |
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…t-validator-retry-consistency # Conflicts: # tests/test_agent.py
| assert validator_retries == [0, 1, 2] | ||
| # max_retries reflects the agent-level default (0) since output_retries not set | ||
| assert validator_max_retries == [0, 0, 0] |
There was a problem hiding this comment.
🚩 last_attempt property becomes misleading for output validators when per-tool max_retries differs from output_retries
The RunContext.last_attempt property (_run_context.py:108-110) is self.retry == self.max_retries. With this PR, output validators see max_retries = ctx.deps.max_result_retries (the global output_retries value), while actual retry limits are per-tool. In test_output_validator_retry_counter_with_tool_switch, retries=0 and per-tool limits are 3/1. The validator sees max_retries=0 but runs 3 times, so last_attempt is True on the first call (0==0) and False on subsequent calls — the exact opposite of reality. This is a pre-existing semantic issue with last_attempt not designed for the two-tier retry system, but the PR makes it more visible by explicitly setting max_retries to the global value.
Was this helpful? React with 👍 or 👎 to provide feedback.
… output-retry fix The `ctx = replace(ctx, max_retries=self.default_max_retries)` line in `ToolManager.for_run_step` was overriding the `max_result_retries` value set by `_agent_graph.py` before each `for_run_step` call. This caused output validators to see `max_retries=1` (the tool default) instead of the correct `output_retries` value when `output_retries != retries`. Upstream PR pydantic#4956 already propagates `ctx.max_retries=max_result_retries` at all three `for_run_step` call sites in `_agent_graph.py`, so the override in `ToolManager` is now both redundant and harmful on the output path. Also update `build_run_context` in tests to accept a `max_retries` param so unit tests can simulate realistic agent contexts (where `_agent_graph.py` sets `ctx.max_retries` before calling `for_run_step`). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…ctx.max_retries` on tool path Preparatory refactor of the output-retry machinery with three independently-motivated but tightly-coupled changes: - Rename confusing private/internal retry fields so the mental model is obvious from code: `Agent._max_result_retries` -> `_max_output_retries`; `GraphAgentDeps.max_result_retries` -> `max_output_retries`; `GraphAgentState.retries` -> `output_retries_used`; `GraphAgentState.increment_retries` -> `consume_output_retry`; `OutputToolset._output_max_retries` removed. Error message `Exceeded maximum retries (N) for output validation` -> `Exceeded maximum output retries (N)`. - Add `output_retries` kwarg to `run` / `run_sync` / `run_stream` / `run_stream_sync` / `run_stream_events` / `iter` with precedence `run arg > spec > agent default`. Plumbs through WrapperAgent and all three durable_exec wrappers (dbos/prefect/temporal). Runtime override clones the shared output toolset before mutating `max_retries` so concurrent runs don't race. - Fix the Devin review comment on pydantic#4956: `ctx.max_retries` in an output validator on the tool path now reflects `tool.max_retries` (the per-tool enforcement limit that will actually stop the validator) instead of the agent-level global default. Text path is unchanged. Also documents the two-path enforcement model in `docs/agent.md` with a new "How output retries are enforced" subsection. Intentionally out of scope: `tool_retries` parameter (blocked by ToolUsePolicy design in pydantic#3691), `Agent.override()` extension (reachable via `spec=` today), and deprecating `retries`.
Closes #4385
Summary
@agent.output_validatorreceived inconsistentctx.retryandctx.max_retriesdepending on whether the model returned text (global counter) or called an output tool (per-tool counter)for_run_steptoOutputToolset, which stores it and builds a separate validator context with global valuesctx.state.retriesin sync when output tools fail validation or raiseModelRetrySplit from #4325 per maintainer request. Also supersedes #4527.
Test plan
test_output_validator_retry_consistency_across_paths— usesToolOutput(max_retries=5)withoutput_retries=2to verify validators seemax_retries=2(global) not5(per-tool)test_agent.pypass🤖 Generated with Claude Code