Skip to content

fix(output): respect ToolOutput.max_retries parameter#4687

Merged
DouweM merged 9 commits intopydantic:mainfrom
JasonCZMeng:fix/issue-4678
Mar 31, 2026
Merged

fix(output): respect ToolOutput.max_retries parameter#4687
DouweM merged 9 commits intopydantic:mainfrom
JasonCZMeng:fix/issue-4678

Conversation

@JasonCZMeng
Copy link
Copy Markdown

@JasonCZMeng JasonCZMeng commented Mar 16, 2026

Summary

ToolOutput(Foo, max_retries=3) accepts a max_retries parameter, but the value was silently ignored:

  1. OutputToolset.build() never extracted it from ToolOutput instances
  2. The Agent unconditionally overwrote OutputToolset.max_retries with its own retries/output_retries
  3. The graph-level increment_retries check used the agent's retry limit instead of the per-tool limit, blocking ToolOutput(max_retries=N) from working when N > Agent(retries)

Changes

  • Extract max_retries from ToolOutput in OutputToolset.build(), taking max() across multiple ToolOutputs
  • Make OutputToolset.max_retries an int | None (default None) so the Agent only applies its default when no ToolOutput specified a value
  • Pass per-tool max_retries to increment_retries for output tool retries, matching how Tool(retries=N) works for function tools

ToolOutput(max_retries=N) now genuinely allows N retries even when Agent(retries) is lower, treating Agent(retries)/Agent(output_retries) as a default for output tools and non-tool output.

Fixes #4678

Test plan

  • test_tool_output_max_retries_respected — tests OutputToolset.build() extraction: explicit value, default (None), and multiple ToolOutputs (max wins)
  • test_tool_output_max_retries_overrides_agent_retries — behavioral test: ToolOutput(max_retries=5) with Agent(retries=2) allows all 5 retries and ctx.max_retries reflects the ToolOutput value
  • All existing retry, streaming, toolset, and UI tests pass

🤖 Generated with Claude Code

@github-actions github-actions Bot added the size: S Small PR (≤100 weighted lines) label Mar 16, 2026
@github-actions github-actions Bot closed this Mar 16, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for your interest in this issue! However, there are already open PRs addressing issue #4678: #4685.

To avoid duplicate efforts, this PR has been closed. If you'd like to contribute, you can review the existing PRs or share your thoughts on issue #4678.

If you believe the existing PRs are inactive, please comment on the issue and a maintainer can reassess.

@github-actions github-actions Bot added the bug Report that something isn't working, or PR implementing a fix label Mar 16, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

Comment thread tests/test_agent.py Outdated
assert max_retries_log == [target_retries] * (target_retries + 1)


def test_tool_output_max_retries_respected():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should be removed. It imports from the private pydantic_ai._output module and directly inspects toolset.max_retries — both of which violate the testing guidelines around testing through public APIs and not accessing private internals.

The second test (test_tool_output_max_retries_overrides_agent_retries) already covers the same behavior through the public API by verifying that ctx.max_retries reflects the ToolOutput value at runtime, which is the meaningful assertion.

Comment thread tests/test_agent.py Outdated

def test_tool_output_max_retries_overrides_agent_retries():
"""ToolOutput.max_retries takes priority over Agent retries. Regression test for #4678."""
from pydantic_ai.output import ToolOutput
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToolOutput is already imported at line 70 — remove this inline import. Imports should be at the top of the file.

Comment thread tests/test_agent.py Outdated

result = agent.run_sync('Hello')
assert result.output == 'Weather in Mexico City'
assert all(m == target_retries for m in max_retries_seen)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the pattern of the existing retry tests above (e.g. test_tool_output_function_retries), this test should also track and assert retries_log (the ctx.retry values) alongside max_retries_seen, and should also assert on result.all_messages() using snapshot() to validate the complete execution trace.

devin-ai-integration[bot]

This comment was marked as resolved.


self._output_toolset = self._output_schema.toolset
if self._output_toolset:
if self._output_toolset and self._output_toolset.max_retries is None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DouweM This change makes ToolOutput.max_retries control the per-tool retry limit (via ToolsetTool.max_retries), but the global output retry counter in _agent_graph.py still uses GraphAgentDeps.max_result_retries (always from the agent's _max_result_retries). This means:

  • ToolOutput(Foo, max_retries=7) with Agent(retries=1) → per-tool limit is 7, but the global increment_retries check fires at retry 1, so the effective limit is still 1.
  • ToolOutput(Foo, max_retries=2) with Agent(retries=10) → per-tool limit fires at 2, global at 10, so effective limit is 2.

Before this PR both values were always the same (agent set both). Now they can diverge, but only the "lower wins" direction actually works. A user who sets ToolOutput(Foo, max_retries=5) without also bumping the agent's retries/output_retries to at least 5 will still get the agent default of 1.

Is this the intended semantic, or should ToolOutput.max_retries also flow into (or override) GraphAgentDeps.max_result_retries?


# Use the per-tool max_retries so ToolOutput(max_retries=N) is respected,
# falling back to the agent-level max_result_retries for unknown tools.
max_retries = validated.tool.max_retries if validated.tool else ctx.deps.max_result_retries
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern was applied here and in the ToolRetryError handler below (line 1027), but the two UnexpectedModelBehavior handlers in this same function (lines 982 and 1019) still use ctx.deps.max_result_retries directly. The one at line 1019 has validated in scope and should use the same pattern for consistency. The one at line 982 fires before validated exists, but could look up the tool via tool_manager.get_tool_def(call.tool_name).

This means the fix only works for validation errors and ModelRetry/ToolRetryError, but not for UnexpectedModelBehavior errors on output tools — those will still be capped by the agent-level retries.

That said, as flagged in the other comment, the deeper question is whether this piecemeal approach is the right fix, or whether ToolOutput.max_retries should also flow into GraphAgentDeps.max_result_retries. @DouweM

Comment thread pydantic_ai_slim/pydantic_ai/_output.py Outdated
toolset=self,
tool_def=tool_def,
max_retries=self.max_retries,
max_retries=self.max_retries if self.max_retries is not None else 1,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded fallback else 1 here means that even when OutputToolset.max_retries is None (no ToolOutput.max_retries was set), the ToolsetTool gets max_retries=1 before the agent has a chance to set OutputToolset.max_retries to the agent's default. This works because get_tools() is called after the agent sets OutputToolset.max_retries in __init__, but it's fragile — the else 1 is a dead code path since the agent always fills in None before this method runs. Consider removing the fallback and making the type int (not int | None) by the time get_tools() is called, to make the invariant explicit.

Comment thread pydantic_ai_slim/pydantic_ai/_output.py Outdated
description = output.description
strict = output.strict
if output.max_retries is not None:
max_retries = max(max_retries or 0, output.max_retries)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max(max_retries or 0, output.max_retries) treats max_retries=0 (if a previous ToolOutput set it explicitly) the same as None. This is probably fine since max_retries=0 doesn't make practical sense, but the idiomatic way to handle this for int | None is: max_retries = max(max_retries, output.max_retries) if max_retries is not None else output.max_retries — which is actually what the commenter on the issue suggested.

devin-ai-integration[bot]

This comment was marked as resolved.

@github-actions github-actions Bot added size: M Medium PR (101-500 weighted lines) and removed auto-review size: S Small PR (≤100 weighted lines) labels Mar 17, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

Comment thread tests/test_agent.py Outdated
assert max_retries_log == [target_retries] * (target_retries + 1)


def test_tool_output_max_retries_respected():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should be removed or rewritten. It imports from pydantic_ai._output (a private module) and directly inspects OutputToolset.build() internals like toolset.max_retries — both of which violate the testing guideline of testing through public APIs, not private methods or helpers.

The behavioral test below (test_tool_output_max_retries_overrides_agent_retries) already covers the important case: that ToolOutput(max_retries=N) actually results in N retries at runtime. That's what matters. The extraction logic in build() is an implementation detail.

If you want to keep coverage of the "max across multiple ToolOutputs" case, add a behavioral test with output_type=[ToolOutput(Foo, max_retries=2), ToolOutput(Bar, max_retries=5)] and verify that 5 retries are actually allowed at runtime.

Comment thread pydantic_ai_slim/pydantic_ai/_output.py Outdated
description = output.description
strict = output.strict
if output.max_retries is not None:
max_retries = max(max_retries or 0, output.max_retries)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max(max_retries or 0, output.max_retries) conflates max_retries=0 with None. While max_retries=0 isn't practically useful, the idiomatic int | None pattern is:

max_retries = max(max_retries, output.max_retries) if max_retries is not None else output.max_retries

(This was also suggested in the issue comment.)

Comment thread pydantic_ai_slim/pydantic_ai/_output.py Outdated
toolset=self,
tool_def=tool_def,
max_retries=self.max_retries,
max_retries=self.max_retries if self.max_retries is not None else 1,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded else 1 fallback means that if OutputToolset.max_retries is still None when get_tools() is called, each ToolsetTool gets max_retries=1 regardless of what the agent's retries/output_retries was set to.

In practice, the agent sets output_toolset.max_retries = self._max_result_retries before get_tools() is called, so this fallback shouldn't be hit in normal usage. But if it ever is, 1 is the wrong default — it should match whatever the agent would have set.

Consider making the fallback more robust by asserting that max_retries is not None at this point (since the agent is supposed to have filled it in), or at least documenting why 1 is acceptable here.

ctx.deps.max_result_retries, error=e, model_settings=ctx.deps.model_settings
)
tool = tool_manager.tools.get(call.tool_name) if tool_manager.tools else None
max_retries = tool.max_retries if tool else ctx.deps.max_result_retries
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern max_retries = (...).tool.max_retries if (...).tool else ctx.deps.max_result_retries is repeated four times in this function (lines 983, 998, 1019, 1026). Per the repo's code style guidelines, duplicated logic should be consolidated into a helper. Consider extracting a small helper like:

def _get_max_retries(tool: ToolsetTool[DepsT] | None, ctx: GraphRunContext[...]) -> int:
    return tool.max_retries if tool else ctx.deps.max_result_retries

Or at minimum, since two of the four sites have validated.tool and two have a tool looked up from tool_manager.tools, unify the lookup approach — the first handler (line 982) fetches the tool via tool_manager.tools.get(call.tool_name) while the other three use validated.tool. Using validated.tool consistently (where validated is available) would be cleaner.

Comment thread tests/test_agent.py Outdated

result = agent.run_sync('Hello')
assert result.output == 'Weather in Mexico City'
assert all(m == target_retries for m in max_retries_seen)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As flagged in an earlier review, this test should also track and assert ctx.retry values (a retries_log list), matching the pattern of the existing test_tool_output_function_retries test right above. This verifies that the retry counter is incrementing correctly on each attempt, not just that max_retries is visible.

Output tools now rely solely on the per-tool _check_max_retries in
ToolManager (same as function tools), removing the redundant global
increment_retries calls. This eliminates the shared counter issue
where failures on one output tool could interfere with another's
retry budget.

The increment_retries mechanism is preserved for non-tool output
retries: empty responses, text output validation, and unknown tools.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
yield event
continue
ctx.state.increment_retries(ctx.deps.max_result_retries, error=e)
raise # pragma: lax no cover
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DouweM This is the key change in this PR that still needs your input: all four increment_retries calls have been removed from the output tool retry paths in process_tool_calls. The retry limiting for output tools is now handled entirely by ToolManager._check_max_retries, which uses per-tool retry tracking via RunContext.retries[tool_name] — the same mechanism that function tools already use.

This is a sound approach and makes output tools consistent with function tools, but it's a significant behavioral change beyond the original scope of the bug (which was just that ToolOutput.max_retries was silently ignored):

  1. The global GraphAgentState.retries counter is no longer incremented for output tool retries — it now only tracks text output retries (via _build_retry_node) and unknown tool calls.
  2. The error message changes from "Exceeded maximum retries (N) for output validation" to "Tool 'final_result' exceeded max retries count of N".
  3. For the UnexpectedModelBehavior cases (lines 1294 and 1323), the old code would call increment_retries before re-raising, which could replace the original error with a max-retries error if the counter was already at the limit. Now the original UnexpectedModelBehavior always propagates directly.

The earlier auto-review comments on this topic (from the previous code revision) are now outdated, so I'm re-raising the question here on the current code.

Comment thread tests/test_agent.py
result = agent.run_sync('Hello')
assert result.output == 'Weather in Mexico City'
assert retries_log == [0, 1, 2, 3, 4, 5]
assert max_retries_log == [target_retries] * (target_retries + 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test covers the single-output-tool case well — good use of retries_log, max_retries_log, and the all_messages() snapshot.

However, as flagged in earlier reviews, there should also be a test with multiple output tools that have different max_retries values, e.g. output_type=[ToolOutput(A, max_retries=5), ToolOutput(B, max_retries=2)]. This would:

  1. Verify that each tool's limit is respected independently (the whole point of the _max_retries_overrides dict).
  2. Confirm that a failure on tool A doesn't exhaust tool B's retry budget (since the per-tool tracking in ToolManager is per tool name).
  3. Exercise the code path in get_tools() where _max_retries_overrides.get(tool_def.name, self.max_retries) returns different values for different tools.

max_retries: int
max_retries: int | None
"""Default max retries for output tools, set by the Agent. Per-tool overrides from `ToolOutput.max_retries` take priority."""
_max_retries_overrides: dict[str, int]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _max_retries_overrides dict creates a parallel data structure that maps tool names to their max_retries overrides, separate from the existing processors dict that also maps tool names to their processing info. Consider whether this could be simplified by storing the max_retries override on the ObjectOutputProcessor or alongside the ToolDefinition instead, to avoid needing to keep two name-keyed dicts in sync.

That said, this is a minor concern — the current approach works and the sync between the two dicts is straightforward since they're both built in the same loop in build().

DouweM and others added 2 commits March 30, 2026 18:54
The UnexpectedModelBehavior cause-unwrapping in increment_retries is
no longer reachable since output tool retries no longer go through
this path.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Verifies that multiple ToolOutputs with different max_retries values
are tracked independently — output_a(max_retries=5) succeeds after
5 retries while output_b(max_retries=2) has a different limit, and
Agent(retries=1) doesn't interfere.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
yield event
continue
ctx.state.increment_retries(ctx.deps.max_result_retries, error=e)
raise # pragma: lax no cover
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DouweM This PR removes all four increment_retries calls from the output tool retry paths in process_tool_calls, making output tool retries entirely governed by the per-tool mechanism in ToolManager._check_max_retries (which uses ToolsetTool.max_retries and RunContext.retries).

This is a significant behavioral change beyond just "respecting ToolOutput.max_retries":

  1. The global retry counter (GraphAgentState.retries) is no longer incremented for output tool failures. This means the "Exceeded maximum retries (N) for output validation" error from increment_retries will never be raised for output tool retries — the error is now always "Tool 'X' exceeded max retries count of N" from ToolManager._check_max_retries. The test changes in test_model_test.py confirm this change in error messaging.

  2. GraphAgentDeps.max_result_retries no longer acts as a cap on output tool retries in process_tool_calls — only the per-tool ToolsetTool.max_retries matters now.

  3. Text output retries still use the global counter (via _build_retry_node at line 883 and _handle_text_response at line 1146), creating an inconsistency in how retries are tracked between text output and tool output.

The original issue (#4678) was that ToolOutput.max_retries was silently ignored. A more conservative fix would have been to pass the per-tool max_retries to increment_retries (as was done in earlier iterations of this PR) rather than removing the calls entirely. Could you weigh in on which approach is correct?

if (
isinstance(error, exceptions.UnexpectedModelBehavior) and error.__cause__ is not None
): # pragma: no cover
error = error.__cause__ # pragma: no cover
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding # pragma: no cover to work around reduced coverage violates the repo's testing guidelines — the preferred approach is to write tests that exercise this branch, or to remove the branch if it's truly unreachable. If removing the increment_retries calls from the output tool paths makes this branch unreachable, that's worth understanding and documenting rather than suppressing with pragmas.

Comment thread pydantic_ai_slim/pydantic_ai/_output.py Outdated

async def get_tools(self, ctx: RunContext[AgentDepsT]) -> dict[str, ToolsetTool[AgentDepsT]]:
# max_retries should always be set by the Agent before get_tools() is called
assert self.max_retries is not None, 'OutputToolset.max_retries must be set before get_tools() is called'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert will crash at runtime if max_retries hasn't been set by the Agent. Since max_retries starts as None and is only set in Agent.__init__, this creates a fragile coupling where OutputToolset can't be used independently (e.g. in tests or by advanced users building custom graphs). Per the repo guidelines, prefer a default over an assertion for this kind of invariant — for example, falling back to self.max_retries or 1 in the get() call on line 978 would be more resilient.

Comment thread tests/test_agent.py
assert result.output == 'A: hello'
assert retries_log == [0, 1, 2, 3, 4, 5]
assert max_retries_log == [5] * 6

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is a good addition for the multi-tool case, but it should include an assert result.all_messages() == snapshot(...) assertion to match the pattern established by test_tool_output_max_retries_overrides_agent_retries above and the existing test_tool_output_function_retries test. The all_messages() snapshot validates the complete execution trace (tool calls, retry prompts, etc.), not just the final output — this is important for catching regressions in the retry flow.

max_retries: int
max_retries: int | None
"""Default max retries for output tools, set by the Agent. Per-tool overrides from `ToolOutput.max_retries` take priority."""
_max_retries_overrides: dict[str, int]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _max_retries_overrides dict duplicates the key space of processors — both are keyed by tool name and built in the same loop. Consider storing the max_retries override on the ObjectOutputProcessor (or alongside it as a tuple) rather than maintaining a parallel dict. This would keep per-tool data co-located and avoid needing to keep two name-keyed structures in sync.

That said, this is a minor concern compared to the higher-level design question about removing increment_retries from the output tool paths.

Comment thread tests/test_agent.py Outdated
raise ModelRetry(f'Retry A {ctx.retry}')
return f'A: {value}'

def output_b(ctx: RunContext[None], value: str) -> str: # pragma: no cover
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# pragma: no cover on output_b — if this function is truly never called (the model always picks output_a), this test doesn't actually exercise the per-tool isolation. A stronger test would have the model call output_b at some point and verify it hits its retry limit (2) independently of output_a's limit (5). As written, this test only demonstrates that output_a gets its override; it doesn't verify that output_b gets a different override.

- Remove dead UnexpectedModelBehavior cause-unwrapping branch from
  increment_retries (no longer reachable after removing output tool
  increment_retries calls)
- Replace assert with fallback in OutputToolset.get_tools()
- Add multi-output test verifying per-tool max_retries isolation
  with snapshot assertion
- Fix stale snapshots for exhaustive strategy tests

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 7 additional findings in Devin Review.

Open in Devin Review

Comment thread pydantic_ai_slim/pydantic_ai/_output.py Outdated
Comment on lines 1315 to 1329
# Validation passed - execute the tool
try:
result_data = await tool_manager.execute_tool_call(validated)
except exceptions.UnexpectedModelBehavior as e:
except exceptions.UnexpectedModelBehavior:
if final_result:
for event in _emit_skipped_output_tool(
call, 'Output tool not used - output function execution failed.', output_parts, args_valid=True
):
yield event
continue
ctx.state.increment_retries(ctx.deps.max_result_retries, error=e)
raise # pragma: lax no cover
except ToolRetryError as e:
# If we already have a valid final result, don't increment retries for invalid output tools
# This allows the run to succeed if at least one output tool returned a valid result
if not final_result:
ctx.state.increment_retries(ctx.deps.max_result_retries, error=e)
yield _messages.FunctionToolCallEvent(call, args_valid=True)
output_parts.append(e.tool_retry)
yield _messages.FunctionToolResultEvent(e.tool_retry)
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Behavioral change: output tool retries no longer affect the global retry counter

This PR removes all ctx.state.increment_retries() calls for output tools in process_tool_calls (_agent_graph.py:1289-1333). Previously, output tool failures incremented BOTH the per-tool counter (via ToolManager._check_max_retries) AND the global counter (GraphAgentState.retries). Now only per-tool tracking applies.

This means:

  • Old: 3 output tools each failing once → global counter reaches 3, could trigger max_result_retries
  • New: each tool's failures tracked independently; global counter unaffected

The global counter (ctx.state.retries) is still incremented for empty responses (line 985), text response retries (line 1057), model-level retries (line 876), and unknown tools (line 1359). This is an intentional semantic change confirmed by the updated tests, but worth noting for reviewers that the max_result_retries limit no longer constrains output tool retries — only ToolOutput.max_retries (or the agent-level default) does.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

DouweM and others added 3 commits March 30, 2026 20:53
- Fix `self.max_retries or 1` treating 0 as None (use explicit None
  check)
- Add tests for exhaustive strategy skip paths when output tool
  exceeds per-tool retry limit while another tool succeeded
- Fix stale snapshots for exhaustive strategy tests

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Avoid if/elif loop that creates an untaken branch — use dict
comprehension and next() lookup instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Wrap UnexpectedModelBehavior from _check_max_retries with the original
"Exceeded maximum retries (N) for output validation" message so output
tool retry errors are distinguishable from function tool errors.

Add docstring to ToolOutput.max_retries explaining its relationship
to Agent retries/output_retries.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@DouweM DouweM changed the title fix(output): respect ToolOutput.max_retries parameter fix(output): respect ToolOutput.max_retries parameter Mar 30, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

Extract check_incomplete_tool_call() from increment_retries and call
it at both output tool re-raise points, so truncated model responses
(finish_reason='length') still produce the actionable IncompleteToolCall
error suggesting users increase max_tokens.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@DouweM DouweM merged commit f28abf4 into pydantic:main Mar 31, 2026
39 checks passed
antznette1 pushed a commit to antznette1/pydantic-ai that referenced this pull request Apr 1, 2026
Co-authored-by: Douwe Maan <[email protected]>
Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
dsfaccini added a commit to dsfaccini/pydantic-ai that referenced this pull request Apr 10, 2026
…se tests

- OutputToolset.for_run_step: use copy() instead of replace() since
  @DataClass(init=False) with custom __init__ makes replace() fail
  (field names don't match init param names)
- process_tool_calls: replace increment_retries() with bare retries += 1
  to align with upstream pydantic#4687 which removed global limit checks in
  favor of per-tool enforcement via ToolManager._check_max_retries
- Update negative test to not use per-tool override that would bypass
  the output_retries default
- Add edge case tests: concurrent run isolation, retry counter across
  tool switches, validation vs execution failure retry counting

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
dsfaccini pushed a commit to dsfaccini/pydantic-ai that referenced this pull request Apr 21, 2026
Co-authored-by: Douwe Maan <[email protected]>
Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Report that something isn't working, or PR implementing a fix size: M Medium PR (101-500 weighted lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ToolOutput max_retries is unused.

2 participants