Skip to content

fix: detect tool_use/tool_result mismatch and suggest /new instead of looping (closes #44473)#46365

Closed
Br1an67 wants to merge 1 commit intoopenclaw:mainfrom
Br1an67:fix/44473
Closed

fix: detect tool_use/tool_result mismatch and suggest /new instead of looping (closes #44473)#46365
Br1an67 wants to merge 1 commit intoopenclaw:mainfrom
Br1an67:fix/44473

Conversation

@Br1an67
Copy link
Copy Markdown
Contributor

@Br1an67 Br1an67 commented Mar 14, 2026

Summary

  • Problem: tool_use_id mismatch causes infinite error loop with no recovery
  • What changed: Mismatch errors now detected and user prompted to /new
  • What did NOT change: Other error handling, transcript repair logic

Change Type

  • Bug fix

Linked Issue/PR

Security Impact

All No

Evidence

  • pnpm build + pnpm check + pnpm test all passing

Compatibility / Migration

  • Backward compatible? Yes

Failure Recovery

  • How to revert: revert this commit

This PR was AI-assisted (fully tested with pnpm build/check/test).

@openclaw-barnacle openclaw-barnacle bot added agents Agent runtime and tooling size: XS labels Mar 14, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 14, 2026

Greptile Summary

This PR fixes an infinite-error-loop bug where a tool_use_id/tool_result mismatch from the Anthropic API had no recovery path; the agent would keep retrying and looping on the same error. The fix detects these mismatch errors in both formatAssistantErrorText and sanitizeUserFacingText and surfaces a user-friendly "Conversation history corruption detected — please use /new" message instead of letting the loop continue. The change is minimal and well-scoped.

Key observations:

  • The overall approach is sound and correctly guarded: the new check in sanitizeUserFacingText only activates under errorContext: true, preventing false rewrites of legitimate assistant text.
  • The bare tool_use_id alternative in TOOL_USE_RESULT_MISMATCH_RE is overly broad — it matches any error mentioning tool_use_id, including unrelated API field-validation errors (e.g. "tool_use_id is required" or "tool_use_id must be a string"), which would mislead the user into starting a /new session when the real cause is a programming/format error rather than history corruption. The other two alternatives (tool_result.*corresponding.*tool_use, unexpected.*tool.*block) are more targeted.
  • Tests cover the two expected Anthropic API error message patterns and confirm the user-facing copy.

Confidence Score: 3/5

  • Safe to merge after tightening the regex; the current tool_use_id bare alternative can misclassify unrelated validation errors.
  • The feature intent and wiring are correct, and the errorContext guard prevents false positives on normal assistant text. However, the bare tool_use_id alternative in TOOL_USE_RESULT_MISMATCH_RE could incorrectly classify API field-validation errors as conversation history corruption, giving users misleading guidance. This is a logic issue in the regex definition that is worth fixing before merge.
  • src/agents/pi-embedded-helpers/errors.ts — specifically the TOOL_USE_RESULT_MISMATCH_RE constant at line 838.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-helpers/errors.ts
Line: 838-839

Comment:
**`tool_use_id` alternative is too broad**

The first alternative `tool_use_id` in the regex matches **any** error message that contains the substring `tool_use_id`, regardless of context. This would misclassify unrelated API validation errors as conversation-history corruption and direct the user to `/new` when that isn't the right fix.

For example, error messages like:
- `"tool_use_id is required"` (client sent a `tool_result` with no ID — a programming error)
- `"tool_use_id must be a string"` (type-validation failure)
- `"messages.0.content.1.tool_result.tool_use_id: field cannot be empty"` (empty-field validation)

…would all match and produce a misleading "Conversation history corruption detected" response.

The two other alternatives (`tool_result.*corresponding.*tool_use` and `unexpected.*tool.*block`) are already broad enough to capture the "mismatch" error variants shown in the tests. A tighter first alternative that targets the actual Anthropic API wording would be safer:

```suggestion
const TOOL_USE_RESULT_MISMATCH_RE =
  /tool_result.*\btool_use_id\b|tool_result.*corresponding.*tool_use|unexpected.*tool_result.*block/i;
```

This keeps `tool_use_id` anchored to `tool_result` context (the two confirmed API messages in the tests both start with "tool_result block with tool_use_id …"), avoiding false matches on validation errors about the field itself.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: c271704

Comment on lines +838 to +839
const TOOL_USE_RESULT_MISMATCH_RE =
/tool_use_id|tool_result.*corresponding.*tool_use|unexpected.*tool.*block/i;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tool_use_id alternative is too broad

The first alternative tool_use_id in the regex matches any error message that contains the substring tool_use_id, regardless of context. This would misclassify unrelated API validation errors as conversation-history corruption and direct the user to /new when that isn't the right fix.

For example, error messages like:

  • "tool_use_id is required" (client sent a tool_result with no ID — a programming error)
  • "tool_use_id must be a string" (type-validation failure)
  • "messages.0.content.1.tool_result.tool_use_id: field cannot be empty" (empty-field validation)

…would all match and produce a misleading "Conversation history corruption detected" response.

The two other alternatives (tool_result.*corresponding.*tool_use and unexpected.*tool.*block) are already broad enough to capture the "mismatch" error variants shown in the tests. A tighter first alternative that targets the actual Anthropic API wording would be safer:

Suggested change
const TOOL_USE_RESULT_MISMATCH_RE =
/tool_use_id|tool_result.*corresponding.*tool_use|unexpected.*tool.*block/i;
const TOOL_USE_RESULT_MISMATCH_RE =
/tool_result.*\btool_use_id\b|tool_result.*corresponding.*tool_use|unexpected.*tool_result.*block/i;

This keeps tool_use_id anchored to tool_result context (the two confirmed API messages in the tests both start with "tool_result block with tool_use_id …"), avoiding false matches on validation errors about the field itself.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-helpers/errors.ts
Line: 838-839

Comment:
**`tool_use_id` alternative is too broad**

The first alternative `tool_use_id` in the regex matches **any** error message that contains the substring `tool_use_id`, regardless of context. This would misclassify unrelated API validation errors as conversation-history corruption and direct the user to `/new` when that isn't the right fix.

For example, error messages like:
- `"tool_use_id is required"` (client sent a `tool_result` with no ID — a programming error)
- `"tool_use_id must be a string"` (type-validation failure)
- `"messages.0.content.1.tool_result.tool_use_id: field cannot be empty"` (empty-field validation)

…would all match and produce a misleading "Conversation history corruption detected" response.

The two other alternatives (`tool_result.*corresponding.*tool_use` and `unexpected.*tool.*block`) are already broad enough to capture the "mismatch" error variants shown in the tests. A tighter first alternative that targets the actual Anthropic API wording would be safer:

```suggestion
const TOOL_USE_RESULT_MISMATCH_RE =
  /tool_result.*\btool_use_id\b|tool_result.*corresponding.*tool_use|unexpected.*tool_result.*block/i;
```

This keeps `tool_use_id` anchored to `tool_result` context (the two confirmed API messages in the tests both start with "tool_result block with tool_use_id …"), avoiding false matches on validation errors about the field itself.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2717047ab

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

/messages\.\d+\.content\.\d+\.tool_(?:use|call)\.(?:input|arguments)/i;

const TOOL_USE_RESULT_MISMATCH_RE =
/tool_use_id|tool_result.*corresponding.*tool_use|unexpected.*tool.*block/i;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict mismatch regex to mismatch-specific tool_result errors

The new TOOL_USE_RESULT_MISMATCH_RE treats any occurrence of tool_use_id as a turn-pairing corruption, and both formatAssistantErrorText and sanitizeUserFacingText check this before the normal invalid-request/raw-error formatting paths. In practice, non-mismatch errors such as schema validation (...tool_result.tool_use_id: Field required) will now be rewritten to “Conversation history corruption... /new”, which hides the real failure cause and points users to an ineffective recovery step. This should only match explicit mismatch wording (for example not found, no corresponding tool_use, immediately preceding assistant message) instead of the bare token.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84c887630c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

/messages\.\d+\.content\.\d+\.tool_(?:use|call)\.(?:input|arguments)/i;

const TOOL_USE_RESULT_MISMATCH_RE =
/tool_use_id\b.*\b(?:not found|does not match|no (?:matching|corresponding)|mismatch)|tool_result.*corresponding.*tool_use|unexpected.*tool.*block/i;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exclude schema-validation errors from mismatch detection

The mismatch regex now treats tool_use_id ... does not match ... as conversation-history corruption, but that phrase also appears in ordinary schema-validation failures (for example an invalid tool_result.tool_use_id format). Because formatAssistantErrorText/sanitizeUserFacingText run this check before the invalid_request_error formatting path, these requests will be rewritten to the /new recovery hint instead of surfacing the actionable validation message, which can send users into ineffective reset loops.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fe8bed0a0d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +838 to +839
const TOOL_USE_RESULT_MISMATCH_RE =
/tool_use_id\b.*\b(?:not found|does not match|no (?:matching|corresponding)|mismatch)|tool_result.*corresponding.*tool_use|unexpected.*tool.*block/i;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict mismatch matcher to pairing-specific error text

The new matcher in isToolUseResultMismatchError is broad enough to catch ordinary validation failures, because any error containing tool_use_id ... does not match now gets rewritten to the /new corruption hint before the existing invalid_request_error formatting path runs. In formatAssistantErrorText and sanitizeUserFacingText, this means schema-style messages (for example, invalid tool_result.tool_use_id format/pattern errors) lose their actionable detail and users are sent into ineffective session resets instead of seeing what input is invalid.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 878bd610b5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

/messages\.\d+\.content\.\d+\.tool_(?:use|call)\.(?:input|arguments)/i;

const TOOL_USE_RESULT_MISMATCH_RE =
/tool_use_id\s+"[^"]+"\s*(?:not found|does not match)|tool_result.*corresponding.*tool_use|unexpected\s+tool_(?:result|use)\s+block/i;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict mismatch matcher to true pairing errors

The new TOOL_USE_RESULT_MISMATCH_RE now treats any tool_use_id "..." does not match text as transcript corruption, but that wording is also used by ordinary schema/format validation failures (for example malformed tool_result.tool_use_id pattern errors). Because this check runs before the existing invalid-request/raw formatting paths, those validation errors are rewritten to the /new reset hint and lose the actionable cause, which can send users into ineffective reset loops instead of fixing the request payload.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 854109f97c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +71 to +74
id === defaultSlotIdForKey("contextEngine") &&
normalizedOwner !== CORE_CONTEXT_ENGINE_OWNER
) {
return { ok: false, existingOwner: CORE_CONTEXT_ENGINE_OWNER };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Allow built-in default context engine registration

This guard blocks any non-core owner from registering the default context engine id, but the built-in legacy engine still registers via registerContextEngine(...), which hardcodes owner public-sdk in this commit. As a result, registerLegacyContextEngine() now silently fails and the default resolveContextEngine() path throws "legacy" is not registered for normal startup/config flows unless users manually select a different engine.

Useful? React with 👍 / 👎.

@Br1an67
Copy link
Copy Markdown
Contributor Author

Br1an67 commented Mar 17, 2026

Closing to manage active PR count. Will reopen when slot is available.

@Br1an67 Br1an67 closed this Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix: tool_use_id mismatch error causes infinite loop instead of recovery

1 participant