fix: auto-repair and retry on orphan tool_result errors#3362
fix: auto-repair and retry on orphan tool_result errors#3362samhotchkiss wants to merge 1 commit intoopenclaw:mainfrom
Conversation
When the Anthropic API rejects a request with 'unexpected tool_use_id found in tool_result blocks', this indicates corrupted session history where a tool_result references a tool_use that doesn't exist in the previous message. This can happen due to: - Race conditions in message processing - Partial saves during crashes - History truncation that removes tool_use but keeps tool_result Changes: - Add isOrphanToolResultError() to detect this specific error pattern - Add retry logic in runEmbeddedAttempt: when this error is caught, repair the transcript using repairToolUseResultPairing() and retry once - Log repair details (orphans dropped, duplicates dropped, synthetic results added) This allows sessions to self-recover from corrupted history without requiring a manual session reset. Closes #XXXX
|
Great PR! This should help address multiple open issues: Issues this may fix/mitigate:
Related PRs working on similar problems:
Would be great to coordinate with these other PRs to ensure comprehensive coverage of the orphan tool pairing problem. |
| export function isOrphanToolResultError(raw: string): boolean { | ||
| if (!raw) return false; | ||
| const lower = raw.toLowerCase(); | ||
| return ( | ||
| lower.includes("unexpected tool_use_id") || | ||
| (lower.includes("tool_use_id") && lower.includes("tool_result")) || | ||
| /tool_result.*does not have.*corresponding.*tool_use/i.test(raw) | ||
| ); | ||
| } |
There was a problem hiding this comment.
[P1] isOrphanToolResultError is likely too broad and can trigger “repair + retry” on unrelated errors.
The (lower.includes("tool_use_id") && lower.includes("tool_result")) condition will match a lot of generic/tooling-related failures (including ones not caused by missing prior tool_use blocks), which means runEmbeddedAttempt will mutate the transcript and retry even when it won’t help. This can hide the real error and introduce hard-to-debug history changes. Consider tightening the match to the known Anthropic wording (e.g., unexpected tool_use_id found in tool_result blocks / Each tool_result block must have a corresponding tool_use block) and/or only treating it as orphan when the error explicitly mentions “corresponding tool_use block in the previous message”.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-helpers/errors.ts
Line: 502:510
Comment:
[P1] `isOrphanToolResultError` is likely too broad and can trigger “repair + retry” on unrelated errors.
The `(lower.includes("tool_use_id") && lower.includes("tool_result"))` condition will match a lot of generic/tooling-related failures (including ones not caused by missing prior `tool_use` blocks), which means `runEmbeddedAttempt` will mutate the transcript and retry even when it won’t help. This can hide the real error and introduce hard-to-debug history changes. Consider tightening the match to the known Anthropic wording (e.g., `unexpected tool_use_id found in tool_result blocks` / `Each tool_result block must have a corresponding tool_use block`) and/or only treating it as orphan when the error explicitly mentions “corresponding tool_use block in the previous message”.
How can I resolve this? If you propose a fix, please make it concise.
Additional Comments (3)
In the orphan-tool-result branch, if If that reporting mode exists for any provider/model, consider explicitly checking Prompt To Fix With AIThis is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 781:817
Comment:
[P0] `promptError` can be silently dropped after the retry path, causing the attempt to look “successful” despite a prompt failure.
In the orphan-tool-result branch, if `runPrompt()` throws and the subsequent retry also throws, you set `promptError = retryErr`. But if `runPrompt()` throws, transcript repair runs, and the retry *doesn’t* throw but still results in an error assistant message (e.g., stopReason="error" with `errorMessage`), `promptError` stays null and downstream code treats this as success (`success: !aborted && !promptError`, return object has `promptError: null`). If `activeSession.prompt()` reports failures via assistant rather than throwing in some cases, this retry logic can mask errors.
If that reporting mode exists for any provider/model, consider explicitly checking `lastAssistant?.stopReason === "error"` (or similar) after `runPrompt()` and setting `promptError` accordingly, both for the initial run and the retry.
How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix With AIThis is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 801:810
Comment:
[P2] Transcript repair logging is conditioned on referential inequality, which may hide “no-op but relevant” repairs.
`if (repaired.messages !== currentMessages)` assumes the repair function returns the same array instance when no changes are needed. If it always returns a new array (even identical) you’ll always log + replace; if it sometimes mutates in place you’ll skip replace/log even though changes occurred. Using a clearer signal from `repairToolUseResultPairing` (e.g., counts/added length) would make this more robust.
How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix With AIThis is a comment left during a code review.
Path: src/agents/pi-embedded-helpers.isorphantoolresulterror.test.ts
Line: 28:32
Comment:
[P3] The “null/undefined input” test uses forced casts that don’t reflect the function’s declared signature.
`isOrphanToolResultError` takes `raw: string`, so `null as unknown as string` compiles but doesn’t match real call sites and can mask type-level regressions. If you want to support nullish input, consider changing the signature to `raw?: string | null` and testing that directly; otherwise, this test case can be dropped.
How can I resolve this? If you propose a fix, please make it concise. |
bfc1ccb to
f92900f
Compare
Problem
When the Anthropic API rejects a request with:
This indicates corrupted session history where a
tool_resultreferences atool_usethat doesn't exist in the previous message.Root causes
tool_usebut keepstool_resultSolution
isOrphanToolResultError()- Detects this specific error patternrunEmbeddedAttempt- When error is caught:repairToolUseResultPairing()This allows sessions to self-recover from corrupted history without requiring a manual session reset.
Changes
src/agents/pi-embedded-helpers/errors.ts- AddedisOrphanToolResultError()src/agents/pi-embedded-helpers.ts- Exported new functionsrc/agents/pi-embedded-runner/run/attempt.ts- Added retry logicsrc/agents/pi-embedded-helpers.isorphantoolresulterror.test.ts- Test coverageGreptile Overview
Greptile Summary
This PR adds detection and self-healing for a specific Anthropic invalid-request failure where a
tool_resultreferences atool_use_idnot present in the previous message (corrupted transcript). It introducesisOrphanToolResultError()insrc/agents/pi-embedded-helpers/errors.ts, re-exports it viasrc/agents/pi-embedded-helpers.ts, and updatesrunEmbeddedAttempt(src/agents/pi-embedded-runner/run/attempt.ts) to repair the active session messages viarepairToolUseResultPairing()and retry the prompt once when this error is detected. A dedicated vitest file adds pattern-based coverage for the new detector.Overall this fits the existing embedded runner flow by keeping the repair localized to the prompt execution path and relying on the existing transcript repair utility instead of introducing new transcript mutation logic.
Confidence Score: 3/5
activeSession.prompt()reports errors.isOrphanToolResultErrorhas a very broad match and the retry wrapper only captures thrown errors, not error-as-message outcomes. These edge cases could lead to unnecessary transcript mutation/retries or falsely successful attempts.(3/5) Reply to the agent's comments like "Can you suggest a fix for this @greptileai?" or ask follow-up questions!