fix(core): defer self-reflection to prevent orphaned tool_use blocks#2205
Merged
fix(core): defer self-reflection to prevent orphaned tool_use blocks#2205
Conversation
…2197) Permanent tool errors (e.g. HTTP 403 from web-scrape) caused OpenAI HTTP 400 "An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'" because attempt_self_reflection was called inside the per-tool result loop. This inserted User{reflection_prompt} + Assistant{response} messages between the Assistant{ToolUse} block and the User{ToolResults} block, breaking the strict message ordering required by OpenAI and Claude APIs. Fix: defer attempt_self_reflection until after all ToolResult parts are assembled and the User{ToolResults} message is pushed to history. A single `pending_reflection: Option<String>` accumulates the first eligible error; reflection fires after push_message(user_msg). Whether reflection succeeds, declines, or errors, Ok(()) is returned so the tool loop continues normally. Secondary fix: record_anomaly_outcome errors are now silently ignored (let _) so a channel send failure cannot abandon mid-batch ToolResult assembly. Regression tests added: R-NTP-13 (single permanent Err), R-NTP-14 (parallel permanent Err), and R-NTP-10/11/12 updated to reflect the new Ok(()) return contract when reflection fails.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #2197.
Root cause
attempt_self_reflectionwas called inside the per-tool result loop inhandle_native_tool_calls. When any tool returned an error (including permanenterrors like HTTP 403), the old code called
attempt_self_reflectionmid-loop,which pushed
User{reflection_prompt}+Assistant{response}into historybefore the
User{ToolResults}message was assembled. The resulting messagesequence sent to the API was:
OpenAI requires every
tool_call_idin an assistant message to be immediatelyfollowed by
role:"tool"messages. Having a regularrole:"user"message at[N+1] → HTTP 400. Claude's context assembly downgrades orphaned ToolUse to text,
silently losing tool results.
Secondary issue:
record_anomaly_outcome().await?propagated channel send errorsvia
?, abandoning mid-batch ToolResult assembly beforeresult_parts.push().Fix
attempt_self_reflectionfrom inside theper-tool loop. A single
pending_reflection: Option<String>captures the firsteligible error output. After the loop assembles all ToolResult parts and
user_msgis pushed to history, reflection fires — in the correct positionafter
User{ToolResults}.Ok(_) | Err(_) => {}— whether reflectionsucceeds, declines, or errors,
Ok(())is returned. ToolResults are alreadycommitted; the tool loop continues normally.
record_anomaly_outcomeerrors ignored: changedawait?tolet _ = ...awaitso channel failures cannot abandon ToolResult assembly.Tests
ToolError::Execution(io::Error::other("HTTP 403"))→ ToolResult present in history
is_ok()(new contract — reflectionerrors no longer propagate) and search all messages for ToolResult parts
Checklist
cargo +nightly fmt --check— cleancargo clippy --workspace -- -D warnings— cleancargo nextest run --workspace --lib --bins— 6103/6103 passed[Unreleased]section)