Skip to content

fix(core): defer self-reflection to prevent orphaned tool_use blocks#2205

Merged
bug-ops merged 1 commit intomainfrom
2197-parallel-tool-result-drop
Mar 27, 2026
Merged

fix(core): defer self-reflection to prevent orphaned tool_use blocks#2205
bug-ops merged 1 commit intomainfrom
2197-parallel-tool-result-drop

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 27, 2026

Fixes #2197.

Root cause

attempt_self_reflection was called inside the per-tool result loop in
handle_native_tool_calls. When any tool returned an error (including permanent
errors like HTTP 403), the old code called attempt_self_reflection mid-loop,
which pushed User{reflection_prompt} + Assistant{response} into history
before the User{ToolResults} message was assembled. The resulting message
sequence sent to the API was:

[N]   Assistant{ToolUse(id-1, id-2, ...)}
[N+1] User{"You attempted to help..."}      ← reflection prompt (wrong position)
[N+2] Assistant{reflection response}
[N+3] User{ToolResult(id-1), ToolResult(id-2)} ← too late

OpenAI requires every tool_call_id in an assistant message to be immediately
followed by role:"tool" messages. Having a regular role:"user" message at
[N+1] → HTTP 400. Claude's context assembly downgrades orphaned ToolUse to text,
silently losing tool results.

Secondary issue: record_anomaly_outcome().await? propagated channel send errors
via ?, abandoning mid-batch ToolResult assembly before result_parts.push().

Fix

  • Deferred self-reflection: removed attempt_self_reflection from inside the
    per-tool loop. A single pending_reflection: Option<String> captures the first
    eligible error output. After the loop assembles all ToolResult parts and
    user_msg is pushed to history, reflection fires — in the correct position
    after User{ToolResults}.
  • Reflection errors swallowed: Ok(_) | Err(_) => {} — whether reflection
    succeeds, declines, or errors, Ok(()) is returned. ToolResults are already
    committed; the tool loop continues normally.
  • record_anomaly_outcome errors ignored: changed await? to let _ = ...await so channel failures cannot abandon ToolResult assembly.

Tests

  • R-NTP-13 (new): single ToolError::Execution(io::Error::other("HTTP 403"))
    → ToolResult present in history
  • R-NTP-14 (new): two parallel permanent errors → both ToolResult parts present
  • R-NTP-10/11/12 updated: now assert is_ok() (new contract — reflection
    errors no longer propagate) and search all messages for ToolResult parts

Checklist

  • cargo +nightly fmt --check — clean
  • cargo clippy --workspace -- -D warnings — clean
  • cargo nextest run --workspace --lib --bins — 6103/6103 passed
  • CHANGELOG.md updated ([Unreleased] section)
  • Live API session test required (LLM serialization gate: touches message history assembly path)

…2197)

Permanent tool errors (e.g. HTTP 403 from web-scrape) caused OpenAI HTTP 400
"An assistant message with 'tool_calls' must be followed by tool messages
responding to each 'tool_call_id'" because attempt_self_reflection was called
inside the per-tool result loop. This inserted User{reflection_prompt} +
Assistant{response} messages between the Assistant{ToolUse} block and the
User{ToolResults} block, breaking the strict message ordering required by
OpenAI and Claude APIs.

Fix: defer attempt_self_reflection until after all ToolResult parts are
assembled and the User{ToolResults} message is pushed to history. A single
`pending_reflection: Option<String>` accumulates the first eligible error;
reflection fires after push_message(user_msg). Whether reflection succeeds,
declines, or errors, Ok(()) is returned so the tool loop continues normally.

Secondary fix: record_anomaly_outcome errors are now silently ignored (let _)
so a channel send failure cannot abandon mid-batch ToolResult assembly.

Regression tests added: R-NTP-13 (single permanent Err), R-NTP-14 (parallel
permanent Err), and R-NTP-10/11/12 updated to reflect the new Ok(()) return
contract when reflection fails.
@github-actions github-actions bot added documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate bug Something isn't working size/XL Extra large PR (500+ lines) labels Mar 27, 2026
@bug-ops bug-ops changed the title fix(tools): always emit tool_result for every tool_call_id on permanent error fix(core): defer self-reflection to prevent orphaned tool_use blocks Mar 27, 2026
@bug-ops bug-ops enabled auto-merge (squash) March 27, 2026 03:00
@bug-ops bug-ops merged commit 0b23daa into main Mar 27, 2026
28 checks passed
@bug-ops bug-ops deleted the 2197-parallel-tool-result-drop branch March 27, 2026 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working core zeph-core crate documentation Improvements or additions to documentation rust Rust code changes size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(tools): parallel tool call with permanent error drops tool_result, causes 400 Bad Request on next LLM turn

1 participant