-
Notifications
You must be signed in to change notification settings - Fork 2
bug(tools): parallel tool call with permanent error drops tool_result, causes 400 Bad Request on next LLM turn #2197
Description
Summary
When the LLM makes parallel tool calls (multiple tool_calls in one assistant message) and one of them fails with a permanent error (e.g., HTTP 403), the tool result for the failed call is not injected into the next LLM request. This causes OpenAI API to return HTTP 400: "An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'".
Reproduction
printf 'Какая погода в Москве? Также, кто такой Путин?\n' | cargo run --features full -- --config .local/config/testing.tomlThe Russian prompt causes the LLM to issue two parallel fetch calls:
fetch("https://api.open-meteo.com/v1/forecast?...")→ 200 OKfetch("https://ru.wikipedia.org/api/rest_v1/page/summary/путин")→ 403 Forbidden (permanent error)
The 403 is recorded in the debug dump as "kind": "permanent", but the tool_result for call_tKaDkhHGLPeg8gBmu8vuBZM1 is missing from the next LLM request.
Debug Dump Analysis
Request 0002-request.json has 4 messages:
[0] system: ...
[1] user: "Какая погода в Москве? Также, кто такой Путин?"
[2] assistant: tool_calls: ['call_QzOp1AlcssaIq18co1gngSe8', 'call_tKaDkhHGLPeg8gBmu8vuBZM1']
[3] user: "You attempted to help..." ← missing tool_result for both call IDs
OpenAI requires a tool_result message for every tool_call_id in the assistant message before the next user turn.
Root Cause
In crates/zeph-core/src/agent/tool_execution/mod.rs (or context/assembly.rs): when a tool call returns a permanent error, the error response may not be assembled into a tool_result block with the correct tool_call_id. Instead, the error is surfaced differently, breaking the required message interleaving.
Expected Behavior
For every tool_call_id in an assistant message, there must be a corresponding tool_result message — even if the tool failed. A failed tool call should produce tool_result with the error text, not silence.
Severity
HIGH — parallel tool calls with any permanent failure cause a complete conversation breakdown. All subsequent turns return "no providers available". Common trigger: multi-part questions in non-English languages that cause parallel fetch calls.
Session
CI-189, debug dump: .local/testing/debug/1774577837/