-
Notifications
You must be signed in to change notification settings - Fork 2
fix(context): MemoryFirst drain creates orphaned tool results causing OpenAI 400 errors #2366
Description
Summary
The MemoryFirst context strategy (assembly.rs:772-784) drains conversation history with keep_tail = 2, which can orphan a role=tool message when the last 2 messages before the new user turn are a tool result and an assistant text response.
Root cause
// assembly.rs:772-784
let keep_tail = 2usize;
let history_start = 1usize;
let len = self.msg.messages.len();
if len > history_start + keep_tail {
self.msg.messages.drain(history_start..len - keep_tail);After a turn with tool calls, the in-memory messages end with: ..., assistant+tool_calls, tool_result, assistant_text_response. When MemoryFirst fires, it keeps the last 2: [tool_result, assistant_text_response]. The preceding assistant+tool_calls is drained. OpenAI rejects requests where role=tool appears without a preceding role=assistant containing tool_calls (HTTP 400).
Reproduction
testing.toml has crossover_turn_threshold = 5. With 4 memory_save turns, the 5th LLM call triggers MemoryFirst and produces the orphan:
[0] system
[1] system (memory recall)
[2] role=tool (ORPHANED — no preceding assistant+tool_calls)
[3] role=assistant (text response)
[4] role=user (new query)
Agent outputs: Error: no providers available (router exhausts all providers after repeated 400s).
Fix
In the MemoryFirst drain, after computing the slice boundary, walk backward from len - keep_tail to ensure the first retained non-system message is not a role=tool message. If it is, extend keep_tail until the boundary falls on an assistant+tool_calls or a user message.
Affected area
crates/zeph-core/src/agent/context/assembly.rs:772-784
Severity
Any conversation that crosses crossover_turn_threshold AND ends a turn with a tool call hits this. With the testing config default of 5 turns, this is reproducible in every 5-turn session with tool use.