-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
Description
When using hosted ShellTool (container-based code execution), the Responses API occasionally returns a response.completed event containing a shell_call output item but without a corresponding shell_call_output. In a multi-turn streaming flow, this orphan shell_call is carried into the next LLM turn's input, causing the model to emit:
No tool output found for shell call call_XXXXX
The SDK has drop_orphan_function_calls() which correctly handles this scenario, but it is only called in normalize_resumed_input() (for session resumption). It is not called in the normal streaming multi-turn path.
Reproduction
- SDK version: 0.12.0 (also reproducible on 0.12.1)
- Transport: WebSocket (
OpenAIResponsesModelwithwebsocket_connections) - Tool:
ShellToolwithcontainer_id=None(auto-provisioned) - Model:
gpt-5.4
Steps
- Start a streamed multi-turn agent run with a hosted
ShellTool - The model issues a
shell_call— the API executes it server-side - The API returns
response.completedwithshell_callin output items but noshell_call_output - The SDK processes the
shell_callitem inturn_resolution.py:1339-1355, logs "Skipping local shell execution for hosted shell tool", and continues - On the next turn,
run_loop.py:1166-1175callsnormalize_input_items_for_api()but notdrop_orphan_function_calls() - The orphan
shell_call(without output) is sent as input to the next LLM call - The model responds with an error message instead of continuing the task
Timeline from production logs
12:27:48.963 1st LLM call (conversation start)
12:28:49.808 Tool execution: get_interest_lists
12:29:29.840 shell_call detected (running_code activity)
12:30:13.883 Tool execution: visualize_line_chart_data_tool
12:30:13.995 SDK processes shell_call item (turn_resolution.py)
12:30:13.996 "Skipping local shell execution for hosted shell tool"
12:30:14.001 Next LLM call — orphan shell_call in input
12:30:16.368 ERROR: "No tool output found for shell call call_R94BZzfgS28s3sgJdD4rjKd5"
Root Cause Analysis
Two contributing factors
1. API-side: The Responses API returns response.completed with a shell_call in the output but omits the shell_call_output. This may be a race condition in container execution or a serialization issue.
2. SDK-side: drop_orphan_function_calls() in agents/run_internal/items.py correctly filters out tool calls without matching outputs, but it's only invoked via normalize_resumed_input() — which is called for resumed sessions only, not during normal streaming multi-turn execution.
Relevant code paths
run_loop.py — streamed turn input (line ~1166-1175):
input_items = normalize_input_items_for_api(input_items)
# ← drop_orphan_function_calls() is NOT called hererun_loop.py — non-streamed turn input (line ~1531):
input_items = normalize_input_items_for_api(input_items)
# ← same gapitems.py — only usage of drop_orphan_function_calls (line ~148-155):
def normalize_resumed_input(raw_input):
if isinstance(raw_input, list):
normalized = normalize_input_items_for_api(raw_input)
return drop_orphan_function_calls(normalized) # ← only here
return raw_inputSuggested Fix
Call drop_orphan_function_calls() after normalize_input_items_for_api() in both run_single_turn_streamed and run_single_turn in run_loop.py:
input_items = normalize_input_items_for_api(input_items)
input_items = drop_orphan_function_calls(input_items) # add this lineThis is a safe, defensive change — drop_orphan_function_calls() is a no-op when all tool calls have matching outputs.
Workaround
We are currently using a monkey-patch that wraps normalize_input_items_for_api to also call drop_orphan_function_calls:
from agents.run_internal import items as _items_module
from agents.run_internal import run_loop as _run_loop_module
_original_normalize = _items_module.normalize_input_items_for_api
def _normalize_and_drop_orphans(items):
normalized = _original_normalize(items)
return _items_module.drop_orphan_function_calls(normalized)
_run_loop_module.normalize_input_items_for_api = _normalize_and_drop_orphans
_items_module.normalize_input_items_for_api = _normalize_and_drop_orphansImpact
- Affects any agent using hosted
ShellToolin multi-turn conversations - The conversation effectively breaks — the model sees a tool call without output and emits an error instead of continuing
- No user-facing recovery possible without restarting the conversation