Skip to content

Hosted shell_call without shell_call_output causes "No tool output found" error in multi-turn streaming flow #2664

@kvasa

Description

@kvasa

Description

When using hosted ShellTool (container-based code execution), the Responses API occasionally returns a response.completed event containing a shell_call output item but without a corresponding shell_call_output. In a multi-turn streaming flow, this orphan shell_call is carried into the next LLM turn's input, causing the model to emit:

No tool output found for shell call call_XXXXX

The SDK has drop_orphan_function_calls() which correctly handles this scenario, but it is only called in normalize_resumed_input() (for session resumption). It is not called in the normal streaming multi-turn path.

Reproduction

  • SDK version: 0.12.0 (also reproducible on 0.12.1)
  • Transport: WebSocket (OpenAIResponsesModel with websocket_connections)
  • Tool: ShellTool with container_id=None (auto-provisioned)
  • Model: gpt-5.4

Steps

  1. Start a streamed multi-turn agent run with a hosted ShellTool
  2. The model issues a shell_call — the API executes it server-side
  3. The API returns response.completed with shell_call in output items but no shell_call_output
  4. The SDK processes the shell_call item in turn_resolution.py:1339-1355, logs "Skipping local shell execution for hosted shell tool", and continues
  5. On the next turn, run_loop.py:1166-1175 calls normalize_input_items_for_api() but not drop_orphan_function_calls()
  6. The orphan shell_call (without output) is sent as input to the next LLM call
  7. The model responds with an error message instead of continuing the task

Timeline from production logs

12:27:48.963  1st LLM call (conversation start)
12:28:49.808  Tool execution: get_interest_lists
12:29:29.840  shell_call detected (running_code activity)
12:30:13.883  Tool execution: visualize_line_chart_data_tool
12:30:13.995  SDK processes shell_call item (turn_resolution.py)
12:30:13.996  "Skipping local shell execution for hosted shell tool"
12:30:14.001  Next LLM call — orphan shell_call in input
12:30:16.368  ERROR: "No tool output found for shell call call_R94BZzfgS28s3sgJdD4rjKd5"

Root Cause Analysis

Two contributing factors

1. API-side: The Responses API returns response.completed with a shell_call in the output but omits the shell_call_output. This may be a race condition in container execution or a serialization issue.

2. SDK-side: drop_orphan_function_calls() in agents/run_internal/items.py correctly filters out tool calls without matching outputs, but it's only invoked via normalize_resumed_input() — which is called for resumed sessions only, not during normal streaming multi-turn execution.

Relevant code paths

run_loop.py — streamed turn input (line ~1166-1175):

input_items = normalize_input_items_for_api(input_items)
# ← drop_orphan_function_calls() is NOT called here

run_loop.py — non-streamed turn input (line ~1531):

input_items = normalize_input_items_for_api(input_items)
# ← same gap

items.py — only usage of drop_orphan_function_calls (line ~148-155):

def normalize_resumed_input(raw_input):
    if isinstance(raw_input, list):
        normalized = normalize_input_items_for_api(raw_input)
        return drop_orphan_function_calls(normalized)  # ← only here
    return raw_input

Suggested Fix

Call drop_orphan_function_calls() after normalize_input_items_for_api() in both run_single_turn_streamed and run_single_turn in run_loop.py:

input_items = normalize_input_items_for_api(input_items)
input_items = drop_orphan_function_calls(input_items)  # add this line

This is a safe, defensive change — drop_orphan_function_calls() is a no-op when all tool calls have matching outputs.

Workaround

We are currently using a monkey-patch that wraps normalize_input_items_for_api to also call drop_orphan_function_calls:

from agents.run_internal import items as _items_module
from agents.run_internal import run_loop as _run_loop_module

_original_normalize = _items_module.normalize_input_items_for_api

def _normalize_and_drop_orphans(items):
    normalized = _original_normalize(items)
    return _items_module.drop_orphan_function_calls(normalized)

_run_loop_module.normalize_input_items_for_api = _normalize_and_drop_orphans
_items_module.normalize_input_items_for_api = _normalize_and_drop_orphans

Impact

  • Affects any agent using hosted ShellTool in multi-turn conversations
  • The conversation effectively breaks — the model sees a tool call without output and emits an error instead of continuing
  • No user-facing recovery possible without restarting the conversation

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions