Hosted shell_call without shell_call_output causes "No tool output found" error in multi-turn streaming flow

## Description

When using hosted `ShellTool` (container-based code execution), the Responses API occasionally returns a `response.completed` event containing a `shell_call` output item but **without** a corresponding `shell_call_output`. In a multi-turn streaming flow, this orphan `shell_call` is carried into the next LLM turn's input, causing the model to emit:

```
No tool output found for shell call call_XXXXX
```

The SDK has `drop_orphan_function_calls()` which correctly handles this scenario, but it is **only called in `normalize_resumed_input()`** (for session resumption). It is **not called** in the normal streaming multi-turn path.

## Reproduction

- **SDK version:** 0.12.0 (also reproducible on 0.12.1)
- **Transport:** WebSocket (`OpenAIResponsesModel` with `websocket_connections`)
- **Tool:** `ShellTool` with `container_id=None` (auto-provisioned)
- **Model:** `gpt-5.4`

### Steps

1. Start a streamed multi-turn agent run with a hosted `ShellTool`
2. The model issues a `shell_call` — the API executes it server-side
3. The API returns `response.completed` with `shell_call` in output items but **no `shell_call_output`**
4. The SDK processes the `shell_call` item in `turn_resolution.py:1339-1355`, logs "Skipping local shell execution for hosted shell tool", and continues
5. On the next turn, `run_loop.py:1166-1175` calls `normalize_input_items_for_api()` but **not** `drop_orphan_function_calls()`
6. The orphan `shell_call` (without output) is sent as input to the next LLM call
7. The model responds with an error message instead of continuing the task

### Timeline from production logs

```
12:27:48.963  1st LLM call (conversation start)
12:28:49.808  Tool execution: get_interest_lists
12:29:29.840  shell_call detected (running_code activity)
12:30:13.883  Tool execution: visualize_line_chart_data_tool
12:30:13.995  SDK processes shell_call item (turn_resolution.py)
12:30:13.996  "Skipping local shell execution for hosted shell tool"
12:30:14.001  Next LLM call — orphan shell_call in input
12:30:16.368  ERROR: "No tool output found for shell call call_R94BZzfgS28s3sgJdD4rjKd5"
```

## Root Cause Analysis

### Two contributing factors

**1. API-side:** The Responses API returns `response.completed` with a `shell_call` in the output but omits the `shell_call_output`. This may be a race condition in container execution or a serialization issue.

**2. SDK-side:** `drop_orphan_function_calls()` in `agents/run_internal/items.py` correctly filters out tool calls without matching outputs, but it's only invoked via `normalize_resumed_input()` — which is called for **resumed sessions only**, not during normal streaming multi-turn execution.

### Relevant code paths

**`run_loop.py` — streamed turn input (line ~1166-1175):**
```python
input_items = normalize_input_items_for_api(input_items)
# ← drop_orphan_function_calls() is NOT called here
```

**`run_loop.py` — non-streamed turn input (line ~1531):**
```python
input_items = normalize_input_items_for_api(input_items)
# ← same gap
```

**`items.py` — only usage of drop_orphan_function_calls (line ~148-155):**
```python
def normalize_resumed_input(raw_input):
    if isinstance(raw_input, list):
        normalized = normalize_input_items_for_api(raw_input)
        return drop_orphan_function_calls(normalized)  # ← only here
    return raw_input
```

## Suggested Fix

Call `drop_orphan_function_calls()` after `normalize_input_items_for_api()` in both `run_single_turn_streamed` and `run_single_turn` in `run_loop.py`:

```python
input_items = normalize_input_items_for_api(input_items)
input_items = drop_orphan_function_calls(input_items)  # add this line
```

This is a safe, defensive change — `drop_orphan_function_calls()` is a no-op when all tool calls have matching outputs.

## Workaround

We are currently using a monkey-patch that wraps `normalize_input_items_for_api` to also call `drop_orphan_function_calls`:

```python
from agents.run_internal import items as _items_module
from agents.run_internal import run_loop as _run_loop_module

_original_normalize = _items_module.normalize_input_items_for_api

def _normalize_and_drop_orphans(items):
    normalized = _original_normalize(items)
    return _items_module.drop_orphan_function_calls(normalized)

_run_loop_module.normalize_input_items_for_api = _normalize_and_drop_orphans
_items_module.normalize_input_items_for_api = _normalize_and_drop_orphans
```

## Impact

- Affects any agent using hosted `ShellTool` in multi-turn conversations
- The conversation effectively breaks — the model sees a tool call without output and emits an error instead of continuing
- No user-facing recovery possible without restarting the conversation


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hosted shell_call without shell_call_output causes "No tool output found" error in multi-turn streaming flow #2664

Description

Reproduction

Steps

Timeline from production logs

Root Cause Analysis

Two contributing factors

Relevant code paths

Suggested Fix

Workaround

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hosted shell_call without shell_call_output causes "No tool output found" error in multi-turn streaming flow #2664

Description

Description

Reproduction

Steps

Timeline from production logs

Root Cause Analysis

Two contributing factors

Relevant code paths

Suggested Fix

Workaround

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions