feat(openai): round-trip Responses API reasoning_items in chat completions by Sameerlite · Pull Request #24690 · BerriAI/litellm

Sameerlite · 2026-03-27T14:55:53Z

Summary

When litellm.completion() routes to OpenAI via the openai/responses/ prefix, this change exposes OpenAI Responses API reasoning items on the Chat Completions-shaped response and accepts them back on assistant messages for the next turn.

Behavior

Response: message.reasoning_items (and streaming delta.reasoning_items on the final chunk) carry id, type, encrypted_content, and summary so clients can persist opaque reasoning state alongside content. reasoning_content remains the concatenated summary text when summaries are present.
Request: Assistant messages may include reasoning_items; LiteLLM maps them to the correct Responses API input items. summary is always sent on that input (including []), which matches the API when using string reasoning_effort (e.g. "low") where the model returns an empty summary but still returns encrypted_content.
Streaming: streaming_handler treats deltas with reasoning_items as non-empty so the terminal chunk is not dropped.

…tions Made-with: Cursor

vercel · 2026-03-27T14:55:58Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 27, 2026 2:58pm

codspeed-hq · 2026-03-27T14:58:20Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing Sameerlite:litellm_litellm_openai-reasoning-items-chat-completions (00a810e) with main (88ed4f9)}

greptile-apps · 2026-03-27T15:00:12Z

Greptile Summary

This PR enables round-tripping of OpenAI Responses API reasoning items through the openai/responses/ Chat Completions bridge. On the response side, ResponseReasoningItem objects are captured and surfaced as message.reasoning_items (non-streaming) or delta.reasoning_items on the terminal chunk (streaming). On the request side, an assistant message carrying reasoning_items is converted back to the Responses API reasoning input format via _reasoning_item_to_response_input, ensuring the encrypted reasoning state is sent on subsequent turns. The streaming handler is updated so the terminal chunk carrying only reasoning_items is not dropped as empty.

Key observations:

The new ChatCompletionReasoningItem / ChatCompletionReasoningSummaryTextBlock TypedDicts and Message.reasoning_items / Delta.reasoning_items fields follow the existing thinking_blocks / annotations patterns correctly.
Two new mock-only tests validate both paths; no real network calls are made, satisfying the CI rule.
Non-streaming vs streaming inconsistency: the non-streaming loop overwrites pending_reasoning_item on each ResponseReasoningItem (keeping only the last), while the streaming path accumulates all items into a list. Should the API return multiple reasoning items, the non-streaming path would silently drop all but the last.
Silent drop in message history: an assistant message with reasoning_items but content is None and no tool_calls is not handled — the reasoning items are dropped when converting to Responses API input.
Non-deterministic fallback ID: _reasoning_item_to_response_input uses id(r_item) (Python object address) as a fallback when r_item["id"] is absent. This ID is non-reproducible after serialisation, and the Responses API requires the original ID for encrypted_content to be valid.

Confidence Score: 5/5

Safe to merge; all findings are P2 style/edge-case concerns that do not affect the common single-reasoning-item path.

The happy path (single reasoning item, non-streaming and streaming) is correct and well-tested with mock tests. All three issues raised are P2: the multi-item inconsistency doesn't manifest with the current API, the silent-drop requires a manually crafted edge-case message, and the fallback ID is a minor defensive-coding gap. No security, data-integrity, or backward-compatibility regressions were found.

litellm/completion_extras/litellm_responses_transformation/transformation.py — the non-streaming accumulation loop and the message-history conversion branch are the two spots worth a second look.

Important Files Changed

Filename	Overview
litellm/completion_extras/litellm_responses_transformation/transformation.py	Core transformation logic: adds reasoning_items extraction (non-streaming drops all but last item if multiple exist), round-trip input conversion, and streaming terminal-chunk emission — three P2 issues found.
litellm/types/utils.py	Adds `reasoning_items` field to `Message` and `Delta`, following the same optional-delete pattern as `thinking_blocks` and `annotations`.
litellm/types/llms/openai.py	Introduces `ChatCompletionReasoningSummaryTextBlock` and `ChatCompletionReasoningItem` TypedDicts to type the new round-trip payload.
litellm/litellm_core_utils/streaming_handler.py	Adds `reasoning_items is not None` guard so the terminal streaming chunk carrying reasoning_items is not dropped as empty.
tests/test_litellm/completion_extras/litellm_responses_transformation/test_completion_extras_litellm_responses_transformation_transformation.py	Two new mock-only tests cover non-streaming round-trip and streaming terminal-chunk emission; no real network calls are made.
docs/my-website/docs/providers/openai.md	Documents multi-turn reasoning_items usage with non-streaming and streaming code examples.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LiteLLM
    participant OpenAI Responses API

    Client->>LiteLLM: completion(messages, include=["reasoning.encrypted_content"])
    LiteLLM->>OpenAI Responses API: POST /responses (with reasoning config)
    OpenAI Responses API-->>LiteLLM: ResponseReasoningItem(id, encrypted_content, summary) + ResponseOutputMessage

    Note over LiteLLM: Non-streaming: _build_reasoning_item()<br/>pending_reasoning_item → Message.reasoning_items
    Note over LiteLLM: Streaming: _build_reasoning_item() on response.completed<br/>Delta.reasoning_items on final chunk

    LiteLLM-->>Client: ModelResponse (message.reasoning_items, message.reasoning_content)

    Client->>LiteLLM: completion(messages=[..., {role:assistant, reasoning_items:[...]}])
    Note over LiteLLM: convert_chat_completion_messages_to_responses_api()<br/>_reasoning_item_to_response_input() → {type:reasoning, id, encrypted_content, summary}
    LiteLLM->>OpenAI Responses API: POST /responses (input contains reasoning item before assistant message)
    OpenAI Responses API-->>LiteLLM: Next response (reasoning state restored)
    LiteLLM-->>Client: ModelResponse

Comments Outside Diff (1)

litellm/completion_extras/litellm_responses_transformation/transformation.py, line 265-275 (link)

reasoning_items silently dropped when content is None with no tool_calls

The branch structure here only processes reasoning_items in two cases:
1. role == "assistant" and tool_calls is a non-empty list (line 248)
2. content is not None (line 265)
An assistant message that has reasoning_items but content is None and no tool_calls (e.g., a manually constructed history entry, or a future response type) falls through both branches without emitting any reasoning input item. The items are silently discarded.

Consider adding an explicit guard before the main elif content is not None branch:
```
elif role == "assistant" and not tool_calls and content is None:
    # reasoning-only assistant turn (no text, no tool calls)
    for r_item in msg.get("reasoning_items") or []:
        input_items.append(_reasoning_item_to_response_input(r_item))
```

_{Reviews (1): Last reviewed commit: "feat(openai): round-trip Responses API r..." | Re-trigger Greptile}

greptile-apps · 2026-03-27T15:00:16Z

        for item in output_items:
            if isinstance(item, ResponseReasoningItem):
-                for summary_item in item.summary:
-                    response_text = getattr(summary_item, "text", "")
-                    reasoning_content = response_text if response_text else ""
+                pending_reasoning_item = _build_reasoning_item(
+                    item_id=item.id,
+                    encrypted_content=getattr(item, "encrypted_content", None),
+                    summary_raw=item.summary,
+                )
+                reasoning_content = " ".join(
+                    s["text"]
+                    for s in pending_reasoning_item["summary"]
+                    if s.get("text")
+                )


Non-streaming drops all but the last reasoning item

pending_reasoning_item is reassigned (not appended) on every ResponseReasoningItem encountered in output_items. If the Responses API ever returns more than one reasoning item in a single response, only the last one will appear on message.reasoning_items. The streaming path (completed_reasoning_items) correctly accumulates all items into a list, so the two paths are already inconsistent.

Consider accumulating in the same style as the streaming path:

pending_reasoning_items: List[Dict[str, Any]] = [] ... if isinstance(item, ResponseReasoningItem): pending_reasoning_items.append(_build_reasoning_item(...)) reasoning_content = " ".join(...)

Then when building the Message:

reasoning_items=cast( Optional[List[ChatCompletionReasoningItem]], pending_reasoning_items if pending_reasoning_items else None, ),

greptile-apps · 2026-03-27T15:00:20Z

+
+def _reasoning_item_to_response_input(r_item: Dict[str, Any]) -> Dict[str, Any]:
+    """Convert a stored ChatCompletionReasoningItem back to a Responses API input item."""
+    r_input: Dict[str, Any] = {
+        "type": "reasoning",
+        "id": r_item.get("id") or f"rs_{id(r_item)}",
+        # summary is always required by the Responses API, even when empty
+        "summary": r_item.get("summary") or [],


Fallback ID is non-deterministic and non-round-trippable

"id": r_item.get("id") or f"rs_{id(r_item)}",

id(r_item) returns the CPython object memory address, which changes on every process start or after JSON serialisation/deserialisation. If a client serialises the assistant message to JSON (e.g., in a database or across a REST hop) and then passes it back, r_item.get("id") will still be falsy but id(r_item) will be a different number than the one the OpenAI Responses API originally issued. The API then receives an unknown id, which may cause it to reject the request or silently fail to restore reasoning state.

Consider raising a clear error or warning instead:

item_id = r_item.get("id") if not item_id: import warnings warnings.warn( "reasoning_item is missing 'id'; the Responses API requires the " "original id for encrypted_content to be valid.", stacklevel=2, ) item_id = "" r_input: Dict[str, Any] = { "type": "reasoning", "id": item_id, ... }

feat(openai): round-trip Responses API reasoning_items in chat comple…

00a810e

…tions Made-with: Cursor

Sameerlite mentioned this pull request Mar 27, 2026

feat(openai): round-trip Responses API reasoning_items in chat completions #24557

Closed

vercel bot deployed to Preview March 27, 2026 14:58 View deployment

greptile-apps bot reviewed Mar 27, 2026

View reviewed changes

yuneng-berri enabled auto-merge March 27, 2026 16:53

yuneng-berri approved these changes Mar 27, 2026

View reviewed changes

yuneng-berri merged commit 3e7ee3e into BerriAI:main Mar 27, 2026
40 of 41 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(openai): round-trip Responses API reasoning_items in chat completions#24690

feat(openai): round-trip Responses API reasoning_items in chat completions#24690
yuneng-berri merged 1 commit intoBerriAI:mainfrom
Sameerlite:litellm_litellm_openai-reasoning-items-chat-completions

Sameerlite commented Mar 27, 2026

Uh oh!

vercel bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Mar 27, 2026

Uh oh!

greptile-apps bot commented Mar 27, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Mar 27, 2026

Uh oh!

greptile-apps bot Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Sameerlite commented Mar 27, 2026

Summary

Behavior

Uh oh!

vercel bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 27, 2026

Merging this PR will not alter performance

Uh oh!

greptile-apps bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Mar 27, 2026 •

edited

Loading

greptile-apps bot commented Mar 27, 2026 •

edited

Loading