Skip to content

feat(openai): round-trip Responses API reasoning_items in chat completions#24690

Merged
yuneng-berri merged 1 commit intoBerriAI:mainfrom
Sameerlite:litellm_litellm_openai-reasoning-items-chat-completions
Mar 27, 2026
Merged

feat(openai): round-trip Responses API reasoning_items in chat completions#24690
yuneng-berri merged 1 commit intoBerriAI:mainfrom
Sameerlite:litellm_litellm_openai-reasoning-items-chat-completions

Conversation

@Sameerlite
Copy link
Copy Markdown
Collaborator

Summary

When litellm.completion() routes to OpenAI via the openai/responses/ prefix, this change exposes OpenAI Responses API reasoning items on the Chat Completions-shaped response and accepts them back on assistant messages for the next turn.

Behavior

  • Response: message.reasoning_items (and streaming delta.reasoning_items on the final chunk) carry id, type, encrypted_content, and summary so clients can persist opaque reasoning state alongside content. reasoning_content remains the concatenated summary text when summaries are present.
  • Request: Assistant messages may include reasoning_items; LiteLLM maps them to the correct Responses API input items. summary is always sent on that input (including []), which matches the API when using string reasoning_effort (e.g. "low") where the model returns an empty summary but still returns encrypted_content.
  • Streaming: streaming_handler treats deltas with reasoning_items as non-empty so the terminal chunk is not dropped.

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 27, 2026 2:58pm

Request Review

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 27, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing Sameerlite:litellm_litellm_openai-reasoning-items-chat-completions (00a810e) with main (88ed4f9)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 27, 2026

Greptile Summary

This PR enables round-tripping of OpenAI Responses API reasoning items through the openai/responses/ Chat Completions bridge. On the response side, ResponseReasoningItem objects are captured and surfaced as message.reasoning_items (non-streaming) or delta.reasoning_items on the terminal chunk (streaming). On the request side, an assistant message carrying reasoning_items is converted back to the Responses API reasoning input format via _reasoning_item_to_response_input, ensuring the encrypted reasoning state is sent on subsequent turns. The streaming handler is updated so the terminal chunk carrying only reasoning_items is not dropped as empty.

Key observations:

  • The new ChatCompletionReasoningItem / ChatCompletionReasoningSummaryTextBlock TypedDicts and Message.reasoning_items / Delta.reasoning_items fields follow the existing thinking_blocks / annotations patterns correctly.
  • Two new mock-only tests validate both paths; no real network calls are made, satisfying the CI rule.
  • Non-streaming vs streaming inconsistency: the non-streaming loop overwrites pending_reasoning_item on each ResponseReasoningItem (keeping only the last), while the streaming path accumulates all items into a list. Should the API return multiple reasoning items, the non-streaming path would silently drop all but the last.
  • Silent drop in message history: an assistant message with reasoning_items but content is None and no tool_calls is not handled — the reasoning items are dropped when converting to Responses API input.
  • Non-deterministic fallback ID: _reasoning_item_to_response_input uses id(r_item) (Python object address) as a fallback when r_item["id"] is absent. This ID is non-reproducible after serialisation, and the Responses API requires the original ID for encrypted_content to be valid.

Confidence Score: 5/5

Safe to merge; all findings are P2 style/edge-case concerns that do not affect the common single-reasoning-item path.

The happy path (single reasoning item, non-streaming and streaming) is correct and well-tested with mock tests. All three issues raised are P2: the multi-item inconsistency doesn't manifest with the current API, the silent-drop requires a manually crafted edge-case message, and the fallback ID is a minor defensive-coding gap. No security, data-integrity, or backward-compatibility regressions were found.

litellm/completion_extras/litellm_responses_transformation/transformation.py — the non-streaming accumulation loop and the message-history conversion branch are the two spots worth a second look.

Important Files Changed

Filename Overview
litellm/completion_extras/litellm_responses_transformation/transformation.py Core transformation logic: adds reasoning_items extraction (non-streaming drops all but last item if multiple exist), round-trip input conversion, and streaming terminal-chunk emission — three P2 issues found.
litellm/types/utils.py Adds reasoning_items field to Message and Delta, following the same optional-delete pattern as thinking_blocks and annotations.
litellm/types/llms/openai.py Introduces ChatCompletionReasoningSummaryTextBlock and ChatCompletionReasoningItem TypedDicts to type the new round-trip payload.
litellm/litellm_core_utils/streaming_handler.py Adds reasoning_items is not None guard so the terminal streaming chunk carrying reasoning_items is not dropped as empty.
tests/test_litellm/completion_extras/litellm_responses_transformation/test_completion_extras_litellm_responses_transformation_transformation.py Two new mock-only tests cover non-streaming round-trip and streaming terminal-chunk emission; no real network calls are made.
docs/my-website/docs/providers/openai.md Documents multi-turn reasoning_items usage with non-streaming and streaming code examples.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LiteLLM
    participant OpenAI Responses API

    Client->>LiteLLM: completion(messages, include=["reasoning.encrypted_content"])
    LiteLLM->>OpenAI Responses API: POST /responses (with reasoning config)
    OpenAI Responses API-->>LiteLLM: ResponseReasoningItem(id, encrypted_content, summary) + ResponseOutputMessage

    Note over LiteLLM: Non-streaming: _build_reasoning_item()<br/>pending_reasoning_item → Message.reasoning_items
    Note over LiteLLM: Streaming: _build_reasoning_item() on response.completed<br/>Delta.reasoning_items on final chunk

    LiteLLM-->>Client: ModelResponse (message.reasoning_items, message.reasoning_content)

    Client->>LiteLLM: completion(messages=[..., {role:assistant, reasoning_items:[...]}])
    Note over LiteLLM: convert_chat_completion_messages_to_responses_api()<br/>_reasoning_item_to_response_input() → {type:reasoning, id, encrypted_content, summary}
    LiteLLM->>OpenAI Responses API: POST /responses (input contains reasoning item before assistant message)
    OpenAI Responses API-->>LiteLLM: Next response (reasoning state restored)
    LiteLLM-->>Client: ModelResponse
Loading

Comments Outside Diff (1)

  1. litellm/completion_extras/litellm_responses_transformation/transformation.py, line 265-275 (link)

    P2 reasoning_items silently dropped when content is None with no tool_calls

    The branch structure here only processes reasoning_items in two cases:

    1. role == "assistant" and tool_calls is a non-empty list (line 248)
    2. content is not None (line 265)

    An assistant message that has reasoning_items but content is None and no tool_calls (e.g., a manually constructed history entry, or a future response type) falls through both branches without emitting any reasoning input item. The items are silently discarded.

    Consider adding an explicit guard before the main elif content is not None branch:

    elif role == "assistant" and not tool_calls and content is None:
        # reasoning-only assistant turn (no text, no tool calls)
        for r_item in msg.get("reasoning_items") or []:
            input_items.append(_reasoning_item_to_response_input(r_item))

Reviews (1): Last reviewed commit: "feat(openai): round-trip Responses API r..." | Re-trigger Greptile

Comment on lines 469 to +480
for item in output_items:
if isinstance(item, ResponseReasoningItem):
for summary_item in item.summary:
response_text = getattr(summary_item, "text", "")
reasoning_content = response_text if response_text else ""
pending_reasoning_item = _build_reasoning_item(
item_id=item.id,
encrypted_content=getattr(item, "encrypted_content", None),
summary_raw=item.summary,
)
reasoning_content = " ".join(
s["text"]
for s in pending_reasoning_item["summary"]
if s.get("text")
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Non-streaming drops all but the last reasoning item

pending_reasoning_item is reassigned (not appended) on every ResponseReasoningItem encountered in output_items. If the Responses API ever returns more than one reasoning item in a single response, only the last one will appear on message.reasoning_items. The streaming path (completed_reasoning_items) correctly accumulates all items into a list, so the two paths are already inconsistent.

Consider accumulating in the same style as the streaming path:

pending_reasoning_items: List[Dict[str, Any]] = []
...
    if isinstance(item, ResponseReasoningItem):
        pending_reasoning_items.append(_build_reasoning_item(...))
        reasoning_content = " ".join(...)

Then when building the Message:

reasoning_items=cast(
    Optional[List[ChatCompletionReasoningItem]],
    pending_reasoning_items if pending_reasoning_items else None,
),

Comment on lines +88 to +95

def _reasoning_item_to_response_input(r_item: Dict[str, Any]) -> Dict[str, Any]:
"""Convert a stored ChatCompletionReasoningItem back to a Responses API input item."""
r_input: Dict[str, Any] = {
"type": "reasoning",
"id": r_item.get("id") or f"rs_{id(r_item)}",
# summary is always required by the Responses API, even when empty
"summary": r_item.get("summary") or [],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Fallback ID is non-deterministic and non-round-trippable

"id": r_item.get("id") or f"rs_{id(r_item)}",

id(r_item) returns the CPython object memory address, which changes on every process start or after JSON serialisation/deserialisation. If a client serialises the assistant message to JSON (e.g., in a database or across a REST hop) and then passes it back, r_item.get("id") will still be falsy but id(r_item) will be a different number than the one the OpenAI Responses API originally issued. The API then receives an unknown id, which may cause it to reject the request or silently fail to restore reasoning state.

Consider raising a clear error or warning instead:

item_id = r_item.get("id")
if not item_id:
    import warnings
    warnings.warn(
        "reasoning_item is missing 'id'; the Responses API requires the "
        "original id for encrypted_content to be valid.",
        stacklevel=2,
    )
    item_id = ""
r_input: Dict[str, Any] = {
    "type": "reasoning",
    "id": item_id,
    ...
}

@yuneng-berri yuneng-berri enabled auto-merge March 27, 2026 16:53
@yuneng-berri yuneng-berri merged commit 3e7ee3e into BerriAI:main Mar 27, 2026
40 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants