feat(openai): round-trip Responses API reasoning_items in chat completions#24690
Conversation
…tions Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR enables round-tripping of OpenAI Responses API reasoning items through the Key observations:
Confidence Score: 5/5Safe to merge; all findings are P2 style/edge-case concerns that do not affect the common single-reasoning-item path. The happy path (single reasoning item, non-streaming and streaming) is correct and well-tested with mock tests. All three issues raised are P2: the multi-item inconsistency doesn't manifest with the current API, the silent-drop requires a manually crafted edge-case message, and the fallback ID is a minor defensive-coding gap. No security, data-integrity, or backward-compatibility regressions were found. litellm/completion_extras/litellm_responses_transformation/transformation.py — the non-streaming accumulation loop and the message-history conversion branch are the two spots worth a second look.
|
| Filename | Overview |
|---|---|
| litellm/completion_extras/litellm_responses_transformation/transformation.py | Core transformation logic: adds reasoning_items extraction (non-streaming drops all but last item if multiple exist), round-trip input conversion, and streaming terminal-chunk emission — three P2 issues found. |
| litellm/types/utils.py | Adds reasoning_items field to Message and Delta, following the same optional-delete pattern as thinking_blocks and annotations. |
| litellm/types/llms/openai.py | Introduces ChatCompletionReasoningSummaryTextBlock and ChatCompletionReasoningItem TypedDicts to type the new round-trip payload. |
| litellm/litellm_core_utils/streaming_handler.py | Adds reasoning_items is not None guard so the terminal streaming chunk carrying reasoning_items is not dropped as empty. |
| tests/test_litellm/completion_extras/litellm_responses_transformation/test_completion_extras_litellm_responses_transformation_transformation.py | Two new mock-only tests cover non-streaming round-trip and streaming terminal-chunk emission; no real network calls are made. |
| docs/my-website/docs/providers/openai.md | Documents multi-turn reasoning_items usage with non-streaming and streaming code examples. |
Sequence Diagram
sequenceDiagram
participant Client
participant LiteLLM
participant OpenAI Responses API
Client->>LiteLLM: completion(messages, include=["reasoning.encrypted_content"])
LiteLLM->>OpenAI Responses API: POST /responses (with reasoning config)
OpenAI Responses API-->>LiteLLM: ResponseReasoningItem(id, encrypted_content, summary) + ResponseOutputMessage
Note over LiteLLM: Non-streaming: _build_reasoning_item()<br/>pending_reasoning_item → Message.reasoning_items
Note over LiteLLM: Streaming: _build_reasoning_item() on response.completed<br/>Delta.reasoning_items on final chunk
LiteLLM-->>Client: ModelResponse (message.reasoning_items, message.reasoning_content)
Client->>LiteLLM: completion(messages=[..., {role:assistant, reasoning_items:[...]}])
Note over LiteLLM: convert_chat_completion_messages_to_responses_api()<br/>_reasoning_item_to_response_input() → {type:reasoning, id, encrypted_content, summary}
LiteLLM->>OpenAI Responses API: POST /responses (input contains reasoning item before assistant message)
OpenAI Responses API-->>LiteLLM: Next response (reasoning state restored)
LiteLLM-->>Client: ModelResponse
Comments Outside Diff (1)
-
litellm/completion_extras/litellm_responses_transformation/transformation.py, line 265-275 (link)reasoning_itemssilently dropped whencontent is Nonewith notool_callsThe branch structure here only processes
reasoning_itemsin two cases:role == "assistant"andtool_callsis a non-empty list (line 248)content is not None(line 265)
An assistant message that has
reasoning_itemsbutcontent is Noneand notool_calls(e.g., a manually constructed history entry, or a future response type) falls through both branches without emitting any reasoning input item. The items are silently discarded.Consider adding an explicit guard before the main
elif content is not Nonebranch:elif role == "assistant" and not tool_calls and content is None: # reasoning-only assistant turn (no text, no tool calls) for r_item in msg.get("reasoning_items") or []: input_items.append(_reasoning_item_to_response_input(r_item))
Reviews (1): Last reviewed commit: "feat(openai): round-trip Responses API r..." | Re-trigger Greptile
| for item in output_items: | ||
| if isinstance(item, ResponseReasoningItem): | ||
| for summary_item in item.summary: | ||
| response_text = getattr(summary_item, "text", "") | ||
| reasoning_content = response_text if response_text else "" | ||
| pending_reasoning_item = _build_reasoning_item( | ||
| item_id=item.id, | ||
| encrypted_content=getattr(item, "encrypted_content", None), | ||
| summary_raw=item.summary, | ||
| ) | ||
| reasoning_content = " ".join( | ||
| s["text"] | ||
| for s in pending_reasoning_item["summary"] | ||
| if s.get("text") | ||
| ) |
There was a problem hiding this comment.
Non-streaming drops all but the last reasoning item
pending_reasoning_item is reassigned (not appended) on every ResponseReasoningItem encountered in output_items. If the Responses API ever returns more than one reasoning item in a single response, only the last one will appear on message.reasoning_items. The streaming path (completed_reasoning_items) correctly accumulates all items into a list, so the two paths are already inconsistent.
Consider accumulating in the same style as the streaming path:
pending_reasoning_items: List[Dict[str, Any]] = []
...
if isinstance(item, ResponseReasoningItem):
pending_reasoning_items.append(_build_reasoning_item(...))
reasoning_content = " ".join(...)Then when building the Message:
reasoning_items=cast(
Optional[List[ChatCompletionReasoningItem]],
pending_reasoning_items if pending_reasoning_items else None,
),|
|
||
| def _reasoning_item_to_response_input(r_item: Dict[str, Any]) -> Dict[str, Any]: | ||
| """Convert a stored ChatCompletionReasoningItem back to a Responses API input item.""" | ||
| r_input: Dict[str, Any] = { | ||
| "type": "reasoning", | ||
| "id": r_item.get("id") or f"rs_{id(r_item)}", | ||
| # summary is always required by the Responses API, even when empty | ||
| "summary": r_item.get("summary") or [], |
There was a problem hiding this comment.
Fallback ID is non-deterministic and non-round-trippable
"id": r_item.get("id") or f"rs_{id(r_item)}",id(r_item) returns the CPython object memory address, which changes on every process start or after JSON serialisation/deserialisation. If a client serialises the assistant message to JSON (e.g., in a database or across a REST hop) and then passes it back, r_item.get("id") will still be falsy but id(r_item) will be a different number than the one the OpenAI Responses API originally issued. The API then receives an unknown id, which may cause it to reject the request or silently fail to restore reasoning state.
Consider raising a clear error or warning instead:
item_id = r_item.get("id")
if not item_id:
import warnings
warnings.warn(
"reasoning_item is missing 'id'; the Responses API requires the "
"original id for encrypted_content to be valid.",
stacklevel=2,
)
item_id = ""
r_input: Dict[str, Any] = {
"type": "reasoning",
"id": item_id,
...
}
Summary
When
litellm.completion()routes to OpenAI via theopenai/responses/prefix, this change exposes OpenAI Responses API reasoning items on the Chat Completions-shaped response and accepts them back on assistant messages for the next turn.Behavior
message.reasoning_items(and streamingdelta.reasoning_itemson the final chunk) carryid,type,encrypted_content, andsummaryso clients can persist opaque reasoning state alongsidecontent.reasoning_contentremains the concatenated summary text when summaries are present.reasoning_items; LiteLLM maps them to the correct Responses API input items.summaryis always sent on that input (including[]), which matches the API when using stringreasoning_effort(e.g."low") where the model returns an empty summary but still returnsencrypted_content.streaming_handlertreats deltas withreasoning_itemsas non-empty so the terminal chunk is not dropped.