Skip to content

Litellm oss staging 03 05 2026#22844

Merged
yuneng-jiang merged 28 commits intomainfrom
litellm_oss_staging_03_05_2026
Mar 21, 2026
Merged

Litellm oss staging 03 05 2026#22844
yuneng-jiang merged 28 commits intomainfrom
litellm_oss_staging_03_05_2026

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented Mar 5, 2026

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

Chesars and others added 8 commits February 17, 2026 22:52
…onses API

Read summary from the original thinking dict instead of hardcoding "detailed"
in _route_openai_thinking_to_responses_api_if_needed(). This preserves the
user's chosen summary value (e.g. "concise", "auto") for non-Claude models
routed through the Anthropic Messages adapter to OpenAI's Responses API.

Fixes #20998
Remove hardcoded summary="detailed" injection — summary is opt-in per
OpenAI spec and increases costs. Users opt-in per-request via LiteLLM
extension: thinking={"type": "enabled", "budget_tokens": N, "summary": "concise"}.

Also preserve summary in translate_thinking_for_model() which previously
dropped it when converting thinking → reasoning_effort for non-Claude models.

Fixes #20998
…ry in translate_anthropic_to_openai

- Remove redundant isinstance(thinking, dict) check in handler.py since
  early return on line 64 guarantees thinking is a dict at that point
- Preserve summary in translate_anthropic_to_openai() for consistency
  across all code paths (adapter, guardrail, main.py)
- Fix translate_thinking_to_reasoning in responses_adapters/transformation.py
  to make summary opt-in (was hardcoded to "detailed")
- Update e2e test to mock litellm.responses (new OpenAI routing path)
- Add tests for Responses API adapter summary preservation
- Resolve merge conflict in test file
…mmary

fix(anthropic): preserve thinking.summary when routing to OpenAI Responses API
…t docs

Document the `summary` optional field in the `thinking` object for the
Anthropic `/v1/messages` adapter, and add a section on summary preservation
when routing to non-Anthropic providers via the adapter.
docs: add thinking.summary field to /v1/messages and reasoning docs
)

* fix(gemini): ensure image token accumulation in usage metadata

Fixed an issue where image tokens were being overwritten instead of accumulated in Gemini responses. Added support for both camelCase and snake_case token count keys. Fixes #22082.

* test: add regression test for image token accumulation and cleanup files

* fix(gemini): ensure consistent accumulation for responseTokensDetails

* fix(gemini): harden token count parsing and add vertex accumulation test

Parse tokenCount/token_count as int-safe values to satisfy mypy and avoid None/object arithmetic. Add regression test for duplicate modality accumulation in Vertex _calculate_usage.
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 21, 2026 10:04pm

Request Review

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 5, 2026

Greptile Summary

This PR introduces a reasoning_auto_summary opt-in flag that controls whether "summary": "detailed" is automatically injected when translating Anthropic thinking parameters to OpenAI reasoning_effort for non-Claude model routing. It also fixes Gemini/Vertex AI usage token counting to accumulate across multiple modality entries instead of overwriting, adds case-insensitive modality normalization, and refactors translate_anthropic_to_openai into focused helper methods.

Key changes:

  • New litellm/llms/anthropic/experimental_pass_through/utils.py with is_reasoning_auto_summary_enabled() helper reading from litellm.reasoning_auto_summary (defaults to False) or LITELLM_REASONING_AUTO_SUMMARY env var
  • translate_thinking_for_model, _translate_thinking_to_openai, translate_thinking_to_reasoning, and _route_openai_thinking_to_responses_api_if_needed all updated consistently to use the new flag
  • Gemini promptTokensDetails, responseTokensDetails, candidatesTokensDetails, and cacheTokensDetails parsing now accumulates += instead of overwriting =, fixing double-entry modality responses
  • translate_anthropic_to_openai decomposed into _translate_metadata_to_openai, _translate_tool_choice_to_openai, _translate_tools_to_openai, _translate_thinking_to_openai, _translate_output_format_to_openai, and _copy_untranslated_anthropic_params

Concerns:

  • The change from always-on summary: detailed injection to opt-in False default is backwards-incompatible for users routing Anthropic thinking requests through non-Claude models: they will silently stop receiving reasoning content in responses until they set reasoning_auto_summary: true. Per the project policy, the safer default would be True with an opt-out path.
  • "litellm_metadata" is excluded from translatable_anthropic_params() but is removed via .pop() inside _translate_metadata_to_openai. This implicit ordering dependency between private helpers is fragile if the methods are ever called independently.

Confidence Score: 3/5

  • Merging as-is will silently break existing users routing Anthropic thinking requests to non-Claude models, who will stop receiving reasoning content by default.
  • The Gemini token accumulation fix and code refactoring are solid and well-tested. However, flipping the reasoning_auto_summary default from effectively True to False is a backwards-incompatible behavioral change — existing users relying on automatic summary: detailed injection will silently lose reasoning content in their responses without any migration guidance in the PR description or changelog entry. Additionally, the implicit ordering dependency created by the litellm_metadata pop in _translate_metadata_to_openai introduces fragility in the new helper-method structure.
  • litellm/llms/anthropic/experimental_pass_through/utils.py and litellm/llms/anthropic/experimental_pass_through/adapters/transformation.py need attention due to the backwards-incompatible default change and the litellm_metadata ordering dependency.

Important Files Changed

Filename Overview
litellm/llms/anthropic/experimental_pass_through/utils.py New utility introducing is_reasoning_auto_summary_enabled() flag; defaults to False, making the previous always-on summary: detailed injection a backwards-incompatible opt-in change.
litellm/llms/anthropic/experimental_pass_through/adapters/transformation.py Major refactor of translate_anthropic_to_openai into smaller helpers; adds reasoning_auto_summary opt-in logic; litellm_metadata not in translatable_anthropic_params creates fragile ordering dependency.
litellm/llms/anthropic/experimental_pass_through/adapters/handler.py Correctly gates summary: detailed injection behind is_reasoning_auto_summary_enabled() and respects user-provided summary field in thinking dict.
litellm/llms/anthropic/experimental_pass_through/responses_adapters/transformation.py Switches translate_thinking_to_reasoning from always-injecting summary: detailed to opt-in via is_reasoning_auto_summary_enabled(); consistent with other paths.
litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py Fixes token count accumulation for multi-entry modality details (TEXT/AUDIO/IMAGE/VIDEO), adds safe .upper() normalization, adds token_count key fallback — correct and well-tested.
litellm/llms/gemini/image_generation/transformation.py Applies same accumulation and normalization fixes for image generation usage metadata; consistent with Gemini chat completion changes.
litellm/proxy/proxy_server.py Purely formatting/style changes (line-wrapping long expressions to fit linting rules); no logic changes.
litellm/proxy/common_request_processing.py Purely formatting changes; no logic changes.
tests/test_litellm/llms/anthropic/experimental_pass_through/messages/test_anthropic_experimental_pass_through_messages_handler.py Extensive new mock-based tests for reasoning_auto_summary flag behavior; existing assertions were updated to match new defaults, reducing coverage of the previously guaranteed summary injection path.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Anthropic /v1/messages with thinking"] --> B{Is Claude model?}
    B -- Yes --> C["Pass thinking as-is to litellm.completion"]
    B -- No --> D["translate_anthropic_thinking_to_reasoning_effort\nbudget_tokens maps to effort string"]
    D --> E{User-provided summary in thinking?}
    E -- Yes --> F["reasoning_effort = effort plus summary"]
    E -- No --> G{is_reasoning_auto_summary_enabled}
    G -- True --> H["reasoning_effort = effort plus summary:detailed"]
    G -- False --> I["reasoning_effort = plain effort string\nNo summary - OpenAI skips reasoning text"]
    F --> J["litellm.completion or litellm.responses"]
    H --> J
    I --> J
    J --> K["OpenAI or non-Claude model"]
Loading

Comments Outside Diff (3)

  1. litellm/llms/anthropic/experimental_pass_through/utils.py, line 7-11 (link)

    Backwards-incompatible default behavior change

    Previously, "summary": "detailed" was unconditionally injected when translating Anthropic thinking into OpenAI reasoning_effort for non-Claude models. This PR flips the default to False (reasoning_auto_summary defaults to False in litellm/__init__.py). Any existing deployment that relied on receiving reasoning-summary text from non-Claude models via the /v1/messages adapter will silently stop receiving summaries after this upgrade, because the OpenAI Responses API only returns reasoning content when summary is explicitly set.

    A user-controlled flag (reasoning_auto_summary) is provided to restore the old behavior, but users must explicitly opt in. Per the project policy on backwards-incompatible changes, the safer approach would be to keep the old default (reasoning_auto_summary: bool = True) and let users opt out if the old behavior causes problems, rather than requiring all existing users to update their configuration to preserve current behavior.

    Rule Used: What: avoid backwards-incompatible changes without... (source)

  2. litellm/llms/anthropic/experimental_pass_through/adapters/transformation.py, line 308-320 (link)

    litellm_metadata not in translatable_anthropic_params creates fragile ordering dependency

    _translate_metadata_to_openai uses .pop("litellm_metadata", ...) to remove litellm_metadata from the request dict before _copy_untranslated_anthropic_params iterates over the remaining keys. However, "litellm_metadata" is not listed in translatable_anthropic_params().

    This means:

    • If _translate_metadata_to_openai is NOT called first (e.g. if someone calls _copy_untranslated_anthropic_params in isolation), litellm_metadata will be forwarded as-is to the downstream LLM call — potentially causing unexpected parameter errors.
    • The .pop() side-effect creates an implicit ordering requirement between these two private methods that is not enforced by the code and not documented.

    Consider adding "litellm_metadata" to translatable_anthropic_params() to make the contract explicit:

    return [
        "messages",
        "metadata",
        "litellm_metadata",
        "system",
        "tool_choice",
        "tools",
        "thinking",
        "output_format",
    ]
  3. tests/test_litellm/llms/anthropic/experimental_pass_through/messages/test_anthropic_experimental_pass_through_messages_handler.py, line 231-234 (link)

    Test assertion weakened to match new default behavior

    This assertion was previously {"reasoning_effort": {"effort": "minimal", "summary": "detailed"}} (when summary was always injected). It was changed to {"reasoning_effort": "minimal"} to match the new opt-in default. While this correctly reflects the new intended behavior, it is worth noting that this is a coverage reduction for the prior behavior path — any regression that accidentally re-enables unconditional injection would not be caught here.

    Consider keeping a complementary assertion that the summary key is absent in the default case (which is already done on line 234 with assert "summary" not in str(result["reasoning_effort"])), but also explicitly asserting the type is str, not dict:

    Rule Used: What: Flag any modifications to existing tests and... (source)

Last reviewed commit: "fix: apply Black for..."

Chesars and others added 3 commits March 5, 2026 11:58
Add `litellm.disable_default_reasoning_summary` flag (default False) and
env var `LITELLM_DISABLE_DEFAULT_REASONING_SUMMARY` to allow users to
opt out of the automatic `summary="detailed"` injection when routing
Anthropic thinking requests to OpenAI's Responses API.

Default behavior is preserved (summary="detailed" is always added),
but users who don't want to pay for summary tokens can now disable it.

https://claude.ai/code/session_01VJU9EwVvgvmeCe3Yu1aULa
…issing env var test

- Extract duplicated summary_disabled evaluation from handler.py and
  transformation.py into a shared is_default_reasoning_summary_disabled()
  helper in utils.py to prevent future divergence.
- Add test_summary_excluded_when_env_var_set to handler test class to
  close env-var test coverage gap flagged by Greptile.
…t-M4Yic

feat(anthropic): add opt-out flag for default reasoning summary
Chesars and others added 2 commits March 5, 2026 12:22
…ry injection + add docs

- Update translate_thinking_for_model (3rd code path) to inject
  summary="detailed" by default, consistent with the other two paths
- Add disable_default_reasoning_summary flag check via shared helper
- Add tests for flag enabled/disabled and user-provided summary
- Document disable_default_reasoning_summary in reasoning_content.md
# Conflicts:
#	tests/test_litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py
Remove 8 development scripts from scripts/ that were accidentally
committed. Remove unused `import litellm` from
responses_adapters/transformation.py.
@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 19, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing litellm_oss_staging_03_05_2026 (e3b62c0) with main (2b889f1)

Open in CodSpeed

…w test exceptions

Address Greptile review feedback:
1. Replace opt-out `disable_default_reasoning_summary` with existing opt-in
   `reasoning_auto_summary` flag — avoids backwards-incompatible change where
   all users routing thinking-enabled requests would silently get a changed
   reasoning_effort shape (string -> dict) on upgrade.
2. Add default summary injection to `_translate_thinking_to_openai` — this path
   was the only one missing it, causing inconsistent behavior for
   litellm.completion() callers using the Anthropic adapter.
3. Narrow `except Exception` to `except (ValueError, TypeError, AttributeError)`
   in tests to avoid masking genuine failures.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Addresses Greptile feedback that test assertions were weakened when
removing summary: "detailed" expectations — now every default-behavior
test explicitly asserts that "summary" is absent from the result.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@yuneng-jiang yuneng-jiang merged commit 7b31ea4 into main Mar 21, 2026
38 of 57 checks passed
@ishaan-berri ishaan-berri deleted the litellm_oss_staging_03_05_2026 branch March 26, 2026 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants