Fix thinking blocks dropped when thinking field is null#24070
Fix thinking blocks dropped when thinking field is null#2407071 commits merged intoBerriAI:litellm_oss_staging_03_18_2026from
Conversation
… failures Pass-through endpoint failures fired both async_failure_handler and async_post_call_failure_hook, causing duplicate logs in callback integrations. Add pass-through guards to the failure path, matching the existing success path behavior.
…st_call Model-level guardrails (litellm_params.guardrails on a deployment) were only merged into request metadata in the streaming post_call path (async_post_call_streaming_hook) but not in the non-streaming path (post_call_success_hook). This caused should_run_guardrail to skip the guardrail because the guardrail name was never added to metadata.guardrails. Add the same _check_and_merge_model_level_guardrails call to post_call_success_hook before the guardrail callback loop, mirroring the streaming path. Fixes model-level guardrails silently not firing for non-streaming post_call requests.
Address Greptile review — resolve sys.path relative to the test file location instead of the process working directory.
- Added "Web Search Integration" to the integrations sidebar for better navigation. - Updated authors in multiple blog posts to use shorthand references for consistency. - Corrected links in various documentation files to ensure proper navigation. - Improved clarity in load test documentation and related settings. These changes aim to streamline user experience and maintain consistency across the documentation.
….json - Fix provider count header: 4 -> 5 new providers - Fix nebius/zai: add bridged endpoint support (messages, responses, a2a, interactions) - Add missing sagemaker_nova to provider_endpoints_support.json
- Fix duplicate docs: RBAC and MCP Troubleshooting cross-links in secondary positions - Add observability_index for Integrations - Update guides, integrations, learn, tutorials index pages Made-with: Cursor
docs: sidebar QA fixes and index updates
Add Vitest + RTL tests for HelpLink, DebugWarningBanner, ExportFormatSelector, ExportTypeSelector, ExportSummary, MetricCard, PolicySelect, ComplexityRouterConfig, RateLimitTypeFormItem, and AgentCardGrid (67 tests total). Co-Authored-By: Claude Opus 4.6 <[email protected]>
[Test] UI: Add unit tests for 10 untested components
…tracking order - Fix Letta Resources links: proxy, SDK (#litellm-python-sdk), observability, correct Letta docs URL - Add Google GenAI SDK to Agent SDKs, remove from AI Tools - Move Track Usage for Coding Tools to end of AI Tools section - Remove Letta from Agent SDKs sidebar - Guides, Learn, Tutorials index updates Made-with: Cursor
… and indexes Add org admin support to /v2/team/list so org admins can list teams within their organizations instead of getting 401. Also enrich the response with members_count and add missing indexes. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Fix org admin own-query regression: always check org admin status before the standard route check so own-queries see all org teams - Clear user_id when org admin is detected so org scope replaces user-membership scope - Remove dead isinstance(organization_id, list) branch - Remove unused datetime import - Remove orphaned _convert_teams_to_response helper Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- _get_org_admin_org_ids: catch only ValueError (user not found) instead of bare Exception — DB errors now propagate as 500s instead of silently demoting org admins to regular users - _build_team_list_where_conditions: return None (not a sentinel string) when user has no team memberships; list_team_v2 short-circuits to empty response without hitting the DB - Org admin + team_id + user_id: use exact team_id match with org scope instead of OR expansion that effectively ignored the team_id filter - Org admin + user_id (no team_id): OR(org teams, direct memberships) now matches legacy _authorize_and_filter_teams behaviour Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…rails-non-streaming-postcall fix(proxy): model-level guardrails not executing for non-streaming post_call
…icate-failure-logs fix(proxy): prevent duplicate callback logs for pass-through endpoint failures
list_team_v2 had 51 statements (limit 50). Extract the team-to-response-model conversion loop into a helper function to satisfy ruff PLR0915. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
docs: Revamp documentation site with new navigation, landing pages, and styling
…ropagation Fix langfuse otel traceparent propagation
* feat(xai): add grok-4.20 beta 2 models with pricing (#23900) Add three grok-4.20 beta 2 model variants from xAI: - grok-4.20-multi-agent-beta-0309 (reasoning + multi-agent) - grok-4.20-beta-0309-reasoning (reasoning) - grok-4.20-beta-0309-non-reasoning Pricing (from https://docs.x.ai/docs/models): - Input: $2.00/1M tokens ($0.20/1M cached) - Output: $6.00/1M tokens - Context: 2M tokens All variants support vision, function calling, tool choice, and web search. Closes LIT-2171 * docs: add Quick Install section for litellm --setup wizard (#23905) * docs: add Quick Install section for litellm --setup wizard * docs: clarify setup wizard is for local/beginner use * feat(setup): interactive setup wizard + install.sh (#23644) * feat(setup): add interactive setup wizard + install.sh Adds `litellm --setup` — a Claude Code-style TUI onboarding wizard that guides users through provider selection, API key entry, and proxy config generation, then optionally starts the proxy immediately. - litellm/setup_wizard.py: wizard with ASCII art, numbered provider menu (OpenAI, Anthropic, Azure, Gemini, Bedrock, Ollama), API key prompts, port/master-key config, and litellm_config.yaml generation - litellm/proxy/proxy_cli.py: adds --setup flag that invokes the wizard - scripts/install.sh: curl-installable script (detect OS/Python, pip install litellm[proxy], launch wizard) Usage: curl -fsSL https://raw.githubusercontent.com/BerriAI/litellm/main/scripts/install.sh | sh litellm --setup * fix(install.sh): remove orange color, add LITELLM_BRANCH env var for branch installs * fix(install.sh): install from git branch so --setup is available for QA * fix(install.sh): remove stale LITELLM_BRANCH reference that caused unbound variable error * fix(install.sh): force-reinstall from git to bypass cached PyPI version * fix(install.sh): show pip progress bar during install * fix(install.sh): always launch wizard via $PYTHON_BIN -m litellm, not PATH binary * fix(install.sh): use litellm.proxy.proxy_cli module (no __main__.py exists) * fix(install.sh): suppress RuntimeWarning from module invocation * fix(install.sh): use Python bin-dir litellm binary to avoid CWD sys.path shadowing * fix(install.sh): use sysconfig.get_path('scripts') to find pip-installed litellm binary * fix(install.sh): redirect stdin from /dev/tty on exec so wizard gets terminal, not exhausted pipe * fix(install.sh): warn about git clone duration, drop --no-cache-dir so re-runs are faster * feat(setup_wizard): arrow-key selector, updated model names * fix(setup_wizard): use sysconfig binary to start proxy, not python -m litellm * feat(setup_wizard): credential validation after key entry + clear next-steps after proxy start * style(install.sh): show git clone warning in blue * refactor(setup_wizard): class with static methods, use check_valid_key from litellm.utils * address greptile review: fix yaml escaping, port validation, display name collisions, tests - setup_wizard.py: add _yaml_escape() for safe YAML embedding of API keys - setup_wizard.py: add _styled_input() with readline ANSI ignore markers - setup_wizard.py: change DIVIDER to _divider() fn to avoid import-time color capture - setup_wizard.py: validate port range 1-65535, initialize before loop - setup_wizard.py: qualify azure display names (azure-gpt-4o) to avoid collision with openai - setup_wizard.py: work on env_copy in _build_config to avoid mutating caller's dict - setup_wizard.py: skip model_list entries for providers with no credentials - setup_wizard.py: prompt for azure deployment name - setup_wizard.py: wrap os.execlp in try/except with friendly fallback - setup_wizard.py: wrap config write in try/except OSError - setup_wizard.py: fix _validate_and_report to use two print lines (no \r overwrite) - setup_wizard.py: add .gitignore tip next to key storage notice - setup_wizard.py: fix run_setup_wizard() return type annotation to None - scripts/install.sh: drop pipefail (not supported by dash on Ubuntu when invoked as sh) - scripts/install.sh: use litellm[proxy] from PyPI (not hardcoded dev branch) - scripts/install.sh: guard /dev/tty read with -r check for Docker/CI compat - scripts/install.sh: remove --force-reinstall to avoid downgrading dependencies - tests/test_litellm/test_setup_wizard.py: 13 unit tests for _build_config and _yaml_escape * style: black format setup_wizard.py * fix: address remaining greptile issues - Windows compat, YAML quoting, credential flow - guard termios/tty imports with try/except ImportError for Windows compat - quote master_key as YAML double-quoted scalar (same as env vars) - remove unused port param from _build_config signature - _validate_and_report now returns the final key so re-entered creds are stored - add test for master_key YAML quoting * fix: add --port to suggested command, guard /dev/tty exec in install.sh * fix: quote api_base in YAML, skip azure if no deployment, only redraw on state change * fix: address greptile review comments - _yaml_escape: add control character escaping (\n, \r, \t) - test: fix tautological assertion in test_build_config_azure_no_deployment_skipped - test: add tests for control character escaping in _yaml_escape * feat(ui): remove Chat UI page link and banner from sidebar and playground (#23908) * feat(guardrails): MCPJWTSigner - built-in guardrail for zero trust MCP auth (#23897) * Allow pre_mcp_call guardrail hooks to mutate outbound MCP headers * Enhance MCPServerManager to support hook-modified arguments and extra headers. Update tests to validate argument mutation and header injection behavior, including warnings for OpenAPI-backed servers when headers are present. * Refactor MCPServerManager to raise HTTPException for extra headers in OpenAPI-backed servers. Update tests to reflect this change, ensuring proper exception handling instead of logging warnings. * Allow pre_mcp_call guardrail hooks to mutate outbound MCP headers * Enhance MCPServerManager to support hook-modified arguments and extra headers. Update tests to validate argument mutation and header injection behavior, including warnings for OpenAPI-backed servers when headers are present. * Refactor MCPServerManager to raise HTTPException for extra headers in OpenAPI-backed servers. Update tests to reflect this change, ensuring proper exception handling instead of logging warnings. * feat(guardrails): add MCPJWTSigner built-in guardrail for zero trust MCP auth Signs outbound MCP tool calls with a LiteLLM-issued RS256 JWT so MCP servers can trust a single signing authority instead of every upstream IdP. Enable in config.yaml: guardrails: - guardrail_name: mcp-jwt-signer litellm_params: guardrail: mcp_jwt_signer mode: pre_mcp_call default_on: true JWT carries sub (user_id), act.sub (team_id, RFC 8693), tool-level scope, iss, aud, iat/exp/nbf. RSA-2048 keypair auto-generated at startup unless MCP_JWT_SIGNING_KEY env var is set. Adds /.well-known/jwks.json endpoint and jwks_uri to /.well-known/openid-configuration so MCP servers can verify LiteLLM-issued tokens via OIDC discovery. * Update MCPServerManager to raise HTTPException with status code 400 for extra headers in OpenAPI-backed servers. Adjust tests to verify the correct status code and exception message. * fix: address P1 issues in MCPJWTSigner - OpenAPI servers: warn + skip header injection instead of 500 - JWKS Cache-Control: 5min for auto-generated keys, 1h for persistent - sub claim: fallback to apikey:{token_hash} for anonymous callers - ttl_seconds: validate > 0 at init time * docs: add MCP zero trust auth guide with architecture diagram * docs: add FastMCP JWT verification guide to zero trust doc * fix: address remaining Greptile review issues (round 2) - mcp_server_manager: warn when hook Authorization overwrites existing header - __init__: remove _mcp_jwt_signer_instance from __all__ (private internal) - discoverable_endpoints: copy dict instead of mutating in-place on OIDC augmentation - test docstring: reflect warn-and-continue behavior for OpenAPI servers - test: update scope assertions for least-privilege (no mcp:tools/list on tool-call JWTs) * fix: address Greptile round 3 feedback - initialize_guardrail: validate mode='pre_mcp_call' at init time — misconfigured mode silently bypasses JWT injection, which is a zero-trust bypass - _build_claims: remove duplicate inline 'import re' (module-level import already present) - _types.py: add TODO comment explaining jwt_claims is forward-compat plumbing for a follow-up PR that will forward upstream IdP claims into outbound MCP JWTs * feat(mcp_jwt_signer): add verify+re-sign, claim ops, two-token model, configurable scopes Addresses all missing pieces from the scoping doc review: FR-5 (Verify + re-sign): MCPJWTSigner now accepts access_token_discovery_uri and token_introspection_endpoint. When set, the incoming Bearer token is extracted from raw_headers (threaded through pre_call_tool_check), verified against the IdP's JWKS (JWT) or introspected (opaque), and only re-signed if valid. Falls back to user_api_key_dict.jwt_claims for LiteLLM JWT-auth mode. FR-12 (Configurable end-user identity mapping): end_user_claim_sources ordered list drives sub resolution — sources: token:<claim>, litellm:user_id, litellm:email, litellm:end_user_id, litellm:team_id. FR-13 (Claim operations): add_claims (insert-if-absent), set_claims (always override), remove_claims (delete) applied in that order. FR-14 (Two-token model): channel_token_audience + channel_token_ttl issue a second JWT injected as x-mcp-channel-token: Bearer <token>. FR-15 (Incoming claim validation): required_claims raises HTTP 403 when any listed claim is absent; optional_claims passes listed claims from verified token into the outbound JWT. FR-9 (Debug headers): debug_headers: true emits x-litellm-mcp-debug with kid, sub, iss, exp, scope. FR-10 (Configurable scopes): allowed_scopes replaces auto-generation. Also fixed: tool-call JWTs no longer grant mcp:tools/list (overpermission). P1 fixes: - proxy/utils.py: _convert_mcp_hook_response_to_kwargs merges rather than replaces extra_headers, preserving headers from prior guardrails. - mcp_server_manager.py: warns when hook injects Authorization alongside a server-configured authentication_token (previously silent). - mcp_server_manager.py: pre_call_tool_check now accepts raw_headers and extracts incoming_bearer_token so FR-5 verification has the raw token. - proxy/utils.py: remove stray inline import inspect inside loop (pre-existing lint error, now cleaned up). Tests: 43 passing (28 new tests covering all FR flags + P1 fixes). * feat(mcp_jwt_signer): add verify+re-sign, claim ops, two-token model, configurable scopes (core) Remaining files from the FR implementation: mcp_jwt_signer.py — full rewrite with all new params: FR-5: access_token_discovery_uri, token_introspection_endpoint, verify_issuer, verify_audience + _verify_incoming_jwt(), _introspect_opaque_token() FR-12: end_user_claim_sources ordered resolution chain FR-13: add_claims, set_claims, remove_claims FR-14: channel_token_audience, channel_token_ttl → x-mcp-channel-token FR-15: required_claims (raises 403), optional_claims (passthrough) FR-9: debug_headers → x-litellm-mcp-debug FR-10: allowed_scopes; tool-call JWTs no longer over-grant tools/list mcp_server_manager.py: - pre_call_tool_check gains raw_headers param to extract incoming_bearer_token - Silent Authorization override warning fixed: now fires when server has authentication_token AND hook injects Authorization tests/test_mcp_jwt_signer.py: 28 new tests covering all FR flags + P1 fixes (43 total, all passing) * fix(mcp_jwt_signer): address pre-landing review issues - Remove stale TODO comment on UserAPIKeyAuth.jwt_claims — the field is already populated and consumed by MCPJWTSigner in the same PR - Fix _get_oidc_discovery to only cache the OIDC discovery doc when jwks_uri is present; a malformed/empty doc now retries on the next request instead of being permanently cached until proxy restart - Add FR-5 test coverage for _fetch_jwks (cache hit/miss), _get_oidc_discovery (cache/no-cache on bad doc), _verify_incoming_jwt (valid token, expired token), _introspect_opaque_token (active, inactive, no endpoint), and the end-to-end 401 hook path — 53 tests total, all passing * docs(mcp_zero_trust): rewrite as use-case guide covering all new JWT signer features Add scenario-driven sections for each new config area: - Verify+re-sign with Okta/Azure AD (access_token_discovery_uri, end_user_claim_sources, token_introspection_endpoint) - Enforcing caller attributes with required_claims / optional_claims - Adding metadata via add_claims / set_claims / remove_claims - Two-token model for AWS Bedrock AgentCore Gateway (channel_token_audience / channel_token_ttl) - Controlling scopes with allowed_scopes - Debugging JWT rejections with debug_headers Update JWT claims table to reflect configurable sub (end_user_claim_sources) * fix(mcp_jwt_signer): wire all config.yaml params through initialize_guardrail The factory was only passing issuer/audience/ttl_seconds to MCPJWTSigner. All FR-5/9/10/12/13/14/15 params (access_token_discovery_uri, end_user_claim_sources, add/set/remove_claims, channel_token_audience, required/optional_claims, debug_headers, allowed_scopes, etc.) were silently dropped, making every advertised advanced feature non-functional when loaded from config.yaml. Add regression test that asserts every param is wired through correctly. * docs(mcp_zero_trust): add hero image * docs(mcp_zero_trust): apply Linear-style edits - Lead with the problem (unsigned direct calls bypass access controls) - Shorter statement section headers instead of question-form headers - Move diagram/OIDC discovery block after the reader is bought in - Add 'read further only if you need to' callout after basic setup - Two-token section now opens from the user problem not product jargon - Add concrete 403 error response example in required_claims section - Debug section opens from the symptom (MCP server returning 401) - Lowercase claims reference header for consistency * fix(mcp_jwt_signer): fix algorithm confusion attack + add OIDC discovery 24h TTL - Remove alg from unverified JWT header; use signing_jwk.algorithm_name from JWKS key instead. Reading alg from attacker-controlled headers enables alg:none / HS256 confusion attacks. - Add _oidc_discovery_fetched_at timestamp and _OIDC_DISCOVERY_TTL = 86400 (24h). Without a TTL the cached discovery doc never refreshes, so IdP key rotation is invisible. --------- Co-authored-by: Noah Nistler <[email protected]> * fix(ci): stabilize CI - formatting, type errors, test polling, security CVEs, router bug, batch resolution Fix 1: Run Black formatter on 35 files Fix 2: Fix MyPy type errors: - setup_wizard.py: add type annotation for 'selected' set variable - user_api_key_auth.py: remove redundant type annotation on jwt_claims reassignment Fix 3: Fix spend accuracy test burst 2 polling to wait for expected total spend instead of just 'any increase' from burst 2 Fix 4: Bump Next.js 16.1.6 -> 16.1.7 to fix CVE-2026-27978, CVE-2026-27979, CVE-2026-27980, CVE-2026-29057 Fix 5: Fix router _pre_call_checks model variable being overwritten inside loop, causing wrong model lookups on subsequent deployments. Use local _deployment_model variable instead. Fix 6: Add missing resolve_output_file_ids_to_unified call in batch retrieve non-terminal-to-terminal path (matching the terminal path behavior) Co-authored-by: Ishaan Jaff <[email protected]> * chore: regenerate poetry.lock to sync with pyproject.toml Co-authored-by: Ishaan Jaff <[email protected]> * fix: format merged files from main and regenerate poetry.lock Co-authored-by: Ishaan Jaff <[email protected]> * fix(mypy): annotate jwt_claims as Optional[dict] to fix type incompatibility Co-authored-by: Ishaan Jaff <[email protected]> * fix(ci): update router region test to use gpt-4.1-mini (fix flaky model lookup) Replace deprecated gpt-3.5-turbo-1106 with gpt-4.1-mini + mock_response in test_router_region_pre_call_check, following the same pattern used in commit 717d37c for test_router_context_window_check_pre_call_check_out_group. Co-authored-by: Ishaan Jaff <[email protected]> * ci: retry flaky logging_testing (async event loop race condition) Co-authored-by: Ishaan Jaff <[email protected]> * fix(ci): aggregate all mock calls in langfuse e2e test to fix race condition The _verify_langfuse_call helper only inspected the last mock call (mock_post.call_args), but the Langfuse SDK may split trace-create and generation-create events across separate HTTP flush cycles. This caused an IndexError when the last call's batch contained only one event type. Fix: iterate over mock_post.call_args_list to collect batch items from ALL calls. Also add a safety assertion after filtering by trace_id and mark all langfuse e2e tests with @pytest.mark.flaky(retries=3) as an extra safety net for any residual timing issues. Co-authored-by: Ishaan Jaff <[email protected]> * fix(ci): black formatting + update OpenAPI compliance tests for spec changes - Apply Black 26.x formatting to litellm_logging.py (parenthesized style) - Update test_input_types_match_spec to follow $ref to InteractionsInput schema (Google updated their OpenAPI spec to use $ref instead of inline oneOf) - Update test_content_schema_uses_discriminator to handle discriminator without explicit mapping (Google removed the mapping key from Content discriminator) Co-authored-by: Ishaan Jaff <[email protected]> * revert: undo incorrect Black 26.x formatting on litellm_logging.py The file was correctly formatted for Black 23.12.1 (the version pinned in pyproject.toml). The previous commit applied Black 26.x formatting which was incompatible with the CI's Black version. Co-authored-by: Ishaan Jaff <[email protected]> * fix(ci): deduplicate and sort langfuse batch events after aggregation The Langfuse SDK may send the same event (e.g., trace-create) in multiple flush cycles, causing duplicates when we aggregate from all mock calls. After filtering by trace_id, deduplicate by keeping only the first event of each type, then sort to ensure trace-create is at index 0 and generation-create at index 1. Co-authored-by: Ishaan Jaff <[email protected]> --------- Co-authored-by: Noah Nistler <[email protected]> Co-authored-by: Cursor Agent <[email protected]> Co-authored-by: Ishaan Jaff <[email protected]>
[Infra] Merge daily branch with main
The check `content.get("thinking", None) is not None` incorrectly
drops thinking blocks when the `thinking` key is explicitly null or
absent. Changed to `content.get("type") == "thinking"` to match
the fix already applied in the experimental pass-through path (PR #15501).
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes a silent data-loss bug where Anthropic extended-thinking blocks were dropped during response parsing whenever the Key changes:
One minor style inconsistency: the neighbouring Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| litellm/llms/anthropic/chat/transformation.py | Single-line fix: changes detection of thinking blocks from checking the presence/value of the thinking key to checking content.get("type") == "thinking", correctly capturing blocks where the thinking field is null or absent. |
| tests/test_litellm/llms/anthropic/chat/test_anthropic_chat_transformation.py | Adds a regression test covering the three cases for the thinking block fix: thinking=null, thinking key absent, and thinking with real content. All tests use mocked data with no real network calls. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Anthropic response content block] --> B{content type?}
B -- "text" --> C[Append to text_content]
B -- "tool_use / server_tool_use" --> D[Convert to OpenAI tool call]
B -- "*_tool_result" --> E[Handle tool results / web_search / web_fetch]
B -- "thinking" --> F["Add to thinking_blocks list\n(FIXED: detect by type, not by thinking key value)"]
B -- "redacted_thinking" --> G[Add to thinking_blocks list]
B -- "compaction" --> H[Add to compaction_blocks]
F --> I{thinking_blocks != None?}
G --> I
I -- "Yes" --> J["Aggregate reasoning_content\n(skips null thinking values safely)"]
I -- "No" --> K[reasoning_content stays None]
J --> L[Return 8-tuple]
K --> L
Last reviewed commit: "Fixed thinking block..."
8b4ed36
into
BerriAI:litellm_oss_staging_03_18_2026
Summary
thinkingkey isnullor absent from the content block.content.get("thinking", None) is not Nonetocontent.get("type") == "thinking"inlitellm/llms/anthropic/chat/transformation.py(line 1525), matching the fix already applied in the experimental pass-through path by PR Add support for thinking blocks and redacted thinking blocks in Anthropic v1/messages API #15501.thinking=null,thinkingkey absent, andthinkingwith actual content.Test plan
test_extract_response_content_thinking_block_null_thinkingcovering three cases:thinkingkey explicitly set tonullthinkingkey entirely absentthinkingkey with actual text contentFixes #24026
🤖 Generated with Claude Code