Skip to content

Litellm oss staging 03 14 2026#23686

Merged
Sameerlite merged 2 commits intomainfrom
litellm_oss_staging_03_14_2026
Mar 16, 2026
Merged

Litellm oss staging 03 14 2026#23686
Sameerlite merged 2 commits intomainfrom
litellm_oss_staging_03_14_2026

Conversation

@RheagalFire
Copy link
Copy Markdown
Contributor

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 15, 2026 8:52am

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 15, 2026

Greptile Summary

This PR is a staging merge that ships two independent improvements: (1) stream_chunk_builder in litellm/main.py now merges annotations from all streaming chunks instead of discarding every chunk after the first, and (2) get_customer_user_header_from_mapping in litellm/proxy/auth/auth_utils.py now returns a list of all customer-role headers (in config order, lowercased) rather than stopping at the first match, enabling priority-ordered fallback across multiple configured customer headers.

Key observations:

  • Both changes are logically correct and well-exercised by new mock-only tests placed in tests/test_litellm/, satisfying the no-network-calls CI requirement.
  • The return-type widening of get_customer_user_header_from_mapping (str → list) is backwards-compatible within this codebase because the sole caller (get_end_user_id_from_request_body) was updated in the same commit with an isinstance(…, list) / elif isinstance(…, str) dispatch.
  • The tests/local_testing/test_auth_utils.py test now has a near-duplicate in tests/test_litellm/proxy/auth/test_auth_utils.py; the overlap is harmless but worth noting.
  • The PR description pre-submission checklist items are all unchecked and there is no linked issue with evidence of resolution per project contribution guidelines.

Confidence Score: 4/5

  • Safe to merge — both changes are correct, backwards-compatible, and covered by mock-only unit tests.
  • The annotation-merging fix in main.py is a straightforward and correct bug fix. The multi-header support in auth_utils.py changes the return type of an internal-only function and updates the single caller atomically, so there is no risk of breakage to callers outside this module. All new tests are properly mocked and will run in CI. Minor deductions for: unchecked pre-submission checklist, no linked issue, and minor trailing whitespace in the production code.
  • No files require special attention.

Important Files Changed

Filename Overview
litellm/main.py Bug fix: stream_chunk_builder now merges annotations from all streaming chunks instead of taking only the first annotation chunk. Logic is correct — the list comprehension already filters out None annotation values, and extend() correctly aggregates them.
litellm/proxy/auth/auth_utils.py Enhancement: get_customer_user_header_from_mapping now returns Optional[list] of all customer-role header names (lowercased) instead of just the first match. get_end_user_id_from_request_body handles both list and str cases via isinstance checks. The single caller in this file is correctly updated. Trailing whitespace remains on lines 683 and 757.
tests/test_litellm/proxy/auth/test_auth_utils.py New mock-only unit tests covering the multi-customer-header feature: null cases, single-match, multi-match ordering, fallback to deprecated user_header_name, and end-to-end get_end_user_id_from_request_body paths. No real network calls.
tests/local_testing/test_auth_utils.py Single assertion updated to match new return type (list with lowercase header name). Existing test coverage for the same function also exists in tests/test_litellm, leading to some duplication.
tests/test_litellm/test_stream_chunk_builder_annotations.py New mock-only tests for annotation merging: multi-chunk merge, single-chunk regression guard, and no-annotations base case. Correctly uses only local litellm types with no network calls.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[get_end_user_id_from_request_body] --> B{Standard headers present?}
    B -- Yes --> C[_get_customer_id_from_standard_headers]
    C -- Found --> Z[return customer_id]
    C -- Not found --> D{user_header_mappings in general_settings?}
    B -- No --> D
    D -- Yes --> E[get_customer_user_header_from_mapping\nreturns list of lowercased header names]
    D -- No --> F{user_header_name in general_settings?}
    E --> G{isinstance list?}
    G -- Yes --> H[Build headers_lower dict\nIterate expected_headers in order\nReturn first non-empty match]
    H --> Z
    H -- No match --> I[Check request body fields]
    F -- Yes --> J{isinstance str?}
    J -- Yes --> K[Case-insensitive scan\nReturn if non-empty]
    K --> Z
    K -- No match --> I
    I --> L{user field?}
    L -- Yes --> Z
    L -- No --> M{litellm_metadata.user?}
    M -- Yes --> Z
    M -- No --> N{metadata.user_id?}
    N -- Yes --> Z
    N -- No --> O{safety_identifier?}
    O -- Yes --> Z
    O -- No --> P[return None]
Loading

Last reviewed commit: dd1ea3d

Comment on lines +97 to +112
if is_valid_deployment_tag(deployment_tags, request_tags, match_any):
matched_value = next(
(t for t in deployment_tags if t in set(request_tags)),
deployment_tags[0],
)
return {"matched_via": "tags", "matched_value": matched_value}

# 2. Regex match against request headers.
# When match_any=False and the deployment has both plain tags and tag_regex,
# the strict tag check has already failed (step 1 returned None). Allow
# the regex to fire only when the deployment has NO plain tags, so we never
# use regex as a backdoor around the operator's strict-tag policy.
strict_tag_check_failed = (
not match_any and bool(deployment_tags) and bool(request_tags)
)
if deployment_tag_regex and header_strings and not strict_tag_check_failed:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex can bypass strict-tag policy when request_tags is empty

The inline comment says "Allow the regex to fire only when the deployment has NO plain tags," but the actual guard condition checks not match_any AND bool(deployment_tags) AND bool(request_tags). When a request carries no tags at all (request_tags is None or []), bool(request_tags) is False, so strict_tag_check_failed is always False — even when match_any=False and the deployment has both tags and tag_regex.

This means a deployment that an operator deliberately locks down with strict tag requirements (match_any=False + non-empty tags) can still be reached by any request that has a matching User-Agent, even if that request provides zero tags. The doc comment and the runtime behaviour disagree.

Suggested fix – make the guard honour the "no plain tags" intent from the comment:

# Allow regex only when the deployment has no plain tags
# OR when the strict-tag check did NOT already run and fail.
strict_tag_check_failed = (
    not match_any and bool(deployment_tags) and bool(request_tags)
)
# Only fire regex when the deployment has no plain tags, or the tag
# check hasn't blocked it.
regex_allowed = not deployment_tags or not strict_tag_check_failed
if deployment_tag_regex and header_strings and regex_allowed:

Or, more simply, skip the regex entirely when the deployment has both tags and tag_regex and strict mode is on:

strict_tag_check_failed = not match_any and bool(deployment_tags)

(removing and bool(request_tags) so strict mode applies even when the request sends no tags).

Comment on lines +18 to 31
# Always default TIKTOKEN_CACHE_DIR to the bundled tokenizers directory
# unless the user explicitly overrides it via CUSTOM_TIKTOKEN_CACHE_DIR.
# This keeps tiktoken fully offline-capable by default (see #1071).
custom_cache_dir = os.getenv("CUSTOM_TIKTOKEN_CACHE_DIR")
if custom_cache_dir:
# If the user opts into a custom cache dir, ensure it exists.
os.makedirs(custom_cache_dir, exist_ok=True)
cache_dir = custom_cache_dir
else:
cache_dir = filename

os.environ["TIKTOKEN_CACHE_DIR"] = cache_dir # use local copy of tiktoken b/c of - https://github.com/BerriAI/litellm/issues/1071

import tiktoken
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LITELLM_NON_ROOT env var silently removed – backwards-incompatible

The previous code honoured LITELLM_NON_ROOT=true to redirect the tiktoken cache to /tmp/tiktoken_cache for read-only, non-root Docker environments. This PR deletes that branch entirely, so any deployment that sets LITELLM_NON_ROOT=true (e.g., distroless or hardened containers) will silently receive no benefit and may hit permission errors if tiktoken tries to write to the read-only package directory.

Per the project's backwards-compatibility rule, behaviour that users rely on should not be removed without a migration path or feature flag.

Suggested approach – keep backward compat while cleaning up the logic:

Suggested change
# Always default TIKTOKEN_CACHE_DIR to the bundled tokenizers directory
# unless the user explicitly overrides it via CUSTOM_TIKTOKEN_CACHE_DIR.
# This keeps tiktoken fully offline-capable by default (see #1071).
custom_cache_dir = os.getenv("CUSTOM_TIKTOKEN_CACHE_DIR")
if custom_cache_dir:
# If the user opts into a custom cache dir, ensure it exists.
os.makedirs(custom_cache_dir, exist_ok=True)
cache_dir = custom_cache_dir
else:
cache_dir = filename
os.environ["TIKTOKEN_CACHE_DIR"] = cache_dir # use local copy of tiktoken b/c of - https://github.com/BerriAI/litellm/issues/1071
import tiktoken
custom_cache_dir = os.getenv("CUSTOM_TIKTOKEN_CACHE_DIR")
if custom_cache_dir:
os.makedirs(custom_cache_dir, exist_ok=True)
cache_dir = custom_cache_dir
else:
# Legacy: LITELLM_NON_ROOT=true redirects to /tmp when the bundled dir is read-only.
is_non_root = os.getenv("LITELLM_NON_ROOT", "").lower() == "true"
if is_non_root and not os.access(filename, os.W_OK):
cache_dir = "/tmp/tiktoken_cache"
os.makedirs(cache_dir, exist_ok=True)
else:
cache_dir = filename

Rule Used: What: avoid backwards-incompatible changes without... (source)

Comment on lines 14 to 26
monkeypatch.delenv("CUSTOM_TIKTOKEN_CACHE_DIR", raising=False)
for key, value in env_overrides.items():
monkeypatch.setenv(key, value)
importlib.reload(default_encoding)

# Mock os.access to return False (not writable)
# and mock os.makedirs to avoid actually creating /tmp/tiktoken_cache on local machine
with patch("os.access", return_value=False), patch("os.makedirs"):
# We need to reload or re-run the logic in default_encoding.py
# But since it's already executed, we'll just test the logic directly
# mirroring what we wrote in the file.

filename = (
"/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/tokenizers"
)
is_non_root = os.getenv("LITELLM_NON_ROOT", "").lower() == "true"

if not os.access(filename, os.W_OK) and is_non_root:
filename = "/tmp/tiktoken_cache"
# mock_makedirs(filename, exist_ok=True)
def test_default_encoding_uses_bundled_tokenizers_by_default(monkeypatch):
"""
TIKTOKEN_CACHE_DIR should point at the bundled tokenizers directory
when no CUSTOM_TIKTOKEN_CACHE_DIR is set, even in non-root environments.
"""
_reload_default_encoding(monkeypatch, LITELLM_NON_ROOT="true")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mock updated to match new (potentially broken) behaviour

The original test_tiktoken_cache_fallback test specifically verified that LITELLM_NON_ROOT=true causes a fallback to /tmp/tiktoken_cache in read-only environments — a regression guard for non-root Docker deployments. The replacement test (test_default_encoding_uses_bundled_tokenizers_by_default) now asserts the opposite behaviour: that LITELLM_NON_ROOT=true has no effect.

If the intent of this PR is to intentionally drop support for LITELLM_NON_ROOT, the test change is consistent — but per the mock-integrity rule, changing a test to match code that introduces a potential regression (breaking existing deployments) is the pattern to avoid. The old mock was a valid regression detector and should be kept (or the feature should be preserved with a deprecation notice) rather than silently deleted.

Rule Used: # Code Review Rule: Mock Test Integrity

What:... (source)

bbarwik and others added 2 commits March 15, 2026 14:20
Previously, stream_chunk_builder only took annotations from the first
chunk that contained them, losing any annotations from later chunks.

This is a problem because providers like Gemini/Vertex AI send grounding
metadata (converted to annotations) in the final streaming chunk, while
other providers may spread annotations across multiple chunks.

Changes:
- Collect and merge annotations from ALL annotation-bearing chunks
  instead of only using the first one
* added the header mapping feature

* added tests

* final cleanup

* final cleanup

* added missing test and logic

* fixed header sending bug

* Update litellm/proxy/auth/auth_utils.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* added back init file in responses + fixed test_auth_utils.py  int local_testing

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 15, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing litellm_oss_staging_03_14_2026 (dd1ea3d) with main (548e7eb)

Open in CodSpeed

@RheagalFire RheagalFire requested a review from Sameerlite March 16, 2026 07:15
@Sameerlite Sameerlite merged commit 3dccdde into main Mar 16, 2026
94 of 99 checks passed
@ishaan-berri ishaan-berri deleted the litellm_oss_staging_03_14_2026 branch March 26, 2026 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants