Litellm oss staging 03 14 2026 by RheagalFire · Pull Request #23686 · BerriAI/litellm

RheagalFire · 2026-03-15T08:37:13Z

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

vercel · 2026-03-15T08:37:19Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 15, 2026 8:52am

greptile-apps · 2026-03-15T08:42:34Z

Greptile Summary

This PR is a staging merge that ships two independent improvements: (1) stream_chunk_builder in litellm/main.py now merges annotations from all streaming chunks instead of discarding every chunk after the first, and (2) get_customer_user_header_from_mapping in litellm/proxy/auth/auth_utils.py now returns a list of all customer-role headers (in config order, lowercased) rather than stopping at the first match, enabling priority-ordered fallback across multiple configured customer headers.

Key observations:

Both changes are logically correct and well-exercised by new mock-only tests placed in tests/test_litellm/, satisfying the no-network-calls CI requirement.
The return-type widening of get_customer_user_header_from_mapping (str → list) is backwards-compatible within this codebase because the sole caller (get_end_user_id_from_request_body) was updated in the same commit with an isinstance(…, list) / elif isinstance(…, str) dispatch.
The tests/local_testing/test_auth_utils.py test now has a near-duplicate in tests/test_litellm/proxy/auth/test_auth_utils.py; the overlap is harmless but worth noting.
The PR description pre-submission checklist items are all unchecked and there is no linked issue with evidence of resolution per project contribution guidelines.

Confidence Score: 4/5

Safe to merge — both changes are correct, backwards-compatible, and covered by mock-only unit tests.
The annotation-merging fix in main.py is a straightforward and correct bug fix. The multi-header support in auth_utils.py changes the return type of an internal-only function and updates the single caller atomically, so there is no risk of breakage to callers outside this module. All new tests are properly mocked and will run in CI. Minor deductions for: unchecked pre-submission checklist, no linked issue, and minor trailing whitespace in the production code.
No files require special attention.

Important Files Changed

Filename	Overview
litellm/main.py	Bug fix: `stream_chunk_builder` now merges annotations from all streaming chunks instead of taking only the first annotation chunk. Logic is correct — the list comprehension already filters out `None` annotation values, and `extend()` correctly aggregates them.
litellm/proxy/auth/auth_utils.py	Enhancement: `get_customer_user_header_from_mapping` now returns `Optional[list]` of all customer-role header names (lowercased) instead of just the first match. `get_end_user_id_from_request_body` handles both `list` and `str` cases via isinstance checks. The single caller in this file is correctly updated. Trailing whitespace remains on lines 683 and 757.
tests/test_litellm/proxy/auth/test_auth_utils.py	New mock-only unit tests covering the multi-customer-header feature: null cases, single-match, multi-match ordering, fallback to deprecated `user_header_name`, and end-to-end `get_end_user_id_from_request_body` paths. No real network calls.
tests/local_testing/test_auth_utils.py	Single assertion updated to match new return type (`list` with lowercase header name). Existing test coverage for the same function also exists in `tests/test_litellm`, leading to some duplication.
tests/test_litellm/test_stream_chunk_builder_annotations.py	New mock-only tests for annotation merging: multi-chunk merge, single-chunk regression guard, and no-annotations base case. Correctly uses only local `litellm` types with no network calls.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[get_end_user_id_from_request_body] --> B{Standard headers present?}
    B -- Yes --> C[_get_customer_id_from_standard_headers]
    C -- Found --> Z[return customer_id]
    C -- Not found --> D{user_header_mappings in general_settings?}
    B -- No --> D
    D -- Yes --> E[get_customer_user_header_from_mapping\nreturns list of lowercased header names]
    D -- No --> F{user_header_name in general_settings?}
    E --> G{isinstance list?}
    G -- Yes --> H[Build headers_lower dict\nIterate expected_headers in order\nReturn first non-empty match]
    H --> Z
    H -- No match --> I[Check request body fields]
    F -- Yes --> J{isinstance str?}
    J -- Yes --> K[Case-insensitive scan\nReturn if non-empty]
    K --> Z
    K -- No match --> I
    I --> L{user field?}
    L -- Yes --> Z
    L -- No --> M{litellm_metadata.user?}
    M -- Yes --> Z
    M -- No --> N{metadata.user_id?}
    N -- Yes --> Z
    N -- No --> O{safety_identifier?}
    O -- Yes --> Z
    O -- No --> P[return None]

_{Last reviewed commit: dd1ea3d}

greptile-apps · 2026-03-15T08:42:38Z

litellm/router_strategy/tag_based_routing.py

+        if is_valid_deployment_tag(deployment_tags, request_tags, match_any):
+            matched_value = next(
+                (t for t in deployment_tags if t in set(request_tags)),
+                deployment_tags[0],
+            )
+            return {"matched_via": "tags", "matched_value": matched_value}
+
+    # 2. Regex match against request headers.
+    # When match_any=False and the deployment has both plain tags and tag_regex,
+    # the strict tag check has already failed (step 1 returned None).  Allow
+    # the regex to fire only when the deployment has NO plain tags, so we never
+    # use regex as a backdoor around the operator's strict-tag policy.
+    strict_tag_check_failed = (
+        not match_any and bool(deployment_tags) and bool(request_tags)
+    )
+    if deployment_tag_regex and header_strings and not strict_tag_check_failed:


Regex can bypass strict-tag policy when request_tags is empty

The inline comment says "Allow the regex to fire only when the deployment has NO plain tags," but the actual guard condition checks not match_any AND bool(deployment_tags) AND bool(request_tags). When a request carries no tags at all (request_tags is None or []), bool(request_tags) is False, so strict_tag_check_failed is always False — even when match_any=False and the deployment has both tags and tag_regex.

This means a deployment that an operator deliberately locks down with strict tag requirements (match_any=False + non-empty tags) can still be reached by any request that has a matching User-Agent, even if that request provides zero tags. The doc comment and the runtime behaviour disagree.

Suggested fix – make the guard honour the "no plain tags" intent from the comment:

# Allow regex only when the deployment has no plain tags # OR when the strict-tag check did NOT already run and fail. strict_tag_check_failed = ( not match_any and bool(deployment_tags) and bool(request_tags) ) # Only fire regex when the deployment has no plain tags, or the tag # check hasn't blocked it. regex_allowed = not deployment_tags or not strict_tag_check_failed if deployment_tag_regex and header_strings and regex_allowed:

Or, more simply, skip the regex entirely when the deployment has both tags and tag_regex and strict mode is on:

strict_tag_check_failed = not match_any and bool(deployment_tags)

(removing and bool(request_tags) so strict mode applies even when the request sends no tags).

greptile-apps · 2026-03-15T08:42:39Z

litellm/litellm_core_utils/default_encoding.py

+# Always default TIKTOKEN_CACHE_DIR to the bundled tokenizers directory
+# unless the user explicitly overrides it via CUSTOM_TIKTOKEN_CACHE_DIR.
+# This keeps tiktoken fully offline-capable by default (see #1071).
+custom_cache_dir = os.getenv("CUSTOM_TIKTOKEN_CACHE_DIR")
+if custom_cache_dir:
+    # If the user opts into a custom cache dir, ensure it exists.
+    os.makedirs(custom_cache_dir, exist_ok=True)
+    cache_dir = custom_cache_dir
+else:
+    cache_dir = filename
+
+os.environ["TIKTOKEN_CACHE_DIR"] = cache_dir  # use local copy of tiktoken b/c of - https://github.com/BerriAI/litellm/issues/1071
+
 import tiktoken


LITELLM_NON_ROOT env var silently removed – backwards-incompatible

The previous code honoured LITELLM_NON_ROOT=true to redirect the tiktoken cache to /tmp/tiktoken_cache for read-only, non-root Docker environments. This PR deletes that branch entirely, so any deployment that sets LITELLM_NON_ROOT=true (e.g., distroless or hardened containers) will silently receive no benefit and may hit permission errors if tiktoken tries to write to the read-only package directory.

Per the project's backwards-compatibility rule, behaviour that users rely on should not be removed without a migration path or feature flag.

Suggested approach – keep backward compat while cleaning up the logic:

Suggested change

# Always default TIKTOKEN_CACHE_DIR to the bundled tokenizers directory

# unless the user explicitly overrides it via CUSTOM_TIKTOKEN_CACHE_DIR.

# This keeps tiktoken fully offline-capable by default (see #1071).

custom_cache_dir = os.getenv("CUSTOM_TIKTOKEN_CACHE_DIR")

if custom_cache_dir:

# If the user opts into a custom cache dir, ensure it exists.

os.makedirs(custom_cache_dir, exist_ok=True)

cache_dir = custom_cache_dir

else:

cache_dir = filename

os.environ["TIKTOKEN_CACHE_DIR"] = cache_dir # use local copy of tiktoken b/c of - https://github.com/BerriAI/litellm/issues/1071

import tiktoken

custom_cache_dir = os.getenv("CUSTOM_TIKTOKEN_CACHE_DIR")

if custom_cache_dir:

os.makedirs(custom_cache_dir, exist_ok=True)

cache_dir = custom_cache_dir

else:

# Legacy: LITELLM_NON_ROOT=true redirects to /tmp when the bundled dir is read-only.

is_non_root = os.getenv("LITELLM_NON_ROOT", "").lower() == "true"

if is_non_root and not os.access(filename, os.W_OK):

cache_dir = "/tmp/tiktoken_cache"

os.makedirs(cache_dir, exist_ok=True)

else:

cache_dir = filename

Rule Used: What: avoid backwards-incompatible changes without... (source)

greptile-apps · 2026-03-15T08:42:40Z

tests/test_default_encoding_non_root.py

    monkeypatch.delenv("CUSTOM_TIKTOKEN_CACHE_DIR", raising=False)
+    for key, value in env_overrides.items():
+        monkeypatch.setenv(key, value)
+    importlib.reload(default_encoding)

-    # Mock os.access to return False (not writable)
-    # and mock os.makedirs to avoid actually creating /tmp/tiktoken_cache on local machine
-    with patch("os.access", return_value=False), patch("os.makedirs"):
-        # We need to reload or re-run the logic in default_encoding.py
-        # But since it's already executed, we'll just test the logic directly
-        # mirroring what we wrote in the file.
-
-        filename = (
-            "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/tokenizers"
-        )
-        is_non_root = os.getenv("LITELLM_NON_ROOT", "").lower() == "true"

-        if not os.access(filename, os.W_OK) and is_non_root:
-            filename = "/tmp/tiktoken_cache"
-            # mock_makedirs(filename, exist_ok=True)
+def test_default_encoding_uses_bundled_tokenizers_by_default(monkeypatch):
+    """
+    TIKTOKEN_CACHE_DIR should point at the bundled tokenizers directory
+    when no CUSTOM_TIKTOKEN_CACHE_DIR is set, even in non-root environments.
+    """
+    _reload_default_encoding(monkeypatch, LITELLM_NON_ROOT="true")



Mock updated to match new (potentially broken) behaviour

The original test_tiktoken_cache_fallback test specifically verified that LITELLM_NON_ROOT=true causes a fallback to /tmp/tiktoken_cache in read-only environments — a regression guard for non-root Docker deployments. The replacement test (test_default_encoding_uses_bundled_tokenizers_by_default) now asserts the opposite behaviour: that LITELLM_NON_ROOT=true has no effect.

If the intent of this PR is to intentionally drop support for LITELLM_NON_ROOT, the test change is consistent — but per the mock-integrity rule, changing a test to match code that introduces a potential regression (breaking existing deployments) is the pattern to avoid. The old mock was a valid regression detector and should be kept (or the feature should be preserved with a deprecation notice) rather than silently deleted.

Rule Used: # Code Review Rule: Mock Test Integrity

What:... (source)

Previously, stream_chunk_builder only took annotations from the first chunk that contained them, losing any annotations from later chunks. This is a problem because providers like Gemini/Vertex AI send grounding metadata (converted to annotations) in the final streaming chunk, while other providers may spread annotations across multiple chunks. Changes: - Collect and merge annotations from ALL annotation-bearing chunks instead of only using the first one

* added the header mapping feature * added tests * final cleanup * final cleanup * added missing test and logic * fixed header sending bug * Update litellm/proxy/auth/auth_utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * added back init file in responses + fixed test_auth_utils.py int local_testing --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

codspeed-hq · 2026-03-15T08:52:57Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing litellm_oss_staging_03_14_2026 (dd1ea3d) with main (548e7eb)}

vercel bot deployed to Preview March 15, 2026 08:38 View deployment

greptile-apps bot reviewed Mar 15, 2026

View reviewed changes

bbarwik and others added 2 commits March 15, 2026 14:20

RheagalFire force-pushed the litellm_oss_staging_03_14_2026 branch from 8e5c3ed to dd1ea3d Compare March 15, 2026 08:51

vercel bot deployed to Preview March 15, 2026 08:52 View deployment

RheagalFire requested a review from Sameerlite March 16, 2026 07:15

Sameerlite approved these changes Mar 16, 2026

View reviewed changes

Sameerlite merged commit 3dccdde into main Mar 16, 2026
94 of 99 checks passed

ishaan-berri deleted the litellm_oss_staging_03_14_2026 branch March 26, 2026 22:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Litellm oss staging 03 14 2026#23686

Litellm oss staging 03 14 2026#23686
Sameerlite merged 2 commits intomainfrom
litellm_oss_staging_03_14_2026

RheagalFire commented Mar 15, 2026

Uh oh!

vercel bot commented Mar 15, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 15, 2026 •

edited

Loading

Important Files Changed

Uh oh!

greptile-apps bot Mar 15, 2026

Uh oh!

greptile-apps bot Mar 15, 2026

Uh oh!

greptile-apps bot Mar 15, 2026

Uh oh!

codspeed-hq bot commented Mar 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

RheagalFire commented Mar 15, 2026

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Type

Changes

Uh oh!

vercel bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

codspeed-hq bot commented Mar 15, 2026

Merging this PR will not alter performance

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vercel bot commented Mar 15, 2026 •

edited

Loading

greptile-apps bot commented Mar 15, 2026 •

edited

Loading