Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR is a staging merge that ships two independent improvements: (1) Key observations:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/main.py | Bug fix: stream_chunk_builder now merges annotations from all streaming chunks instead of taking only the first annotation chunk. Logic is correct — the list comprehension already filters out None annotation values, and extend() correctly aggregates them. |
| litellm/proxy/auth/auth_utils.py | Enhancement: get_customer_user_header_from_mapping now returns Optional[list] of all customer-role header names (lowercased) instead of just the first match. get_end_user_id_from_request_body handles both list and str cases via isinstance checks. The single caller in this file is correctly updated. Trailing whitespace remains on lines 683 and 757. |
| tests/test_litellm/proxy/auth/test_auth_utils.py | New mock-only unit tests covering the multi-customer-header feature: null cases, single-match, multi-match ordering, fallback to deprecated user_header_name, and end-to-end get_end_user_id_from_request_body paths. No real network calls. |
| tests/local_testing/test_auth_utils.py | Single assertion updated to match new return type (list with lowercase header name). Existing test coverage for the same function also exists in tests/test_litellm, leading to some duplication. |
| tests/test_litellm/test_stream_chunk_builder_annotations.py | New mock-only tests for annotation merging: multi-chunk merge, single-chunk regression guard, and no-annotations base case. Correctly uses only local litellm types with no network calls. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[get_end_user_id_from_request_body] --> B{Standard headers present?}
B -- Yes --> C[_get_customer_id_from_standard_headers]
C -- Found --> Z[return customer_id]
C -- Not found --> D{user_header_mappings in general_settings?}
B -- No --> D
D -- Yes --> E[get_customer_user_header_from_mapping\nreturns list of lowercased header names]
D -- No --> F{user_header_name in general_settings?}
E --> G{isinstance list?}
G -- Yes --> H[Build headers_lower dict\nIterate expected_headers in order\nReturn first non-empty match]
H --> Z
H -- No match --> I[Check request body fields]
F -- Yes --> J{isinstance str?}
J -- Yes --> K[Case-insensitive scan\nReturn if non-empty]
K --> Z
K -- No match --> I
I --> L{user field?}
L -- Yes --> Z
L -- No --> M{litellm_metadata.user?}
M -- Yes --> Z
M -- No --> N{metadata.user_id?}
N -- Yes --> Z
N -- No --> O{safety_identifier?}
O -- Yes --> Z
O -- No --> P[return None]
Last reviewed commit: dd1ea3d
| if is_valid_deployment_tag(deployment_tags, request_tags, match_any): | ||
| matched_value = next( | ||
| (t for t in deployment_tags if t in set(request_tags)), | ||
| deployment_tags[0], | ||
| ) | ||
| return {"matched_via": "tags", "matched_value": matched_value} | ||
|
|
||
| # 2. Regex match against request headers. | ||
| # When match_any=False and the deployment has both plain tags and tag_regex, | ||
| # the strict tag check has already failed (step 1 returned None). Allow | ||
| # the regex to fire only when the deployment has NO plain tags, so we never | ||
| # use regex as a backdoor around the operator's strict-tag policy. | ||
| strict_tag_check_failed = ( | ||
| not match_any and bool(deployment_tags) and bool(request_tags) | ||
| ) | ||
| if deployment_tag_regex and header_strings and not strict_tag_check_failed: |
There was a problem hiding this comment.
Regex can bypass strict-tag policy when request_tags is empty
The inline comment says "Allow the regex to fire only when the deployment has NO plain tags," but the actual guard condition checks not match_any AND bool(deployment_tags) AND bool(request_tags). When a request carries no tags at all (request_tags is None or []), bool(request_tags) is False, so strict_tag_check_failed is always False — even when match_any=False and the deployment has both tags and tag_regex.
This means a deployment that an operator deliberately locks down with strict tag requirements (match_any=False + non-empty tags) can still be reached by any request that has a matching User-Agent, even if that request provides zero tags. The doc comment and the runtime behaviour disagree.
Suggested fix – make the guard honour the "no plain tags" intent from the comment:
# Allow regex only when the deployment has no plain tags
# OR when the strict-tag check did NOT already run and fail.
strict_tag_check_failed = (
not match_any and bool(deployment_tags) and bool(request_tags)
)
# Only fire regex when the deployment has no plain tags, or the tag
# check hasn't blocked it.
regex_allowed = not deployment_tags or not strict_tag_check_failed
if deployment_tag_regex and header_strings and regex_allowed:Or, more simply, skip the regex entirely when the deployment has both tags and tag_regex and strict mode is on:
strict_tag_check_failed = not match_any and bool(deployment_tags)(removing and bool(request_tags) so strict mode applies even when the request sends no tags).
| # Always default TIKTOKEN_CACHE_DIR to the bundled tokenizers directory | ||
| # unless the user explicitly overrides it via CUSTOM_TIKTOKEN_CACHE_DIR. | ||
| # This keeps tiktoken fully offline-capable by default (see #1071). | ||
| custom_cache_dir = os.getenv("CUSTOM_TIKTOKEN_CACHE_DIR") | ||
| if custom_cache_dir: | ||
| # If the user opts into a custom cache dir, ensure it exists. | ||
| os.makedirs(custom_cache_dir, exist_ok=True) | ||
| cache_dir = custom_cache_dir | ||
| else: | ||
| cache_dir = filename | ||
|
|
||
| os.environ["TIKTOKEN_CACHE_DIR"] = cache_dir # use local copy of tiktoken b/c of - https://github.com/BerriAI/litellm/issues/1071 | ||
|
|
||
| import tiktoken |
There was a problem hiding this comment.
LITELLM_NON_ROOT env var silently removed – backwards-incompatible
The previous code honoured LITELLM_NON_ROOT=true to redirect the tiktoken cache to /tmp/tiktoken_cache for read-only, non-root Docker environments. This PR deletes that branch entirely, so any deployment that sets LITELLM_NON_ROOT=true (e.g., distroless or hardened containers) will silently receive no benefit and may hit permission errors if tiktoken tries to write to the read-only package directory.
Per the project's backwards-compatibility rule, behaviour that users rely on should not be removed without a migration path or feature flag.
Suggested approach – keep backward compat while cleaning up the logic:
| # Always default TIKTOKEN_CACHE_DIR to the bundled tokenizers directory | |
| # unless the user explicitly overrides it via CUSTOM_TIKTOKEN_CACHE_DIR. | |
| # This keeps tiktoken fully offline-capable by default (see #1071). | |
| custom_cache_dir = os.getenv("CUSTOM_TIKTOKEN_CACHE_DIR") | |
| if custom_cache_dir: | |
| # If the user opts into a custom cache dir, ensure it exists. | |
| os.makedirs(custom_cache_dir, exist_ok=True) | |
| cache_dir = custom_cache_dir | |
| else: | |
| cache_dir = filename | |
| os.environ["TIKTOKEN_CACHE_DIR"] = cache_dir # use local copy of tiktoken b/c of - https://github.com/BerriAI/litellm/issues/1071 | |
| import tiktoken | |
| custom_cache_dir = os.getenv("CUSTOM_TIKTOKEN_CACHE_DIR") | |
| if custom_cache_dir: | |
| os.makedirs(custom_cache_dir, exist_ok=True) | |
| cache_dir = custom_cache_dir | |
| else: | |
| # Legacy: LITELLM_NON_ROOT=true redirects to /tmp when the bundled dir is read-only. | |
| is_non_root = os.getenv("LITELLM_NON_ROOT", "").lower() == "true" | |
| if is_non_root and not os.access(filename, os.W_OK): | |
| cache_dir = "/tmp/tiktoken_cache" | |
| os.makedirs(cache_dir, exist_ok=True) | |
| else: | |
| cache_dir = filename |
Rule Used: What: avoid backwards-incompatible changes without... (source)
| monkeypatch.delenv("CUSTOM_TIKTOKEN_CACHE_DIR", raising=False) | ||
| for key, value in env_overrides.items(): | ||
| monkeypatch.setenv(key, value) | ||
| importlib.reload(default_encoding) | ||
|
|
||
| # Mock os.access to return False (not writable) | ||
| # and mock os.makedirs to avoid actually creating /tmp/tiktoken_cache on local machine | ||
| with patch("os.access", return_value=False), patch("os.makedirs"): | ||
| # We need to reload or re-run the logic in default_encoding.py | ||
| # But since it's already executed, we'll just test the logic directly | ||
| # mirroring what we wrote in the file. | ||
|
|
||
| filename = ( | ||
| "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/tokenizers" | ||
| ) | ||
| is_non_root = os.getenv("LITELLM_NON_ROOT", "").lower() == "true" | ||
|
|
||
| if not os.access(filename, os.W_OK) and is_non_root: | ||
| filename = "/tmp/tiktoken_cache" | ||
| # mock_makedirs(filename, exist_ok=True) | ||
| def test_default_encoding_uses_bundled_tokenizers_by_default(monkeypatch): | ||
| """ | ||
| TIKTOKEN_CACHE_DIR should point at the bundled tokenizers directory | ||
| when no CUSTOM_TIKTOKEN_CACHE_DIR is set, even in non-root environments. | ||
| """ | ||
| _reload_default_encoding(monkeypatch, LITELLM_NON_ROOT="true") | ||
|
|
There was a problem hiding this comment.
Mock updated to match new (potentially broken) behaviour
The original test_tiktoken_cache_fallback test specifically verified that LITELLM_NON_ROOT=true causes a fallback to /tmp/tiktoken_cache in read-only environments — a regression guard for non-root Docker deployments. The replacement test (test_default_encoding_uses_bundled_tokenizers_by_default) now asserts the opposite behaviour: that LITELLM_NON_ROOT=true has no effect.
If the intent of this PR is to intentionally drop support for LITELLM_NON_ROOT, the test change is consistent — but per the mock-integrity rule, changing a test to match code that introduces a potential regression (breaking existing deployments) is the pattern to avoid. The old mock was a valid regression detector and should be kept (or the feature should be preserved with a deprecation notice) rather than silently deleted.
Rule Used: # Code Review Rule: Mock Test Integrity
What:... (source)
Previously, stream_chunk_builder only took annotations from the first chunk that contained them, losing any annotations from later chunks. This is a problem because providers like Gemini/Vertex AI send grounding metadata (converted to annotations) in the final streaming chunk, while other providers may spread annotations across multiple chunks. Changes: - Collect and merge annotations from ALL annotation-bearing chunks instead of only using the first one
* added the header mapping feature * added tests * final cleanup * final cleanup * added missing test and logic * fixed header sending bug * Update litellm/proxy/auth/auth_utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * added back init file in responses + fixed test_auth_utils.py int local_testing --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
8e5c3ed to
dd1ea3d
Compare
Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test
Changes