fix(proxy): enforce budget limits across multi-pod deployments via Redis-backed spend counters#24682
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR introduces Redis-backed, cross-pod spend counters to fix multi-pod budget over-enforcement (the N×budget problem), where each pod independently tracked spend against the full budget. A new Key changes and observations:
Confidence Score: 4/5Safe to merge for key/team/team-member budget enforcement; user-level multi-pod budgets remain unaddressed and the Redis TTL behavior warrants documentation or a long-lived TTL override. All four P0/P1 concerns from prior review rounds are addressed. Remaining findings are P2: the user budget gap is architecturally incomplete but explicitly out of scope, the TTL concern is a soft reliability edge case, and the None-guard inconsistency has a safe fallback. litellm/proxy/proxy_server.py (Redis TTL for spend counters), litellm/proxy/auth/auth_checks.py (user budget gap, None-guard consistency)
|
| Filename | Overview |
|---|---|
| litellm/proxy/proxy_server.py | Adds spend_counter_cache global, get_current_spend(), increment_spend_counters(), and _init_and_increment_spend_counter() helpers; wires redis_cache in _init_cache(). Race condition on cold-start seeding addressed with increment-vs-set strategy. Redis key TTL defaults to the general cache TTL. |
| litellm/proxy/auth/auth_checks.py | Updated _virtual_key_max_budget_check, _team_max_budget_check, and _check_team_member_budget to read spend via get_current_spend(); user-level budget check in common_checks still reads user_object.spend directly (out of PR scope). Missing None-guard for valid_token.user_id in _check_team_member_budget counter key construction. |
| litellm/proxy/auth/user_api_key_auth.py | Early team-member budget check updated to use get_current_spend() with correct None-guard for user_id and team_id before constructing counter key. |
| litellm/proxy/common_utils/reset_budget_job.py | Budget reset now explicitly resets Redis counters with warning logging on failure; team-member counters reset by fetching affected memberships. Direct Redis reset bypasses DualCache to prevent silent stale-counter issue. |
| litellm/proxy/hooks/proxy_track_cost_callback.py | increment_spend_counters is now awaited before the fire-and-forget update_cache task, ensuring Redis counters are updated before the next auth check. |
| tests/test_litellm/proxy/auth/test_auth_checks.py | Four new async tests for key/team/team-member budget checks using mocked get_current_spend; no real network calls. |
| tests/test_litellm/proxy/test_proxy_server.py | Four new tests for get_current_spend Redis-first behavior, in-memory fallback, counter init+increment, and team/member tracking. All use mocked or in-memory backends. |
Reviews (5): Last reviewed commit: "fix(proxy): enforce budget limits across..." | Re-trigger Greptile
| current = await spend_counter_cache.async_get_cache(key=counter_key) | ||
| if current is None: | ||
| source = await user_api_key_cache.async_get_cache(key=source_cache_key) | ||
| base_spend = 0.0 | ||
| if source is not None: | ||
| if isinstance(source, dict): | ||
| base_spend = source.get("spend", 0.0) or 0.0 | ||
| else: | ||
| base_spend = getattr(source, "spend", 0.0) or 0.0 | ||
| await spend_counter_cache.async_set_cache( | ||
| key=counter_key, value=base_spend | ||
| ) | ||
|
|
||
| await spend_counter_cache.async_increment_cache( | ||
| key=counter_key, value=increment | ||
| ) |
There was a problem hiding this comment.
Race condition on counter initialization under-counts spend
There is a check-then-set race between concurrent pods at cold start. Both pods may see current is None simultaneously, each read the same base_spend from user_api_key_cache, then one pod's async_set_cache will overwrite the Redis key after the other pod has already incremented it — dropping that increment.
Example sequence (two pods, both cold on the same token, base_spend=5.0):
- Pod A:
async_get_cache→None - Pod B:
async_get_cache→None - Pod A:
set_cache(5.0)→ Redis = 5.0 - Pod A:
increment_cache(+0.30)→ Redis = 5.30 - Pod B:
set_cache(5.0)→ Redis = 5.0 (overwrites 5.30) - Pod B:
increment_cache(+0.40)→ Redis = 5.40
Final: 5.40 instead of 5.70 — 0.30 is silently lost. During a burst of concurrent cold-start requests this can under-count spend by multiple increments, defeating the budget enforcement goal.
The safest Redis-native fix is to skip the separate set_cache entirely and rely on a Lua SET key base NX + INCRBYFLOAT combination (SET only if not exists). A pragmatic alternative is to omit the initialization step and use INCRBYFLOAT starting from 0, then reconcile against the DB spend value in get_current_spend rather than seeding the counter from the DB object.
| # Reset the cross-pod spend counter | ||
| from litellm.proxy.proxy_server import spend_counter_cache | ||
|
|
||
| if item_type == "key" and hasattr(item, "token") and item.token is not None: | ||
| await spend_counter_cache.async_set_cache( | ||
| key=f"spend:key:{item.token}", value=0.0 | ||
| ) | ||
| elif item_type == "team" and hasattr(item, "team_id") and item.team_id is not None: | ||
| await spend_counter_cache.async_set_cache( | ||
| key=f"spend:team:{item.team_id}", value=0.0 | ||
| ) |
There was a problem hiding this comment.
Team member spend counters are never reset
_reset_budget_common is called with item_type in {"key", "team", "user"}. The code resets Redis counters for keys and teams, but there is no reset for team member counters under the spend:team_member:{user_id}:{team_id} key pattern.
This means that after a budget period resets (e.g., monthly), the Redis counter for a team member retains its accumulated value. On the next request that team member will be treated as having already spent their previous period's amount, and they will be permanently rate-limited until the Redis counter naturally expires or is manually cleared.
The reset job does not have a _reset_budget_for_team_member path at the moment, so team member counters have no reset path at all. This is a material gap given the PR's stated goal of reliable multi-pod budget enforcement.
litellm/proxy/proxy_server.py
Outdated
| # 1. Try Redis first (cross-pod authoritative) | ||
| if spend_counter_cache.redis_cache is not None: | ||
| try: | ||
| val = await spend_counter_cache.redis_cache.async_get_cache( | ||
| key=counter_key | ||
| ) | ||
| if val is not None: | ||
| return float(val) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
Silent Redis exception swallowing makes failures invisible
The bare except Exception: pass swallows all Redis read errors with no log output. When Redis is temporarily unavailable this silently degrades to in-memory (per-pod) spend enforcement without any indication, making it very hard to diagnose why budgets are not being enforced after a Redis outage. Consider logging at debug level so that operators can trace the fallback path.
There was a problem hiding this comment.
Security Issues
- Sensitive information exposure (proxy API key) in Redis/cache keys
The new spend counter logic conditionally hashes the token only when it starts with "sk-". If a raw proxy API key does not use this prefix, it will be written into Redis/in-memory cache keys as plaintext (e.g., spend:key:<raw_token>). This violates the project's requirement that proxy API keys must be hashed before any storage/use outside of the creation flow and risks secret leakage through cache backends, metrics, or operational tooling that enumerates keys.
Recommendations
- Always use the hashed token for cache key construction. Avoid heuristic prefix checks. Ensure the caller always provides a hashed token, or add a robust, explicit indicator to detect already-hashed tokens. Never place raw proxy API keys into cache keys, values, logs, or error messages.
| ): | ||
| """ | ||
| Atomically increment spend counters for budget enforcement. | ||
|
|
There was a problem hiding this comment.
The spend counter uses a heuristic to decide whether to hash the proxy API key based on the prefix \"sk-\". If a raw proxy API key does not start with \"sk-\", its raw value will be embedded directly into Redis/in-memory cache keys (e.g., spend:key:<raw_token>), exposing a secret in the cache namespace. This violates the project's proxy key handling rules and creates a realistic secret leakage path via cache backends or any tooling that lists keys.
Vulnerable code:
if token is not None:
hashed_token = (
hash_token(token=token)
if isinstance(token, str) and token.startswith("sk-")
else token
)
await _init_and_increment_spend_counter(
counter_key=f"spend:key:{hashed_token}",
source_cache_key=hashed_token,
increment=response_cost,
)Impact:
- Raw proxy API keys may be stored and retrievable from Redis or logs/metrics that capture key names, enabling unauthorized use of the proxy by anyone with access to the cache backend or operational logs.
Remediation:
- Always use the hashed token for cache key construction. Do not rely on string-prefix heuristics. Ensure
tokenis consistently a hashed value at this point in the flow, or perform a robust check/normalization that guarantees hashing of any raw proxy key before use. Also ensure any other paths that set or read spend keys use the hashed form consistently.
For more details, see the finding in Corridor.
Provide feedback: Reply with whether this is a valid vulnerability or false positive to help improve Corridor's accuracy.
83fc320 to
f0b8774
Compare
| if item_type == "key" and hasattr(item, "token") and item.token is not None: | ||
| await spend_counter_cache.async_set_cache( | ||
| key=f"spend:key:{item.token}", value=0.0 | ||
| ) | ||
| elif item_type == "team" and hasattr(item, "team_id") and item.team_id is not None: | ||
| await spend_counter_cache.async_set_cache( | ||
| key=f"spend:team:{item.team_id}", value=0.0 | ||
| ) |
There was a problem hiding this comment.
Silent Redis counter reset failure causes permanent over-enforcement
spend_counter_cache.async_set_cache(...) internally swallows Redis write errors — its own try/except catches exceptions and logs without re-raising. This means when Redis has a transient error during a budget reset, the call returns without error, but the Redis counter still holds the old spend value.
The critical problem: get_current_spend always reads Redis first (bypassing DualCache's in-memory fallback), so after a silently-failed reset:
- In-memory counter: correctly reset to
0.0 - Redis counter: still holds the old period's spend
- Next
get_current_spendcall returns the stale Redis value → budget check blocks the user permanently
This failure mode is newly introduced by this PR because the direct Redis read in get_current_spend creates an asymmetry with DualCache's error-swallowing write path. Before this PR, there were no Redis-backed counters, so there was no persistent state to get stuck.
The same risk applies to the team branch here and to the reset_budget_for_litellm_team_members function.
A safe fix is to call spend_counter_cache.redis_cache.async_set_cache(...) directly when redis_cache is not None, wrapped in its own try/except that emits a WARNING-level log (not just debug), so operators can detect stuck counters. The in-memory reset can still proceed unconditionally as a local fallback.
08360c3 to
9d64dfb
Compare
9d64dfb to
019d204
Compare
…dis-backed spend counters Budget checks on API keys, teams, and team members were not enforced in multi-pod deployments because user_api_key_cache is intentionally in-memory-only. Each pod tracked spend independently, so with N pods the effective budget was N × max_budget. Introduces a separate spend_counter_cache (DualCache wired to redis_usage_cache) with atomic increment/read helpers: - increment_spend_counters(): awaited in cost callback (not create_task) to update both in-memory and Redis before the next auth check - get_current_spend(): reads Redis first (cross-pod authoritative), falls back to in-memory, then to cached object .spend from DB Budget check functions (_virtual_key_max_budget_check, _team_max_budget_check, _check_team_member_budget) now read spend via get_current_spend() instead of cached object .spend fields. When Redis is not configured, falls back to in-memory-only counters (same as current single-instance behavior). Fixes BerriAI#23714
019d204 to
d533b43
Compare
Budget checks on API keys, teams, and team members were not enforced in multi-pod deployments because user_api_key_cache is intentionally in-memory-only. Each pod tracked spend independently, so with N pods the effective budget was N × max_budget.
Introduces a separate spend_counter_cache (DualCache wired to redis_usage_cache) with atomic increment/read helpers:
Budget check functions (_virtual_key_max_budget_check, _team_max_budget_check, _check_team_member_budget) now read spend via get_current_spend() instead of cached object .spend fields.
When Redis is not configured, falls back to in-memory-only counters (same as current single-instance behavior).
Fixes #23714
Relevant issues
Fixes #23714
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix
Changes
New:
spend_counter_cacheglobal + helpers (proxy_server.py)spend_counter_cache: separateDualCachewired toredis_usage_cachein_init_cache()— keeps spend counters isolated fromuser_api_key_cacheget_current_spend(counter_key, fallback_spend): reads Redis first (cross-pod authoritative), falls back to in-memory counter, then to cached object.spendfrom DB. This bypassesDualCache.async_get_cache's in-memory-first behavior which would return stale per-pod values.increment_spend_counters(token, team_id, user_id, response_cost): atomically increments counters in both in-memory and Redis viaDualCache.async_increment_cache_init_and_increment_spend_counter(): lazy-initializes counter from cached object's DB-loaded spend on first access, then incrementsModified: cost callback (
proxy_track_cost_callback.py)await increment_spend_counters(...)before the existingcreate_task(update_cache(...))— awaited (not fire-and-forget) so the counter is current before the next request's auth checkModified: budget check functions (
auth_checks.py)_virtual_key_max_budget_check: reads spend viaget_current_spend()instead ofvalid_token.spend_team_max_budget_check: reads spend viaget_current_spend()instead ofteam_object.spend_check_team_member_budget: reads spend viaget_current_spend()instead ofteam_membership.spendget_current_spendis imported inline (same pattern ascommon_checks)Modified: early team member check (
user_api_key_auth.py)get_current_spend()instead ofvalid_token.team_member_spendModified: budget reset (
reset_budget_job.py)spend_counter_cache.async_set_cache()when budgets are resetTests (8 new)
test_auth_checks.py: key/team/team-member budget checks read from spend counter; fallback when no countertest_proxy_server.py:get_current_spendRedis-first reads; in-memory fallback; counter init + increment; team + member counters