fix(proxy): enforce budget limits across multi-pod deployments via Redis-backed spend counters by michelligabriele · Pull Request #24682 · BerriAI/litellm

michelligabriele · 2026-03-27T13:43:29Z

Budget checks on API keys, teams, and team members were not enforced in multi-pod deployments because user_api_key_cache is intentionally in-memory-only. Each pod tracked spend independently, so with N pods the effective budget was N × max_budget.

Introduces a separate spend_counter_cache (DualCache wired to redis_usage_cache) with atomic increment/read helpers:

increment_spend_counters(): awaited in cost callback (not create_task) to update both in-memory and Redis before the next auth check
get_current_spend(): reads Redis first (cross-pod authoritative), falls back to in-memory, then to cached object .spend from DB

Budget check functions (_virtual_key_max_budget_check, _team_max_budget_check, _check_team_member_budget) now read spend via get_current_spend() instead of cached object .spend fields.

When Redis is not configured, falls back to in-memory-only counters (same as current single-instance behavior).

Fixes #23714

Relevant issues

Fixes #23714

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🐛 Bug Fix

Changes

New: `spend_counter_cache` global + helpers (`proxy_server.py`)

spend_counter_cache: separate DualCache wired to redis_usage_cache in _init_cache() — keeps spend counters isolated from user_api_key_cache
get_current_spend(counter_key, fallback_spend): reads Redis first (cross-pod authoritative), falls back to in-memory counter, then to cached object .spend from DB. This bypasses DualCache.async_get_cache's in-memory-first behavior which would return stale per-pod values.
increment_spend_counters(token, team_id, user_id, response_cost): atomically increments counters in both in-memory and Redis via DualCache.async_increment_cache
_init_and_increment_spend_counter(): lazy-initializes counter from cached object's DB-loaded spend on first access, then increments

Modified: cost callback (`proxy_track_cost_callback.py`)

await increment_spend_counters(...) before the existing create_task(update_cache(...)) — awaited (not fire-and-forget) so the counter is current before the next request's auth check

Modified: budget check functions (`auth_checks.py`)

_virtual_key_max_budget_check: reads spend via get_current_spend() instead of valid_token.spend
_team_max_budget_check: reads spend via get_current_spend() instead of team_object.spend
_check_team_member_budget: reads spend via get_current_spend() instead of team_membership.spend
No function signature changes — get_current_spend is imported inline (same pattern as common_checks)

Modified: early team member check (`user_api_key_auth.py`)

Reads team member spend via get_current_spend() instead of valid_token.team_member_spend

Modified: budget reset (`reset_budget_job.py`)

Resets spend counter keys to 0.0 via spend_counter_cache.async_set_cache() when budgets are reset

Tests (8 new)

test_auth_checks.py: key/team/team-member budget checks read from spend counter; fallback when no counter
test_proxy_server.py: get_current_spend Redis-first reads; in-memory fallback; counter init + increment; team + member counters

vercel · 2026-03-27T13:43:36Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 27, 2026 7:41pm

codspeed-hq · 2026-03-27T13:45:02Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing michelligabriele:fix/budget-spend-counters (d533b43) with main (d949085)}

greptile-apps · 2026-03-27T13:48:35Z

Greptile Summary

This PR introduces Redis-backed, cross-pod spend counters to fix multi-pod budget over-enforcement (the N×budget problem), where each pod independently tracked spend against the full budget. A new spend_counter_cache (DualCache wired to redis_usage_cache) is added alongside two helper functions — get_current_spend (Redis-first read) and increment_spend_counters (awaited atomic increment) — and all three budget-check entry points for virtual keys, teams, and team members are updated to read from these counters instead of stale cached-object .spend fields.

Key changes and observations:

Budget reset paths now explicitly reset Redis counters with warning-level logging on failure, addressing the stale-counter-after-reset risk flagged in earlier review rounds.
Cold-start seeding uses async_increment_cache (not async_set_cache), making simultaneous cold-starts produce over-counting rather than under-counting — a conservative trade-off documented in code comments.
User-level budget enforcement (common_checks, lines 511–521) is not covered: user_object.spend is still read directly, so per-user budget limits in multi-pod deployments remain subject to the original N×budget problem. This is explicitly out of scope per the PR description but should be tracked as a follow-up.
The spend counter Redis TTL inherits the general-purpose cache default (60 s unless litellm.default_redis_ttl is configured). After TTL-driven expiry, counters re-seed from the DB-cached object value, which may lag behind unflushable update_cache tasks during short inactivity windows.
_check_team_member_budget in auth_checks.py builds the counter key from valid_token.user_id without a None guard, unlike the equivalent code added in user_api_key_auth.py. The fallback is safe but the inconsistency is worth tidying.
Eight new unit tests cover all modified budget-check paths using mocked or in-memory backends with no live network calls.

Confidence Score: 4/5

Safe to merge for key/team/team-member budget enforcement; user-level multi-pod budgets remain unaddressed and the Redis TTL behavior warrants documentation or a long-lived TTL override.

All four P0/P1 concerns from prior review rounds are addressed. Remaining findings are P2: the user budget gap is architecturally incomplete but explicitly out of scope, the TTL concern is a soft reliability edge case, and the None-guard inconsistency has a safe fallback.

litellm/proxy/proxy_server.py (Redis TTL for spend counters), litellm/proxy/auth/auth_checks.py (user budget gap, None-guard consistency)

Important Files Changed

Filename	Overview
litellm/proxy/proxy_server.py	Adds spend_counter_cache global, get_current_spend(), increment_spend_counters(), and _init_and_increment_spend_counter() helpers; wires redis_cache in _init_cache(). Race condition on cold-start seeding addressed with increment-vs-set strategy. Redis key TTL defaults to the general cache TTL.
litellm/proxy/auth/auth_checks.py	Updated _virtual_key_max_budget_check, _team_max_budget_check, and _check_team_member_budget to read spend via get_current_spend(); user-level budget check in common_checks still reads user_object.spend directly (out of PR scope). Missing None-guard for valid_token.user_id in _check_team_member_budget counter key construction.
litellm/proxy/auth/user_api_key_auth.py	Early team-member budget check updated to use get_current_spend() with correct None-guard for user_id and team_id before constructing counter key.
litellm/proxy/common_utils/reset_budget_job.py	Budget reset now explicitly resets Redis counters with warning logging on failure; team-member counters reset by fetching affected memberships. Direct Redis reset bypasses DualCache to prevent silent stale-counter issue.
litellm/proxy/hooks/proxy_track_cost_callback.py	increment_spend_counters is now awaited before the fire-and-forget update_cache task, ensuring Redis counters are updated before the next auth check.
tests/test_litellm/proxy/auth/test_auth_checks.py	Four new async tests for key/team/team-member budget checks using mocked get_current_spend; no real network calls.
tests/test_litellm/proxy/test_proxy_server.py	Four new tests for get_current_spend Redis-first behavior, in-memory fallback, counter init+increment, and team/member tracking. All use mocked or in-memory backends.

_{Reviews (5): Last reviewed commit: "fix(proxy): enforce budget limits across..." | Re-trigger Greptile}

greptile-apps · 2026-03-27T13:48:39Z

litellm/proxy/proxy_server.py

+    current = await spend_counter_cache.async_get_cache(key=counter_key)
+    if current is None:
+        source = await user_api_key_cache.async_get_cache(key=source_cache_key)
+        base_spend = 0.0
+        if source is not None:
+            if isinstance(source, dict):
+                base_spend = source.get("spend", 0.0) or 0.0
+            else:
+                base_spend = getattr(source, "spend", 0.0) or 0.0
+        await spend_counter_cache.async_set_cache(
+            key=counter_key, value=base_spend
+        )
+
+    await spend_counter_cache.async_increment_cache(
+        key=counter_key, value=increment
+    )


Race condition on counter initialization under-counts spend

There is a check-then-set race between concurrent pods at cold start. Both pods may see current is None simultaneously, each read the same base_spend from user_api_key_cache, then one pod's async_set_cache will overwrite the Redis key after the other pod has already incremented it — dropping that increment.

Example sequence (two pods, both cold on the same token, base_spend=5.0):

Pod A: async_get_cache → None

Pod B: async_get_cache → None

Pod A: set_cache(5.0) → Redis = 5.0

Pod A: increment_cache(+0.30) → Redis = 5.30

Pod B: set_cache(5.0) → Redis = 5.0 (overwrites 5.30)

Pod B: increment_cache(+0.40) → Redis = 5.40

Final: 5.40 instead of 5.70 — 0.30 is silently lost. During a burst of concurrent cold-start requests this can under-count spend by multiple increments, defeating the budget enforcement goal.

The safest Redis-native fix is to skip the separate set_cache entirely and rely on a Lua SET key base NX + INCRBYFLOAT combination (SET only if not exists). A pragmatic alternative is to omit the initialization step and use INCRBYFLOAT starting from 0, then reconcile against the DB spend value in get_current_spend rather than seeding the counter from the DB object.

greptile-apps · 2026-03-27T13:48:40Z

litellm/proxy/common_utils/reset_budget_job.py

+            # Reset the cross-pod spend counter
+            from litellm.proxy.proxy_server import spend_counter_cache
+
+            if item_type == "key" and hasattr(item, "token") and item.token is not None:
+                await spend_counter_cache.async_set_cache(
+                    key=f"spend:key:{item.token}", value=0.0
+                )
+            elif item_type == "team" and hasattr(item, "team_id") and item.team_id is not None:
+                await spend_counter_cache.async_set_cache(
+                    key=f"spend:team:{item.team_id}", value=0.0
+                )


Team member spend counters are never reset

_reset_budget_common is called with item_type in {"key", "team", "user"}. The code resets Redis counters for keys and teams, but there is no reset for team member counters under the spend:team_member:{user_id}:{team_id} key pattern.

This means that after a budget period resets (e.g., monthly), the Redis counter for a team member retains its accumulated value. On the next request that team member will be treated as having already spent their previous period's amount, and they will be permanently rate-limited until the Redis counter naturally expires or is manually cleared.

The reset job does not have a _reset_budget_for_team_member path at the moment, so team member counters have no reset path at all. This is a material gap given the PR's stated goal of reliable multi-pod budget enforcement.

greptile-apps · 2026-03-27T13:48:41Z

litellm/proxy/proxy_server.py

+    # 1. Try Redis first (cross-pod authoritative)
+    if spend_counter_cache.redis_cache is not None:
+        try:
+            val = await spend_counter_cache.redis_cache.async_get_cache(
+                key=counter_key
+            )
+            if val is not None:
+                return float(val)
+        except Exception:
+            pass


Silent Redis exception swallowing makes failures invisible

The bare except Exception: pass swallows all Redis read errors with no log output. When Redis is temporarily unavailable this silently degrades to in-memory (per-pod) spend enforcement without any indication, making it very hard to diagnose why budgets are not being enforced after a Redis outage. Consider logging at debug level so that operators can trace the fallback path.

corridor-security

Security Issues

Sensitive information exposure (proxy API key) in Redis/cache keys
The new spend counter logic conditionally hashes the token only when it starts with "sk-". If a raw proxy API key does not use this prefix, it will be written into Redis/in-memory cache keys as plaintext (e.g., spend:key:<raw_token>). This violates the project's requirement that proxy API keys must be hashed before any storage/use outside of the creation flow and risks secret leakage through cache backends, metrics, or operational tooling that enumerates keys.

Recommendations

Always use the hashed token for cache key construction. Avoid heuristic prefix checks. Ensure the caller always provides a hashed token, or add a robust, explicit indicator to detect already-hashed tokens. Never place raw proxy API keys into cache keys, values, logs, or error messages.

corridor-security · 2026-03-27T13:53:43Z

litellm/proxy/proxy_server.py

+):
+    """
+    Atomically increment spend counters for budget enforcement.
+


The spend counter uses a heuristic to decide whether to hash the proxy API key based on the prefix \"sk-\". If a raw proxy API key does not start with \"sk-\", its raw value will be embedded directly into Redis/in-memory cache keys (e.g., spend:key:<raw_token>), exposing a secret in the cache namespace. This violates the project's proxy key handling rules and creates a realistic secret leakage path via cache backends or any tooling that lists keys.

Vulnerable code:

if token is not None: hashed_token = ( hash_token(token=token) if isinstance(token, str) and token.startswith("sk-") else token ) await _init_and_increment_spend_counter( counter_key=f"spend:key:{hashed_token}", source_cache_key=hashed_token, increment=response_cost, )

Impact:

Raw proxy API keys may be stored and retrievable from Redis or logs/metrics that capture key names, enabling unauthorized use of the proxy by anyone with access to the cache backend or operational logs.

Remediation:

Always use the hashed token for cache key construction. Do not rely on string-prefix heuristics. Ensure token is consistently a hashed value at this point in the flow, or perform a robust check/normalization that guarantees hashing of any raw proxy key before use. Also ensure any other paths that set or read spend keys use the hashed form consistently.

For more details, see the finding in Corridor.

Provide feedback: Reply with whether this is a valid vulnerability or false positive to help improve Corridor's accuracy.

greptile-apps · 2026-03-27T14:05:20Z

litellm/proxy/common_utils/reset_budget_job.py

+            if item_type == "key" and hasattr(item, "token") and item.token is not None:
+                await spend_counter_cache.async_set_cache(
+                    key=f"spend:key:{item.token}", value=0.0
+                )
+            elif item_type == "team" and hasattr(item, "team_id") and item.team_id is not None:
+                await spend_counter_cache.async_set_cache(
+                    key=f"spend:team:{item.team_id}", value=0.0
+                )


Silent Redis counter reset failure causes permanent over-enforcement

spend_counter_cache.async_set_cache(...) internally swallows Redis write errors — its own try/except catches exceptions and logs without re-raising. This means when Redis has a transient error during a budget reset, the call returns without error, but the Redis counter still holds the old spend value.

The critical problem: get_current_spend always reads Redis first (bypassing DualCache's in-memory fallback), so after a silently-failed reset:

In-memory counter: correctly reset to 0.0

Redis counter: still holds the old period's spend

Next get_current_spend call returns the stale Redis value → budget check blocks the user permanently

This failure mode is newly introduced by this PR because the direct Redis read in get_current_spend creates an asymmetry with DualCache's error-swallowing write path. Before this PR, there were no Redis-backed counters, so there was no persistent state to get stuck.

The same risk applies to the team branch here and to the reset_budget_for_litellm_team_members function.

A safe fix is to call spend_counter_cache.redis_cache.async_set_cache(...) directly when redis_cache is not None, wrapped in its own try/except that emits a WARNING-level log (not just debug), so operators can detect stuck counters. The in-memory reset can still proceed unconditionally as a local fallback.

…dis-backed spend counters Budget checks on API keys, teams, and team members were not enforced in multi-pod deployments because user_api_key_cache is intentionally in-memory-only. Each pod tracked spend independently, so with N pods the effective budget was N × max_budget. Introduces a separate spend_counter_cache (DualCache wired to redis_usage_cache) with atomic increment/read helpers: - increment_spend_counters(): awaited in cost callback (not create_task) to update both in-memory and Redis before the next auth check - get_current_spend(): reads Redis first (cross-pod authoritative), falls back to in-memory, then to cached object .spend from DB Budget check functions (_virtual_key_max_budget_check, _team_max_budget_check, _check_team_member_budget) now read spend via get_current_spend() instead of cached object .spend fields. When Redis is not configured, falls back to in-memory-only counters (same as current single-instance behavior). Fixes BerriAI#23714

litellm/proxy/common_utils/reset_budget_job.py

+        # Reset Redis directly so a transient failure doesn't leave stale
+        # counters that get_current_spend would read as authoritative.
+        try:
+            from litellm.proxy.proxy_server import spend_counter_cache


vercel bot deployed to Preview March 27, 2026 13:45 View deployment

greptile-apps bot reviewed Mar 27, 2026

View reviewed changes

corridor-security bot reviewed Mar 27, 2026

View reviewed changes

michelligabriele force-pushed the fix/budget-spend-counters branch from 83fc320 to f0b8774 Compare March 27, 2026 13:53

vercel bot deployed to Preview March 27, 2026 13:55 View deployment

greptile-apps bot reviewed Mar 27, 2026

View reviewed changes

michelligabriele force-pushed the fix/budget-spend-counters branch 2 times, most recently from 08360c3 to 9d64dfb Compare March 27, 2026 14:20

vercel bot deployed to Preview March 27, 2026 14:27 View deployment

michelligabriele force-pushed the fix/budget-spend-counters branch from 9d64dfb to 019d204 Compare March 27, 2026 14:42

vercel bot deployed to Preview March 27, 2026 14:48 View deployment

yuneng-berri approved these changes Mar 27, 2026

View reviewed changes

michelligabriele force-pushed the fix/budget-spend-counters branch from 019d204 to d533b43 Compare March 27, 2026 19:40

vercel bot deployed to Preview March 27, 2026 19:41 View deployment

github-advanced-security bot found potential problems Mar 27, 2026

View reviewed changes

yuneng-berri merged commit 846e4b4 into BerriAI:main Mar 27, 2026
40 of 41 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(proxy): enforce budget limits across multi-pod deployments via Redis-backed spend counters#24682

fix(proxy): enforce budget limits across multi-pod deployments via Redis-backed spend counters#24682
yuneng-berri merged 1 commit intoBerriAI:mainfrom
michelligabriele:fix/budget-spend-counters

michelligabriele commented Mar 27, 2026

Uh oh!

vercel bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 27, 2026 •

edited

Loading

Important Files Changed

Uh oh!

greptile-apps bot Mar 27, 2026

Uh oh!

greptile-apps bot Mar 27, 2026

Uh oh!

greptile-apps bot Mar 27, 2026

Uh oh!

corridor-security bot left a comment

Uh oh!

corridor-security bot Mar 27, 2026

Uh oh!

greptile-apps bot Mar 27, 2026

Uh oh!

Check notice

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

michelligabriele commented Mar 27, 2026

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Type

Changes

New: spend_counter_cache global + helpers (proxy_server.py)

Modified: cost callback (proxy_track_cost_callback.py)

Modified: budget check functions (auth_checks.py)

Modified: early team member check (user_api_key_auth.py)

Modified: budget reset (reset_budget_job.py)

Tests (8 new)

Uh oh!

vercel bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

greptile-apps bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

corridor-security bot left a comment

Choose a reason for hiding this comment

Uh oh!

corridor-security bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Check notice

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

New: `spend_counter_cache` global + helpers (`proxy_server.py`)

Modified: cost callback (`proxy_track_cost_callback.py`)

Modified: budget check functions (`auth_checks.py`)

Modified: early team member check (`user_api_key_auth.py`)

Modified: budget reset (`reset_budget_job.py`)

vercel bot commented Mar 27, 2026 •

edited

Loading

codspeed-hq bot commented Mar 27, 2026 •

edited

Loading

greptile-apps bot commented Mar 27, 2026 •

edited

Loading