Skip to content

fix(proxy): enforce budget limits across multi-pod deployments via Redis-backed spend counters#24682

Merged
yuneng-berri merged 1 commit intoBerriAI:mainfrom
michelligabriele:fix/budget-spend-counters
Mar 27, 2026
Merged

fix(proxy): enforce budget limits across multi-pod deployments via Redis-backed spend counters#24682
yuneng-berri merged 1 commit intoBerriAI:mainfrom
michelligabriele:fix/budget-spend-counters

Conversation

@michelligabriele
Copy link
Copy Markdown
Contributor

Budget checks on API keys, teams, and team members were not enforced in multi-pod deployments because user_api_key_cache is intentionally in-memory-only. Each pod tracked spend independently, so with N pods the effective budget was N × max_budget.

Introduces a separate spend_counter_cache (DualCache wired to redis_usage_cache) with atomic increment/read helpers:

  • increment_spend_counters(): awaited in cost callback (not create_task) to update both in-memory and Redis before the next auth check
  • get_current_spend(): reads Redis first (cross-pod authoritative), falls back to in-memory, then to cached object .spend from DB

Budget check functions (_virtual_key_max_budget_check, _team_max_budget_check, _check_team_member_budget) now read spend via get_current_spend() instead of cached object .spend fields.

When Redis is not configured, falls back to in-memory-only counters (same as current single-instance behavior).

Fixes #23714

Relevant issues

Fixes #23714

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix

Changes

New: spend_counter_cache global + helpers (proxy_server.py)

  • spend_counter_cache: separate DualCache wired to redis_usage_cache in _init_cache() — keeps spend counters isolated from user_api_key_cache
  • get_current_spend(counter_key, fallback_spend): reads Redis first (cross-pod authoritative), falls back to in-memory counter, then to cached object .spend from DB. This bypasses DualCache.async_get_cache's in-memory-first behavior which would return stale per-pod values.
  • increment_spend_counters(token, team_id, user_id, response_cost): atomically increments counters in both in-memory and Redis via DualCache.async_increment_cache
  • _init_and_increment_spend_counter(): lazy-initializes counter from cached object's DB-loaded spend on first access, then increments

Modified: cost callback (proxy_track_cost_callback.py)

  • await increment_spend_counters(...) before the existing create_task(update_cache(...)) — awaited (not fire-and-forget) so the counter is current before the next request's auth check

Modified: budget check functions (auth_checks.py)

  • _virtual_key_max_budget_check: reads spend via get_current_spend() instead of valid_token.spend
  • _team_max_budget_check: reads spend via get_current_spend() instead of team_object.spend
  • _check_team_member_budget: reads spend via get_current_spend() instead of team_membership.spend
  • No function signature changes — get_current_spend is imported inline (same pattern as common_checks)

Modified: early team member check (user_api_key_auth.py)

  • Reads team member spend via get_current_spend() instead of valid_token.team_member_spend

Modified: budget reset (reset_budget_job.py)

  • Resets spend counter keys to 0.0 via spend_counter_cache.async_set_cache() when budgets are reset

Tests (8 new)

  • test_auth_checks.py: key/team/team-member budget checks read from spend counter; fallback when no counter
  • test_proxy_server.py: get_current_spend Redis-first reads; in-memory fallback; counter init + increment; team + member counters

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 27, 2026 7:41pm

Request Review

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 27, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing michelligabriele:fix/budget-spend-counters (d533b43) with main (d949085)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 27, 2026

Greptile Summary

This PR introduces Redis-backed, cross-pod spend counters to fix multi-pod budget over-enforcement (the N×budget problem), where each pod independently tracked spend against the full budget. A new spend_counter_cache (DualCache wired to redis_usage_cache) is added alongside two helper functions — get_current_spend (Redis-first read) and increment_spend_counters (awaited atomic increment) — and all three budget-check entry points for virtual keys, teams, and team members are updated to read from these counters instead of stale cached-object .spend fields.

Key changes and observations:

  • Budget reset paths now explicitly reset Redis counters with warning-level logging on failure, addressing the stale-counter-after-reset risk flagged in earlier review rounds.
  • Cold-start seeding uses async_increment_cache (not async_set_cache), making simultaneous cold-starts produce over-counting rather than under-counting — a conservative trade-off documented in code comments.
  • User-level budget enforcement (common_checks, lines 511–521) is not covered: user_object.spend is still read directly, so per-user budget limits in multi-pod deployments remain subject to the original N×budget problem. This is explicitly out of scope per the PR description but should be tracked as a follow-up.
  • The spend counter Redis TTL inherits the general-purpose cache default (60 s unless litellm.default_redis_ttl is configured). After TTL-driven expiry, counters re-seed from the DB-cached object value, which may lag behind unflushable update_cache tasks during short inactivity windows.
  • _check_team_member_budget in auth_checks.py builds the counter key from valid_token.user_id without a None guard, unlike the equivalent code added in user_api_key_auth.py. The fallback is safe but the inconsistency is worth tidying.
  • Eight new unit tests cover all modified budget-check paths using mocked or in-memory backends with no live network calls.

Confidence Score: 4/5

Safe to merge for key/team/team-member budget enforcement; user-level multi-pod budgets remain unaddressed and the Redis TTL behavior warrants documentation or a long-lived TTL override.

All four P0/P1 concerns from prior review rounds are addressed. Remaining findings are P2: the user budget gap is architecturally incomplete but explicitly out of scope, the TTL concern is a soft reliability edge case, and the None-guard inconsistency has a safe fallback.

litellm/proxy/proxy_server.py (Redis TTL for spend counters), litellm/proxy/auth/auth_checks.py (user budget gap, None-guard consistency)

Important Files Changed

Filename Overview
litellm/proxy/proxy_server.py Adds spend_counter_cache global, get_current_spend(), increment_spend_counters(), and _init_and_increment_spend_counter() helpers; wires redis_cache in _init_cache(). Race condition on cold-start seeding addressed with increment-vs-set strategy. Redis key TTL defaults to the general cache TTL.
litellm/proxy/auth/auth_checks.py Updated _virtual_key_max_budget_check, _team_max_budget_check, and _check_team_member_budget to read spend via get_current_spend(); user-level budget check in common_checks still reads user_object.spend directly (out of PR scope). Missing None-guard for valid_token.user_id in _check_team_member_budget counter key construction.
litellm/proxy/auth/user_api_key_auth.py Early team-member budget check updated to use get_current_spend() with correct None-guard for user_id and team_id before constructing counter key.
litellm/proxy/common_utils/reset_budget_job.py Budget reset now explicitly resets Redis counters with warning logging on failure; team-member counters reset by fetching affected memberships. Direct Redis reset bypasses DualCache to prevent silent stale-counter issue.
litellm/proxy/hooks/proxy_track_cost_callback.py increment_spend_counters is now awaited before the fire-and-forget update_cache task, ensuring Redis counters are updated before the next auth check.
tests/test_litellm/proxy/auth/test_auth_checks.py Four new async tests for key/team/team-member budget checks using mocked get_current_spend; no real network calls.
tests/test_litellm/proxy/test_proxy_server.py Four new tests for get_current_spend Redis-first behavior, in-memory fallback, counter init+increment, and team/member tracking. All use mocked or in-memory backends.

Reviews (5): Last reviewed commit: "fix(proxy): enforce budget limits across..." | Re-trigger Greptile

Comment on lines +1795 to +1810
current = await spend_counter_cache.async_get_cache(key=counter_key)
if current is None:
source = await user_api_key_cache.async_get_cache(key=source_cache_key)
base_spend = 0.0
if source is not None:
if isinstance(source, dict):
base_spend = source.get("spend", 0.0) or 0.0
else:
base_spend = getattr(source, "spend", 0.0) or 0.0
await spend_counter_cache.async_set_cache(
key=counter_key, value=base_spend
)

await spend_counter_cache.async_increment_cache(
key=counter_key, value=increment
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Race condition on counter initialization under-counts spend

There is a check-then-set race between concurrent pods at cold start. Both pods may see current is None simultaneously, each read the same base_spend from user_api_key_cache, then one pod's async_set_cache will overwrite the Redis key after the other pod has already incremented it — dropping that increment.

Example sequence (two pods, both cold on the same token, base_spend=5.0):

  1. Pod A: async_get_cacheNone
  2. Pod B: async_get_cacheNone
  3. Pod A: set_cache(5.0) → Redis = 5.0
  4. Pod A: increment_cache(+0.30) → Redis = 5.30
  5. Pod B: set_cache(5.0) → Redis = 5.0 (overwrites 5.30)
  6. Pod B: increment_cache(+0.40) → Redis = 5.40

Final: 5.40 instead of 5.70 — 0.30 is silently lost. During a burst of concurrent cold-start requests this can under-count spend by multiple increments, defeating the budget enforcement goal.

The safest Redis-native fix is to skip the separate set_cache entirely and rely on a Lua SET key base NX + INCRBYFLOAT combination (SET only if not exists). A pragmatic alternative is to omit the initialization step and use INCRBYFLOAT starting from 0, then reconcile against the DB spend value in get_current_spend rather than seeding the counter from the DB object.

Comment on lines +535 to +545
# Reset the cross-pod spend counter
from litellm.proxy.proxy_server import spend_counter_cache

if item_type == "key" and hasattr(item, "token") and item.token is not None:
await spend_counter_cache.async_set_cache(
key=f"spend:key:{item.token}", value=0.0
)
elif item_type == "team" and hasattr(item, "team_id") and item.team_id is not None:
await spend_counter_cache.async_set_cache(
key=f"spend:team:{item.team_id}", value=0.0
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Team member spend counters are never reset

_reset_budget_common is called with item_type in {"key", "team", "user"}. The code resets Redis counters for keys and teams, but there is no reset for team member counters under the spend:team_member:{user_id}:{team_id} key pattern.

This means that after a budget period resets (e.g., monthly), the Redis counter for a team member retains its accumulated value. On the next request that team member will be treated as having already spent their previous period's amount, and they will be permanently rate-limited until the Redis counter naturally expires or is manually cleared.

The reset job does not have a _reset_budget_for_team_member path at the moment, so team member counters have no reset path at all. This is a material gap given the PR's stated goal of reliable multi-pod budget enforcement.

Comment on lines +1714 to +1723
# 1. Try Redis first (cross-pod authoritative)
if spend_counter_cache.redis_cache is not None:
try:
val = await spend_counter_cache.redis_cache.async_get_cache(
key=counter_key
)
if val is not None:
return float(val)
except Exception:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent Redis exception swallowing makes failures invisible

The bare except Exception: pass swallows all Redis read errors with no log output. When Redis is temporarily unavailable this silently degrades to in-memory (per-pod) spend enforcement without any indication, making it very hard to diagnose why budgets are not being enforced after a Redis outage. Consider logging at debug level so that operators can trace the fallback path.

Copy link
Copy Markdown

@corridor-security corridor-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issues

  • Sensitive information exposure (proxy API key) in Redis/cache keys
    The new spend counter logic conditionally hashes the token only when it starts with "sk-". If a raw proxy API key does not use this prefix, it will be written into Redis/in-memory cache keys as plaintext (e.g., spend:key:<raw_token>). This violates the project's requirement that proxy API keys must be hashed before any storage/use outside of the creation flow and risks secret leakage through cache backends, metrics, or operational tooling that enumerates keys.

Recommendations

  • Always use the hashed token for cache key construction. Avoid heuristic prefix checks. Ensure the caller always provides a hashed token, or add a robust, explicit indicator to detect already-hashed tokens. Never place raw proxy API keys into cache keys, values, logs, or error messages.

):
"""
Atomically increment spend counters for budget enforcement.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spend counter uses a heuristic to decide whether to hash the proxy API key based on the prefix \"sk-\". If a raw proxy API key does not start with \"sk-\", its raw value will be embedded directly into Redis/in-memory cache keys (e.g., spend:key:<raw_token>), exposing a secret in the cache namespace. This violates the project's proxy key handling rules and creates a realistic secret leakage path via cache backends or any tooling that lists keys.

Vulnerable code:

if token is not None:
    hashed_token = (
        hash_token(token=token)
        if isinstance(token, str) and token.startswith("sk-")
        else token
    )
    await _init_and_increment_spend_counter(
        counter_key=f"spend:key:{hashed_token}",
        source_cache_key=hashed_token,
        increment=response_cost,
    )

Impact:

  • Raw proxy API keys may be stored and retrievable from Redis or logs/metrics that capture key names, enabling unauthorized use of the proxy by anyone with access to the cache backend or operational logs.

Remediation:

  • Always use the hashed token for cache key construction. Do not rely on string-prefix heuristics. Ensure token is consistently a hashed value at this point in the flow, or perform a robust check/normalization that guarantees hashing of any raw proxy key before use. Also ensure any other paths that set or read spend keys use the hashed form consistently.

For more details, see the finding in Corridor.

Provide feedback: Reply with whether this is a valid vulnerability or false positive to help improve Corridor's accuracy.

Comment on lines +555 to +562
if item_type == "key" and hasattr(item, "token") and item.token is not None:
await spend_counter_cache.async_set_cache(
key=f"spend:key:{item.token}", value=0.0
)
elif item_type == "team" and hasattr(item, "team_id") and item.team_id is not None:
await spend_counter_cache.async_set_cache(
key=f"spend:team:{item.team_id}", value=0.0
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Silent Redis counter reset failure causes permanent over-enforcement

spend_counter_cache.async_set_cache(...) internally swallows Redis write errors — its own try/except catches exceptions and logs without re-raising. This means when Redis has a transient error during a budget reset, the call returns without error, but the Redis counter still holds the old spend value.

The critical problem: get_current_spend always reads Redis first (bypassing DualCache's in-memory fallback), so after a silently-failed reset:

  • In-memory counter: correctly reset to 0.0
  • Redis counter: still holds the old period's spend
  • Next get_current_spend call returns the stale Redis value → budget check blocks the user permanently

This failure mode is newly introduced by this PR because the direct Redis read in get_current_spend creates an asymmetry with DualCache's error-swallowing write path. Before this PR, there were no Redis-backed counters, so there was no persistent state to get stuck.

The same risk applies to the team branch here and to the reset_budget_for_litellm_team_members function.

A safe fix is to call spend_counter_cache.redis_cache.async_set_cache(...) directly when redis_cache is not None, wrapped in its own try/except that emits a WARNING-level log (not just debug), so operators can detect stuck counters. The in-memory reset can still proceed unconditionally as a local fallback.

…dis-backed spend counters

Budget checks on API keys, teams, and team members were not enforced in
multi-pod deployments because user_api_key_cache is intentionally
in-memory-only. Each pod tracked spend independently, so with N pods
the effective budget was N × max_budget.

Introduces a separate spend_counter_cache (DualCache wired to
redis_usage_cache) with atomic increment/read helpers:
- increment_spend_counters(): awaited in cost callback (not create_task)
  to update both in-memory and Redis before the next auth check
- get_current_spend(): reads Redis first (cross-pod authoritative),
  falls back to in-memory, then to cached object .spend from DB

Budget check functions (_virtual_key_max_budget_check,
_team_max_budget_check, _check_team_member_budget) now read spend via
get_current_spend() instead of cached object .spend fields.

When Redis is not configured, falls back to in-memory-only counters
(same as current single-instance behavior).

Fixes BerriAI#23714
# Reset Redis directly so a transient failure doesn't leave stale
# counters that get_current_spend would read as authoritative.
try:
from litellm.proxy.proxy_server import spend_counter_cache

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
litellm.proxy.proxy_server
begins an import cycle.
Import of module
proxy_server
begins an import cycle.
@yuneng-berri yuneng-berri merged commit 846e4b4 into BerriAI:main Mar 27, 2026
40 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Set an API key budget, but users can still make requests after exceeding the max_budget.

3 participants