Skip to content

fix(sso): direct PKCE token exchange + Redis wiring for multi-instance SSO#22923

Merged
ishaan-jaff merged 62 commits intomainfrom
litellm_pkce_sso_fix
Mar 12, 2026
Merged

fix(sso): direct PKCE token exchange + Redis wiring for multi-instance SSO#22923
ishaan-jaff merged 62 commits intomainfrom
litellm_pkce_sso_fix

Conversation

@ishaan-jaff
Copy link
Copy Markdown
Contributor

@ishaan-jaff ishaan-jaff commented Mar 5, 2026

Relevant issues

Fixes: PKCE (Proof Key for Code Exchange) token exchange failing with Okta and other PKCE-required providers because fastapi-sso drops the code_verifier parameter.

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting `@greptileai` and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix

Changes

Two fixes for PKCE-based SSO (Okta and other providers):

1. Direct token exchange when PKCE is enabled

fastapi-sso silently drops code_verifier from the token exchange request, causing PKCE-required providers to reject the auth with "invalid_grant". When GENERIC_CLIENT_USE_PKCE=true and a code_verifier is found in cache, the callback now calls the token endpoint directly via httpx — bypassing fastapi-sso.verify_and_process — with the full PKCE parameters.

2. Redis-backed verifier storage for multi-pod deployments

Previously, code_verifier was stored in user_api_key_cache (in-memory). In multi-pod deployments, the login and callback can land on different pods, causing a cache miss. Now redis_usage_cache is used when available (falling back to in-memory for single-instance deploys).

Additional improvements:

  • PKCE_STRICT_CACHE_MISS=true opt-in to fail with HTTP 401 on cross-pod cache misses instead of silently proceeding
  • Startup warning when GENERIC_CLIENT_USE_PKCE=true but Redis is not configured
  • Bearer credentials stripped from error messages to prevent credential leakage in 403 responses
  • 22 new tests covering all PKCE paths including cache miss scenarios, public clients, id_token fallback, null JSON responses, and credential isolation

…ulti-instance SSO

When PKCE is enabled, bypass fastapi-sso and perform direct token exchange so
code_verifier is correctly included. Store PKCE verifiers as dict in cache
for proper JSON serialization in Redis. Wire user_api_key_cache to Redis when
available so PKCE verifiers are shared across ECS tasks/pods.

Also adds clearer error messages when PKCE is required but not configured.
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 7, 2026 1:43am

Request Review

…ler methods

- Move import httpx/jwt to module level (top of file, not inside function)
- Extract inline PKCE token exchange + userinfo logic into two static methods:
  _pkce_token_exchange() and _get_pkce_userinfo()
- get_generic_sso_response PKCE path is now a single method call
- Fix double-logging in except block for non-PKCE errors
- Use %-style log formatting (no f-strings in log calls)
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 5, 2026

Greptile Summary

This PR fixes PKCE token exchange for Okta and other PKCE-required providers by implementing a direct httpx-based token exchange that includes the code_verifier parameter (which fastapi-sso silently drops), and adds Redis-backed storage for PKCE verifiers in multi-pod deployments.

Key implementation:

  • New _pkce_token_exchange() and _get_pkce_userinfo() methods perform direct OAuth token exchange with proper code_verifier inclusion, HTTP 200 error-body detection, and id_token JWT fallback
  • PKCE verifiers stored in redis_usage_cache (with in-memory fallback) for cross-instance access, with explicit three-way merge logic to prevent bearer credential confusion
  • Bearer credentials stripped from error messages to prevent exposure in 403 responses
  • Explicit type guards for token_response dict validation (guards against JSON null responses)
  • 22 new tests covering PKCE paths including Basic Auth, credentials-in-body, HTTP-200-with-error-body, id_token fallback, cache miss scenarios, and credential isolation

Code quality observations:

  • Exception handling is well-structured with proper guards for non-dict token responses (line 2855)
  • Authorization code guard correctly placed inside the PKCE conditional block to preserve non-PKCE error callback behavior
  • Bearer credential precedence correctly enforced via three-way merge logic in _pkce_token_exchange (lines 2924-2933)
  • Try/except scoping properly isolates the POST call from context manager teardown exceptions
  • Verifier deletion intentionally deferred to after downstream processing for defensive record-keeping

Confidence Score: 4/5

  • No critical issues identified. Code implements PKCE token exchange correctly with proper guard clauses and error handling.
  • The implementation is sound: authorization flow correctly gates PKCE-specific logic within the if code_verifier: block, bearer credentials are properly protected through three-way merge logic that ensures token endpoint values take precedence, and defensive type guards prevent common edge cases like JSON null responses. Exception handling and try/except scoping are correct. The 22-test suite provides good coverage of PKCE paths.
  • No files require special attention

Last reviewed commit: 22e103f

@ishaan-jaff
Copy link
Copy Markdown
Contributor Author

@greptile review

- Fix access_token missing in PKCE path: read from combined_response directly
  instead of generic_sso.access_token (which is only set by verify_and_process)
- Fix PKCE error hint firing when PKCE is already enabled: only show
  'set GENERIC_CLIENT_USE_PKCE=true' advice when code_verifier was absent
- Fix unguarded KeyError on access_token: check for error field in HTTP 200
  responses before accessing token_response['access_token']
- Fix silent empty userinfo: raise ProxyException when both userinfo endpoint
  and id_token fallback produce no user data
- Fix backward-incompatible Redis wiring: only attach Redis to user_api_key_cache
  when GENERIC_CLIENT_USE_PKCE=true, preserving existing in-memory behaviour
@ishaan-jaff
Copy link
Copy Markdown
Contributor Author

@greptile review

- Fix PKCE error hint: check env var directly (not code_verifier presence) to
  distinguish 'PKCE not configured' from 'PKCE enabled but cache miss'
- Fix misleading Redis TTL comment in proxy_server.py
@ishaan-jaff
Copy link
Copy Markdown
Contributor Author

@greptile review

- Fix CRITICAL log firing on every non-PKCE callback: only log when PKCE is enabled
- Remove unused pkce_env_value intermediate variable
- Prefer reusing redis_usage_cache over creating separate RedisCache instance
  (avoids losing advanced connection options like SSL, timeouts, db)
@ishaan-jaff
Copy link
Copy Markdown
Contributor Author

@greptile review

- Strip OAuth token credentials from response_convertor input to prevent
  access_token/id_token appearing in restricted-group error messages
- Reuse single httpx.AsyncClient for both token exchange and userinfo requests
  to avoid a second TCP/TLS handshake per SSO callback
- Revert Redis wiring to user_api_key_cache: PKCE code already uses
  redis_usage_cache directly; wiring would route all API-key lookups through
  Redis unnecessarily. Add startup warning instead when PKCE+Redis mismatch.
- Move _OAUTH_TOKEN_FIELDS to module level
@ishaan-jaff
Copy link
Copy Markdown
Contributor Author

@greptile review

@ishaan-jaff
Copy link
Copy Markdown
Contributor Author

@greptile review

@ishaan-jaff
Copy link
Copy Markdown
Contributor Author

@greptile review

Comment on lines +2820 to +2827
if response is None:
# Should never happen in practice — construction failure is unexpected.
raise ProxyException(
message="Token endpoint request did not return a response",
type=ProxyErrorTypes.auth_error,
param="token_exchange",
code=status.HTTP_401_UNAUTHORIZED,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response is initialized to None at line 2804 before the async with block. If httpx.AsyncClient() construction fails (line 2805), the exception propagates immediately without entering the block body, leaving response = None. However, that exception would be caught by the outer try/except at line 788 in get_generic_sso_response, preventing execution from ever reaching the guard at line 2820.

The guard at lines 2820–2827 cannot be reached in practice because:

  1. If AsyncClient() construction succeeds, the async with body is entered
  2. Inside the body, either response is assigned (line 2807) or a ProxyException is raised (line 2818)
  3. If AsyncClient() construction fails, the exception propagates before the guard is reached

The comment misleadingly suggests this guard handles the construction-failure case, but that scenario would cause the exception to propagate to the outer handler, not continue to line 2820. The guard is dead code and the accompanying comment creates a false expectation for future maintainers.

Suggested fix: Remove the unreachable guard, or restructure exception handling to actually reach this point (e.g., catch construction failures at a higher level before delegating to _pkce_token_exchange).

@ishaan-jaff
Copy link
Copy Markdown
Contributor Author

@greptile review

Comment on lines +2571 to +2609
if cached_data:
# Extract code_verifier from dict (stored as dict for JSON serialization)
if isinstance(cached_data, dict) and "code_verifier" in cached_data:
code_verifier = cached_data["code_verifier"]
if not code_verifier:
# Dict format is correct but value is empty or null. This is
# a distinct case from an unrecognized format — the entry exists
# but was stored with an empty/null verifier (data integrity issue).
_empty_value_in_dict = True
verbose_proxy_logger.warning(
"PKCE verifier dict for state '%s' has an empty/null code_verifier "
"value — may indicate a storage bug. Treating as a cache miss.",
state,
)
else:
verbose_proxy_logger.debug("PKCE code_verifier retrieved from cache")
elif isinstance(cached_data, str):
# Handle legacy format (plain string) for backward compatibility
code_verifier = cached_data
verbose_proxy_logger.warning(
"Retrieved code_verifier in legacy plain-string format. "
"Future storage will use dict format."
)
else:
# Defer the detailed ERROR log to the strict-mode branch below
# (which includes state and a diagnostic message). Log at DEBUG
# here to avoid duplicate ERROR entries in the same request.
verbose_proxy_logger.debug(
"Unexpected PKCE verifier cache format (type=%s); skipping.",
type(cached_data).__name__,
)

if code_verifier:
# Add code_verifier to token exchange parameters (Redis returns decoded string)
token_params["code_verifier"] = (
code_verifier
if isinstance(code_verifier, str)
else str(code_verifier)
)
verbose_proxy_logger.debug(
"PKCE code_verifier retrieved and will be included in token exchange"
# Add code_verifier to token exchange parameters.
token_params["code_verifier"] = code_verifier
# Return the cache key so the caller can delete it *after* a
# successful token exchange (avoids losing the verifier on retry
# if the exchange fails partway through).
token_params["_pkce_cache_key"] = cache_key
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale empty-valued cache entry is never cleaned up in non-strict mode.

When a cached dict contains an empty/null code_verifier value, the code logs a warning indicating a potential storage bug, but the stale cache entry is never deleted. The issue: when code_verifier is falsy, the cache key is not returned to the caller, so _delete_pkce_verifier is never called.

The warning message already documents this as "may indicate a storage bug," but provides no cleanup action. Unlike the strict-mode path which raises an error, the non-strict path allows the request to continue while leaving the corrupted cache entry in memory until its 600-second TTL expires.

Consider implementing best-effort cleanup of the stale entry in the non-strict path after logging the warning, so corrupted or partially-stored verifiers are actively removed rather than left to natural TTL expiry.

@ishaan-jaff
Copy link
Copy Markdown
Contributor Author

@greptile review

Comment on lines +3918 to +3944
mock_cache = MagicMock()
mock_cache.async_get_cache = AsyncMock(return_value=12345)

mock_request = MagicMock(spec=Request)
mock_request.query_params = {"state": "bad_format_non_strict"}

# Non-strict mode: should log a warning and continue, not raise.
# Use patch.dict with PKCE_STRICT_CACHE_MISS="false" to avoid permanently
# mutating the test process environment with os.environ.pop().
with caplog.at_level(logging.WARNING), patch(
"litellm.proxy.proxy_server.redis_usage_cache", None
), patch(
"litellm.proxy.proxy_server.user_api_key_cache", mock_cache
), patch.dict(
os.environ,
{"GENERIC_CLIENT_USE_PKCE": "true", "PKCE_STRICT_CACHE_MISS": "false"},
clear=False,
):
result = await SSOAuthenticationHandler.prepare_token_exchange_parameters(
request=mock_request, generic_include_client_id=False
)

# No raise in non-strict mode; verifier simply absent from params
assert "code_verifier" not in result
assert "_pkce_cache_key" not in result
# Cache was queried (the unexpected format was retrieved and logged at WARNING)
mock_cache.async_get_cache.assert_called_once()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing async mock setup causes silent cleanup failure in test

This test creates a MagicMock for the cache without setting up async_delete_cache as an AsyncMock. When the test runs with unexpected format data (integer 12345), the production code calls _delete_pkce_verifier() which attempts to await the delete method. Since the mock doesn't provide an awaitable, a TypeError is raised and silently caught by the exception handler — meaning the cleanup path never actually executes.

A regression that breaks the cleanup code would go completely undetected by this test.

Recommendation: Configure the mock to properly handle the async cleanup call:

  1. Add mock_cache.async_delete_cache = AsyncMock() after line 3919
  2. Add an assertion after the test to verify cleanup was called with the correct cache key
  3. This matches the pattern already correctly used in test_prepare_token_exchange_parameters around line 3142

@ishaan-jaff
Copy link
Copy Markdown
Contributor Author

@greptile review

Comment on lines 3030 to 3041
# all API-key lookups through Redis.
use_pkce = os.getenv("GENERIC_CLIENT_USE_PKCE", "false").lower() == "true"
if use_pkce and redis_usage_cache is None:
verbose_proxy_logger.warning(
"GENERIC_CLIENT_USE_PKCE=true but Redis is not configured for LiteLLM caching. "
"PKCE verifiers will not be shared across instances — callbacks may land on a "
"different pod than the login request and fail silently. "
"Configure Redis via the 'cache' section in your proxy config, "
"or enable sticky sessions for single-instance deployments. "
"Set PKCE_STRICT_CACHE_MISS=true to fail fast with a 401 on cache misses "
"instead of continuing without a code_verifier."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PKCE warning fires on every config hot-reload, not just startup

load_config() is called on startup and on every config-file hot-reload. This means the warning at lines 3033-3040 will emit repeatedly in long-running deployments that use dynamic config reloads, cluttering logs with repeated "startup" advisories.

Consider guarding with a module-level sentinel so the warning fires only once per process:

_pkce_redis_warning_emitted = False

def load_config(...):
    ...
    global _pkce_redis_warning_emitted
    use_pkce = os.getenv("GENERIC_CLIENT_USE_PKCE", "false").lower() == "true"
    if use_pkce and redis_usage_cache is None and not _pkce_redis_warning_emitted:
        _pkce_redis_warning_emitted = True
        verbose_proxy_logger.warning(...)

@ishaan-jaff
Copy link
Copy Markdown
Contributor Author

@greptile review

Comment on lines +785 to +786
# Initialized here so it's visible in the except block for error-hint logic
code_verifier: Optional[str] = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment at line 785 states code_verifier is initialized "so it's visible in the except block for error-hint logic," but the except block (lines 879-933) never references this variable. The error-hint logic uses pkce_configured = os.getenv(...) and error_message = str(e) instead. The initialization is harmless but the comment creates a false expectation for future maintainers.

Suggested change
# Initialized here so it's visible in the except block for error-hint logic
code_verifier: Optional[str] = None
# Initialize before try block to ensure it's available for use in the PKCE token
# exchange path (line 795 onwards).
code_verifier: Optional[str] = None

Comment on lines +2906 to +2917
# Merge: userinfo takes precedence for identity claims (sub, email, name, …) per
# the OpenID Connect spec (userinfo is the authoritative source for identity).
# But bearer credentials (access_token, id_token, refresh_token) must always come
# from the token endpoint, not from userinfo (non-standard providers occasionally
# include these fields in userinfo, which would otherwise shadow the real bearer token).
#
# Three-way merge semantics for each bearer-credential field:
# 1. token_response has a non-null value → use it (token endpoint is authoritative)
# 2. token_response explicitly sent null → remove the key so callers get a clean
# absence signal; the null from the token endpoint overrides userinfo too
# 3. field absent from token_response → leave whatever userinfo provided as-is
# (e.g. userinfo-provided id_token from a non-standard provider)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment at line 2908 states "bearer credentials (access_token, id_token, refresh_token) must always come from the token endpoint, not from userinfo," but Case 3 in the merge loop (lines 2916-2926) explicitly contradicts this by accepting any of these fields from userinfo when absent from token_response.

For example, many providers omit refresh_token from the token response (it's optional). If a non-standard provider includes refresh_token in userinfo, Case 3 would silently accept it into the merged response. A refresh_token field in userinfo is not a standard OIDC field and could allow userinfo providers to inject long-lived credentials.

Update the comment to either:

  1. Explicitly document Case 3 as an intentional fallback for non-standard providers that embed credentials in userinfo (and note the security implications), or
  2. Modify the loop to guard refresh_token by accepting it only from token_response, never from userinfo, regardless of key absence
Suggested change
# Merge: userinfo takes precedence for identity claims (sub, email, name, …) per
# the OpenID Connect spec (userinfo is the authoritative source for identity).
# But bearer credentials (access_token, id_token, refresh_token) must always come
# from the token endpoint, not from userinfo (non-standard providers occasionally
# include these fields in userinfo, which would otherwise shadow the real bearer token).
#
# Three-way merge semantics for each bearer-credential field:
# 1. token_response has a non-null value → use it (token endpoint is authoritative)
# 2. token_response explicitly sent null → remove the key so callers get a clean
# absence signal; the null from the token endpoint overrides userinfo too
# 3. field absent from token_response → leave whatever userinfo provided as-is
# (e.g. userinfo-provided id_token from a non-standard provider)
# Merge: userinfo takes precedence for identity claims (sub, email, name, …) per
# the OpenID Connect spec (userinfo is the authoritative source for identity).
# For bearer credentials, apply a three-way precedence:
# 1. token_response has a non-null value → use it (authoritative)
# 2. token_response explicitly sends null → remove the key
# 3. field absent from token_response → use token_response value only;
# do not accept bearer credentials from userinfo (non-standard providers
# must not be trusted to provide access_token, id_token, or refresh_token)

@ishaan-jaff
Copy link
Copy Markdown
Contributor Author

@greptile review

Comment on lines +2617 to +2648
if strict_cache_miss:
# Distinguish empty-value dicts, corrupt-format entries, and genuine
# cache misses so operators can investigate the correct root cause.
if _empty_value_in_dict:
# Dict format was correct but code_verifier was empty/null.
raise ProxyException(
message=(
f"PKCE verifier for state '{state}' was found in cache but "
f"has an empty or null code_verifier value — possible storage bug."
),
type=ProxyErrorTypes.auth_error,
param="PKCE_CACHE_MISS",
code=status.HTTP_401_UNAUTHORIZED,
)
elif cached_data is not None:
# Cache had data but in an unrecognised format (e.g. corrupt Redis value).
verbose_proxy_logger.error(
"PKCE verifier for state '%s' has an unrecognized format (type=%s); "
"treating as a cache miss. Investigate the cached value — it may be "
"a corrupt or stale entry.",
state,
type(cached_data).__name__,
)
raise ProxyException(
message=(
f"PKCE verifier for state '{state}' has an unrecognized format "
f"(type={type(cached_data).__name__}). The cached entry may be corrupt."
),
type=ProxyErrorTypes.auth_error,
param="PKCE_CACHE_MISS",
code=status.HTTP_401_UNAUTHORIZED,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrupt verifier not cleaned up in strict mode

In non-strict mode (line 2682), when cached_data is not None but unusable, _delete_pkce_verifier is called to evict the corrupt entry so it doesn't linger until TTL expiry. The strict-mode branches at lines 2620–2648 raise ProxyException for the same conditions (_empty_value_in_dict and unrecognized format) but skip that cleanup call.

The corrupt entry stays in Redis for up to 600 s. In strict mode operators are already aware something is wrong and may re-examine the cache for diagnosis — a dangling entry makes that harder. The fix is symmetric with the non-strict cleanup:

if _empty_value_in_dict:
    await SSOAuthenticationHandler._delete_pkce_verifier(cache_key)  # best-effort
    raise ProxyException(
        message=...,
        ...
    )
elif cached_data is not None:
    await SSOAuthenticationHandler._delete_pkce_verifier(cache_key)  # best-effort
    verbose_proxy_logger.error(...)
    raise ProxyException(...)

Since _delete_pkce_verifier already swallows its own exceptions, this does not change the raise semantics.

Comment on lines +2800 to +2830
# The try/except is INSIDE the async with so that TLS teardown exceptions
# from __aexit__ propagate as-is and are NOT mis-labelled as "Token endpoint
# request failed". httpx buffers the full response body before __aexit__,
# so status_code / text / json() remain valid after the context exits.
async with httpx.AsyncClient() as http_client:
try:
response = await http_client.post(token_endpoint, **post_kwargs)
except Exception as exc:
# Catch network-level errors (SSL, DNS, TCP, timeout, etc.) and
# wrap them as a clean ProxyException rather than leaking raw
# httpx or OS exceptions to callers.
verbose_proxy_logger.error("PKCE token endpoint unreachable: %s", exc)
raise ProxyException(
message=f"Token endpoint request failed: {exc}",
type=ProxyErrorTypes.auth_error,
param="token_exchange",
code=status.HTTP_401_UNAUTHORIZED,
) from exc

# Response processing outside the async with — httpx buffers the full
# response body so status_code / text / json() remain valid after __aexit__.
if response.status_code != 200:
verbose_proxy_logger.error(
"PKCE token exchange failed. status=%s body=%s",
response.status_code,
response.text[:500],
)
raise ProxyException(
message=f"Token exchange failed: {response.status_code} - {response.text[:500]}",
type=ProxyErrorTypes.auth_error,
param="token_exchange",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response consumed outside async with block — comment contradicts actual scope

response is assigned inside the async with httpx.AsyncClient() as http_client: block (line ~2807) but consumed (.status_code, .text, .json()) after the block closes. The inline comment correctly notes that httpx buffers the full response body before __aexit__, so reading these attributes after the client is closed is safe today.

The risk is forward-maintenance: a future refactor to streaming (stream=True) would silently break these post-block accesses. Consider moving the non-200 check, JSON parse, and access_token validation inside the async with block to make the lifetime dependency explicit, or at minimum add a short explanatory comment near the consumption site reinforcing the buffering guarantee:

# response.status_code / .text / .json() are safe here: httpx buffers the full
# body before __aexit__, so these attributes remain valid after the client closes.
if response.status_code != 200:

Comment on lines +3919 to +3940
mock_cache = MagicMock()
mock_cache.async_get_cache = AsyncMock(return_value=12345)
mock_cache.async_delete_cache = AsyncMock()

mock_request = MagicMock(spec=Request)
mock_request.query_params = {"state": "bad_format_non_strict"}

# Non-strict mode: should log a warning and continue, not raise.
# Use patch.dict with PKCE_STRICT_CACHE_MISS="false" to avoid permanently
# mutating the test process environment with os.environ.pop().
with caplog.at_level(logging.WARNING), patch(
"litellm.proxy.proxy_server.redis_usage_cache", None
), patch(
"litellm.proxy.proxy_server.user_api_key_cache", mock_cache
), patch.dict(
os.environ,
{"GENERIC_CLIENT_USE_PKCE": "true", "PKCE_STRICT_CACHE_MISS": "false"},
clear=False,
):
result = await SSOAuthenticationHandler.prepare_token_exchange_parameters(
request=mock_request, generic_include_client_id=False
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async_delete_cache not set up as AsyncMock — cleanup call silently swallowed

mock_cache is a plain MagicMock(). When the non-strict path calls await SSOAuthenticationHandler._delete_pkce_verifier(cache_key), it internally awaits active_cache.async_delete_cache(...). Because mock_cache.async_delete_cache is a regular MagicMock (not an AsyncMock), the await raises a TypeError that is silently swallowed by _delete_pkce_verifier's own except Exception.

The cleanup branch is therefore never actually exercised. A regression that breaks the cleanup dispatch in this path would pass the test silently.

Fix: configure async_delete_cache as an AsyncMock and assert it is called:

mock_cache.async_delete_cache = AsyncMock()
# … run the test …
mock_cache.async_delete_cache.assert_called_once()

This matches the pattern already used correctly in test_prepare_token_exchange_parameters (around line 3142).

@ishaan-jaff
Copy link
Copy Markdown
Contributor Author

@greptile review

Comment on lines +3420 to +3430
async def fake_post(*args, **kwargs):
# Verify Basic Auth is set
assert "auth" in kwargs
assert isinstance(kwargs["auth"], httpx.BasicAuth)
# Verify code_verifier is in the POST body (essential PKCE field)
assert kwargs.get("data", {}).get("code_verifier") == "verifier_abc"
# Verify credentials are NOT double-sent in the POST body when using Basic Auth
post_data = kwargs.get("data", {})
assert "client_secret" not in post_data, "client_secret must not appear in POST body when using Basic Auth"
assert "client_id" not in post_data, "client_id must not appear in POST body when using Basic Auth (include_client_id=False)"
return mock_response
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redirect_uri not asserted in POST body

fake_post thoroughly validates credential placement (auth, code_verifier, absent client_id/client_secret) but does not assert that redirect_uri was forwarded when redirect_url is non-None. Since the production code includes redirect_uri in token_data only via the if redirect_url: guard at line 2780, accidentally removing that guard would silently omit redirect_uri from the POST body — and many OAuth providers reject requests with a redirect_uri_mismatch error in that case.

Add the assertion inside fake_post:

Suggested change
async def fake_post(*args, **kwargs):
# Verify Basic Auth is set
assert "auth" in kwargs
assert isinstance(kwargs["auth"], httpx.BasicAuth)
# Verify code_verifier is in the POST body (essential PKCE field)
assert kwargs.get("data", {}).get("code_verifier") == "verifier_abc"
# Verify credentials are NOT double-sent in the POST body when using Basic Auth
post_data = kwargs.get("data", {})
assert "client_secret" not in post_data, "client_secret must not appear in POST body when using Basic Auth"
assert "client_id" not in post_data, "client_id must not appear in POST body when using Basic Auth (include_client_id=False)"
return mock_response
async def fake_post(*args, **kwargs):
# Verify Basic Auth is set
assert "auth" in kwargs
assert isinstance(kwargs["auth"], httpx.BasicAuth)
# Verify code_verifier is in the POST body (essential PKCE field)
assert kwargs.get("data", {}).get("code_verifier") == "verifier_abc"
# Verify redirect_uri is forwarded to the token endpoint
assert kwargs.get("data", {}).get("redirect_uri") == "https://proxy.example.com/callback", "redirect_uri must be forwarded to the token endpoint"
# Verify credentials are NOT double-sent in the POST body when using Basic Auth
post_data = kwargs.get("data", {})
assert "client_secret" not in post_data, "client_secret must not appear in POST body when using Basic Auth"
assert "client_id" not in post_data, "client_id must not appear in POST body when using Basic Auth (include_client_id=False)"
return mock_response

The same gap exists in test_pkce_token_exchange_credentials_in_body at line 3481 and in test_pkce_token_exchange_public_client_no_secret.

Comment on lines +3674 to +3675
assert "unavailable" in exc_info.value.message.lower() or "no userinfo" in exc_info.value.message.lower() or "userinfo" in exc_info.value.message.lower()
assert str(exc_info.value.code) == "401"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over-permissive message assertion

The message check "unavailable" in ... or "no userinfo" in ... or "userinfo" in exc_info.value.message.lower() will pass for literally any ProxyException whose message contains the word "userinfo", including unrelated errors. The third clause is so broad that it adds no signal.

The actual error raised by _get_pkce_userinfo when userinfo_endpoint is set but the endpoint returns empty/null and no id_token is present is:

"SSO user info unavailable: userinfo endpoint failed and no id_token was present in the token response."

Tighten the assertion to match the actual message that should be raised:

Suggested change
assert "unavailable" in exc_info.value.message.lower() or "no userinfo" in exc_info.value.message.lower() or "userinfo" in exc_info.value.message.lower()
assert str(exc_info.value.code) == "401"
assert "userinfo endpoint failed" in exc_info.value.message.lower() or "unavailable" in exc_info.value.message.lower()
assert str(exc_info.value.code) == "401"

Comment on lines +2982 to +2985
verbose_proxy_logger.warning(
"Userinfo endpoint returned %s, falling back to id_token",
resp.status_code,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-200 userinfo response body omitted from diagnostic log

The warning logs only the HTTP status code when the userinfo endpoint returns a non-200 response. Without the response body it is impossible to distinguish "401 - token expired", "403 - insufficient scope", "429 - rate limited", or "503 - maintenance page" from a log entry alone.

Suggested change
verbose_proxy_logger.warning(
"Userinfo endpoint returned %s, falling back to id_token",
resp.status_code,
)
verbose_proxy_logger.warning(
"Userinfo endpoint returned %s, falling back to id_token. Body: %s",
resp.status_code,
resp.text[:300],
)

user_custom_auth = None
user_custom_key_generate = None
# Sentinel: prevents PKCE-no-Redis advisory from re-logging on config hot-reload
_pkce_no_redis_warning_emitted: bool = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module-level sentinel persists state across tests in the same process

_pkce_no_redis_warning_emitted is a module-level boolean. Once set to True during any test that exercises load_config() with PKCE enabled and no Redis, it stays True for the remainder of the pytest session. Any later test that asserts the advisory warning fires (e.g. a future regression test for the startup warning) will silently fail to observe the log entry because the sentinel prevents it from being emitted.

Consider resetting the sentinel at the top of test setup, or scope it to the load_config call via a local parameter so tests are fully isolated:

_pkce_no_redis_warning_emitted: bool = False

The simplest fix for test authors is to reset the sentinel in test teardown:

import litellm.proxy.proxy_server as ps
ps._pkce_no_redis_warning_emitted = False

If a test for this warning is added, it should include that teardown step.

@ishaan-jaff
Copy link
Copy Markdown
Contributor Author

@greptile review

@ishaan-jaff ishaan-jaff merged commit 7697b1c into main Mar 12, 2026
75 of 102 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant