fix(sso): direct PKCE token exchange + Redis wiring for multi-instance SSO#22923
fix(sso): direct PKCE token exchange + Redis wiring for multi-instance SSO#22923ishaan-jaff merged 62 commits intomainfrom
Conversation
…ulti-instance SSO When PKCE is enabled, bypass fastapi-sso and perform direct token exchange so code_verifier is correctly included. Store PKCE verifiers as dict in cache for proper JSON serialization in Redis. Wire user_api_key_cache to Redis when available so PKCE verifiers are shared across ECS tasks/pods. Also adds clearer error messages when PKCE is required but not configured.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
…ler methods - Move import httpx/jwt to module level (top of file, not inside function) - Extract inline PKCE token exchange + userinfo logic into two static methods: _pkce_token_exchange() and _get_pkce_userinfo() - get_generic_sso_response PKCE path is now a single method call - Fix double-logging in except block for non-PKCE errors - Use %-style log formatting (no f-strings in log calls)
Greptile SummaryThis PR fixes PKCE token exchange for Okta and other PKCE-required providers by implementing a direct Key implementation:
Code quality observations:
Confidence Score: 4/5
Last reviewed commit: 22e103f |
|
@greptile review |
- Fix access_token missing in PKCE path: read from combined_response directly instead of generic_sso.access_token (which is only set by verify_and_process) - Fix PKCE error hint firing when PKCE is already enabled: only show 'set GENERIC_CLIENT_USE_PKCE=true' advice when code_verifier was absent - Fix unguarded KeyError on access_token: check for error field in HTTP 200 responses before accessing token_response['access_token'] - Fix silent empty userinfo: raise ProxyException when both userinfo endpoint and id_token fallback produce no user data - Fix backward-incompatible Redis wiring: only attach Redis to user_api_key_cache when GENERIC_CLIENT_USE_PKCE=true, preserving existing in-memory behaviour
|
@greptile review |
- Fix PKCE error hint: check env var directly (not code_verifier presence) to distinguish 'PKCE not configured' from 'PKCE enabled but cache miss' - Fix misleading Redis TTL comment in proxy_server.py
|
@greptile review |
- Fix CRITICAL log firing on every non-PKCE callback: only log when PKCE is enabled - Remove unused pkce_env_value intermediate variable - Prefer reusing redis_usage_cache over creating separate RedisCache instance (avoids losing advanced connection options like SSL, timeouts, db)
|
@greptile review |
- Strip OAuth token credentials from response_convertor input to prevent access_token/id_token appearing in restricted-group error messages - Reuse single httpx.AsyncClient for both token exchange and userinfo requests to avoid a second TCP/TLS handshake per SSO callback - Revert Redis wiring to user_api_key_cache: PKCE code already uses redis_usage_cache directly; wiring would route all API-key lookups through Redis unnecessarily. Add startup warning instead when PKCE+Redis mismatch. - Move _OAUTH_TOKEN_FIELDS to module level
|
@greptile review |
|
@greptile review |
|
@greptile review |
| if response is None: | ||
| # Should never happen in practice — construction failure is unexpected. | ||
| raise ProxyException( | ||
| message="Token endpoint request did not return a response", | ||
| type=ProxyErrorTypes.auth_error, | ||
| param="token_exchange", | ||
| code=status.HTTP_401_UNAUTHORIZED, | ||
| ) |
There was a problem hiding this comment.
response is initialized to None at line 2804 before the async with block. If httpx.AsyncClient() construction fails (line 2805), the exception propagates immediately without entering the block body, leaving response = None. However, that exception would be caught by the outer try/except at line 788 in get_generic_sso_response, preventing execution from ever reaching the guard at line 2820.
The guard at lines 2820–2827 cannot be reached in practice because:
- If
AsyncClient()construction succeeds, theasync withbody is entered - Inside the body, either
responseis assigned (line 2807) or aProxyExceptionis raised (line 2818) - If
AsyncClient()construction fails, the exception propagates before the guard is reached
The comment misleadingly suggests this guard handles the construction-failure case, but that scenario would cause the exception to propagate to the outer handler, not continue to line 2820. The guard is dead code and the accompanying comment creates a false expectation for future maintainers.
Suggested fix: Remove the unreachable guard, or restructure exception handling to actually reach this point (e.g., catch construction failures at a higher level before delegating to _pkce_token_exchange).
|
@greptile review |
| if cached_data: | ||
| # Extract code_verifier from dict (stored as dict for JSON serialization) | ||
| if isinstance(cached_data, dict) and "code_verifier" in cached_data: | ||
| code_verifier = cached_data["code_verifier"] | ||
| if not code_verifier: | ||
| # Dict format is correct but value is empty or null. This is | ||
| # a distinct case from an unrecognized format — the entry exists | ||
| # but was stored with an empty/null verifier (data integrity issue). | ||
| _empty_value_in_dict = True | ||
| verbose_proxy_logger.warning( | ||
| "PKCE verifier dict for state '%s' has an empty/null code_verifier " | ||
| "value — may indicate a storage bug. Treating as a cache miss.", | ||
| state, | ||
| ) | ||
| else: | ||
| verbose_proxy_logger.debug("PKCE code_verifier retrieved from cache") | ||
| elif isinstance(cached_data, str): | ||
| # Handle legacy format (plain string) for backward compatibility | ||
| code_verifier = cached_data | ||
| verbose_proxy_logger.warning( | ||
| "Retrieved code_verifier in legacy plain-string format. " | ||
| "Future storage will use dict format." | ||
| ) | ||
| else: | ||
| # Defer the detailed ERROR log to the strict-mode branch below | ||
| # (which includes state and a diagnostic message). Log at DEBUG | ||
| # here to avoid duplicate ERROR entries in the same request. | ||
| verbose_proxy_logger.debug( | ||
| "Unexpected PKCE verifier cache format (type=%s); skipping.", | ||
| type(cached_data).__name__, | ||
| ) | ||
|
|
||
| if code_verifier: | ||
| # Add code_verifier to token exchange parameters (Redis returns decoded string) | ||
| token_params["code_verifier"] = ( | ||
| code_verifier | ||
| if isinstance(code_verifier, str) | ||
| else str(code_verifier) | ||
| ) | ||
| verbose_proxy_logger.debug( | ||
| "PKCE code_verifier retrieved and will be included in token exchange" | ||
| # Add code_verifier to token exchange parameters. | ||
| token_params["code_verifier"] = code_verifier | ||
| # Return the cache key so the caller can delete it *after* a | ||
| # successful token exchange (avoids losing the verifier on retry | ||
| # if the exchange fails partway through). | ||
| token_params["_pkce_cache_key"] = cache_key |
There was a problem hiding this comment.
Stale empty-valued cache entry is never cleaned up in non-strict mode.
When a cached dict contains an empty/null code_verifier value, the code logs a warning indicating a potential storage bug, but the stale cache entry is never deleted. The issue: when code_verifier is falsy, the cache key is not returned to the caller, so _delete_pkce_verifier is never called.
The warning message already documents this as "may indicate a storage bug," but provides no cleanup action. Unlike the strict-mode path which raises an error, the non-strict path allows the request to continue while leaving the corrupted cache entry in memory until its 600-second TTL expires.
Consider implementing best-effort cleanup of the stale entry in the non-strict path after logging the warning, so corrupted or partially-stored verifiers are actively removed rather than left to natural TTL expiry.
|
@greptile review |
| mock_cache = MagicMock() | ||
| mock_cache.async_get_cache = AsyncMock(return_value=12345) | ||
|
|
||
| mock_request = MagicMock(spec=Request) | ||
| mock_request.query_params = {"state": "bad_format_non_strict"} | ||
|
|
||
| # Non-strict mode: should log a warning and continue, not raise. | ||
| # Use patch.dict with PKCE_STRICT_CACHE_MISS="false" to avoid permanently | ||
| # mutating the test process environment with os.environ.pop(). | ||
| with caplog.at_level(logging.WARNING), patch( | ||
| "litellm.proxy.proxy_server.redis_usage_cache", None | ||
| ), patch( | ||
| "litellm.proxy.proxy_server.user_api_key_cache", mock_cache | ||
| ), patch.dict( | ||
| os.environ, | ||
| {"GENERIC_CLIENT_USE_PKCE": "true", "PKCE_STRICT_CACHE_MISS": "false"}, | ||
| clear=False, | ||
| ): | ||
| result = await SSOAuthenticationHandler.prepare_token_exchange_parameters( | ||
| request=mock_request, generic_include_client_id=False | ||
| ) | ||
|
|
||
| # No raise in non-strict mode; verifier simply absent from params | ||
| assert "code_verifier" not in result | ||
| assert "_pkce_cache_key" not in result | ||
| # Cache was queried (the unexpected format was retrieved and logged at WARNING) | ||
| mock_cache.async_get_cache.assert_called_once() |
There was a problem hiding this comment.
Missing async mock setup causes silent cleanup failure in test
This test creates a MagicMock for the cache without setting up async_delete_cache as an AsyncMock. When the test runs with unexpected format data (integer 12345), the production code calls _delete_pkce_verifier() which attempts to await the delete method. Since the mock doesn't provide an awaitable, a TypeError is raised and silently caught by the exception handler — meaning the cleanup path never actually executes.
A regression that breaks the cleanup code would go completely undetected by this test.
Recommendation: Configure the mock to properly handle the async cleanup call:
- Add
mock_cache.async_delete_cache = AsyncMock()after line 3919 - Add an assertion after the test to verify cleanup was called with the correct cache key
- This matches the pattern already correctly used in test_prepare_token_exchange_parameters around line 3142
|
@greptile review |
litellm/proxy/proxy_server.py
Outdated
| # all API-key lookups through Redis. | ||
| use_pkce = os.getenv("GENERIC_CLIENT_USE_PKCE", "false").lower() == "true" | ||
| if use_pkce and redis_usage_cache is None: | ||
| verbose_proxy_logger.warning( | ||
| "GENERIC_CLIENT_USE_PKCE=true but Redis is not configured for LiteLLM caching. " | ||
| "PKCE verifiers will not be shared across instances — callbacks may land on a " | ||
| "different pod than the login request and fail silently. " | ||
| "Configure Redis via the 'cache' section in your proxy config, " | ||
| "or enable sticky sessions for single-instance deployments. " | ||
| "Set PKCE_STRICT_CACHE_MISS=true to fail fast with a 401 on cache misses " | ||
| "instead of continuing without a code_verifier." | ||
| ) |
There was a problem hiding this comment.
PKCE warning fires on every config hot-reload, not just startup
load_config() is called on startup and on every config-file hot-reload. This means the warning at lines 3033-3040 will emit repeatedly in long-running deployments that use dynamic config reloads, cluttering logs with repeated "startup" advisories.
Consider guarding with a module-level sentinel so the warning fires only once per process:
_pkce_redis_warning_emitted = False
def load_config(...):
...
global _pkce_redis_warning_emitted
use_pkce = os.getenv("GENERIC_CLIENT_USE_PKCE", "false").lower() == "true"
if use_pkce and redis_usage_cache is None and not _pkce_redis_warning_emitted:
_pkce_redis_warning_emitted = True
verbose_proxy_logger.warning(...)|
@greptile review |
| # Initialized here so it's visible in the except block for error-hint logic | ||
| code_verifier: Optional[str] = None |
There was a problem hiding this comment.
The comment at line 785 states code_verifier is initialized "so it's visible in the except block for error-hint logic," but the except block (lines 879-933) never references this variable. The error-hint logic uses pkce_configured = os.getenv(...) and error_message = str(e) instead. The initialization is harmless but the comment creates a false expectation for future maintainers.
| # Initialized here so it's visible in the except block for error-hint logic | |
| code_verifier: Optional[str] = None | |
| # Initialize before try block to ensure it's available for use in the PKCE token | |
| # exchange path (line 795 onwards). | |
| code_verifier: Optional[str] = None |
| # Merge: userinfo takes precedence for identity claims (sub, email, name, …) per | ||
| # the OpenID Connect spec (userinfo is the authoritative source for identity). | ||
| # But bearer credentials (access_token, id_token, refresh_token) must always come | ||
| # from the token endpoint, not from userinfo (non-standard providers occasionally | ||
| # include these fields in userinfo, which would otherwise shadow the real bearer token). | ||
| # | ||
| # Three-way merge semantics for each bearer-credential field: | ||
| # 1. token_response has a non-null value → use it (token endpoint is authoritative) | ||
| # 2. token_response explicitly sent null → remove the key so callers get a clean | ||
| # absence signal; the null from the token endpoint overrides userinfo too | ||
| # 3. field absent from token_response → leave whatever userinfo provided as-is | ||
| # (e.g. userinfo-provided id_token from a non-standard provider) |
There was a problem hiding this comment.
The comment at line 2908 states "bearer credentials (access_token, id_token, refresh_token) must always come from the token endpoint, not from userinfo," but Case 3 in the merge loop (lines 2916-2926) explicitly contradicts this by accepting any of these fields from userinfo when absent from token_response.
For example, many providers omit refresh_token from the token response (it's optional). If a non-standard provider includes refresh_token in userinfo, Case 3 would silently accept it into the merged response. A refresh_token field in userinfo is not a standard OIDC field and could allow userinfo providers to inject long-lived credentials.
Update the comment to either:
- Explicitly document Case 3 as an intentional fallback for non-standard providers that embed credentials in userinfo (and note the security implications), or
- Modify the loop to guard
refresh_tokenby accepting it only from token_response, never from userinfo, regardless of key absence
| # Merge: userinfo takes precedence for identity claims (sub, email, name, …) per | |
| # the OpenID Connect spec (userinfo is the authoritative source for identity). | |
| # But bearer credentials (access_token, id_token, refresh_token) must always come | |
| # from the token endpoint, not from userinfo (non-standard providers occasionally | |
| # include these fields in userinfo, which would otherwise shadow the real bearer token). | |
| # | |
| # Three-way merge semantics for each bearer-credential field: | |
| # 1. token_response has a non-null value → use it (token endpoint is authoritative) | |
| # 2. token_response explicitly sent null → remove the key so callers get a clean | |
| # absence signal; the null from the token endpoint overrides userinfo too | |
| # 3. field absent from token_response → leave whatever userinfo provided as-is | |
| # (e.g. userinfo-provided id_token from a non-standard provider) | |
| # Merge: userinfo takes precedence for identity claims (sub, email, name, …) per | |
| # the OpenID Connect spec (userinfo is the authoritative source for identity). | |
| # For bearer credentials, apply a three-way precedence: | |
| # 1. token_response has a non-null value → use it (authoritative) | |
| # 2. token_response explicitly sends null → remove the key | |
| # 3. field absent from token_response → use token_response value only; | |
| # do not accept bearer credentials from userinfo (non-standard providers | |
| # must not be trusted to provide access_token, id_token, or refresh_token) |
|
@greptile review |
| if strict_cache_miss: | ||
| # Distinguish empty-value dicts, corrupt-format entries, and genuine | ||
| # cache misses so operators can investigate the correct root cause. | ||
| if _empty_value_in_dict: | ||
| # Dict format was correct but code_verifier was empty/null. | ||
| raise ProxyException( | ||
| message=( | ||
| f"PKCE verifier for state '{state}' was found in cache but " | ||
| f"has an empty or null code_verifier value — possible storage bug." | ||
| ), | ||
| type=ProxyErrorTypes.auth_error, | ||
| param="PKCE_CACHE_MISS", | ||
| code=status.HTTP_401_UNAUTHORIZED, | ||
| ) | ||
| elif cached_data is not None: | ||
| # Cache had data but in an unrecognised format (e.g. corrupt Redis value). | ||
| verbose_proxy_logger.error( | ||
| "PKCE verifier for state '%s' has an unrecognized format (type=%s); " | ||
| "treating as a cache miss. Investigate the cached value — it may be " | ||
| "a corrupt or stale entry.", | ||
| state, | ||
| type(cached_data).__name__, | ||
| ) | ||
| raise ProxyException( | ||
| message=( | ||
| f"PKCE verifier for state '{state}' has an unrecognized format " | ||
| f"(type={type(cached_data).__name__}). The cached entry may be corrupt." | ||
| ), | ||
| type=ProxyErrorTypes.auth_error, | ||
| param="PKCE_CACHE_MISS", | ||
| code=status.HTTP_401_UNAUTHORIZED, | ||
| ) |
There was a problem hiding this comment.
Corrupt verifier not cleaned up in strict mode
In non-strict mode (line 2682), when cached_data is not None but unusable, _delete_pkce_verifier is called to evict the corrupt entry so it doesn't linger until TTL expiry. The strict-mode branches at lines 2620–2648 raise ProxyException for the same conditions (_empty_value_in_dict and unrecognized format) but skip that cleanup call.
The corrupt entry stays in Redis for up to 600 s. In strict mode operators are already aware something is wrong and may re-examine the cache for diagnosis — a dangling entry makes that harder. The fix is symmetric with the non-strict cleanup:
if _empty_value_in_dict:
await SSOAuthenticationHandler._delete_pkce_verifier(cache_key) # best-effort
raise ProxyException(
message=...,
...
)
elif cached_data is not None:
await SSOAuthenticationHandler._delete_pkce_verifier(cache_key) # best-effort
verbose_proxy_logger.error(...)
raise ProxyException(...)Since _delete_pkce_verifier already swallows its own exceptions, this does not change the raise semantics.
| # The try/except is INSIDE the async with so that TLS teardown exceptions | ||
| # from __aexit__ propagate as-is and are NOT mis-labelled as "Token endpoint | ||
| # request failed". httpx buffers the full response body before __aexit__, | ||
| # so status_code / text / json() remain valid after the context exits. | ||
| async with httpx.AsyncClient() as http_client: | ||
| try: | ||
| response = await http_client.post(token_endpoint, **post_kwargs) | ||
| except Exception as exc: | ||
| # Catch network-level errors (SSL, DNS, TCP, timeout, etc.) and | ||
| # wrap them as a clean ProxyException rather than leaking raw | ||
| # httpx or OS exceptions to callers. | ||
| verbose_proxy_logger.error("PKCE token endpoint unreachable: %s", exc) | ||
| raise ProxyException( | ||
| message=f"Token endpoint request failed: {exc}", | ||
| type=ProxyErrorTypes.auth_error, | ||
| param="token_exchange", | ||
| code=status.HTTP_401_UNAUTHORIZED, | ||
| ) from exc | ||
|
|
||
| # Response processing outside the async with — httpx buffers the full | ||
| # response body so status_code / text / json() remain valid after __aexit__. | ||
| if response.status_code != 200: | ||
| verbose_proxy_logger.error( | ||
| "PKCE token exchange failed. status=%s body=%s", | ||
| response.status_code, | ||
| response.text[:500], | ||
| ) | ||
| raise ProxyException( | ||
| message=f"Token exchange failed: {response.status_code} - {response.text[:500]}", | ||
| type=ProxyErrorTypes.auth_error, | ||
| param="token_exchange", |
There was a problem hiding this comment.
response consumed outside async with block — comment contradicts actual scope
response is assigned inside the async with httpx.AsyncClient() as http_client: block (line ~2807) but consumed (.status_code, .text, .json()) after the block closes. The inline comment correctly notes that httpx buffers the full response body before __aexit__, so reading these attributes after the client is closed is safe today.
The risk is forward-maintenance: a future refactor to streaming (stream=True) would silently break these post-block accesses. Consider moving the non-200 check, JSON parse, and access_token validation inside the async with block to make the lifetime dependency explicit, or at minimum add a short explanatory comment near the consumption site reinforcing the buffering guarantee:
# response.status_code / .text / .json() are safe here: httpx buffers the full
# body before __aexit__, so these attributes remain valid after the client closes.
if response.status_code != 200:| mock_cache = MagicMock() | ||
| mock_cache.async_get_cache = AsyncMock(return_value=12345) | ||
| mock_cache.async_delete_cache = AsyncMock() | ||
|
|
||
| mock_request = MagicMock(spec=Request) | ||
| mock_request.query_params = {"state": "bad_format_non_strict"} | ||
|
|
||
| # Non-strict mode: should log a warning and continue, not raise. | ||
| # Use patch.dict with PKCE_STRICT_CACHE_MISS="false" to avoid permanently | ||
| # mutating the test process environment with os.environ.pop(). | ||
| with caplog.at_level(logging.WARNING), patch( | ||
| "litellm.proxy.proxy_server.redis_usage_cache", None | ||
| ), patch( | ||
| "litellm.proxy.proxy_server.user_api_key_cache", mock_cache | ||
| ), patch.dict( | ||
| os.environ, | ||
| {"GENERIC_CLIENT_USE_PKCE": "true", "PKCE_STRICT_CACHE_MISS": "false"}, | ||
| clear=False, | ||
| ): | ||
| result = await SSOAuthenticationHandler.prepare_token_exchange_parameters( | ||
| request=mock_request, generic_include_client_id=False | ||
| ) |
There was a problem hiding this comment.
async_delete_cache not set up as AsyncMock — cleanup call silently swallowed
mock_cache is a plain MagicMock(). When the non-strict path calls await SSOAuthenticationHandler._delete_pkce_verifier(cache_key), it internally awaits active_cache.async_delete_cache(...). Because mock_cache.async_delete_cache is a regular MagicMock (not an AsyncMock), the await raises a TypeError that is silently swallowed by _delete_pkce_verifier's own except Exception.
The cleanup branch is therefore never actually exercised. A regression that breaks the cleanup dispatch in this path would pass the test silently.
Fix: configure async_delete_cache as an AsyncMock and assert it is called:
mock_cache.async_delete_cache = AsyncMock()
# … run the test …
mock_cache.async_delete_cache.assert_called_once()This matches the pattern already used correctly in test_prepare_token_exchange_parameters (around line 3142).
|
@greptile review |
| async def fake_post(*args, **kwargs): | ||
| # Verify Basic Auth is set | ||
| assert "auth" in kwargs | ||
| assert isinstance(kwargs["auth"], httpx.BasicAuth) | ||
| # Verify code_verifier is in the POST body (essential PKCE field) | ||
| assert kwargs.get("data", {}).get("code_verifier") == "verifier_abc" | ||
| # Verify credentials are NOT double-sent in the POST body when using Basic Auth | ||
| post_data = kwargs.get("data", {}) | ||
| assert "client_secret" not in post_data, "client_secret must not appear in POST body when using Basic Auth" | ||
| assert "client_id" not in post_data, "client_id must not appear in POST body when using Basic Auth (include_client_id=False)" | ||
| return mock_response |
There was a problem hiding this comment.
redirect_uri not asserted in POST body
fake_post thoroughly validates credential placement (auth, code_verifier, absent client_id/client_secret) but does not assert that redirect_uri was forwarded when redirect_url is non-None. Since the production code includes redirect_uri in token_data only via the if redirect_url: guard at line 2780, accidentally removing that guard would silently omit redirect_uri from the POST body — and many OAuth providers reject requests with a redirect_uri_mismatch error in that case.
Add the assertion inside fake_post:
| async def fake_post(*args, **kwargs): | |
| # Verify Basic Auth is set | |
| assert "auth" in kwargs | |
| assert isinstance(kwargs["auth"], httpx.BasicAuth) | |
| # Verify code_verifier is in the POST body (essential PKCE field) | |
| assert kwargs.get("data", {}).get("code_verifier") == "verifier_abc" | |
| # Verify credentials are NOT double-sent in the POST body when using Basic Auth | |
| post_data = kwargs.get("data", {}) | |
| assert "client_secret" not in post_data, "client_secret must not appear in POST body when using Basic Auth" | |
| assert "client_id" not in post_data, "client_id must not appear in POST body when using Basic Auth (include_client_id=False)" | |
| return mock_response | |
| async def fake_post(*args, **kwargs): | |
| # Verify Basic Auth is set | |
| assert "auth" in kwargs | |
| assert isinstance(kwargs["auth"], httpx.BasicAuth) | |
| # Verify code_verifier is in the POST body (essential PKCE field) | |
| assert kwargs.get("data", {}).get("code_verifier") == "verifier_abc" | |
| # Verify redirect_uri is forwarded to the token endpoint | |
| assert kwargs.get("data", {}).get("redirect_uri") == "https://proxy.example.com/callback", "redirect_uri must be forwarded to the token endpoint" | |
| # Verify credentials are NOT double-sent in the POST body when using Basic Auth | |
| post_data = kwargs.get("data", {}) | |
| assert "client_secret" not in post_data, "client_secret must not appear in POST body when using Basic Auth" | |
| assert "client_id" not in post_data, "client_id must not appear in POST body when using Basic Auth (include_client_id=False)" | |
| return mock_response |
The same gap exists in test_pkce_token_exchange_credentials_in_body at line 3481 and in test_pkce_token_exchange_public_client_no_secret.
| assert "unavailable" in exc_info.value.message.lower() or "no userinfo" in exc_info.value.message.lower() or "userinfo" in exc_info.value.message.lower() | ||
| assert str(exc_info.value.code) == "401" |
There was a problem hiding this comment.
Over-permissive message assertion
The message check "unavailable" in ... or "no userinfo" in ... or "userinfo" in exc_info.value.message.lower() will pass for literally any ProxyException whose message contains the word "userinfo", including unrelated errors. The third clause is so broad that it adds no signal.
The actual error raised by _get_pkce_userinfo when userinfo_endpoint is set but the endpoint returns empty/null and no id_token is present is:
"SSO user info unavailable: userinfo endpoint failed and no id_token was present in the token response."
Tighten the assertion to match the actual message that should be raised:
| assert "unavailable" in exc_info.value.message.lower() or "no userinfo" in exc_info.value.message.lower() or "userinfo" in exc_info.value.message.lower() | |
| assert str(exc_info.value.code) == "401" | |
| assert "userinfo endpoint failed" in exc_info.value.message.lower() or "unavailable" in exc_info.value.message.lower() | |
| assert str(exc_info.value.code) == "401" |
| verbose_proxy_logger.warning( | ||
| "Userinfo endpoint returned %s, falling back to id_token", | ||
| resp.status_code, | ||
| ) |
There was a problem hiding this comment.
Non-200 userinfo response body omitted from diagnostic log
The warning logs only the HTTP status code when the userinfo endpoint returns a non-200 response. Without the response body it is impossible to distinguish "401 - token expired", "403 - insufficient scope", "429 - rate limited", or "503 - maintenance page" from a log entry alone.
| verbose_proxy_logger.warning( | |
| "Userinfo endpoint returned %s, falling back to id_token", | |
| resp.status_code, | |
| ) | |
| verbose_proxy_logger.warning( | |
| "Userinfo endpoint returned %s, falling back to id_token. Body: %s", | |
| resp.status_code, | |
| resp.text[:300], | |
| ) |
| user_custom_auth = None | ||
| user_custom_key_generate = None | ||
| # Sentinel: prevents PKCE-no-Redis advisory from re-logging on config hot-reload | ||
| _pkce_no_redis_warning_emitted: bool = False |
There was a problem hiding this comment.
Module-level sentinel persists state across tests in the same process
_pkce_no_redis_warning_emitted is a module-level boolean. Once set to True during any test that exercises load_config() with PKCE enabled and no Redis, it stays True for the remainder of the pytest session. Any later test that asserts the advisory warning fires (e.g. a future regression test for the startup warning) will silently fail to observe the log entry because the sentinel prevents it from being emitted.
Consider resetting the sentinel at the top of test setup, or scope it to the load_config call via a local parameter so tests are fully isolated:
_pkce_no_redis_warning_emitted: bool = FalseThe simplest fix for test authors is to reset the sentinel in test teardown:
import litellm.proxy.proxy_server as ps
ps._pkce_no_redis_warning_emitted = FalseIf a test for this warning is added, it should include that teardown step.
|
@greptile review |
Relevant issues
Fixes: PKCE (Proof Key for Code Exchange) token exchange failing with Okta and other PKCE-required providers because fastapi-sso drops the code_verifier parameter.
Pre-Submission checklist
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unitCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix
Changes
Two fixes for PKCE-based SSO (Okta and other providers):
1. Direct token exchange when PKCE is enabled
fastapi-ssosilently dropscode_verifierfrom the token exchange request, causing PKCE-required providers to reject the auth with "invalid_grant". WhenGENERIC_CLIENT_USE_PKCE=trueand acode_verifieris found in cache, the callback now calls the token endpoint directly viahttpx— bypassingfastapi-sso.verify_and_process— with the full PKCE parameters.2. Redis-backed verifier storage for multi-pod deployments
Previously,
code_verifierwas stored inuser_api_key_cache(in-memory). In multi-pod deployments, the login and callback can land on different pods, causing a cache miss. Nowredis_usage_cacheis used when available (falling back to in-memory for single-instance deploys).Additional improvements:
PKCE_STRICT_CACHE_MISS=trueopt-in to fail with HTTP 401 on cross-pod cache misses instead of silently proceedingGENERIC_CLIENT_USE_PKCE=truebut Redis is not configured