fix(security): fix prompt injection detection — async heuristics + inheritance chain#24346
fix(security): fix prompt injection detection — async heuristics + inheritance chain#24346xr843 wants to merge 3 commits intoBerriAI:mainfrom
Conversation
…heritance chain Fixes BerriAI#19499 Two critical bugs in prompt injection detection: 1. Heuristics check blocks event loop: check_user_input_similarity() is a CPU-bound O(n*m) SequenceMatcher operation called synchronously inside async_pre_call_hook(). This blocks the FastAPI event loop for 60-90s on large inputs, causing K8s health probes to fail and pods to restart. Fix: offload to asyncio.to_thread(). 2. LLM API check never executes: _OPTIONAL_PromptInjectionDetection extends CustomLogger, but the proxy dispatcher (during_call_hook in utils.py) only invokes async_moderation_hook() for CustomGuardrail instances. The llm_api_check code path was therefore unreachable. Fix: change base class to CustomGuardrail and call super().__init__() with guardrail_name and default_on=True. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes two bugs in Key observations:
Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| litellm/proxy/hooks/prompt_injection_detection.py | Base class changed from CustomLogger to CustomGuardrail with default_on=True, and check_user_input_similarity wrapped in asyncio.to_thread; fixes the event loop blocking and dispatch bugs, but introduces unscoped event_hook and backward-compat concerns already flagged in prior review threads. |
| litellm/proxy/management_helpers/audit_logs.py | Purely cosmetic formatting changes (multi-line expression wrapping); no semantic or behavioral changes. |
| tests/test_litellm/proxy/hooks/test_prompt_injection_detection.py | Three new tests added covering inheritance, dispatch reachability, and asyncio.to_thread usage; all use mocks with no network calls; minor style issue with inlined imports inside one test function. |
Sequence Diagram
sequenceDiagram
participant Client
participant ProxyUtils as proxy/utils.py<br/>during_call_hook()
participant Guardrail as _OPTIONAL_PromptInjectionDetection
participant Thread as Thread Pool<br/>(asyncio.to_thread)
participant LLMRouter as LLM Router<br/>(llm_api_check)
Note over ProxyUtils,Guardrail: BEFORE: CustomLogger base — isinstance check fails,<br/>async_moderation_hook never called
Client->>ProxyUtils: request
ProxyUtils->>ProxyUtils: isinstance(cb, CustomGuardrail)? ❌ (was CustomLogger)
Note over ProxyUtils: LLM API check silently skipped
Note over ProxyUtils,Guardrail: AFTER: CustomGuardrail base — dispatch now works
Client->>ProxyUtils: request
ProxyUtils->>Guardrail: isinstance check ✅
ProxyUtils->>Guardrail: should_run_guardrail(during_call)?
Guardrail-->>ProxyUtils: True (default_on=True, event_hook=None)
ProxyUtils->>Guardrail: async_moderation_hook()
alt llm_api_check=True
Guardrail->>LLMRouter: acompletion(prompt_injection_system_prompt + user_input)
LLMRouter-->>Guardrail: response
alt response contains fail_call_string
Guardrail-->>Client: HTTP 400 — prompt injection detected
else
Guardrail-->>Client: continue
end
else llm_api_check=False or params=None
Guardrail-->>ProxyUtils: None (no-op)
end
Note over Guardrail,Thread: async_pre_call_hook — heuristics fix
Guardrail->>Thread: asyncio.to_thread(check_user_input_similarity)
Thread-->>Guardrail: is_prompt_attack (bool)
alt is_prompt_attack
Guardrail-->>Client: HTTP 400 — prompt injection detected
end
Reviews (2): Last reviewed commit: "style: format audit_logs.py with black" | Re-trigger Greptile
| super().__init__( | ||
| guardrail_name="prompt_injection_detection", | ||
| default_on=True, | ||
| ) |
There was a problem hiding this comment.
disable_global_guardrail flag now silently bypasses prompt injection detection
Changing the base class to CustomGuardrail reroutes the pre-call dispatch through _process_guardrail_callback (lines 1369–1390 in proxy/utils.py), which calls callback.should_run_guardrail(data=data, event_type=pre_call) before invoking async_pre_call_hook.
With default_on=True, should_run_guardrail returns False when the request has disable_global_guardrail=True in its metadata. Before this PR, _OPTIONAL_PromptInjectionDetection was a CustomLogger and its async_pre_call_hook was invoked directly (via the elif branch at line 1392), meaning that flag had no effect on prompt injection checks.
This means any request that sets disable_global_guardrail=True will now completely skip the async_pre_call_hook for this guardrail — a silent security gap. This was not the case before. If this bypass is intentional it should be explicitly documented; if not, the should_run_guardrail result should be overridden (or default_on=False used with explicit guardrail registration).
Rule Used: What: avoid backwards-incompatible changes without... (source)
| super().__init__( | ||
| guardrail_name="prompt_injection_detection", | ||
| default_on=True, | ||
| ) |
There was a problem hiding this comment.
event_hook scope unrestricted — guardrail runs for all future event types
event_hook is not passed to super().__init__(), so it defaults to None. Combined with default_on=True, _event_hook_is_event_type in the base class returns True for every GuardrailEventHooks value. This means should_run_guardrail will return True for any new event type added in the future (e.g. post_call, pre_mcp_call, during_mcp_call, etc.) — running async_moderation_hook in contexts where prompt-injection detection is not meaningful.
Explicitly setting event_hook would constrain the scope and make intent clear:
| super().__init__( | |
| guardrail_name="prompt_injection_detection", | |
| default_on=True, | |
| ) | |
| super().__init__( | |
| guardrail_name="prompt_injection_detection", | |
| default_on=True, | |
| event_hook=[GuardrailEventHooks.pre_call, GuardrailEventHooks.during_call], | |
| ) |
| class _OPTIONAL_PromptInjectionDetection(CustomGuardrail): | ||
| # Class variables or attributes | ||
| def __init__( | ||
| self, | ||
| prompt_injection_params: Optional[LiteLLMPromptInjectionParams] = None, | ||
| ): | ||
| super().__init__( | ||
| guardrail_name="prompt_injection_detection", | ||
| default_on=True, | ||
| ) |
There was a problem hiding this comment.
llm_api_check=True users will now incur unexpected LLM costs — no opt-in flag
Before this PR async_moderation_hook was unreachable because during_call_hook in proxy/utils.py only processes CustomGuardrail instances (not plain CustomLogger). Switching the base class directly enables the LLM check for all users who have llm_api_check=True configured, with no feature flag to opt in.
Per this repository's backward-compatibility rule, behavior-changing fixes that can introduce unexpected costs or latency for existing users should be gated behind a feature flag (e.g. litellm.enable_prompt_injection_llm_moderation_hook = False by default, or a proxy config toggle). As written, any user with llm_api_check=True will start generating additional LLM API calls and costs immediately upon upgrading, with no way to stage the rollout.
Rule Used: What: avoid backwards-incompatible changes without... (source)
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
| import asyncio | ||
| from unittest.mock import patch |
There was a problem hiding this comment.
Inline imports should be at module level
asyncio and patch are imported inside the test function body, inconsistent with the module-level imports used everywhere else in the file. Moving them to the top of the file improves readability and follows standard Python convention.
| import asyncio | |
| from unittest.mock import patch | |
| import asyncio | |
| from unittest.mock import patch | |
| detector = _OPTIONAL_PromptInjectionDetection( |
Should instead be placed at the top of the file alongside the other imports, and the two lines above removed from inside the function.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| def test_dispatch_reachable_via_should_run_guardrail(): | ||
| """ | ||
| Fix for https://github.com/BerriAI/litellm/issues/19499 | ||
|
|
||
| The during_call_hook() dispatcher in proxy/utils.py guards its loop | ||
| with `isinstance(callback, CustomGuardrail)` and then calls | ||
| `callback.should_run_guardrail(data, event_type=during_call)`. | ||
|
|
||
| Verify that should_run_guardrail returns True for during_call so | ||
| async_moderation_hook is actually reachable from the dispatcher. | ||
| """ | ||
| detector = _OPTIONAL_PromptInjectionDetection() | ||
| # default_on=True + event_hook=None means it should run for all event types | ||
| assert detector.should_run_guardrail( | ||
| data={}, | ||
| event_type=GuardrailEventHooks.during_call, | ||
| ) is True |
There was a problem hiding this comment.
async_moderation_hook (during_call) remains untested after being unblocked
This PR's core motivation is that async_moderation_hook was previously unreachable because during_call_hook() in proxy/utils.py only dispatches to CustomGuardrail instances. Now that the class is a CustomGuardrail, async_moderation_hook is reachable for the first time — but there is no test that actually exercises it.
test_dispatch_reachable_via_should_run_guardrail only checks that should_run_guardrail returns True for during_call; it never invokes async_moderation_hook. A test that calls async_moderation_hook with llm_api_check=True (mocking self.llm_router.acompletion) would verify that the fix actually reaches the LLM API check logic end-to-end and that the response parsing (llm_api_fail_call_string comparison) behaves correctly.
|
@themavik Thank you for the thorough review! Addressing your two points: 1. Good catch. In practice, prompt injection detection runs at most once per request during the 2. Race condition on No race condition here. Thanks again for the careful review! |
…hooks, add async_moderation_hook tests - Override should_run_guardrail() so prompt injection detection cannot be bypassed via disable_global_guardrail (security fix) - Scope event_hook to [pre_call, during_call] instead of running on all event types (post_call, logging_only are not relevant) - Add end-to-end tests for async_moderation_hook (injection detected and safe prompt paths) - Add tests for bypass resistance and event hook scoping - Move inline imports (asyncio, unittest.mock) to module top level Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
Too many files changed for review. ( |
|
All review feedback has been addressed in commit 02d9a08:
The remaining CI failures (Vercel deployment, CircleCI auto-create-config) are infrastructure issues common to fork PRs, not code-related. Ready for re-review. 🙏 |
|
Hi @yuneng-berri, all previous review feedback has been addressed in the latest commits — backward-compat guard, scoped event hooks, additional test coverage, and import style fixes. Could you take another look when you get a chance? Thanks! |
|
Closing this PR as I'm refocusing my contributions on fewer projects for deeper engagement. Thank you for your time reviewing. Feel free to cherry-pick the changes if they're still useful. |
Supersedes #24284 (rebased onto latest main to resolve 267-commit drift and merge conflicts).
Summary
Fixes #19499
Two critical bugs in the built-in prompt injection detection feature:
Heuristics check blocks the event loop:
check_user_input_similarity()performs CPU-boundSequenceMatcheroperations (triple nested loop, O(n*m)) synchronously insideasync_pre_call_hook(). With large inputs this blocks the FastAPI event loop for 60-90s, causing K8s health probes to fail and pods to restart. Fix: offload toasyncio.to_thread().LLM API check never executes:
_OPTIONAL_PromptInjectionDetectionextendsCustomLogger, but the proxy dispatcher (during_call_hookinutils.py) only invokesasync_moderation_hook()forCustomGuardrailinstances — making thellm_api_checkcode path unreachable. Fix: change base class toCustomGuardrailand callsuper().__init__()withguardrail_name="prompt_injection_detection"anddefault_on=True.Breaking change note
Switching the base class to
CustomGuardrailwithdefault_on=Truemeans the proxy'sduring_call_hook()dispatcher will now invokeasync_moderation_hook()for every request where this guardrail is registered. For most users this is a no-op (async_moderation_hookreturnsNoneimmediately whenprompt_injection_params is Noneorllm_api_checkis not enabled). However, users who hadllm_api_check=Trueconfigured will now start seeing actual LLM API calls during the moderation phase — this was the intended behavior all along, but the previous bug silently skipped it. If you relied on the broken behavior (no LLM moderation despitellm_api_check=True), be aware that enabling this fix will introduce additional latency and LLM API costs per request.Test plan
test_inherits_from_custom_guardrail— verifies the class is now aCustomGuardrailinstancetest_dispatch_reachable_via_should_run_guardrail— verifiesshould_run_guardrailreturnsTrueforpre_callandduring_callevents (not justisinstance)test_heuristics_check_does_not_block_event_loop— parametrized test verifying both code paths (heuristics_check=Truebranch andelsebranch) offload toasyncio.to_threadtest_acompletion_call_type_rejects_prompt_injection,test_acompletion_call_type_allows_safe_prompt) continue to pass