Skip to content

fix(security): fix prompt injection detection — async heuristics + inheritance chain#24346

Closed
xr843 wants to merge 3 commits intoBerriAI:mainfrom
xr843:fix/prompt-injection-detection-v2
Closed

fix(security): fix prompt injection detection — async heuristics + inheritance chain#24346
xr843 wants to merge 3 commits intoBerriAI:mainfrom
xr843:fix/prompt-injection-detection-v2

Conversation

@xr843
Copy link
Copy Markdown
Contributor

@xr843 xr843 commented Mar 22, 2026

Supersedes #24284 (rebased onto latest main to resolve 267-commit drift and merge conflicts).

Summary

Fixes #19499

Two critical bugs in the built-in prompt injection detection feature:

  • Heuristics check blocks the event loop: check_user_input_similarity() performs CPU-bound SequenceMatcher operations (triple nested loop, O(n*m)) synchronously inside async_pre_call_hook(). With large inputs this blocks the FastAPI event loop for 60-90s, causing K8s health probes to fail and pods to restart. Fix: offload to asyncio.to_thread().

  • LLM API check never executes: _OPTIONAL_PromptInjectionDetection extends CustomLogger, but the proxy dispatcher (during_call_hook in utils.py) only invokes async_moderation_hook() for CustomGuardrail instances — making the llm_api_check code path unreachable. Fix: change base class to CustomGuardrail and call super().__init__() with guardrail_name="prompt_injection_detection" and default_on=True.

Breaking change note

Switching the base class to CustomGuardrail with default_on=True means the proxy's during_call_hook() dispatcher will now invoke async_moderation_hook() for every request where this guardrail is registered. For most users this is a no-op (async_moderation_hook returns None immediately when prompt_injection_params is None or llm_api_check is not enabled). However, users who had llm_api_check=True configured will now start seeing actual LLM API calls during the moderation phase — this was the intended behavior all along, but the previous bug silently skipped it. If you relied on the broken behavior (no LLM moderation despite llm_api_check=True), be aware that enabling this fix will introduce additional latency and LLM API costs per request.

Test plan

  • test_inherits_from_custom_guardrail — verifies the class is now a CustomGuardrail instance
  • test_dispatch_reachable_via_should_run_guardrail — verifies should_run_guardrail returns True for pre_call and during_call events (not just isinstance)
  • test_heuristics_check_does_not_block_event_loop — parametrized test verifying both code paths (heuristics_check=True branch and else branch) offload to asyncio.to_thread
  • Existing tests (test_acompletion_call_type_rejects_prompt_injection, test_acompletion_call_type_allows_safe_prompt) continue to pass

…heritance chain

Fixes BerriAI#19499

Two critical bugs in prompt injection detection:

1. Heuristics check blocks event loop: check_user_input_similarity() is
   a CPU-bound O(n*m) SequenceMatcher operation called synchronously
   inside async_pre_call_hook(). This blocks the FastAPI event loop for
   60-90s on large inputs, causing K8s health probes to fail and pods to
   restart. Fix: offload to asyncio.to_thread().

2. LLM API check never executes: _OPTIONAL_PromptInjectionDetection
   extends CustomLogger, but the proxy dispatcher (during_call_hook in
   utils.py) only invokes async_moderation_hook() for CustomGuardrail
   instances. The llm_api_check code path was therefore unreachable.
   Fix: change base class to CustomGuardrail and call super().__init__()
   with guardrail_name and default_on=True.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Error Error Mar 26, 2026 9:26am

Request Review

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 22, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing xr843:fix/prompt-injection-detection-v2 (847f2f5) with main (c89496f)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 22, 2026

Greptile Summary

This PR fixes two bugs in _OPTIONAL_PromptInjectionDetection: the CPU-bound SequenceMatcher loop now runs in a thread via asyncio.to_thread (preventing event loop blocking and K8s health probe failures), and the base class is changed from CustomLogger to CustomGuardrail so the proxy dispatcher's isinstance gate no longer silently drops async_moderation_hook calls for llm_api_check. The audit_logs.py change is cosmetic only (line-wrapping).

Key observations:

  • The two core fixes are technically correct and well-motivated
  • Three prior-thread concerns remain unaddressed: the backward-incompatible activation of llm_api_check without a feature flag (violating the repo's backward-compat policy), the unscoped event_hook=None causing the guardrail to run for every future GuardrailEventHooks value, and disable_global_guardrail=True now silently bypassing injection detection where it previously had no effect
  • The now-unblocked async_moderation_hook code path has no direct test coverage — the new tests verify dispatch reachability but never exercise the actual LLM API check or its response-parsing logic
  • asyncio and patch are imported inside the test function body rather than at module level

Confidence Score: 2/5

  • Not safe to merge until the backward-incompatibility concern (unexpected LLM costs for llm_api_check=True users), the unscoped event_hook, and the missing async_moderation_hook integration test are addressed.
  • Both underlying bugs are real and the fixes are mechanically correct. However, the base-class promotion to CustomGuardrail with default_on=True and no event_hook restriction activates a previously dead code path for all existing users without a feature flag — violating the repository's explicit backward-compatibility policy. Additionally, the newly reachable async_moderation_hook path (the primary fix motivation) has no direct test coverage, making regression detection weak.
  • litellm/proxy/hooks/prompt_injection_detection.py — the super().__init__() call should restrict event_hook to [pre_call, during_call] and the backward-compat activation of llm_api_check should be gated behind a feature flag per repo policy.

Important Files Changed

Filename Overview
litellm/proxy/hooks/prompt_injection_detection.py Base class changed from CustomLogger to CustomGuardrail with default_on=True, and check_user_input_similarity wrapped in asyncio.to_thread; fixes the event loop blocking and dispatch bugs, but introduces unscoped event_hook and backward-compat concerns already flagged in prior review threads.
litellm/proxy/management_helpers/audit_logs.py Purely cosmetic formatting changes (multi-line expression wrapping); no semantic or behavioral changes.
tests/test_litellm/proxy/hooks/test_prompt_injection_detection.py Three new tests added covering inheritance, dispatch reachability, and asyncio.to_thread usage; all use mocks with no network calls; minor style issue with inlined imports inside one test function.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ProxyUtils as proxy/utils.py<br/>during_call_hook()
    participant Guardrail as _OPTIONAL_PromptInjectionDetection
    participant Thread as Thread Pool<br/>(asyncio.to_thread)
    participant LLMRouter as LLM Router<br/>(llm_api_check)

    Note over ProxyUtils,Guardrail: BEFORE: CustomLogger base — isinstance check fails,<br/>async_moderation_hook never called

    Client->>ProxyUtils: request
    ProxyUtils->>ProxyUtils: isinstance(cb, CustomGuardrail)? ❌ (was CustomLogger)
    Note over ProxyUtils: LLM API check silently skipped

    Note over ProxyUtils,Guardrail: AFTER: CustomGuardrail base — dispatch now works

    Client->>ProxyUtils: request
    ProxyUtils->>Guardrail: isinstance check ✅
    ProxyUtils->>Guardrail: should_run_guardrail(during_call)?
    Guardrail-->>ProxyUtils: True (default_on=True, event_hook=None)
    ProxyUtils->>Guardrail: async_moderation_hook()

    alt llm_api_check=True
        Guardrail->>LLMRouter: acompletion(prompt_injection_system_prompt + user_input)
        LLMRouter-->>Guardrail: response
        alt response contains fail_call_string
            Guardrail-->>Client: HTTP 400 — prompt injection detected
        else
            Guardrail-->>Client: continue
        end
    else llm_api_check=False or params=None
        Guardrail-->>ProxyUtils: None (no-op)
    end

    Note over Guardrail,Thread: async_pre_call_hook — heuristics fix
    Guardrail->>Thread: asyncio.to_thread(check_user_input_similarity)
    Thread-->>Guardrail: is_prompt_attack (bool)
    alt is_prompt_attack
        Guardrail-->>Client: HTTP 400 — prompt injection detected
    end
Loading

Reviews (2): Last reviewed commit: "style: format audit_logs.py with black" | Re-trigger Greptile

Comment on lines +35 to +38
super().__init__(
guardrail_name="prompt_injection_detection",
default_on=True,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 disable_global_guardrail flag now silently bypasses prompt injection detection

Changing the base class to CustomGuardrail reroutes the pre-call dispatch through _process_guardrail_callback (lines 1369–1390 in proxy/utils.py), which calls callback.should_run_guardrail(data=data, event_type=pre_call) before invoking async_pre_call_hook.

With default_on=True, should_run_guardrail returns False when the request has disable_global_guardrail=True in its metadata. Before this PR, _OPTIONAL_PromptInjectionDetection was a CustomLogger and its async_pre_call_hook was invoked directly (via the elif branch at line 1392), meaning that flag had no effect on prompt injection checks.

This means any request that sets disable_global_guardrail=True will now completely skip the async_pre_call_hook for this guardrail — a silent security gap. This was not the case before. If this bypass is intentional it should be explicitly documented; if not, the should_run_guardrail result should be overridden (or default_on=False used with explicit guardrail registration).

Rule Used: What: avoid backwards-incompatible changes without... (source)

Comment on lines +35 to +38
super().__init__(
guardrail_name="prompt_injection_detection",
default_on=True,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 event_hook scope unrestricted — guardrail runs for all future event types

event_hook is not passed to super().__init__(), so it defaults to None. Combined with default_on=True, _event_hook_is_event_type in the base class returns True for every GuardrailEventHooks value. This means should_run_guardrail will return True for any new event type added in the future (e.g. post_call, pre_mcp_call, during_mcp_call, etc.) — running async_moderation_hook in contexts where prompt-injection detection is not meaningful.

Explicitly setting event_hook would constrain the scope and make intent clear:

Suggested change
super().__init__(
guardrail_name="prompt_injection_detection",
default_on=True,
)
super().__init__(
guardrail_name="prompt_injection_detection",
default_on=True,
event_hook=[GuardrailEventHooks.pre_call, GuardrailEventHooks.during_call],
)

Comment on lines +29 to +38
class _OPTIONAL_PromptInjectionDetection(CustomGuardrail):
# Class variables or attributes
def __init__(
self,
prompt_injection_params: Optional[LiteLLMPromptInjectionParams] = None,
):
super().__init__(
guardrail_name="prompt_injection_detection",
default_on=True,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 llm_api_check=True users will now incur unexpected LLM costs — no opt-in flag

Before this PR async_moderation_hook was unreachable because during_call_hook in proxy/utils.py only processes CustomGuardrail instances (not plain CustomLogger). Switching the base class directly enables the LLM check for all users who have llm_api_check=True configured, with no feature flag to opt in.

Per this repository's backward-compatibility rule, behavior-changing fixes that can introduce unexpected costs or latency for existing users should be gated behind a feature flag (e.g. litellm.enable_prompt_injection_llm_moderation_hook = False by default, or a proxy config toggle). As written, any user with llm_api_check=True will start generating additional LLM API calls and costs immediately upon upgrading, with no way to stage the rollout.

Rule Used: What: avoid backwards-incompatible changes without... (source)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Comment on lines +125 to +126
import asyncio
from unittest.mock import patch
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inline imports should be at module level

asyncio and patch are imported inside the test function body, inconsistent with the module-level imports used everywhere else in the file. Moving them to the top of the file improves readability and follows standard Python convention.

Suggested change
import asyncio
from unittest.mock import patch
import asyncio
from unittest.mock import patch
detector = _OPTIONAL_PromptInjectionDetection(

Should instead be placed at the top of the file alongside the other imports, and the two lines above removed from inside the function.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +77 to +93
def test_dispatch_reachable_via_should_run_guardrail():
"""
Fix for https://github.com/BerriAI/litellm/issues/19499

The during_call_hook() dispatcher in proxy/utils.py guards its loop
with `isinstance(callback, CustomGuardrail)` and then calls
`callback.should_run_guardrail(data, event_type=during_call)`.

Verify that should_run_guardrail returns True for during_call so
async_moderation_hook is actually reachable from the dispatcher.
"""
detector = _OPTIONAL_PromptInjectionDetection()
# default_on=True + event_hook=None means it should run for all event types
assert detector.should_run_guardrail(
data={},
event_type=GuardrailEventHooks.during_call,
) is True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 async_moderation_hook (during_call) remains untested after being unblocked

This PR's core motivation is that async_moderation_hook was previously unreachable because during_call_hook() in proxy/utils.py only dispatches to CustomGuardrail instances. Now that the class is a CustomGuardrail, async_moderation_hook is reachable for the first time — but there is no test that actually exercises it.

test_dispatch_reachable_via_should_run_guardrail only checks that should_run_guardrail returns True for during_call; it never invokes async_moderation_hook. A test that calls async_moderation_hook with llm_api_check=True (mocking self.llm_router.acompletion) would verify that the fix actually reaches the LLM API check logic end-to-end and that the response parsing (llm_api_fail_call_string comparison) behaves correctly.

@xr843
Copy link
Copy Markdown
Contributor Author

xr843 commented Mar 23, 2026

@themavik Thank you for the thorough review! Addressing your two points:

1. asyncio.to_thread and the default thread pool

Good catch. In practice, prompt injection detection runs at most once per request during the pre_call hook, and the check_user_input_similarity call is relatively fast (it's a SequenceMatcher comparison against a small, fixed list of known injection patterns). Under realistic workloads this won't saturate the default ThreadPoolExecutor. That said, it's a valid forward-looking concern — if the reference pattern list grows significantly or request volumes spike, a dedicated bounded executor would be the right mitigation. I'll keep this in mind for a follow-up optimization if needed.

2. Race condition on prompt_injection_messages

No race condition here. prompt_injection_messages is set once during __init__ (from the config) and is never mutated afterward — it's effectively read-only for the lifetime of the guardrail instance. There are no code paths that append to or modify that list after initialization, so concurrent reads from the thread pool are safe without any synchronization.

Thanks again for the careful review!

…hooks, add async_moderation_hook tests

- Override should_run_guardrail() so prompt injection detection cannot
  be bypassed via disable_global_guardrail (security fix)
- Scope event_hook to [pre_call, during_call] instead of running on
  all event types (post_call, logging_only are not relevant)
- Add end-to-end tests for async_moderation_hook (injection detected
  and safe prompt paths)
- Add tests for bypass resistance and event hook scoping
- Move inline imports (asyncio, unittest.mock) to module top level

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 26, 2026

Too many files changed for review. (3000 files found, 100 file limit)

@xr843
Copy link
Copy Markdown
Contributor Author

xr843 commented Mar 28, 2026

All review feedback has been addressed in commit 02d9a08:

  • Backward compatibility: should_run_guardrail() override + event hooks scoped to [pre_call, during_call] only
  • Bypass resistance: disable_global_guardrail flag is deliberately ignored for security
  • Async test coverage: Added test_async_moderation_hook_detects_injection_via_llm_api() and test_async_moderation_hook_allows_safe_prompt()
  • Import style: Moved to module top level

The remaining CI failures (Vercel deployment, CircleCI auto-create-config) are infrastructure issues common to fork PRs, not code-related.

Ready for re-review. 🙏

@xr843
Copy link
Copy Markdown
Contributor Author

xr843 commented Mar 28, 2026

Hi @yuneng-berri, all previous review feedback has been addressed in the latest commits — backward-compat guard, scoped event hooks, additional test coverage, and import style fixes. Could you take another look when you get a chance? Thanks!

@xr843
Copy link
Copy Markdown
Contributor Author

xr843 commented Apr 3, 2026

Closing this PR as I'm refocusing my contributions on fewer projects for deeper engagement. Thank you for your time reviewing. Feel free to cherry-pick the changes if they're still useful.

@xr843 xr843 closed this Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Prompt Injection Detection Issues

1 participant