fix(security): fix prompt injection detection — async heuristics + inheritance chain by xr843 · Pull Request #24346 · BerriAI/litellm

xr843 · 2026-03-22T07:59:13Z

Supersedes #24284 (rebased onto latest main to resolve 267-commit drift and merge conflicts).

Summary

Fixes #19499

Two critical bugs in the built-in prompt injection detection feature:

Heuristics check blocks the event loop: check_user_input_similarity() performs CPU-bound SequenceMatcher operations (triple nested loop, O(n*m)) synchronously inside async_pre_call_hook(). With large inputs this blocks the FastAPI event loop for 60-90s, causing K8s health probes to fail and pods to restart. Fix: offload to asyncio.to_thread().
LLM API check never executes: _OPTIONAL_PromptInjectionDetection extends CustomLogger, but the proxy dispatcher (during_call_hook in utils.py) only invokes async_moderation_hook() for CustomGuardrail instances — making the llm_api_check code path unreachable. Fix: change base class to CustomGuardrail and call super().__init__() with guardrail_name="prompt_injection_detection" and default_on=True.

Breaking change note

Switching the base class to CustomGuardrail with default_on=True means the proxy's during_call_hook() dispatcher will now invoke async_moderation_hook() for every request where this guardrail is registered. For most users this is a no-op (async_moderation_hook returns None immediately when prompt_injection_params is None or llm_api_check is not enabled). However, users who had llm_api_check=True configured will now start seeing actual LLM API calls during the moderation phase — this was the intended behavior all along, but the previous bug silently skipped it. If you relied on the broken behavior (no LLM moderation despite llm_api_check=True), be aware that enabling this fix will introduce additional latency and LLM API costs per request.

Test plan

test_inherits_from_custom_guardrail — verifies the class is now a CustomGuardrail instance
test_dispatch_reachable_via_should_run_guardrail — verifies should_run_guardrail returns True for pre_call and during_call events (not just isinstance)
test_heuristics_check_does_not_block_event_loop — parametrized test verifying both code paths (heuristics_check=True branch and else branch) offload to asyncio.to_thread
Existing tests (test_acompletion_call_type_rejects_prompt_injection, test_acompletion_call_type_allows_safe_prompt) continue to pass

…heritance chain Fixes BerriAI#19499 Two critical bugs in prompt injection detection: 1. Heuristics check blocks event loop: check_user_input_similarity() is a CPU-bound O(n*m) SequenceMatcher operation called synchronously inside async_pre_call_hook(). This blocks the FastAPI event loop for 60-90s on large inputs, causing K8s health probes to fail and pods to restart. Fix: offload to asyncio.to_thread(). 2. LLM API check never executes: _OPTIONAL_PromptInjectionDetection extends CustomLogger, but the proxy dispatcher (during_call_hook in utils.py) only invokes async_moderation_hook() for CustomGuardrail instances. The llm_api_check code path was therefore unreachable. Fix: change base class to CustomGuardrail and call super().__init__() with guardrail_name and default_on=True. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

vercel · 2026-03-22T07:59:19Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Error		Mar 26, 2026 9:26am

codspeed-hq · 2026-03-22T08:01:19Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing xr843:fix/prompt-injection-detection-v2 (847f2f5) with main (c89496f)}

greptile-apps · 2026-03-22T08:04:05Z

Greptile Summary

This PR fixes two bugs in _OPTIONAL_PromptInjectionDetection: the CPU-bound SequenceMatcher loop now runs in a thread via asyncio.to_thread (preventing event loop blocking and K8s health probe failures), and the base class is changed from CustomLogger to CustomGuardrail so the proxy dispatcher's isinstance gate no longer silently drops async_moderation_hook calls for llm_api_check. The audit_logs.py change is cosmetic only (line-wrapping).

Key observations:

The two core fixes are technically correct and well-motivated
Three prior-thread concerns remain unaddressed: the backward-incompatible activation of llm_api_check without a feature flag (violating the repo's backward-compat policy), the unscoped event_hook=None causing the guardrail to run for every future GuardrailEventHooks value, and disable_global_guardrail=True now silently bypassing injection detection where it previously had no effect
The now-unblocked async_moderation_hook code path has no direct test coverage — the new tests verify dispatch reachability but never exercise the actual LLM API check or its response-parsing logic
asyncio and patch are imported inside the test function body rather than at module level

Confidence Score: 2/5

Not safe to merge until the backward-incompatibility concern (unexpected LLM costs for llm_api_check=True users), the unscoped event_hook, and the missing async_moderation_hook integration test are addressed.
Both underlying bugs are real and the fixes are mechanically correct. However, the base-class promotion to CustomGuardrail with default_on=True and no event_hook restriction activates a previously dead code path for all existing users without a feature flag — violating the repository's explicit backward-compatibility policy. Additionally, the newly reachable async_moderation_hook path (the primary fix motivation) has no direct test coverage, making regression detection weak.
litellm/proxy/hooks/prompt_injection_detection.py — the super().__init__() call should restrict event_hook to [pre_call, during_call] and the backward-compat activation of llm_api_check should be gated behind a feature flag per repo policy.

Important Files Changed

Filename	Overview
litellm/proxy/hooks/prompt_injection_detection.py	Base class changed from CustomLogger to CustomGuardrail with default_on=True, and check_user_input_similarity wrapped in asyncio.to_thread; fixes the event loop blocking and dispatch bugs, but introduces unscoped event_hook and backward-compat concerns already flagged in prior review threads.
litellm/proxy/management_helpers/audit_logs.py	Purely cosmetic formatting changes (multi-line expression wrapping); no semantic or behavioral changes.
tests/test_litellm/proxy/hooks/test_prompt_injection_detection.py	Three new tests added covering inheritance, dispatch reachability, and asyncio.to_thread usage; all use mocks with no network calls; minor style issue with inlined imports inside one test function.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ProxyUtils as proxy/utils.py<br/>during_call_hook()
    participant Guardrail as _OPTIONAL_PromptInjectionDetection
    participant Thread as Thread Pool<br/>(asyncio.to_thread)
    participant LLMRouter as LLM Router<br/>(llm_api_check)

    Note over ProxyUtils,Guardrail: BEFORE: CustomLogger base — isinstance check fails,<br/>async_moderation_hook never called

    Client->>ProxyUtils: request
    ProxyUtils->>ProxyUtils: isinstance(cb, CustomGuardrail)? ❌ (was CustomLogger)
    Note over ProxyUtils: LLM API check silently skipped

    Note over ProxyUtils,Guardrail: AFTER: CustomGuardrail base — dispatch now works

    Client->>ProxyUtils: request
    ProxyUtils->>Guardrail: isinstance check ✅
    ProxyUtils->>Guardrail: should_run_guardrail(during_call)?
    Guardrail-->>ProxyUtils: True (default_on=True, event_hook=None)
    ProxyUtils->>Guardrail: async_moderation_hook()

    alt llm_api_check=True
        Guardrail->>LLMRouter: acompletion(prompt_injection_system_prompt + user_input)
        LLMRouter-->>Guardrail: response
        alt response contains fail_call_string
            Guardrail-->>Client: HTTP 400 — prompt injection detected
        else
            Guardrail-->>Client: continue
        end
    else llm_api_check=False or params=None
        Guardrail-->>ProxyUtils: None (no-op)
    end

    Note over Guardrail,Thread: async_pre_call_hook — heuristics fix
    Guardrail->>Thread: asyncio.to_thread(check_user_input_similarity)
    Thread-->>Guardrail: is_prompt_attack (bool)
    alt is_prompt_attack
        Guardrail-->>Client: HTTP 400 — prompt injection detected
    end

_{Reviews (2): Last reviewed commit: "style: format audit_logs.py with black" | Re-trigger Greptile}

greptile-apps · 2026-03-22T08:04:08Z

+        super().__init__(
+            guardrail_name="prompt_injection_detection",
+            default_on=True,
+        )


disable_global_guardrail flag now silently bypasses prompt injection detection

Changing the base class to CustomGuardrail reroutes the pre-call dispatch through _process_guardrail_callback (lines 1369–1390 in proxy/utils.py), which calls callback.should_run_guardrail(data=data, event_type=pre_call) before invoking async_pre_call_hook.

With default_on=True, should_run_guardrail returns False when the request has disable_global_guardrail=True in its metadata. Before this PR, _OPTIONAL_PromptInjectionDetection was a CustomLogger and its async_pre_call_hook was invoked directly (via the elif branch at line 1392), meaning that flag had no effect on prompt injection checks.

This means any request that sets disable_global_guardrail=True will now completely skip the async_pre_call_hook for this guardrail — a silent security gap. This was not the case before. If this bypass is intentional it should be explicitly documented; if not, the should_run_guardrail result should be overridden (or default_on=False used with explicit guardrail registration).

Rule Used: What: avoid backwards-incompatible changes without... (source)

greptile-apps · 2026-03-22T08:04:09Z

+        super().__init__(
+            guardrail_name="prompt_injection_detection",
+            default_on=True,
+        )


event_hook scope unrestricted — guardrail runs for all future event types

event_hook is not passed to super().__init__(), so it defaults to None. Combined with default_on=True, _event_hook_is_event_type in the base class returns True for every GuardrailEventHooks value. This means should_run_guardrail will return True for any new event type added in the future (e.g. post_call, pre_mcp_call, during_mcp_call, etc.) — running async_moderation_hook in contexts where prompt-injection detection is not meaningful.

Explicitly setting event_hook would constrain the scope and make intent clear:

Suggested change

super().__init__(

guardrail_name="prompt_injection_detection",

default_on=True,

)

super().__init__(

guardrail_name="prompt_injection_detection",

default_on=True,

event_hook=[GuardrailEventHooks.pre_call, GuardrailEventHooks.during_call],

)

greptile-apps · 2026-03-22T08:04:10Z

+class _OPTIONAL_PromptInjectionDetection(CustomGuardrail):
    # Class variables or attributes
    def __init__(
        self,
        prompt_injection_params: Optional[LiteLLMPromptInjectionParams] = None,
    ):
+        super().__init__(
+            guardrail_name="prompt_injection_detection",
+            default_on=True,
+        )


llm_api_check=True users will now incur unexpected LLM costs — no opt-in flag

Before this PR async_moderation_hook was unreachable because during_call_hook in proxy/utils.py only processes CustomGuardrail instances (not plain CustomLogger). Switching the base class directly enables the LLM check for all users who have llm_api_check=True configured, with no feature flag to opt in.

Per this repository's backward-compatibility rule, behavior-changing fixes that can introduce unexpected costs or latency for existing users should be gated behind a feature flag (e.g. litellm.enable_prompt_injection_llm_moderation_hook = False by default, or a proxy config toggle). As written, any user with llm_api_check=True will start generating additional LLM API calls and costs immediately upon upgrading, with no way to stage the rollout.

Rule Used: What: avoid backwards-incompatible changes without... (source)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

greptile-apps · 2026-03-23T00:23:26Z

+    import asyncio
+    from unittest.mock import patch


Inline imports should be at module level

asyncio and patch are imported inside the test function body, inconsistent with the module-level imports used everywhere else in the file. Moving them to the top of the file improves readability and follows standard Python convention.

Suggested change

import asyncio

from unittest.mock import patch

import asyncio

from unittest.mock import patch

detector = _OPTIONAL_PromptInjectionDetection(

Should instead be placed at the top of the file alongside the other imports, and the two lines above removed from inside the function.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-03-23T00:23:27Z

+def test_dispatch_reachable_via_should_run_guardrail():
+    """
+    Fix for https://github.com/BerriAI/litellm/issues/19499
+
+    The during_call_hook() dispatcher in proxy/utils.py guards its loop
+    with `isinstance(callback, CustomGuardrail)` and then calls
+    `callback.should_run_guardrail(data, event_type=during_call)`.
+
+    Verify that should_run_guardrail returns True for during_call so
+    async_moderation_hook is actually reachable from the dispatcher.
+    """
+    detector = _OPTIONAL_PromptInjectionDetection()
+    # default_on=True + event_hook=None means it should run for all event types
+    assert detector.should_run_guardrail(
+        data={},
+        event_type=GuardrailEventHooks.during_call,
+    ) is True


async_moderation_hook (during_call) remains untested after being unblocked

This PR's core motivation is that async_moderation_hook was previously unreachable because during_call_hook() in proxy/utils.py only dispatches to CustomGuardrail instances. Now that the class is a CustomGuardrail, async_moderation_hook is reachable for the first time — but there is no test that actually exercises it.

test_dispatch_reachable_via_should_run_guardrail only checks that should_run_guardrail returns True for during_call; it never invokes async_moderation_hook. A test that calls async_moderation_hook with llm_api_check=True (mocking self.llm_router.acompletion) would verify that the fix actually reaches the LLM API check logic end-to-end and that the response parsing (llm_api_fail_call_string comparison) behaves correctly.

xr843 · 2026-03-23T23:40:27Z

@themavik Thank you for the thorough review! Addressing your two points:

1. asyncio.to_thread and the default thread pool

Good catch. In practice, prompt injection detection runs at most once per request during the pre_call hook, and the check_user_input_similarity call is relatively fast (it's a SequenceMatcher comparison against a small, fixed list of known injection patterns). Under realistic workloads this won't saturate the default ThreadPoolExecutor. That said, it's a valid forward-looking concern — if the reference pattern list grows significantly or request volumes spike, a dedicated bounded executor would be the right mitigation. I'll keep this in mind for a follow-up optimization if needed.

2. Race condition on prompt_injection_messages

No race condition here. prompt_injection_messages is set once during __init__ (from the config) and is never mutated afterward — it's effectively read-only for the lifetime of the guardrail instance. There are no code paths that append to or modify that list after initialization, so concurrent reads from the thread pool are safe without any synchronization.

Thanks again for the careful review!

…hooks, add async_moderation_hook tests - Override should_run_guardrail() so prompt injection detection cannot be bypassed via disable_global_guardrail (security fix) - Scope event_hook to [pre_call, during_call] instead of running on all event types (post_call, logging_only are not relevant) - Add end-to-end tests for async_moderation_hook (injection detected and safe prompt paths) - Add tests for bypass resistance and event hook scoping - Move inline imports (asyncio, unittest.mock) to module top level Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

greptile-apps · 2026-03-26T09:26:15Z

Too many files changed for review. (3000 files found, 100 file limit)

xr843 · 2026-03-28T06:45:50Z

All review feedback has been addressed in commit 02d9a08:

✅ Backward compatibility: should_run_guardrail() override + event hooks scoped to [pre_call, during_call] only
✅ Bypass resistance: disable_global_guardrail flag is deliberately ignored for security
✅ Async test coverage: Added test_async_moderation_hook_detects_injection_via_llm_api() and test_async_moderation_hook_allows_safe_prompt()
✅ Import style: Moved to module top level

The remaining CI failures (Vercel deployment, CircleCI auto-create-config) are infrastructure issues common to fork PRs, not code-related.

Ready for re-review. 🙏

xr843 · 2026-03-28T23:30:24Z

Hi @yuneng-berri, all previous review feedback has been addressed in the latest commits — backward-compat guard, scoped event hooks, additional test coverage, and import style fixes. Could you take another look when you get a chance? Thanks!

xr843 · 2026-04-03T00:35:01Z

Closing this PR as I'm refocusing my contributions on fewer projects for deeper engagement. Thank you for your time reviewing. Feel free to cherry-pick the changes if they're still useful.

xr843 mentioned this pull request Mar 22, 2026

fix(security): fix prompt injection detection — async heuristics + inheritance chain #24284

Closed

5 tasks

vercel bot deployed to Preview March 22, 2026 08:01 View deployment

greptile-apps bot reviewed Mar 22, 2026

View reviewed changes

style: format audit_logs.py with black

847f2f5

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

vercel bot deployed to Preview March 23, 2026 00:21 View deployment

greptile-apps bot reviewed Mar 23, 2026

View reviewed changes

vercel bot had a problem deploying to Preview March 26, 2026 09:26 Failure

xr843 closed this Apr 3, 2026

Uh oh!

Conversation

xr843 commented Mar 22, 2026

Summary

Breaking change note

Test plan

Uh oh!

vercel bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

greptile-apps bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

xr843 commented Mar 23, 2026

Uh oh!

greptile-apps bot commented Mar 26, 2026

Uh oh!

xr843 commented Mar 28, 2026

Uh oh!

xr843 commented Mar 28, 2026

Uh oh!

xr843 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Mar 22, 2026 •

edited

Loading

codspeed-hq bot commented Mar 22, 2026 •

edited

Loading

greptile-apps bot commented Mar 22, 2026 •

edited

Loading