-
Notifications
You must be signed in to change notification settings - Fork 2
bug(security): sanitizer classifier 401 on HuggingFace download — regex fallback blocks benign queries #2292
Copy link
Copy link
Closed
Labels
P1High ROI, low complexity — do next sprintHigh ROI, low complexity — do next sprintbugSomething isn't workingSomething isn't working
Description
Summary
During A2A daemon live testing (CI-238), the injection sanitizer fails to load the classifier model with a 401 error, then falls back to a regex-based classifier that produces false positives blocking benign user messages.
Log evidence
WARN zeph_sanitizer: classifier inference error, falling back to regex
error=model loading failed: failed to download config.json from
protectai/deberta-v3-small-prompt-injection-v2: request error: http status: 401
Steps to reproduce
- Run
./target/debug/zeph --config .local/config/testing.toml --daemon - Send POST to
http://localhost:8080/a2awith:{"jsonrpc":"2.0","id":1,"method":"message/send","params":{"message":{"role":"user","parts":[{"kind":"text","text":"hello, who are you?"}]}}} - Also blocked:
"what is 2 + 2?"
Expected
- Classifier downloads successfully (HuggingFace token resolved from vault), or
- Regex fallback only blocks genuine injection patterns, not benign greetings/arithmetic
Actual
Both "hello, who are you?" and "what is 2 + 2?" are blocked with:
{"kind":"text","text":"[security] Input blocked: injection detected by classifier."}Root causes to investigate
protectai/deberta-v3-small-prompt-injection-v2requires a HuggingFace API token — check ifZEPH_HF_TOKENor equivalent is in the vault and passed to the downloader- Regex fallback patterns are too aggressive — any regex that matches basic greetings or arithmetic is a miscalibration
- Behavior is non-deterministic: the same "what is 2+2?" query was passed once (no artifacts) and blocked another time — suggests a race condition in classifier initialization
Config used
.local/config/testing.toml — [sanitizer] section with default settings (no explicit HuggingFace token config)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P1High ROI, low complexity — do next sprintHigh ROI, low complexity — do next sprintbugSomething isn't workingSomething isn't working