-
Notifications
You must be signed in to change notification settings - Fork 2
fix(sanitizer): injection false positives from legitimate user queries in memory retrieval #2025
Description
Summary
During continuous improvement session CI-15 (2026-03-20, v0.16.0), the ContentSanitizer flagged 2 injection patterns (flags=2) when retrieving memory content that contained legitimate user queries.
Root Cause
assembly.rs::sanitize_memory_message() runs all retrieved memory messages through ContentSanitizer::detect_injections(). When a user query like:
"Use the memory_save tool to save this fact: ... Confirm when done."
is stored in memory and later retrieved via memory_search, the sanitizer flags it because it contains imperative language similar to injection patterns (e.g., instruction-like directives).
Observed Warning
WARN zeph_core::agent::context::assembly: injection patterns detected in memory retrieval flags=2
Impact
- Severity: Low — sanitizer is advisory only (doc comment: "not a security boundary")
- Functional impact: None — retrieval proceeds normally
- Operational impact: Log noise; inflates
sanitizer_injection_flagsmetric
Expected Behavior
User messages stored in memory (prior conversation turns) should not trigger injection warnings when retrieved. The sanitizer should distinguish between:
- Actual injection: untrusted external content (web scrapes, MCP tool output, documents)
- False positive: prior user conversation turns retrieved from SQLite
Potential Fix
- Reduce
ContentSourcespecificity for memory retrieval paths — use a lower sensitivity mode - Add context label to retrieved messages (e.g.,
role=user) to skip injection scanning for known-safe sources - Filter patterns that commonly trigger on instruction-like user queries (e.g., patterns matching common verb phrases)
Evidence
- Session:
.local/testing/sessions/2026-03-20-session-ci15.md - Log:
.local/testing/memory-ops-2026-03-20.log - Code:
crates/zeph-core/src/agent/context/assembly.rs::sanitize_memory_message()