feat(security): untrusted content isolation — ContentSanitizer pipeline (Phase 1, #1195)#1221
Merged
feat(security): untrusted content isolation — ContentSanitizer pipeline (Phase 1, #1195)#1221
Conversation
…1196-1199) Implements Phase 1 of epic #1195 (indirect prompt injection defense). New sanitizer module in zeph-core provides a four-step pipeline: 1. Truncate content to configurable max_content_size (64 KiB default) 2. Strip null bytes and non-printable control characters 3. Detect 17 injection patterns compiled from OWASP cheat sheet 4. Wrap in spotlighting XML delimiters (LocalUntrusted: <tool-output>, ExternalUntrusted: <external-data>) with inline WARNING for flagged content Critical fixes addressed: - CRIT-01: sanitizer applied to both Ok(output) and ConfirmationRequired branches of handle_tool_result (no bypass path) - CRIT-02: memory retrieval messages (recall, cross-session, corrections, document RAG, summaries) sanitized in prepare_context before insertion - CRIT-03: delimiter tag names escaped from content before wrapping to prevent spotlighting wrapper escape attacks Additional improvements: - IMP-03: 6 additional injection patterns (forget_everything, disregard, override_directives, act_as_if, html_image_exfil, markdown alt-text fix) - IMP-04: system prompt security note in BASE_PROMPT_TAIL - IMP-05: sanitizer_runs, sanitizer_injection_flags, sanitizer_truncations metrics in MetricsSnapshot - SUG-02: ContentIsolationConfig derives PartialEq SecurityConfig: Copy removed (now Clone only) due to nested ContentIsolationConfig. Config snapshot updated. runner.rs: config.security.clone() added.
FIX-01: Add xml_attr_escape() helper and apply to kind_str/id_str in apply_spotlight() to prevent XML attribute injection via crafted tool names or URLs (e.g. shell" trust="trusted would override trust marker). FIX-02: Add update_metrics() calls (sanitizer_runs, sanitizer_injection_flags, sanitizer_truncations) in sanitize_memory_message() — previously memory-path sanitizer runs were invisible in the metrics dashboard. FIX-03: Make escape_delimiter_tags() case-insensitive using regex replace_all with (?i) flag, so mixed-case variants like <Tool-Output> and <EXTERNAL-DATA> are also neutralized (previously only exact lowercase matched). FIX-04: Fix xml_tag_injection pattern regex typo: `s*` → `\s*` (whitespace zero-or-more). Fixes detection of space-padded tags like "< system>"; also removes false positive on non-tag like <sssystem>. 15 new tests added covering all 4 fixes.
Add reference chapter covering ContentSanitizer pipeline, trust levels, spotlighting format, injection pattern detection, coverage table, config reference, metrics, and known limitations. Update SUMMARY.md, root README, and zeph-core README.
This was
linked to
issues
Mar 5, 2026
…ng alerts in context.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1196, #1197, #1198, #1199
Part of epic #1195
Summary
Implements Phase 1 of the Untrusted Content Isolation epic. All content entering the LLM context from external sources (tool results, web scrape, MCP, A2A, memory retrieval) now passes through a four-step sanitization pipeline before reaching the model — defending against indirect prompt injection.
ContentSanitizer: truncate → strip control chars → detect 17 injection patterns → spotlight wrap<tool-output>for local tools,<external-data>with strong warning header for external sources<tool-output>and<external-data>tags are case-insensitively escaped from content before wrapping (CRIT-03)[security.content_isolation]TOML config section withenabled,max_content_size(64 KiB),flag_injection_patterns,spotlight_untrustedsanitizer_runs,sanitizer_injection_flags,sanitizer_truncationsinMetricsSnapshotIntegration points:
handle_tool_result: bothOk(Some(output))andConfirmationRequiredbranchesprepare_context: all 5 memory retrieval message paths (recall, cross-session, corrections, document RAG, summaries)Test plan
flag_injection_patterns: falseandspotlight_untrusted: falseconfig paths testedChecklist
cargo +nightly fmt --checkpassescargo clippy --workspace -- -D warningspassescargo nextest run --workspace --lib --binspasses (3782 passed)[Unreleased]section)