feat(security): untrusted content isolation — ContentSanitizer pipeline (Phase 1, #1195) by bug-ops · Pull Request #1221 · bug-ops/zeph

bug-ops · 2026-03-05T13:33:16Z

Closes #1196, #1197, #1198, #1199
Part of epic #1195

Summary

Implements Phase 1 of the Untrusted Content Isolation epic. All content entering the LLM context from external sources (tool results, web scrape, MCP, A2A, memory retrieval) now passes through a four-step sanitization pipeline before reaching the model — defending against indirect prompt injection.

ContentSanitizer: truncate → strip control chars → detect 17 injection patterns → spotlight wrap
Two-tier spotlighting: <tool-output> for local tools, <external-data> with strong warning header for external sources
17 compiled injection patterns (OWASP cheat sheet + base64 + delimiter escape variants)
Flag-only approach: patterns are annotated, not removed, to preserve information
Delimiter escape prevention: <tool-output> and <external-data> tags are case-insensitively escaped from content before wrapping (CRIT-03)
XML attribute escaping on source identifiers in spotlight wrapper (SEC-01)
[security.content_isolation] TOML config section with enabled, max_content_size (64 KiB), flag_injection_patterns, spotlight_untrusted
Sanitizer metrics: sanitizer_runs, sanitizer_injection_flags, sanitizer_truncations in MetricsSnapshot
System prompt reinforcement note instructing LLM to treat spotlighted content as data

Integration points:

handle_tool_result: both Ok(Some(output)) and ConfirmationRequired branches
prepare_context: all 5 memory retrieval message paths (recall, cross-session, corrections, document RAG, summaries)

Test plan

3782 tests pass (was 3728 pre-epic, +54 new tests)
All 17 injection patterns individually covered
Delimiter escape prevention tested for both lowercase and mixed-case tags
XML attribute injection prevention tested
Base64-encoded injection detection tested
Size truncation boundary tests (at limit, over limit)
Spotlighting wrapper format tested for both trust levels
Regression: legitimate security documentation not flagged
flag_injection_patterns: false and spotlight_untrusted: false config paths tested
Memory path metrics coverage tested

Checklist

cargo +nightly fmt --check passes
cargo clippy --workspace -- -D warnings passes
cargo nextest run --workspace --lib --bins passes (3782 passed)
CHANGELOG.md updated ([Unreleased] section)
docs/src/reference/security/untrusted-content-isolation.md added
SUMMARY.md updated
root README.md updated
crates/zeph-core/README.md updated

…1196-1199) Implements Phase 1 of epic #1195 (indirect prompt injection defense). New sanitizer module in zeph-core provides a four-step pipeline: 1. Truncate content to configurable max_content_size (64 KiB default) 2. Strip null bytes and non-printable control characters 3. Detect 17 injection patterns compiled from OWASP cheat sheet 4. Wrap in spotlighting XML delimiters (LocalUntrusted: <tool-output>, ExternalUntrusted: <external-data>) with inline WARNING for flagged content Critical fixes addressed: - CRIT-01: sanitizer applied to both Ok(output) and ConfirmationRequired branches of handle_tool_result (no bypass path) - CRIT-02: memory retrieval messages (recall, cross-session, corrections, document RAG, summaries) sanitized in prepare_context before insertion - CRIT-03: delimiter tag names escaped from content before wrapping to prevent spotlighting wrapper escape attacks Additional improvements: - IMP-03: 6 additional injection patterns (forget_everything, disregard, override_directives, act_as_if, html_image_exfil, markdown alt-text fix) - IMP-04: system prompt security note in BASE_PROMPT_TAIL - IMP-05: sanitizer_runs, sanitizer_injection_flags, sanitizer_truncations metrics in MetricsSnapshot - SUG-02: ContentIsolationConfig derives PartialEq SecurityConfig: Copy removed (now Clone only) due to nested ContentIsolationConfig. Config snapshot updated. runner.rs: config.security.clone() added.

FIX-01: Add xml_attr_escape() helper and apply to kind_str/id_str in apply_spotlight() to prevent XML attribute injection via crafted tool names or URLs (e.g. shell" trust="trusted would override trust marker). FIX-02: Add update_metrics() calls (sanitizer_runs, sanitizer_injection_flags, sanitizer_truncations) in sanitize_memory_message() — previously memory-path sanitizer runs were invisible in the metrics dashboard. FIX-03: Make escape_delimiter_tags() case-insensitive using regex replace_all with (?i) flag, so mixed-case variants like <Tool-Output> and <EXTERNAL-DATA> are also neutralized (previously only exact lowercase matched). FIX-04: Fix xml_tag_injection pattern regex typo: `s*` → `\s*` (whitespace zero-or-more). Fixes detection of space-padded tags like "< system>"; also removes false positive on non-tag like <sssystem>. 15 new tests added covering all 4 fixes.

Add reference chapter covering ContentSanitizer pipeline, trust levels, spotlighting format, injection pattern detection, coverage table, config reference, metrics, and known limitations. Update SUMMARY.md, root README, and zeph-core README.

…y removal

…egration tests

crates/zeph-core/src/agent/context.rs

…ng alerts in context.rs

crates/zeph-core/src/agent/context.rs

bug-ops added 3 commits March 5, 2026 14:14

github-actions bot added enhancement New feature or request documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate size/XL Extra large PR (500+ lines) and removed enhancement New feature or request labels Mar 5, 2026

This was linked to issues Mar 5, 2026

[SEC-1.2] ContentSanitizer with injection pattern detection #1197

Closed

[SEC-1.3] Content isolation config section #1198

Closed

[SEC-1.4] ContextBuilder sanitizer integration #1199

Closed

fix(security): clone SecurityConfig in acp.rs and daemon.rs after Cop…

15f1158

…y removal

github-actions bot added the enhancement New feature or request label Mar 5, 2026

bug-ops enabled auto-merge (squash) March 5, 2026 13:39

fix(security): add content_isolation default to SecurityConfig in int…

c8715f8

…egration tests

github-actions bot added the tests Test-related changes label Mar 5, 2026

github-advanced-security bot found potential problems Mar 5, 2026

View reviewed changes

fix(security): use lgtm suppression format for CodeQL cleartext-loggi…

da49e93

…ng alerts in context.rs

github-advanced-security bot found potential problems Mar 5, 2026

View reviewed changes

bug-ops merged commit 37b5a12 into main Mar 5, 2026
28 checks passed

bug-ops deleted the untrusted-content-isolation branch March 5, 2026 14:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(security): untrusted content isolation — ContentSanitizer pipeline (Phase 1, #1195)#1221

feat(security): untrusted content isolation — ContentSanitizer pipeline (Phase 1, #1195)#1221
bug-ops merged 6 commits intomainfrom
untrusted-content-isolation

bug-ops commented Mar 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bug-ops commented Mar 5, 2026

Summary

Test plan

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants