Skip to content

feat(security): untrusted content isolation — ContentSanitizer pipeline (Phase 1, #1195)#1221

Merged
bug-ops merged 6 commits intomainfrom
untrusted-content-isolation
Mar 5, 2026
Merged

feat(security): untrusted content isolation — ContentSanitizer pipeline (Phase 1, #1195)#1221
bug-ops merged 6 commits intomainfrom
untrusted-content-isolation

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 5, 2026

Closes #1196, #1197, #1198, #1199
Part of epic #1195

Summary

Implements Phase 1 of the Untrusted Content Isolation epic. All content entering the LLM context from external sources (tool results, web scrape, MCP, A2A, memory retrieval) now passes through a four-step sanitization pipeline before reaching the model — defending against indirect prompt injection.

  • ContentSanitizer: truncate → strip control chars → detect 17 injection patterns → spotlight wrap
  • Two-tier spotlighting: <tool-output> for local tools, <external-data> with strong warning header for external sources
  • 17 compiled injection patterns (OWASP cheat sheet + base64 + delimiter escape variants)
  • Flag-only approach: patterns are annotated, not removed, to preserve information
  • Delimiter escape prevention: <tool-output> and <external-data> tags are case-insensitively escaped from content before wrapping (CRIT-03)
  • XML attribute escaping on source identifiers in spotlight wrapper (SEC-01)
  • [security.content_isolation] TOML config section with enabled, max_content_size (64 KiB), flag_injection_patterns, spotlight_untrusted
  • Sanitizer metrics: sanitizer_runs, sanitizer_injection_flags, sanitizer_truncations in MetricsSnapshot
  • System prompt reinforcement note instructing LLM to treat spotlighted content as data

Integration points:

  • handle_tool_result: both Ok(Some(output)) and ConfirmationRequired branches
  • prepare_context: all 5 memory retrieval message paths (recall, cross-session, corrections, document RAG, summaries)

Test plan

  • 3782 tests pass (was 3728 pre-epic, +54 new tests)
  • All 17 injection patterns individually covered
  • Delimiter escape prevention tested for both lowercase and mixed-case tags
  • XML attribute injection prevention tested
  • Base64-encoded injection detection tested
  • Size truncation boundary tests (at limit, over limit)
  • Spotlighting wrapper format tested for both trust levels
  • Regression: legitimate security documentation not flagged
  • flag_injection_patterns: false and spotlight_untrusted: false config paths tested
  • Memory path metrics coverage tested

Checklist

  • cargo +nightly fmt --check passes
  • cargo clippy --workspace -- -D warnings passes
  • cargo nextest run --workspace --lib --bins passes (3782 passed)
  • CHANGELOG.md updated ([Unreleased] section)
  • docs/src/reference/security/untrusted-content-isolation.md added
  • SUMMARY.md updated
  • root README.md updated
  • crates/zeph-core/README.md updated

bug-ops added 3 commits March 5, 2026 14:14
…1196-1199)

Implements Phase 1 of epic #1195 (indirect prompt injection defense).

New sanitizer module in zeph-core provides a four-step pipeline:
1. Truncate content to configurable max_content_size (64 KiB default)
2. Strip null bytes and non-printable control characters
3. Detect 17 injection patterns compiled from OWASP cheat sheet
4. Wrap in spotlighting XML delimiters (LocalUntrusted: <tool-output>,
   ExternalUntrusted: <external-data>) with inline WARNING for flagged content

Critical fixes addressed:
- CRIT-01: sanitizer applied to both Ok(output) and ConfirmationRequired
  branches of handle_tool_result (no bypass path)
- CRIT-02: memory retrieval messages (recall, cross-session, corrections,
  document RAG, summaries) sanitized in prepare_context before insertion
- CRIT-03: delimiter tag names escaped from content before wrapping to
  prevent spotlighting wrapper escape attacks

Additional improvements:
- IMP-03: 6 additional injection patterns (forget_everything, disregard,
  override_directives, act_as_if, html_image_exfil, markdown alt-text fix)
- IMP-04: system prompt security note in BASE_PROMPT_TAIL
- IMP-05: sanitizer_runs, sanitizer_injection_flags, sanitizer_truncations
  metrics in MetricsSnapshot
- SUG-02: ContentIsolationConfig derives PartialEq

SecurityConfig: Copy removed (now Clone only) due to nested ContentIsolationConfig.
Config snapshot updated. runner.rs: config.security.clone() added.
FIX-01: Add xml_attr_escape() helper and apply to kind_str/id_str in
apply_spotlight() to prevent XML attribute injection via crafted tool
names or URLs (e.g. shell" trust="trusted would override trust marker).

FIX-02: Add update_metrics() calls (sanitizer_runs, sanitizer_injection_flags,
sanitizer_truncations) in sanitize_memory_message() — previously memory-path
sanitizer runs were invisible in the metrics dashboard.

FIX-03: Make escape_delimiter_tags() case-insensitive using regex replace_all
with (?i) flag, so mixed-case variants like <Tool-Output> and <EXTERNAL-DATA>
are also neutralized (previously only exact lowercase matched).

FIX-04: Fix xml_tag_injection pattern regex typo: `s*` → `\s*` (whitespace
zero-or-more). Fixes detection of space-padded tags like "< system>"; also
removes false positive on non-tag like <sssystem>.

15 new tests added covering all 4 fixes.
Add reference chapter covering ContentSanitizer pipeline, trust levels,
spotlighting format, injection pattern detection, coverage table, config
reference, metrics, and known limitations. Update SUMMARY.md, root README,
and zeph-core README.
@github-actions github-actions bot added enhancement New feature or request documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate size/XL Extra large PR (500+ lines) and removed enhancement New feature or request labels Mar 5, 2026
@github-actions github-actions bot added the enhancement New feature or request label Mar 5, 2026
@bug-ops bug-ops enabled auto-merge (squash) March 5, 2026 13:39
@github-actions github-actions bot added the tests Test-related changes label Mar 5, 2026
@bug-ops bug-ops merged commit 37b5a12 into main Mar 5, 2026
28 checks passed
@bug-ops bug-ops deleted the untrusted-content-isolation branch March 5, 2026 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation enhancement New feature or request rust Rust code changes size/XL Extra large PR (500+ lines) tests Test-related changes

Projects

None yet

2 participants