Skip to content

feat(security): exfiltration guards (Phase 4, #1195)#1246

Merged
bug-ops merged 2 commits intomainfrom
feat/m32/exfiltration-guards
Mar 5, 2026
Merged

feat(security): exfiltration guards (Phase 4, #1195)#1246
bug-ops merged 2 commits intomainfrom
feat/m32/exfiltration-guards

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 5, 2026

Summary

Phase 4 of Untrusted Content Isolation epic (#1195). Adds ExfiltrationGuard to zeph-core with three independently toggleable guards:

  • Markdown image scanner: strips inline (![](url)) and reference-style (![alt][ref]) image injection from LLM output at 5 integration points (streaming, non-streaming, native text, ToolUse text, cache hits)
  • Tool URL validator: cross-references URLs in tool call arguments against flagged untrusted content; flag-only approach (no blocking) consistent with Phase 1 philosophy
  • Memory write guard: skips Qdrant embedding for injection-flagged content while preserving SQLite conversation continuity

Config

[security.exfiltration_guard]
block_markdown_images = true   # default
validate_tool_urls = true      # default
guard_memory_writes = true     # default

Changes

  • crates/zeph-core/src/sanitizer/exfiltration.rs (new): ExfiltrationGuard, ExfiltrationGuardConfig, ExfiltrationEvent, 20 unit tests
  • crates/zeph-core/src/agent/tool_execution.rs: 5 scan integration points, tool URL validation, flagged URL collection
  • crates/zeph-core/src/agent/persistence.rs: has_injection_flags parameter on persist_message
  • crates/zeph-core/src/config/types.rs: ExfiltrationGuardConfig under SecurityConfig
  • crates/zeph-core/src/metrics.rs: 3 new counters
  • Docs, READMEs, CHANGELOG updated

Review process

Architecture critic (4 significant + 4 minor gaps) → all addressed in implementation → code review (2 critical + 3 important) → all fixed and re-review approved.

Known limitations (Phase 5)

  • Percent-encoded scheme bypass (%68ttps://)
  • HTML <img> tag injection
  • Unicode zero-width joiner bypass
  • Streaming chunks visible before accumulated scan

Test plan

  • 20 unit tests for ExfiltrationGuard (scan_output, validate_tool_call, memory guard)
  • cargo nextest run --features full: 4029/4029 pass
  • cargo clippy --features full: 0 warnings
  • cargo +nightly fmt --check: clean

Closes phase 4 of #1195.

… writes (Phase 4, #1195)

Add ExfiltrationGuard to zeph-core with three independently toggleable
guards under [security.exfiltration_guard] config section:

- Output scanner: strips markdown image injection (inline + reference-style)
  from LLM responses at 5 integration points (streaming, non-streaming,
  native text, ToolUse text, cache hits)
- Tool URL validator: cross-references URLs in tool call arguments against
  flagged untrusted content using HashSet for O(1) lookups
- Memory write guard: skips Qdrant embedding for injection-flagged content
  while preserving SQLite conversation continuity

Addresses all architecture critic findings (S1-S4, M1-M4) and code review
issues (SEC-EX-01 through SEC-EX-06). Adds 20 unit tests, 3 metrics
counters, and updates docs and READMEs.
@github-actions github-actions bot added documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 5, 2026
@bug-ops bug-ops merged commit 61c4627 into main Mar 5, 2026
28 checks passed
@bug-ops bug-ops deleted the feat/m32/exfiltration-guards branch March 5, 2026 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation enhancement New feature or request rust Rust code changes size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SEC-4.3] Memory write poisoning guard [SEC-4.2] Tool call argument validation guard [SEC-4.1] Markdown image exfiltration guard

1 participant