-
Notifications
You must be signed in to change notification settings - Fork 2
research(security): VIGIL verify-before-commit for tool output streams — 22% attack reduction, intent-anchored sanitization (arXiv:2601.05755) #2306
Copy link
Copy link
Open
Labels
P3Research — medium-high complexityResearch — medium-high complexityresearchResearch-driven improvementResearch-driven improvementsecuritySecurity-related issueSecurity-related issue
Description
Paper
arXiv:2601.05755 — VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit
Key Finding
A verify-before-commit protocol sanitizes tool output streams against user-intent-anchored constraints. Reduces attack success rate by over 22% while more than doubling agent utility under attack compared to static baselines.
Applicability to Zeph
- Tool output sanitization: Zeph's
ContentSanitizercurrently checks inputs (user messages). VIGIL targets outputs — tool results before they enter the LLM context. This is a gap: a malicious MCP tool response could inject instructions into the next LLM turn. - Intent anchoring: The key insight is that sanitization should be relative to the original user intent, not absolute. A tool result that says "ignore previous instructions" is obviously suspicious; VIGIL formalizes this check.
- Integration point: In
agent/tool_execution/, after calling the executor but before pushingtool_resultinto context. Check tool output against a cached representation of the user's original intent. - Relation to fix(security): sanitize MCP tool descriptions before interpolating into pruning prompt #2297: PR fix(security): sanitize MCP tool descriptions before interpolating into pruning prompt #2297 (sanitize MCP tool descriptions before pruning) addresses a narrower case. VIGIL's approach is more general — covering all tool outputs at runtime.
- Priority: P3 — complements existing injection defense but requires implementing intent extraction/anchoring, which is non-trivial.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P3Research — medium-high complexityResearch — medium-high complexityresearchResearch-driven improvementResearch-driven improvementsecuritySecurity-related issueSecurity-related issue