Skip to content

research(security): VIGIL verify-before-commit for tool output streams — 22% attack reduction, intent-anchored sanitization (arXiv:2601.05755) #2306

@bug-ops

Description

@bug-ops

Paper

arXiv:2601.05755VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit

Key Finding

A verify-before-commit protocol sanitizes tool output streams against user-intent-anchored constraints. Reduces attack success rate by over 22% while more than doubling agent utility under attack compared to static baselines.

Applicability to Zeph

  • Tool output sanitization: Zeph's ContentSanitizer currently checks inputs (user messages). VIGIL targets outputs — tool results before they enter the LLM context. This is a gap: a malicious MCP tool response could inject instructions into the next LLM turn.
  • Intent anchoring: The key insight is that sanitization should be relative to the original user intent, not absolute. A tool result that says "ignore previous instructions" is obviously suspicious; VIGIL formalizes this check.
  • Integration point: In agent/tool_execution/, after calling the executor but before pushing tool_result into context. Check tool output against a cached representation of the user's original intent.
  • Relation to fix(security): sanitize MCP tool descriptions before interpolating into pruning prompt #2297: PR fix(security): sanitize MCP tool descriptions before interpolating into pruning prompt #2297 (sanitize MCP tool descriptions before pruning) addresses a narrower case. VIGIL's approach is more general — covering all tool outputs at runtime.
  • Priority: P3 — complements existing injection defense but requires implementing intent extraction/anchoring, which is non-trivial.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Research — medium-high complexityresearchResearch-driven improvementsecuritySecurity-related issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions