Skip to content

security(tools): adversarial policy agent — pre-execution LLM validation of tool calls against user-defined policies #2447

@bug-ops

Description

@bug-ops

Gap Source

Goose v1.28.0 (March 18, 2026) added adversarial policy agent. Competitive parity scan CI-307.

What Is Missing

Goose added an independent LLM reviewer: before executing tool calls, a second model instance validates the call against user-defined plain-language policies (e.g., 'never upload to public URLs'). Runs in a separate session to prevent context contamination.

Zeph has ContentSanitizer and three-class classifier, but no pre-execution LLM-based policy validation.

Why It Matters

Different threat model from existing defenses:

  • ContentSanitizer: blocks injection in inputs to the main agent
  • This: validates outputs (tool calls) before execution against explicit user policies
  • Protects against a compromised main agent loop issuing dangerous calls that pass existing guards

Two reference agents now have this pattern (Goose + external adversarial-policy-agent library).

Implementation Sketch

  • Add [tools.policy] config section with enabled = false, policy_provider (fast model), policy_file (plain-language rules)
  • Before ToolExecutor::execute(), call policy validator LLM with: tool name + args + policy rules
  • On rejection: return PolicyViolation error, log to audit trail, surface in TUI
  • Run in separate context (no access to main conversation history) to prevent context manipulation

Priority

P2 — meaningful security gap as Zeph gains more autonomous tool use in orchestration. Straightforward extension of existing audit/permission infrastructure.

Metadata

Metadata

Assignees

Labels

P2High value, medium complexityllmzeph-llm crate (Ollama, Claude)securitySecurity-related issue

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions