-
Notifications
You must be signed in to change notification settings - Fork 2
security(tools): adversarial policy agent — pre-execution LLM validation of tool calls against user-defined policies #2447
Description
Gap Source
Goose v1.28.0 (March 18, 2026) added adversarial policy agent. Competitive parity scan CI-307.
What Is Missing
Goose added an independent LLM reviewer: before executing tool calls, a second model instance validates the call against user-defined plain-language policies (e.g., 'never upload to public URLs'). Runs in a separate session to prevent context contamination.
Zeph has ContentSanitizer and three-class classifier, but no pre-execution LLM-based policy validation.
Why It Matters
Different threat model from existing defenses:
- ContentSanitizer: blocks injection in inputs to the main agent
- This: validates outputs (tool calls) before execution against explicit user policies
- Protects against a compromised main agent loop issuing dangerous calls that pass existing guards
Two reference agents now have this pattern (Goose + external adversarial-policy-agent library).
Implementation Sketch
- Add
[tools.policy]config section withenabled = false,policy_provider(fast model),policy_file(plain-language rules) - Before
ToolExecutor::execute(), call policy validator LLM with: tool name + args + policy rules - On rejection: return
PolicyViolationerror, log to audit trail, surface in TUI - Run in separate context (no access to main conversation history) to prevent context manipulation
Priority
P2 — meaningful security gap as Zeph gains more autonomous tool use in orchestration. Straightforward extension of existing audit/permission infrastructure.