security(tools): adversarial policy agent — pre-execution LLM validation of tool calls against user-defined policies

## Gap Source

Goose v1.28.0 (March 18, 2026) added adversarial policy agent. Competitive parity scan CI-307.

## What Is Missing

Goose added an independent LLM reviewer: before executing tool calls, a second model instance validates the call against user-defined plain-language policies (e.g., 'never upload to public URLs'). Runs in a separate session to prevent context contamination.

Zeph has ContentSanitizer and three-class classifier, but no pre-execution LLM-based policy validation.

## Why It Matters

Different threat model from existing defenses:
- ContentSanitizer: blocks injection in *inputs* to the main agent
- This: validates *outputs* (tool calls) before execution against explicit user policies
- Protects against a compromised main agent loop issuing dangerous calls that pass existing guards

Two reference agents now have this pattern (Goose + external adversarial-policy-agent library).

## Implementation Sketch

- Add `[tools.policy]` config section with `enabled = false`, `policy_provider` (fast model), `policy_file` (plain-language rules)
- Before `ToolExecutor::execute()`, call policy validator LLM with: tool name + args + policy rules
- On rejection: return `PolicyViolation` error, log to audit trail, surface in TUI
- Run in separate context (no access to main conversation history) to prevent context manipulation

## Priority

P2 — meaningful security gap as Zeph gains more autonomous tool use in orchestration. Straightforward extension of existing audit/permission infrastructure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security(tools): adversarial policy agent — pre-execution LLM validation of tool calls against user-defined policies #2447

Gap Source

What Is Missing

Why It Matters

Implementation Sketch

Priority

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

security(tools): adversarial policy agent — pre-execution LLM validation of tool calls against user-defined policies #2447

Description

Gap Source

What Is Missing

Why It Matters

Implementation Sketch

Priority

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions