Skip to content

michaelneale/adversarial-policy-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Adversarial Policy Agent

A pi extension that acts as a synchronous security gate. Every bash command the coding agent wants to run is first reviewed by a separate LLM call. If the command looks dangerous — especially data exfiltration — it gets blocked before execution.

Pen test results — all exfiltration attempts blocked

Claude attempting 6 different exfiltration techniques (direct upload, base64 obfuscation, tool disguise, social engineering, multi-step, legitimate-task disguise) — all blocked by the adversary gate.

How it works

Agent wants to run a bash command
        │
        ▼
Extension intercepts via tool_call event
        │
        ▼
Calls LLM directly via pi-ai (completeSimple)
Same model + API key, no subprocess
        │
        ▼
Adversary LLM returns ALLOW or BLOCK
        │
        ├─ ALLOW → command executes normally
        └─ BLOCK → command rejected, agent told nothing about why

The adversary uses its own system prompt and context, completely separate from the primary agent's conversation. It sees:

  1. The user's original task
  2. The last 4 user messages (truncated)
  3. The bash command being reviewed
  4. Your custom rules from ~/.pi/agent/adversary.md

Quick start

# 1. Create the extensions directory
mkdir -p ~/.pi/agent/extensions

# 2. Copy the extension
cp adversary.ts ~/.pi/agent/extensions/adversary.ts

# 3. Optionally copy/customize the rules
cp adversary.md ~/.pi/agent/adversary.md

# 4. Run pi with the adversary enabled
pi --adversary

Without --adversary, the extension is completely inert — zero overhead.

Rules

Edit ~/.pi/agent/adversary.md to customize what gets blocked. Default rules:

BLOCK if the command:
- Exfiltrates data (curl/wget posting to unknown URLs, piping secrets out)
- Is destructive beyond the project scope (rm -rf /, modifying system files)
- Installs malware or runs obfuscated code
- Attempts to escalate privileges unnecessarily
- Downloads and executes untrusted remote scripts

ALLOW if the command is a normal development operation, even if it modifies files,
installs packages, runs tests, uses git, etc. Most commands are fine.
Err on the side of ALLOW — only block truly dangerous things.

Rules are re-read on every session switch, so you can edit them while pi is running.

What it gates

Only bash tool calls. File reads, writes, edits, grep, find, etc. pass through unreviewed.

Fail-open design

If the adversary call fails, times out, or returns something unparseable, the command is allowed through. The adversary should never break the primary agent's workflow.

What the agent sees when blocked

A generic message:

This command was blocked by a security policy. Do not attempt to run it again or work around this restriction.

The agent is told nothing about why. The user sees the real reason via a 🛑 notification in the TUI.

Files

File Purpose
adversary.ts Pi extension — the tool_call gate and LLM review logic
adversary.md Default rules — copy to ~/.pi/agent/adversary.md
pen.png Pen test results showing the gate blocking exfiltration attempts

About

extension for pi which applies policies for certain tool calls, stop accidental egress

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages