Adversarial Policy Agent

A pi extension that acts as a synchronous security gate. Every bash command the coding agent wants to run is first reviewed by a separate LLM call. If the command looks dangerous — especially data exfiltration — it gets blocked before execution.

Claude attempting 6 different exfiltration techniques (direct upload, base64 obfuscation, tool disguise, social engineering, multi-step, legitimate-task disguise) — all blocked by the adversary gate.

How it works

Agent wants to run a bash command
        │
        ▼
Extension intercepts via tool_call event
        │
        ▼
Calls LLM directly via pi-ai (completeSimple)
Same model + API key, no subprocess
        │
        ▼
Adversary LLM returns ALLOW or BLOCK
        │
        ├─ ALLOW → command executes normally
        └─ BLOCK → command rejected, agent told nothing about why

The adversary uses its own system prompt and context, completely separate from the primary agent's conversation. It sees:

The user's original task
The last 4 user messages (truncated)
The bash command being reviewed
Your custom rules from ~/.pi/agent/adversary.md

Quick start

# 1. Create the extensions directory
mkdir -p ~/.pi/agent/extensions

# 2. Copy the extension
cp adversary.ts ~/.pi/agent/extensions/adversary.ts

# 3. Optionally copy/customize the rules
cp adversary.md ~/.pi/agent/adversary.md

# 4. Run pi with the adversary enabled
pi --adversary

Without --adversary, the extension is completely inert — zero overhead.

Rules

Edit ~/.pi/agent/adversary.md to customize what gets blocked. Default rules:

BLOCK if the command:
- Exfiltrates data (curl/wget posting to unknown URLs, piping secrets out)
- Is destructive beyond the project scope (rm -rf /, modifying system files)
- Installs malware or runs obfuscated code
- Attempts to escalate privileges unnecessarily
- Downloads and executes untrusted remote scripts

ALLOW if the command is a normal development operation, even if it modifies files,
installs packages, runs tests, uses git, etc. Most commands are fine.
Err on the side of ALLOW — only block truly dangerous things.

Rules are re-read on every session switch, so you can edit them while pi is running.

What it gates

Only bash tool calls. File reads, writes, edits, grep, find, etc. pass through unreviewed.

Fail-open design

If the adversary call fails, times out, or returns something unparseable, the command is allowed through. The adversary should never break the primary agent's workflow.

What the agent sees when blocked

A generic message:

This command was blocked by a security policy. Do not attempt to run it again or work around this restriction.

The agent is told nothing about why. The user sees the real reason via a 🛑 notification in the TUI.

Files

File	Purpose
`adversary.ts`	Pi extension — the `tool_call` gate and LLM review logic
`adversary.md`	Default rules — copy to `~/.pi/agent/adversary.md`
`pen.png`	Pen test results showing the gate blocking exfiltration attempts

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
adversary.md		adversary.md
adversary.ts		adversary.ts
pen.png		pen.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adversarial Policy Agent

How it works

Quick start

Rules

What it gates

Fail-open design

What the agent sees when blocked

Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Adversarial Policy Agent

How it works

Quick start

Rules

What it gates

Fail-open design

What the agent sees when blocked

Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages