A pi extension that acts as a synchronous security gate. Every bash command the coding agent wants to run is first reviewed by a separate LLM call. If the command looks dangerous — especially data exfiltration — it gets blocked before execution.
Claude attempting 6 different exfiltration techniques (direct upload, base64 obfuscation, tool disguise, social engineering, multi-step, legitimate-task disguise) — all blocked by the adversary gate.
Agent wants to run a bash command
│
▼
Extension intercepts via tool_call event
│
▼
Calls LLM directly via pi-ai (completeSimple)
Same model + API key, no subprocess
│
▼
Adversary LLM returns ALLOW or BLOCK
│
├─ ALLOW → command executes normally
└─ BLOCK → command rejected, agent told nothing about why
The adversary uses its own system prompt and context, completely separate from the primary agent's conversation. It sees:
- The user's original task
- The last 4 user messages (truncated)
- The bash command being reviewed
- Your custom rules from
~/.pi/agent/adversary.md
# 1. Create the extensions directory
mkdir -p ~/.pi/agent/extensions
# 2. Copy the extension
cp adversary.ts ~/.pi/agent/extensions/adversary.ts
# 3. Optionally copy/customize the rules
cp adversary.md ~/.pi/agent/adversary.md
# 4. Run pi with the adversary enabled
pi --adversaryWithout --adversary, the extension is completely inert — zero overhead.
Edit ~/.pi/agent/adversary.md to customize what gets blocked. Default rules:
BLOCK if the command:
- Exfiltrates data (curl/wget posting to unknown URLs, piping secrets out)
- Is destructive beyond the project scope (rm -rf /, modifying system files)
- Installs malware or runs obfuscated code
- Attempts to escalate privileges unnecessarily
- Downloads and executes untrusted remote scripts
ALLOW if the command is a normal development operation, even if it modifies files,
installs packages, runs tests, uses git, etc. Most commands are fine.
Err on the side of ALLOW — only block truly dangerous things.
Rules are re-read on every session switch, so you can edit them while pi is running.
Only bash tool calls. File reads, writes, edits, grep, find, etc. pass through unreviewed.
If the adversary call fails, times out, or returns something unparseable, the command is allowed through. The adversary should never break the primary agent's workflow.
A generic message:
This command was blocked by a security policy. Do not attempt to run it again or work around this restriction.
The agent is told nothing about why. The user sees the real reason via a 🛑 notification in the TUI.
| File | Purpose |
|---|---|
adversary.ts |
Pi extension — the tool_call gate and LLM review logic |
adversary.md |
Default rules — copy to ~/.pi/agent/adversary.md |
pen.png |
Pen test results showing the gate blocking exfiltration attempts |
