📦 This plugin is part of the Vainplex OpenClaw Suite — a collection of production plugins that turn OpenClaw into a self-governing, learning system. See the monorepo for the full picture.
In February 2026, UC Berkeley's Center for Long-Term Cybersecurity published a 67-page framework for governing autonomous AI agents. The same month, Microsoft's Cyber Pulse report revealed that 80% of Fortune 500 companies now run active AI agents — and 29% of employees use unsanctioned ones. Microsoft followed up with a threat analysis specific to OpenClaw, outlining identity, isolation, and runtime risks for self-hosted agents.
The gap is clear: agents are everywhere, governance is nowhere. The Berkeley framework defines what's needed. The existing tools — scanners, input/output filters, output validators — cover fragments. None of them do contextual, learning, runtime governance across agents.
This plugin does. It implements 8 of Berkeley's 12 core requirements today, with the remaining 4 designed and scheduled.
Zero runtime dependencies. Hundreds of tests. Production since February 2026.
UC Berkeley's Agentic AI Risk-Management Standards Profile and Microsoft's governance requirements define what responsible agent infrastructure looks like. Here's where this plugin stands:
| Requirement | Our Implementation | Status |
|---|---|---|
| Agent Registry | Trust config with per-agent scores, all 9 agents registered | ✅ Implemented |
| Access Control / Least Privilege | Per-agent tool blocking, trust tier-based permissions | ✅ Implemented |
| Real-time Monitoring | Every tool call evaluated against policies before execution | ✅ Implemented |
| Activity Logging / Audit Trail | Append-only JSONL, ISO 27001 / SOC 2 / NIS2 control mapping | ✅ Implemented |
| Emergency Controls | Night Mode (time-based blocking), Rate Limiter (frequency cap) | ✅ Implemented |
| Cascading Agent Policies | Cross-agent governance — parent policies propagate to sub-agents | ✅ Implemented |
| Autonomy Levels | Trust tiers (0–100, five levels) — functionally equivalent to Berkeley's L0–L5 | ✅ Implemented |
| Credential Protection | 3-layer redaction with SHA-256 vault, 17 built-in patterns, fail-closed | ✅ Implemented |
| Human-in-the-Loop | Approval 2FA — TOTP-based approval for agent tool calls. Session approval mode: one code unlocks 10 minutes of auto-approved execution. | ✅ Implemented |
| Semantic Intent Analysis | LLM-powered intent classification before tool execution | 📋 Planned |
| Multi-Agent Interaction Monitoring | Agent-to-agent message governance | 📋 Planned |
| Tamper-evident Audit | Hash-chain audit trail for compliance verification | 📋 Planned |
9 implemented. 3 planned. Production since 2026-02-18.
Most tools in this space solve a piece of the problem. None of them solve the whole thing.
| Tool | What It Does | What's Missing |
|---|---|---|
| Invariant Labs → Snyk | Runtime guardrails, MCP scanning, trace analysis | Acquired by Snyk — enterprise-only. No trust scores. No cross-agent governance. No compliance audit trail. |
| NVIDIA NeMo Guardrails | Input/output filtering, topical control | Filters messages, not tool calls. No agent context. No trust awareness. No multi-agent policies. |
| GuardrailsAI | Output validation, schema enforcement | Validates what comes out. No idea who called what, when, or whether they should have. Python-only. |
| SecureClaw | 56 audit checks, 5 hardening modules, OWASP-aligned | Scanner, not runtime. Tells you what's wrong — doesn't prevent it. No policies, no trust. |
| OpenClaw built-in | Tool allowlists, realpath containment, plugin sandboxing | Static config. No trust scoring. No time-awareness. No learning. No compliance mapping. |
The difference: those tools operate on inputs and outputs. This plugin operates on decisions — which tool, which agent, what time, what trust level, what frequency, what context. Then it decides, logs, and learns.
As Peter Steinberger noted, this is what a trust model for AI agents should look like.
Agent calls exec("git push origin main")
→ Governance evaluates: tool + time + trust + frequency + context
→ Verdict: DENY — "Forge cannot push to main (trust: restricted, score: 32)"
→ Audit record written (JSONL, compliance-mapped)
→ Agent gets a clear rejection reason
- Contextual Policies — Not just "which tool" but "which tool, when, by whom, at what risk level"
- Learning Trust — Score 0–100, five tiers, decay on inactivity. Sub-agents can never exceed parent's trust.
- Cross-Agent Governance — Parent policies cascade to sub-agents. Deny on main = deny on forge.
- Compliance Audit Trail — Append-only JSONL with ISO 27001/SOC 2/NIS2 control mapping.
Trust is not a config value. It's earned per conversation.
- Two-Tier Trust Model — Persistent agent trust (configured baseline) + ephemeral session trust (earned in real-time). A fresh session starts at 70% of agent trust and climbs with successful tool calls.
- Session Signals — Success (+1), policy block (−2), credential violation (−10). Clean streak bonus after 10 consecutive good calls.
- Ceiling & Floor — Sessions can earn up to 120% of agent trust, but can always drop to zero.
- Adaptive Display —
[Governance] Agent: main (60/trusted) | Session: 42/standard | Policies: 4
No existing governance tool implements session-level trust. Static per-agent allowlists don't capture that the same agent performs differently across sessions.
- Output Validation (RFC-006) — Detects unverified numeric claims, contradictions, and hallucinated system states. Configurable LLM gate for external communications.
- Redaction Layer (RFC-007) — 3-layer defense-in-depth for credentials, PII, and financial data. SHA-256 vault, fail-closed mode, 17 built-in patterns.
- Fact Registry — Register known facts (from live systems or static files). Claims are checked against facts with fuzzy numeric matching.
npm install @vainplex/openclaw-governance{
"plugins": {
"entries": {
"openclaw-governance": { "enabled": true }
}
}
}{
"enabled": true,
"timezone": "Europe/Berlin",
"failMode": "open",
"trust": {
"defaults": {
"main": 60,
"forge": 45,
"*": 10
}
},
"builtinPolicies": {
"nightMode": { "start": "23:00", "end": "06:00" },
"credentialGuard": true,
"productionSafeguard": true,
"rateLimiter": { "maxPerMinute": 15 }
},
"outputValidation": {
"enabled": true,
"unverifiedClaimPolicy": "flag"
},
"redaction": {
"enabled": true,
"categories": ["credential", "pii", "financial"],
"failMode": "closed"
}
}Real-time security intelligence for AI agents. Integrates ShieldAPI and ERC-8004 on-chain reputation into the governance layer.
- URL Threat Detection — Checks outbound URLs for phishing, malware, brand impersonation
- Prompt Injection Detection — Scans tool parameters for adversarial inputs (208 patterns)
- Domain Reputation — DNS, blacklist, SSL, SPF/DMARC checks on extracted domains
- On-Chain Reputation — ERC-8004 agent identity + reputation from Base blockchain
- Trust Enrichment — Security events automatically adjust agent trust scores
- x402 Auto-Pay — Automatic USDC micropayments when free tier exhausted
Minimum config — add to your governance config:
{
"agentFirewall": {
"enabled": true
}
}That's it. Defaults: flag mode (warn, don't block), ShieldAPI at shield.vainplex.dev, 5s timeout, fail-open.
Vainplex Governance is 100% compatible with NVIDIA NemoClaw and OpenShell out of the box.
While NemoClaw provides OS-level sandboxing (Landlock, seccomp), Vainplex acts as the Policy Decision Point inside the sandbox, providing Human-in-the-Loop 2FA and verifiable Merkle-Tree audit trails.
Since NemoClaw strictly isolates network namespaces, you must allowlist the following endpoints in your nemoclaw-blueprint.yaml for Vainplex to function correctly:
network_policies:
allowlist:
- domain: "shield.vainplex.dev" # For Agent Firewall / URL Threat Detection
port: 443
- domain: "your-nats-cluster.internal" # For EventStore Merkle-Tree Auditing
port: 4222{
"agentFirewall": {
"enabled": true,
"mode": "flag",
"baseUrl": "https://shield.vainplex.dev",
"timeoutMs": 5000,
"maxUrlsPerMessage": 10,
"domainAllowlist": ["mycompany.com", "*.internal.corp"],
"fallbackOnError": "allow",
"promptCheck": {
"enabled": true,
"tools": ["exec", "write", "edit", "sessions_spawn"],
"minConfidence": 0.7
},
"cache": {
"ttlSeconds": 3600,
"maxEntries": 256
},
"trustEnrichment": {
"enabled": true
},
"walletKey": "${SHIELDAPI_WALLET_KEY}",
"erc8004": {
"enabled": true,
"chain": "base",
"agentMapping": {
"myagent": 16700
}
}
}
}| Key | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | false |
Enable Agent Firewall |
mode |
"flag" | "block" |
"flag" |
Flag = warn only, Block = deny on threat |
baseUrl |
string | https://shield.vainplex.dev |
ShieldAPI endpoint |
timeoutMs |
number | 5000 |
Request timeout (ms) |
maxUrlsPerMessage |
number | 10 |
Max URLs to check per message |
domainAllowlist |
string[] | [] |
Additional domains to skip (supports *. globs) |
fallbackOnError |
"allow" | "block" |
"allow" |
Behavior when ShieldAPI is unreachable |
walletKey |
string | — | Wallet key for x402 auto-pay |
promptCheck.enabled |
boolean | true |
Enable prompt injection checking |
promptCheck.tools |
string[] | ["exec","write","edit","sessions_spawn"] |
Tools to check |
promptCheck.minConfidence |
number | 0.7 |
Min confidence to trigger |
cache.ttlSeconds |
number | 3600 |
Cache TTL |
cache.maxEntries |
number | 256 |
Max cache entries per check type |
trustEnrichment.enabled |
boolean | true |
Feed security events into trust scores |
erc8004.enabled |
boolean | false |
Enable on-chain reputation lookup |
erc8004.chain |
string | "base" |
Blockchain: base, ethereum, polygon |
erc8004.agentMapping |
object | {} |
Map agent IDs to on-chain IDs |
| Variable | Description |
|---|---|
AGENT_FIREWALL_WALLET_KEY |
Wallet key for x402 payments |
SHIELDAPI_WALLET_KEY |
Alternative wallet key env var |
Type /firewall to see:
- Current mode (flag/block)
- Cache stats (size, hits, misses)
- x402 wallet status
- ERC-8004 status
| Mode | URL Threat | Prompt Injection | Domain Risk |
|---|---|---|---|
| flag | Logs warning, message goes through | Logs warning, tool call proceeds | Logs warning |
| block | Message cancelled | Tool call blocked | Message cancelled |
These domains are never checked (plus any you add via domainAllowlist):
github.com, *.github.com, npmjs.com, api.openai.com, api.anthropic.com, *.vainplex.dev, *.vainplex.de
ShieldAPI offers 3 free calls per endpoint per day. After that:
- With
walletKey: Automatic x402 USDC micropayment ($0.001-$0.01) - Without
walletKey: Falls back to "unknown" risk (fail-open)
TOTP-based Human-in-the-Loop for agent tool calls. When a security-critical agent (e.g., your pentesting agent) tries to run exec, the system:
- Blocks the tool call via
before_tool_callhook - Batches multiple commands within a 3-second window
- Sends a notification to a dedicated Matrix room
- Waits for a 6-digit TOTP code from an authorized approver
- Approves the batch — and starts a Session Approval window
One TOTP code doesn't just approve one command — it unlocks all exec calls from that agent for a configurable duration (default: 10 minutes). No more entering codes for every nmap step.
🔒 APPROVAL REQUIRED (1 command)
Agent: vera
1. exec: nmap -sV -T4 127.0.0.1 --top-ports 20
Enter TOTP code (5min timeout)
✨ One code approves ALL commands for 10 minutes
- No dependency on OpenClaw's exec-approval system — works independently via plugin hooks
- Dedicated Matrix bot (
@governance:yourserver) sends notifications - Independent Matrix poller (2s interval) — reads TOTP codes directly from the governance room, no reliance on OpenClaw's Matrix sync
- TOTP replay protection — same code can't be used twice within the same period
- Periodic cleanup — expired sessions and cooldowns cleaned every 5 minutes
{
"approval2fa": {
"enabled": true,
"totpSecret": "YOUR_BASE32_SECRET",
"totpIssuer": "Vainplex Governance",
"totpLabel": "Agent Approval",
"timeoutSeconds": 300,
"maxAttempts": 3,
"cooldownSeconds": 900,
"batchWindowMs": 3000,
"sessionDurationMinutes": 10,
"approvers": ["@admin:yourserver.dev"],
"notifyChannel": "room:!yourRoomId:yourserver.dev"
}
}Create a dedicated Matrix bot account and a secrets file:
~/.openclaw/plugins/openclaw-governance/matrix-notify.json:
{
"homeserverUrl": "https://matrix.yourserver.dev",
"accessToken": "syt_your_bot_token",
"userId": "@governance:yourserver.dev"
}The bot needs to be invited to the notification room. The plugin's built-in Matrix poller reads responses directly — no need to configure the bot as an OpenClaw agent.
Create a policy that triggers 2FA for specific agents/tools:
{
"id": "agent-2fa",
"priority": 200,
"scope": {
"hooks": ["before_tool_call"],
"agents": ["vera"]
},
"rules": [{
"id": "exec-requires-2fa",
"conditions": [
{ "type": "tool", "name": "exec" }
],
"effect": {
"action": "2fa"
}
}]
}| Key | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | false |
Enable Approval 2FA |
totpSecret |
string | — | Base32-encoded TOTP secret (shared with authenticator app) |
totpIssuer |
string | — | TOTP issuer name (shown in authenticator) |
totpLabel |
string | — | TOTP label (shown in authenticator) |
timeoutSeconds |
number | 300 |
Seconds before auto-deny |
maxAttempts |
number | 3 |
Max wrong codes before cooldown |
cooldownSeconds |
number | 900 |
Cooldown after max attempts exceeded |
batchWindowMs |
number | 3000 |
Debounce window for batching commands |
sessionDurationMinutes |
number | 10 |
Auto-approve duration after valid TOTP |
approvers |
string[] | — | Matrix user IDs authorized to approve |
notifyChannel |
string | — | Matrix room for notifications (room:!id:server) |
- Fail-closed — if 2FA check errors, the tool call is blocked
- Approver-only — only configured Matrix users can enter codes
- Replay-protected — same TOTP code rejected within the same 30s period
- Rate-limited — 3 wrong codes → 15 minute cooldown
- Session-scoped — approval is per-agent, not global
3-layer defense-in-depth against credential, PII, and financial data leakage.
| Layer | Hook | When | Can Modify? |
|---|---|---|---|
| Layer 1 | tool_result_persist |
Before tool output is written to transcript | ✅ Yes (sync) |
| Layer 2 | message_sending |
Before outbound messages to channels | ✅ Yes (modifying) |
| Layer 2b | before_message_write |
Before message persistence | ✅ Yes (sync) |
| Category | Patterns |
|---|---|
| Credential | OpenAI API key, Anthropic key, Google API key, GitHub PAT/server token, GitLab PAT, Private key headers, Bearer tokens, Key-value credentials, AWS access key, Generic API key (sk-*), Basic Auth |
| PII | Email addresses, Phone numbers (international) |
| Financial | Credit card numbers (Luhn-valid), IBAN, US SSN |
Tool returns: "Found key sk_test_51Ss4R2..."
→ Layer 1: Pattern match → Replace with [REDACTED:api_key:a3f2]
→ SHA-256 hash stored in vault (1h TTL)
→ Transcript gets redacted version
→ If agent needs the real value later: vault resolves placeholder in before_tool_call
{
"redaction": {
"enabled": true,
"categories": ["credential", "pii", "financial"],
"vaultExpirySeconds": 3600,
"failMode": "closed",
"customPatterns": [
{
"name": "internal-token",
"regex": "MYAPP_[A-Z0-9]{32}",
"category": "credential"
}
],
"allowlist": {
"piiAllowedChannels": [],
"financialAllowedChannels": [],
"exemptTools": ["web_search"],
"exemptAgents": []
},
"performanceBudgetMs": 5
}
}- Credentials can NEVER be allowlisted — even exempt tools get credential-only scanning
- fail-closed — on redaction errors, output is suppressed entirely
- SHA-256 vault — no plaintext storage, hash collision handling, TTL-based expiry
- No secrets in logs — audit entries log categories and counts, never values
Be honest about what this does and doesn't protect.
| ✅ Protected | ❌ Not Protected |
|---|---|
| Tool outputs written to transcript | Live-streamed tool output (before persist) |
| Outbound messages to channels | Inbound user messages |
| Audit log entries | LLM context window (keys sent by user) |
| Persisted conversation history | Third-party tool-internal logging |
Why? OpenClaw streams tool output to the LLM in real-time for responsiveness. The tool_result_persist hook fires after streaming but before writing to the transcript. This means:
- If a tool returns a secret, the LLM sees it during the current turn (streaming)
- But the transcript and audit logs get the redacted version
- The LLM's response goes through Layer 2 (
message_sending) — so secrets won't appear in outbound messages
For maximum protection: Don't store secrets in files that agents can cat. Use a vault (Vaultwarden, 1Password CLI) and let agents fetch secrets via dedicated tools that you exempt from redaction.
Detects and flags potentially hallucinated or unverified claims in agent output.
| Detector | What It Catches |
|---|---|
system_state |
"The server is running" without live verification |
entity_name |
Incorrect names for known entities |
existence |
"Feature X exists" claims without evidence |
operational_status |
"Service Y is healthy" without live check |
Register known facts for claim verification:
{
"outputValidation": {
"enabled": true,
"factRegistries": [{
"id": "system-live",
"facts": [
{ "subject": "governance-tests", "predicate": "count", "value": "771", "source": "vitest" },
{ "subject": "nats-events", "predicate": "count", "value": "255908", "source": "nats stream ls" }
]
}],
"unverifiedClaimPolicy": "flag"
}
}| Policy | Effect |
|---|---|
ignore |
No action on unverified claims |
flag |
Add [UNVERIFIED] annotation |
warn |
Log warning |
block |
Block the message entirely |
For external communications (email, message tool, sessions_send), an optional LLM validator can verify claims against the fact registry before sending:
{
"outputValidation": {
"llmValidator": {
"enabled": true,
"model": "gemini/gemini-3-flash-preview",
"failMode": "open",
"maxRetries": 2,
"cacheSeconds": 300
}
}
}{
"id": "night-guard",
"rules": [{
"id": "deny-exec-at-night",
"conditions": [
{ "type": "tool", "name": ["exec", "gateway", "cron"] },
{ "type": "time", "after": "23:00", "before": "07:00" }
],
"effect": { "action": "deny", "reason": "High-risk tools blocked during night hours" }
}]
}{
"id": "spawn-control",
"rules": [{
"id": "require-trust",
"conditions": [
{ "type": "tool", "name": "sessions_spawn" },
{ "type": "agent", "maxScore": 39 }
],
"effect": { "action": "deny", "reason": "Agents below score 40 cannot spawn sub-agents" }
}]
}| Type | What it checks |
|---|---|
tool |
Tool name, parameters (exact, glob, regex) |
time |
Hour, day-of-week, named windows |
agent |
Agent ID, trust tier, score range |
context |
Conversation, message content, channel |
risk |
Computed risk level |
frequency |
Actions per time window |
any |
OR — at least one sub-condition |
not |
Negation |
All conditions in a rule are AND-combined. Use any for OR logic.
| Tier | Score | Capability |
|---|---|---|
untrusted |
0–19 | Read-only, no external actions |
restricted |
20–39 | Basic operations, no production |
standard |
40–59 | Normal operation |
trusted |
60–79 | Extended permissions, can spawn agents |
privileged |
80–100 | Full autonomy |
Trust modifiers: +0.1/success, -2/violation, +0.5/day age, +0.3/day clean streak. Decay: ×0.95 after 30 days inactive. Sub-agents inherit parent's trust ceiling.
| Policy | What it does |
|---|---|
nightMode |
Blocks risky tools during off-hours |
credentialGuard |
Blocks access to secrets, .env, passwords |
productionSafeguard |
Blocks systemctl, docker rm, destructive ops |
rateLimiter |
Throttles tool calls per minute |
Every decision → ~/.openclaw/plugins/openclaw-governance/governance/audit/YYYY-MM-DD.jsonl:
- One file per day, auto-cleaned after
retentionDays - Sensitive data redacted before write
- Each record maps to compliance controls (ISO 27001, SOC 2, NIS2)
- Policy evaluation: <5ms for 10+ regex policies
- Redaction scan: <5ms for typical tool output
- Zero runtime dependencies (Node.js builtins only)
- Pre-compiled regex cache, ring buffer frequency tracking
- Node.js ≥ 22.0.0
- OpenClaw gateway
| Plugin | Description |
|---|---|
| @vainplex/nats-eventstore | NATS JetStream event persistence + audit trail |
| @vainplex/openclaw-cortex | Conversation intelligence — threads, decisions, boot context, trace analysis |
| @vainplex/openclaw-governance | Policy engine — trust scores, credential redaction, production safeguards |
| @vainplex/openclaw-knowledge-engine | Entity and relationship extraction from conversations |
| @vainplex/openclaw-sitrep | Situation reports — health, goals, timers aggregated |
| @vainplex/openclaw-leuko | Cognitive immune system — health checks, anomaly detection |
| @vainplex/openclaw-membrane | Episodic memory bridge via gRPC |
Full suite: alberthild/vainplex-openclaw
MIT © Albert Hild