Execution-Layer Security
for AI Agents.

Policy-enforced execution gateway that intercepts file, network, and process activity at runtime—no matter what the prompt, tool output, or user says.

Models are probabilistic. Execution must be deterministic. agentsh enforces policy at the syscall level.

Get started View on GitHub

agentsh session

❯ agentsh exec $SID -- curl https://api.example.com

net_connect → api.example.com:443 allow

file_write → /tmp/response.json allow

❯ agentsh exec $SID -- rm -rf /workspace/cache

delete → /workspace/cache approve?

↳ "Confirm delete? [y/N]"

❯ agentsh exec $SID -- cat ~/.ssh/id_rsa

file_read → ~/.ssh/id_rsa deny

❯

Why agentsh

The agent proposes.
The policy decides.

Prompt injection, jailbreaks, and plain old mistakes all look the same at the execution layer. agentsh evaluates every action at the moment it happens—allow, deny, approve, or steer—independent of prompt compliance.

Prompt-proof enforcement

Prompts can drift. Policies don’t. Enforcement happens at the system call level—where files open, sockets connect, and processes spawn.

See everything

Every file, network, and process operation is captured—including subprocess trees—so you can understand what really happened.

Approval gates

Risky operations pause for explicit confirmation. Agents can request, but humans (or CI policy) decide.

Why “just tell the agent not to” fails

Prompt injection — Content in files, web pages, or tool output can hijack instructions mid-run.

Jailbreaks — Clever phrasing can persuade the model to “make an exception” for something unsafe.

Reasoning errors — Agents misunderstand tasks and take destructive actions “by accident.”

Subprocess blind spots — pip installs, npm scripts, makefiles: tools spawn work the agent can't fully "see."

Secret exposure — Environment variables, .env files, mounted credentials—all accessible to the agent, even in containers.

Architecture

Drop-in execution-layer gateway

Place agentsh under your agent or harness. It intercepts syscalls, applies your policy, and emits structured events you can route anywhere.

Agent / Harness

Claude, GPT, Cursor…

request

agentsh

Policy engine

allow

deny

approve

if allowed

System

Files, network, processes

events

Audit Log

Structured events

allow

Operation proceeds normally

deny

Operation blocked with message

approve

Human confirmation required

redirect

Swap to safe alternative

audit

Allow + detailed logging

soft_delete

Quarantine with restore

Quick Start

Start in under a minute

1 Install agentsh

Debian/Ubuntu

# Download from GitHub releases
sudo dpkg -i agentsh_<VERSION>_linux_amd64.deb

# Or build from source
make build
sudo install -m 0755 bin/agentsh /usr/local/bin
            

2 Run through it

Terminal

# Create a session
SID=$(agentsh session create --workspace . | jq -r .id)

# Run commands through agentsh
agentsh exec "$SID" -- ls -la

# Structured output for agents
agentsh exec --output json "$SID" -- curl https://example.com
            

3 Run your agent under agentsh

The best way to use agentsh in development is to run your coding agent as a child process. All shell commands the agent spawns will be intercepted and policy-checked automatically.

            terminal
            Run agent under agentsh
          

# Claude Code
agentsh exec --root . -- claude

# OpenAI Codex CLI
agentsh exec --root . -- codex

# Cursor
agentsh exec --root . -- cursor

# Any agent that spawns shell commands
agentsh exec --root . -- your-agent-command
          

Alternative: Instruct via AGENTS.md (when you can't run under agentsh)

Advisory, not enforced. This approach relies on the agent following instructions—it can be bypassed. Use only when you can't run the agent under agentsh or use the shell shim.

            AGENTS.md / CLAUDE.md
            Add to your repo
          

## Shell access

- Run commands via agentsh (not directly in bash/zsh).
- Use: agentsh exec $SID -- <your-command>
- For structured output: agentsh exec --output json $SID -- <cmd>
- Get session ID first: SID=$(agentsh session create --workspace . | jq -r .id)
          

Container Deployment

Containers isolate.
agentsh governs.

Containers limit where an agent can cause damage—but inside the container, it's still a free-for-all. The agent can read any file, access your env vars and secrets, hit any endpoint, and delete your workspace. agentsh adds the missing layer: control over what the agent can actually do.

Install a lightweight shell shim in your container. The agent thinks it's calling /bin/bash—but every command routes through agentsh and gets policy-checked.

No changes to agent code or prompts
Works with any framework (Claude Code, Cursor, custom)
Captures subprocess trees spawned by scripts
Same allow/deny/approve/steer decisions

              
                Dockerfile
              
FROM debian:bookworm-slim

# Install agentsh
RUN dpkg -i agentsh_*_linux_amd64.deb

# Install the shell shim — this is the magic
RUN agentsh shim install-shell \
  --root / \
  --shim /usr/bin/agentsh-shell-shim \
  --bash \
  --i-understand-this-modifies-the-host

# Point to agentsh server (sidecar or host)
ENV AGENTSH_SERVER=http://127.0.0.1:18080

# Now any /bin/bash or /bin/sh call goes through agentsh
# Agents never know the difference

What happens: The shim swaps /bin/bash and /bin/sh. When an agent calls subprocess.run(["bash", "-c", "…"]), it actually hits agentsh—which applies policy and logs the outcome.

Full Harness Protection

Wrap the harness, not just the shell

Harnesses like Claude Code or Cursor include built-in tools that bypass the shell—file edits, execution, and network requests. Run the harness itself under agentsh to govern everything end-to-end.

The gotcha: When a harness writes a file via an internal tool (e.g., str_replace), no shell is spawned. If you only shimmed bash, you’d miss it.

Catches built-in file/network tools, not just shell commands
Policy applies to the harness and everything it spawns
One complete audit trail for the entire agent run
Works with Claude Code, Cursor, Aider, or custom harnesses

Wrap the entire harness

# Create a session for the entire agent run
SID=$(agentsh session create \
  --workspace /project \
  --policy agent-sandbox | jq -r .id)

# Run your agent harness UNDER agentsh
agentsh exec "$SID" -- claude-code --project /project

# Or with Cursor, Aider, custom harness...
agentsh exec "$SID" -- cursor --folder /project
agentsh exec "$SID" -- python my_agent.py
            

What gets captured

# Even built-in harness tools are governed:
├─ file_write → /project/src/main.py      ✓ allow
├─ file_read  → /etc/passwd               ✗ deny
├─ net_connect → api.openai.com:443       ✓ allow
├─ net_connect → evil.com:443             ✗ deny
├─ exec       → npm install               ✓ allow
│  └─ file_write → /project/node_modules  ✓ allow
│  └─ net_connect → registry.npmjs.org    ✓ allow
├─ exec       → curl http://attacker.com  ✗ deny
└─ file_delete → /project/.env            ? approve
            

Policy Engine

Policies as code

Define what's allowed, what needs approval, what gets steered, and what's blocked—using simple YAML you can version, review, and ship.

            
              
              
              
              
            
            policies/dev-safe.yaml
          

file_rules:
  - name: allow-workspace
    paths: ["/workspace/**"]
    operations: [read, write, create]
    decision: allow

  - name: approve-delete
    paths: ["/workspace/**"]
    operations: [delete]
    decision: approve
    message: "Delete {{.Path}}?"

  - name: deny-secrets
    paths: ["**/.env", "**/.env.*", "**/credentials*"]
    decision: deny

  - name: deny-ssh
    paths: ["~/.ssh/**"]
    decision: deny

network_rules:
  - name: allow-api
    domains: ["api.example.com"]
    ports: [443]
    decision: allow

command_rules:
  - name: block-env-dump
    commands: [env, printenv]
    decision: deny

  - name: block-dangerous
    commands: [rm, shutdown, reboot]
    decision: deny

env_rules:
  - name: npm-registry-only
    commands: [npm, yarn]
    allow: [NPM_TOKEN, NODE_ENV]

  - name: db-migrate-only
    commands: [prisma, drizzle]
    allow: [DATABASE_URL]
          

Starter policy packs

dev-safe.yaml

workspace local

Great default for local work. Workspace access, env/secrets/SSH denied, and a tight network allowlist.

ci-strict.yaml

CI/CD strict

Designed for CI runners. Deny outside workspace and restrict network to registries and required endpoints.

agent-sandbox.yaml

sandbox paranoid

For running unknown code. Default deny, explicit allowlists, approvals, and soft-delete quarantine.

Escape the Retry Loop

Steer, don't just block

A deny often triggers “try harder.” Different flags, different paths, Base64 encoding, creative workarounds. You’ve seen it: dozens of retries, wasted tokens, and the task still doesn’t complete.

The deny spiral:

✗ curl https://example.com → denied

✗ wget https://example.com → denied

✗ python -c "import urllib..." → denied

✗ nc -v example.com 443 → denied

... 10 more attempts, still denied

With steering:

↪ curl https://example.com → steered to agentsh-fetch

✓ Operation succeeds, agent continues

✓ You control what actually happens

✓ Full audit trail preserved

Steering keeps work moving. The agent sees success and proceeds. You decide where writes land, which endpoints get hit, and what command is truly executed—without breaking the flow.

steering policy examples

network_rules:
  # Route npm to internal registry
  - name: steer-npm
    destinations: ["registry.npmjs.org"]
    decision: redirect
    redirect_to: "npm.internal.corp"
    message: "Steered to internal registry"

command_rules:
  # Route downloads through audited proxy
  - name: steer-curl
    commands: [curl, wget]
    decision: redirect
    message: "Downloads routed through audit"
    redirect_to:
      command: agentsh-fetch
      args: ["--audit"]

# Low-level network steering (DNS/TCP)
dns_redirects:
  - name: api-to-proxy
    match: ".*\\.anthropic\\.com"
    resolve_to: "10.0.0.50"

connect_redirects:
  - name: route-api
    match: "api\\.anthropic\\.com:443"
    redirect_to: "vertex-proxy:443"
            

Great for: sandboxing untrusted code, mocking external services, enforcing workspace boundaries, auditing network access, and transparently routing API calls through internal proxies—without derailing the agent.

Execution-Layer Security for AI Agents.

The agent proposes. The policy decides.