Skip to content

Security Model

Marty McEnroe edited this page Feb 15, 2026 · 3 revisions

Security Model

Threat Model

Who Is the Adversary?

Unleashed's primary threat is accidental destruction by a hallucinating LLM — not a human attacker with network access. Claude Code can:

  • Delete files outside the project directory
  • Overwrite system configuration
  • Exfiltrate secrets via Bash commands (e.g., curl with env vars)
  • Run destructive git operations (push --force, reset --hard)

These are probabilistic (LLM hallucination) not deterministic (targeted attack). The safety system is designed for this threat model — it doesn't try to stop a sophisticated attacker who controls the terminal.

What Are We Protecting?

Asset Value Threat
Project source code High Accidental deletion, destructive git operations
System files Critical Writes to C:\Windows, Program Files, AppData
User data outside projects High OneDrive, Dropbox, personal documents
API keys and secrets Critical Exfiltration via curl, env var logging
Git history High Force push, hard reset

What Are We NOT Protecting Against?

  • Targeted attack on unleashed itself — an attacker with shell access can just kill the process
  • Malicious Claude Code updates — we trust Anthropic's distribution
  • Side-channel attacks — timing, power analysis, etc.
  • Social engineering — the user is the only operator

Trust Boundaries

flowchart TD
    subgraph TRUSTED ["Trusted (User's Machine)"]
        direction TB
        USER["User"]
        UNLEASHED["Unleashed Process"]
        CC["Claude Code Process"]
        FS["File System<br/>(within safe_paths)"]
    end

    subgraph SEMI ["Semi-Trusted (Boundaries)"]
        direction TB
        FSOUT["File System<br/>(outside safe_paths)"]
        GIT["Git Remote<br/>(push operations)"]
    end

    subgraph UNTRUSTED ["Untrusted (External)"]
        direction TB
        API["Anthropic API<br/>(sentinel Haiku calls)"]
        WEB["Internet<br/>(curl, wget, WebFetch)"]
        LLM["LLM Output<br/>(Claude's responses)"]
    end

    USER --> UNLEASHED
    UNLEASHED --> CC
    CC --> FS
    CC --> FSOUT
    CC --> GIT
    CC --> WEB
    UNLEASHED -->|"sentinel check"| API
    CC -.->|"generates"| LLM
    LLM -.->|"could contain<br/>prompt injection"| CC

    style UNTRUSTED fill:#ef4444,stroke:#991b1b,color:#fff
    style SEMI fill:#fbbf24,stroke:#92400e,color:#000
    style TRUSTED fill:#4ade80,stroke:#166534,color:#000
Loading

Data That Leaves the Machine

Destination Data Sent When Privacy Impact
Anthropic API (Haiku) Tool type + command text (first 500 chars) + CWD Every sentinel UNCERTAIN verdict Low — command text only, no file contents
Git remote Committed code git push (auto-approved) Project-scoped
Web (curl, wget, etc.) Whatever the command specifies When Claude runs web-fetching commands Depends on command — sentinel evaluates

Critical: Sentinel API calls send command text to Anthropic. If the command contains inline secrets (e.g., curl -H "Authorization: Bearer sk-...") those secrets reach the Haiku API. This is inherent to the evaluation model — sentinel must see the command to evaluate it. See "Privacy Considerations" below.

Known Vulnerabilities

Issue Severity Status Description
#38 P0 Critical Open API key in sentinel exception messages — str(e) may contain API key from HTTP headers
#41 P0 Critical Open in_approval stuck flag — no timeout, could freeze all approvals
#39 P2 Open Permission pattern spoofing — LLM output could trigger false approval
#42 P1 Open Bare except: break in stdin reader — silent thread death
#43 P1 Open Context buffer too small — sentinel may miss tool type
#3 P2 Open API key isolation causes silent hang in standalone sentinel

Mitigations in Place

Sentinel Three-Tier Defense

See Sentinel Safety Gate for full details.

  • Local rules: 19 hard block patterns, 12 safe patterns, path-based rules
  • API evaluation: Haiku LLM evaluates ambiguous commands
  • Fail-open: API errors don't freeze the session

Path-Based Access Control

Operations are allowed or blocked based on directory:

  • Safe paths: C:\Users\mcwiz\Projects\, /c/Users/mcwiz/Projects/
  • Excluded paths: OneDrive, AppData, .cache, .local, Dropbox, Google Drive, Windows, Program Files

Hard Block Commands

19 regex patterns for always-dangerous commands: dd, mkfs, shred, format, git push --force, git reset --hard, rm -rf /, etc.

Overlap Buffer

256-byte overlap between PTY read chunks prevents missing permission patterns at read boundaries.

Privacy Considerations

  1. Sentinel API calls: Command text (up to 500 chars) is sent to Anthropic's Haiku API for evaluation. This includes Bash commands, file paths, and command arguments. No file contents are sent.

  2. Mirror transcripts: Stored locally in logs/. Contain a filtered view of the session. Not transmitted anywhere.

  3. Shadow logs: Stored locally in logs/. Contain tool type and arguments for every detected permission prompt.

  4. Friction logs: Stored locally as JSONL. Contain permission prompt timing and verdict data.

  5. No telemetry: Unleashed sends no analytics, usage data, or crash reports.

API Key Management

Key Storage Used By
ANTHROPIC_API_KEY Environment variable (via ~/.agentos_secrets) Claude Code
AGENTOS_SENTINEL_KEY Environment variable (via ~/.agentos_secrets) Sentinel gate

Both keys are loaded from environment variables, sourced from ~/.agentos_secrets in .bash_profile. The secrets file is not version-controlled.

Known leak path (#38): When the Anthropic SDK raises an exception, str(e) may include HTTP request headers containing the API key. The current _api_check() returns str(e) as the error reason, which propagates to stderr and log files.

Recommendations for Security Reviewers

See For Security Reviewers for a structured review guide.

Clone this wiki locally