Security Model

Threat Model

Who Is the Adversary?

Unleashed's primary threat is accidental destruction by a hallucinating LLM — not a human attacker with network access. Claude Code can:

Delete files outside the project directory
Overwrite system configuration
Exfiltrate secrets via Bash commands (e.g., curl with env vars)
Run destructive git operations (push --force, reset --hard)

These are probabilistic (LLM hallucination) not deterministic (targeted attack). The safety system is designed for this threat model — it doesn't try to stop a sophisticated attacker who controls the terminal.

What Are We Protecting?

Asset	Value	Threat
Project source code	High	Accidental deletion, destructive git operations
System files	Critical	Writes to `C:\Windows`, `Program Files`, AppData
User data outside projects	High	OneDrive, Dropbox, personal documents
API keys and secrets	Critical	Exfiltration via curl, env var logging
Git history	High	Force push, hard reset

What Are We NOT Protecting Against?

Targeted attack on unleashed itself — an attacker with shell access can just kill the process
Malicious Claude Code updates — we trust Anthropic's distribution
Side-channel attacks — timing, power analysis, etc.
Social engineering — the user is the only operator

Trust Boundaries

flowchart TD
    subgraph TRUSTED ["Trusted (User's Machine)"]
        direction TB
        USER["User"]
        UNLEASHED["Unleashed Process"]
        CC["Claude Code Process"]
        FS["File System<br/>(within safe_paths)"]
    end

    subgraph SEMI ["Semi-Trusted (Boundaries)"]
        direction TB
        FSOUT["File System<br/>(outside safe_paths)"]
        GIT["Git Remote<br/>(push operations)"]
    end

    subgraph UNTRUSTED ["Untrusted (External)"]
        direction TB
        API["Anthropic API<br/>(sentinel Haiku calls)"]
        WEB["Internet<br/>(curl, wget, WebFetch)"]
        LLM["LLM Output<br/>(Claude's responses)"]
    end

    USER --> UNLEASHED
    UNLEASHED --> CC
    CC --> FS
    CC --> FSOUT
    CC --> GIT
    CC --> WEB
    UNLEASHED -->|"sentinel check"| API
    CC -.->|"generates"| LLM
    LLM -.->|"could contain<br/>prompt injection"| CC

    style UNTRUSTED fill:#ef4444,stroke:#991b1b,color:#fff
    style SEMI fill:#fbbf24,stroke:#92400e,color:#000
    style TRUSTED fill:#4ade80,stroke:#166534,color:#000

Data That Leaves the Machine

Destination	Data Sent	When	Privacy Impact
Anthropic API (Haiku)	Tool type + command text (first 500 chars) + CWD	Every sentinel UNCERTAIN verdict	Low — command text only, no file contents
Git remote	Committed code	`git push` (auto-approved)	Project-scoped
Web (curl, wget, etc.)	Whatever the command specifies	When Claude runs web-fetching commands	Depends on command — sentinel evaluates

Critical: Sentinel API calls send command text to Anthropic. If the command contains inline secrets (e.g., curl -H "Authorization: Bearer sk-...") those secrets reach the Haiku API. This is inherent to the evaluation model — sentinel must see the command to evaluate it. See "Privacy Considerations" below.

Known Vulnerabilities

Issue	Severity	Status	Description
#38	P0 Critical	Open	API key in sentinel exception messages — `str(e)` may contain API key from HTTP headers
#41	P0 Critical	Open	`in_approval` stuck flag — no timeout, could freeze all approvals
#39	P2	Open	Permission pattern spoofing — LLM output could trigger false approval
#42	P1	Open	Bare `except: break` in stdin reader — silent thread death
#43	P1	Open	Context buffer too small — sentinel may miss tool type
#3	P2	Open	API key isolation causes silent hang in standalone sentinel

Mitigations in Place

Sentinel Three-Tier Defense

See Sentinel Safety Gate for full details.

Local rules: 19 hard block patterns, 12 safe patterns, path-based rules
API evaluation: Haiku LLM evaluates ambiguous commands
Fail-open: API errors don't freeze the session

Path-Based Access Control

Operations are allowed or blocked based on directory:

Safe paths: C:\Users\mcwiz\Projects\, /c/Users/mcwiz/Projects/
Excluded paths: OneDrive, AppData, .cache, .local, Dropbox, Google Drive, Windows, Program Files

Hard Block Commands

19 regex patterns for always-dangerous commands: dd, mkfs, shred, format, git push --force, git reset --hard, rm -rf /, etc.

Overlap Buffer

256-byte overlap between PTY read chunks prevents missing permission patterns at read boundaries.

Privacy Considerations

Sentinel API calls: Command text (up to 500 chars) is sent to Anthropic's Haiku API for evaluation. This includes Bash commands, file paths, and command arguments. No file contents are sent.
Mirror transcripts: Stored locally in logs/. Contain a filtered view of the session. Not transmitted anywhere.
Shadow logs: Stored locally in logs/. Contain tool type and arguments for every detected permission prompt.
Friction logs: Stored locally as JSONL. Contain permission prompt timing and verdict data.
No telemetry: Unleashed sends no analytics, usage data, or crash reports.

API Key Management

Key	Storage	Used By
`ANTHROPIC_API_KEY`	Environment variable (via `~/.agentos_secrets`)	Claude Code
`AGENTOS_SENTINEL_KEY`	Environment variable (via `~/.agentos_secrets`)	Sentinel gate

Both keys are loaded from environment variables, sourced from ~/.agentos_secrets in .bash_profile. The secrets file is not version-controlled.

Known leak path (#38): When the Anthropic SDK raises an exception, str(e) may include HTTP request headers containing the API key. The current _api_check() returns str(e) as the error reason, which propagates to stderr and log files.

Recommendations for Security Reviewers

See For Security Reviewers for a structured review guide.

Home

Architecture

Safety & Security

Session Mirror

Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security Model

Security Model

Threat Model

Who Is the Adversary?

What Are We Protecting?

What Are We NOT Protecting Against?

Trust Boundaries

Data That Leaves the Machine

Known Vulnerabilities

Mitigations in Place

Sentinel Three-Tier Defense

Path-Based Access Control

Hard Block Commands

Overlap Buffer

Privacy Considerations

API Key Management

Recommendations for Security Reviewers

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Home

Clone this wiki locally