Skip to content

Sentinel Safety Gate

Marty McEnroe edited this page Feb 14, 2026 · 1 revision

Sentinel Safety Gate

Sentinel is unleashed's integrated safety system. It evaluates commands before auto-approval, catching dangerous operations that a hallucinating LLM might attempt. The design prioritizes availability over strictness — sentinel should help when it can and stay out of the way when it can't.

Three-Tier Architecture

flowchart TD
    CMD["Command Arrives<br/>(from permission detection)"]
    LOCAL{"Tier 1: Local Rules<br/>(regex, less than 1ms)"}
    API{"Tier 2: Haiku API<br/>(LLM, 1-3s)"}
    FAILOPEN["Tier 3: Fail-Open<br/>(approve with warning)"]

    CMD --> LOCAL
    LOCAL -->|"ALLOW<br/>(safe pattern match)"| APPROVE["Auto-Approve"]
    LOCAL -->|"BLOCK<br/>(hard block match)"| BLOCK["Withhold Approval<br/>(user decides)"]
    LOCAL -->|"UNCERTAIN<br/>(no match)"| API

    API -->|"ALLOW"| APPROVE
    API -->|"BLOCK: reason"| BLOCK
    API -->|"ERROR<br/>(timeout, network, etc.)"| FAILOPEN

    FAILOPEN --> APPROVE

    style APPROVE fill:#4ade80,stroke:#333,color:#000
    style BLOCK fill:#ef4444,stroke:#333,color:#fff
    style FAILOPEN fill:#fbbf24,stroke:#333,color:#000
    style LOCAL fill:#60a5fa,stroke:#333,color:#000
    style API fill:#a78bfa,stroke:#333,color:#000
Loading

See ADR-004 for the full decision record.

Tier 1: Local Rules (sentinel_rules.py)

Fast regex-based decisions using pre-existing safety data from ~/.agentos/.

Safe Patterns (ALLOW instantly)

Pattern Matches
^(ls|dir|cat|head|tail|...) Read-only file operations
^git\s+(status|log|diff|show|branch|...) Read-only git operations
^(pwd|echo|printf|date|whoami|...) Environment queries
^(grep|rg|find|fd|ag) Search tools
^poetry\s+(run|install|add|show|lock) Python package management
^pytest Test runner
^gh\s+(issue|pr|repo|api)\s+(list|view|...) GitHub CLI
^git\s+(add|commit|push(?!.*--force)) Non-destructive git writes

Hard Blocks (BLOCK instantly)

Loaded from ~/.agentos/hard_block_commands.txt — 19 patterns:

  • Always blocked: dd, mkfs, shred, format, disk operations
  • Conditional (blocked outside safe paths): rm -rf, del /s, rmdir /s, file deletion
  • Git destructive: git reset --hard, git clean -fd, git push --force

Conditional blocks check whether the command targets a safe path (directories listed in ~/.agentos/safe_paths.txt). Destructive operations inside C:\Users\mcwiz\Projects are allowed through; the same operations targeting system directories are blocked.

Path Rules (Write/Edit tool types)

  • Safe paths (safe_paths.txt): C:\Users\mcwiz\Projects, /c/Users/mcwiz/Projects, etc.
  • Excluded paths (excluded_paths.txt): OneDrive, AppData, .cache, Dropbox, Windows, Program Files

Write/Edit operations targeting safe paths → ALLOW. Targeting excluded paths → BLOCK. Everything else → UNCERTAIN.

Tier 2: Haiku API

Commands that local rules can't resolve (UNCERTAIN) go to the Anthropic Haiku API.

Model: claude-3-5-haiku-latest Timeout: 3 seconds System prompt:

You are Sentinel, a security gatekeeper for a Windows/Git Bash development environment. The user is a Professional Engineer running Claude Code with auto-approval (Unleashed). [...] Reply exactly 'ALLOW' if safe, or 'BLOCK: reason' if dangerous.

The API receives: tool type, working directory, and command arguments (first 500 chars). It responds with a verdict.

Tier 3: Fail-Open

If the API call fails (timeout, network error, rate limit, malformed response), the command is auto-approved with a yellow warning:

[SENTINEL] API error, fail-open: RateLimitError (HTTP 429)

See ADR-003 for the rationale.

Scope Configuration

Flag Tool Types Gated
--sentinel Bash only (alias for --sentinel-scope bash)
--sentinel-scope bash Bash
--sentinel-scope write Bash, Write, Edit
--sentinel-scope all Bash, Write, Edit, WebFetch, WebSearch, Skill, Task

Tools not in scope are auto-approved instantly — no sentinel overhead.

Shadow Mode

--sentinel-shadow logs what sentinel would evaluate without actually doing anything. Use this to:

  1. Validate tool type detection before enabling sentinel
  2. Measure how many commands would hit the API (vs. local resolution)
  3. Build confidence before enabling blocking mode

Shadow logs are written to logs/sentinel-shadow-{session}.log.

Stats Tracking

Sentinel tracks per-session statistics:

{
    "local_allow": 42,   # Tier 1 ALLOW
    "local_block": 0,    # Tier 1 BLOCK
    "api_allow": 5,      # Tier 2 ALLOW
    "api_block": 1,      # Tier 2 BLOCK
    "api_error": 0,      # Tier 3 ERROR (fail-open)
}

Printed to stderr at session end. In this example, 42 out of 48 commands (87.5%) were resolved locally — only 6 needed API calls.

Worker Thread Architecture

Sentinel checks run in a daemon worker thread, not the PTY reader thread. This is critical — the previous integration attempt (archive/unleashed-guarded.py) called the API synchronously in the reader thread, which blocked all terminal output for 1-3 seconds per check and made the session unusable.

See ADR-002 for the full decision record.

Known Issues

Issue Severity Description
#38 P0 API key may leak in exception messages
#41 P0 in_approval flag has no timeout watchdog
#43 P1 2KB context buffer too small — sentinel may misidentify tool type
#40 P2 Unbounded thread creation (mitigated by in_approval serialization)
#44 P2 No visual feedback during API evaluation
#46 P2 No session summary on exit

Clone this wiki locally