Sentinel Safety Gate

Sentinel is unleashed's integrated safety system. It evaluates commands before auto-approval, catching dangerous operations that a hallucinating LLM might attempt. The design prioritizes availability over strictness — sentinel should help when it can and stay out of the way when it can't.

Three-Tier Architecture

flowchart TD
    CMD["Command Arrives<br/>(from permission detection)"]
    LOCAL{"Tier 1: Local Rules<br/>(regex, less than 1ms)"}
    API{"Tier 2: Haiku API<br/>(LLM, 1-3s)"}
    FAILOPEN["Tier 3: Fail-Open<br/>(approve with warning)"]

    CMD --> LOCAL
    LOCAL -->|"ALLOW<br/>(safe pattern match)"| APPROVE["Auto-Approve"]
    LOCAL -->|"BLOCK<br/>(hard block match)"| BLOCK["Withhold Approval<br/>(user decides)"]
    LOCAL -->|"UNCERTAIN<br/>(no match)"| API

    API -->|"ALLOW"| APPROVE
    API -->|"BLOCK: reason"| BLOCK
    API -->|"ERROR<br/>(timeout, network, etc.)"| FAILOPEN

    FAILOPEN --> APPROVE

    style APPROVE fill:#4ade80,stroke:#333,color:#000
    style BLOCK fill:#ef4444,stroke:#333,color:#fff
    style FAILOPEN fill:#fbbf24,stroke:#333,color:#000
    style LOCAL fill:#60a5fa,stroke:#333,color:#000
    style API fill:#a78bfa,stroke:#333,color:#000

See ADR-004 for the full decision record.

Tier 1: Local Rules (`sentinel_rules.py`)

Fast regex-based decisions using pre-existing safety data from ~/.agentos/.

Safe Patterns (ALLOW instantly)

Pattern	Matches
`^(ls\|dir\|cat\|head\|tail\|...)`	Read-only file operations
`^git\s+(status\|log\|diff\|show\|branch\|...)`	Read-only git operations
`^(pwd\|echo\|printf\|date\|whoami\|...)`	Environment queries
`^(grep\|rg\|find\|fd\|ag)`	Search tools
`^poetry\s+(run\|install\|add\|show\|lock)`	Python package management
`^pytest`	Test runner
`^gh\s+(issue\|pr\|repo\|api)\s+(list\|view\|...)`	GitHub CLI
`^git\s+(add\|commit\|push(?!.*--force))`	Non-destructive git writes

Hard Blocks (BLOCK instantly)

Loaded from ~/.agentos/hard_block_commands.txt — 19 patterns:

Always blocked: dd, mkfs, shred, format, disk operations
Conditional (blocked outside safe paths): rm -rf, del /s, rmdir /s, file deletion
Git destructive: git reset --hard, git clean -fd, git push --force

Conditional blocks check whether the command targets a safe path (directories listed in ~/.agentos/safe_paths.txt). Destructive operations inside C:\Users\mcwiz\Projects are allowed through; the same operations targeting system directories are blocked.

Path Rules (Write/Edit tool types)

Safe paths (safe_paths.txt): C:\Users\mcwiz\Projects, /c/Users/mcwiz/Projects, etc.
Excluded paths (excluded_paths.txt): OneDrive, AppData, .cache, Dropbox, Windows, Program Files

Write/Edit operations targeting safe paths → ALLOW. Targeting excluded paths → BLOCK. Everything else → UNCERTAIN.

Tier 2: Haiku API

Commands that local rules can't resolve (UNCERTAIN) go to the Anthropic Haiku API.

Model: claude-3-5-haiku-latest Timeout: 3 seconds System prompt:

You are Sentinel, a security gatekeeper for a Windows/Git Bash development environment. The user is a Professional Engineer running Claude Code with auto-approval (Unleashed). [...] Reply exactly 'ALLOW' if safe, or 'BLOCK: reason' if dangerous.

The API receives: tool type, working directory, and command arguments (first 500 chars). It responds with a verdict.

Tier 3: Fail-Open

If the API call fails (timeout, network error, rate limit, malformed response), the command is auto-approved with a yellow warning:

[SENTINEL] API error, fail-open: RateLimitError (HTTP 429)

See ADR-003 for the rationale.

Scope Configuration

Flag	Tool Types Gated
`--sentinel`	Bash only (alias for `--sentinel-scope bash`)
`--sentinel-scope bash`	Bash
`--sentinel-scope write`	Bash, Write, Edit
`--sentinel-scope all`	Bash, Write, Edit, WebFetch, WebSearch, Skill, Task

Tools not in scope are auto-approved instantly — no sentinel overhead.

Shadow Mode

--sentinel-shadow logs what sentinel would evaluate without actually doing anything. Use this to:

Validate tool type detection before enabling sentinel
Measure how many commands would hit the API (vs. local resolution)
Build confidence before enabling blocking mode

Shadow logs are written to logs/sentinel-shadow-{session}.log.

Stats Tracking

Sentinel tracks per-session statistics:

{
    "local_allow": 42,   # Tier 1 ALLOW
    "local_block": 0,    # Tier 1 BLOCK
    "api_allow": 5,      # Tier 2 ALLOW
    "api_block": 1,      # Tier 2 BLOCK
    "api_error": 0,      # Tier 3 ERROR (fail-open)
}

Printed to stderr at session end. In this example, 42 out of 48 commands (87.5%) were resolved locally — only 6 needed API calls.

Worker Thread Architecture

Sentinel checks run in a daemon worker thread, not the PTY reader thread. This is critical — the previous integration attempt (archive/unleashed-guarded.py) called the API synchronously in the reader thread, which blocked all terminal output for 1-3 seconds per check and made the session unusable.

See ADR-002 for the full decision record.

Known Issues

Issue	Severity	Description
#38	P0	API key may leak in exception messages
#41	P0	`in_approval` flag has no timeout watchdog
#43	P1	2KB context buffer too small — sentinel may misidentify tool type
#40	P2	Unbounded thread creation (mitigated by `in_approval` serialization)
#44	P2	No visual feedback during API evaluation
#46	P2	No session summary on exit

Home

Architecture

Safety & Security

Session Mirror

Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentinel Safety Gate

Sentinel Safety Gate

Three-Tier Architecture

Tier 1: Local Rules (`sentinel_rules.py`)

Safe Patterns (ALLOW instantly)

Hard Blocks (BLOCK instantly)

Path Rules (Write/Edit tool types)

Tier 2: Haiku API

Tier 3: Fail-Open

Scope Configuration

Shadow Mode

Stats Tracking

Worker Thread Architecture

Known Issues

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Home

Clone this wiki locally

Sentinel Safety Gate

Sentinel Safety Gate

Three-Tier Architecture

Tier 1: Local Rules (sentinel_rules.py)

Safe Patterns (ALLOW instantly)

Hard Blocks (BLOCK instantly)

Path Rules (Write/Edit tool types)

Tier 2: Haiku API

Tier 3: Fail-Open

Scope Configuration

Shadow Mode

Stats Tracking

Worker Thread Architecture

Known Issues

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Home

Clone this wiki locally

Tier 1: Local Rules (`sentinel_rules.py`)