-
Notifications
You must be signed in to change notification settings - Fork 0
Sentinel Safety Gate
Sentinel is unleashed's integrated safety system. It evaluates commands before auto-approval, catching dangerous operations that a hallucinating LLM might attempt. The design prioritizes availability over strictness — sentinel should help when it can and stay out of the way when it can't.
flowchart TD
CMD["Command Arrives<br/>(from permission detection)"]
LOCAL{"Tier 1: Local Rules<br/>(regex, less than 1ms)"}
API{"Tier 2: Haiku API<br/>(LLM, 1-3s)"}
FAILOPEN["Tier 3: Fail-Open<br/>(approve with warning)"]
CMD --> LOCAL
LOCAL -->|"ALLOW<br/>(safe pattern match)"| APPROVE["Auto-Approve"]
LOCAL -->|"BLOCK<br/>(hard block match)"| BLOCK["Withhold Approval<br/>(user decides)"]
LOCAL -->|"UNCERTAIN<br/>(no match)"| API
API -->|"ALLOW"| APPROVE
API -->|"BLOCK: reason"| BLOCK
API -->|"ERROR<br/>(timeout, network, etc.)"| FAILOPEN
FAILOPEN --> APPROVE
style APPROVE fill:#4ade80,stroke:#333,color:#000
style BLOCK fill:#ef4444,stroke:#333,color:#fff
style FAILOPEN fill:#fbbf24,stroke:#333,color:#000
style LOCAL fill:#60a5fa,stroke:#333,color:#000
style API fill:#a78bfa,stroke:#333,color:#000
See ADR-004 for the full decision record.
Fast regex-based decisions using pre-existing safety data from ~/.agentos/.
| Pattern | Matches |
|---|---|
^(ls|dir|cat|head|tail|...) |
Read-only file operations |
^git\s+(status|log|diff|show|branch|...) |
Read-only git operations |
^(pwd|echo|printf|date|whoami|...) |
Environment queries |
^(grep|rg|find|fd|ag) |
Search tools |
^poetry\s+(run|install|add|show|lock) |
Python package management |
^pytest |
Test runner |
^gh\s+(issue|pr|repo|api)\s+(list|view|...) |
GitHub CLI |
^git\s+(add|commit|push(?!.*--force)) |
Non-destructive git writes |
Loaded from ~/.agentos/hard_block_commands.txt — 19 patterns:
-
Always blocked:
dd,mkfs,shred,format, disk operations -
Conditional (blocked outside safe paths):
rm -rf,del /s,rmdir /s, file deletion -
Git destructive:
git reset --hard,git clean -fd,git push --force
Conditional blocks check whether the command targets a safe path (directories listed in ~/.agentos/safe_paths.txt). Destructive operations inside C:\Users\mcwiz\Projects are allowed through; the same operations targeting system directories are blocked.
-
Safe paths (
safe_paths.txt):C:\Users\mcwiz\Projects,/c/Users/mcwiz/Projects, etc. -
Excluded paths (
excluded_paths.txt): OneDrive, AppData,.cache, Dropbox, Windows, Program Files
Write/Edit operations targeting safe paths → ALLOW. Targeting excluded paths → BLOCK. Everything else → UNCERTAIN.
Commands that local rules can't resolve (UNCERTAIN) go to the Anthropic Haiku API.
Model: claude-3-5-haiku-latest
Timeout: 3 seconds
System prompt:
You are Sentinel, a security gatekeeper for a Windows/Git Bash development environment. The user is a Professional Engineer running Claude Code with auto-approval (Unleashed). [...] Reply exactly 'ALLOW' if safe, or 'BLOCK: reason' if dangerous.
The API receives: tool type, working directory, and command arguments (first 500 chars). It responds with a verdict.
If the API call fails (timeout, network error, rate limit, malformed response), the command is auto-approved with a yellow warning:
[SENTINEL] API error, fail-open: RateLimitError (HTTP 429)
See ADR-003 for the rationale.
| Flag | Tool Types Gated |
|---|---|
--sentinel |
Bash only (alias for --sentinel-scope bash) |
--sentinel-scope bash |
Bash |
--sentinel-scope write |
Bash, Write, Edit |
--sentinel-scope all |
Bash, Write, Edit, WebFetch, WebSearch, Skill, Task |
Tools not in scope are auto-approved instantly — no sentinel overhead.
--sentinel-shadow logs what sentinel would evaluate without actually doing anything. Use this to:
- Validate tool type detection before enabling sentinel
- Measure how many commands would hit the API (vs. local resolution)
- Build confidence before enabling blocking mode
Shadow logs are written to logs/sentinel-shadow-{session}.log.
Sentinel tracks per-session statistics:
{
"local_allow": 42, # Tier 1 ALLOW
"local_block": 0, # Tier 1 BLOCK
"api_allow": 5, # Tier 2 ALLOW
"api_block": 1, # Tier 2 BLOCK
"api_error": 0, # Tier 3 ERROR (fail-open)
}Printed to stderr at session end. In this example, 42 out of 48 commands (87.5%) were resolved locally — only 6 needed API calls.
Sentinel checks run in a daemon worker thread, not the PTY reader thread. This is critical — the previous integration attempt (archive/unleashed-guarded.py) called the API synchronously in the reader thread, which blocked all terminal output for 1-3 seconds per check and made the session unusable.
See ADR-002 for the full decision record.
| Issue | Severity | Description |
|---|---|---|
| #38 | P0 | API key may leak in exception messages |
| #41 | P0 |
in_approval flag has no timeout watchdog |
| #43 | P1 | 2KB context buffer too small — sentinel may misidentify tool type |
| #40 | P2 | Unbounded thread creation (mitigated by in_approval serialization) |
| #44 | P2 | No visual feedback during API evaluation |
| #46 | P2 | No session summary on exit |
Architecture
Safety & Security
Session Mirror
Reference