Skip to content

b9nn/posture-probe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

posture-probe

A browser agent that audits a GitHub account against a CIS-style security checklist by actually opening the settings pages, reading what's rendered, and asking Claude whether each setting matches the expected posture.

It's a narrow browser agent on purpose: one application (GitHub), one benchmark (a hand-rolled CIS-style subset), one output format (findings JSON). The point is to show the agent loop end-to-end (plan, navigate, observe, classify, report) on a real public site, not to be a general scanner.

What it does

For each check in a YAML benchmark, the agent:

  1. Launches a Chromium browser via Playwright with a saved login session.
  2. Navigates to the GitHub settings page that holds the relevant setting.
  3. Snapshots the page's accessibility tree (semantic, less brittle than raw DOM).
  4. Asks Claude: "given this page content and this benchmark check, is the setting compliant? Quote the evidence."
  5. Records a structured finding (PASS / FAIL / UNKNOWN, with a quoted evidence string and a screenshot path).

Output is a JSON file like:

{
  "target": "b9nn",
  "generated_at": "2026-05-17T22:14:03Z",
  "benchmark": "github_user_cis_subset_v1",
  "findings": [
    {"id": "GH-USER-001", "title": "Two-factor authentication enabled",
     "status": "PASS", "evidence": "Two-factor authentication is enabled."},
    {"id": "GH-USER-004", "title": "No active classic personal access tokens",
     "status": "FAIL", "evidence": "3 personal access tokens (classic) are active."}
  ]
}

Why this shape

The hard parts of a real browser agent show up here, just at small scale:

  • Auth. GitHub gates everything behind a session cookie. The agent uses Playwright's storage_state to persist a logged-in session you create once interactively, so no password automation or TOTP juggling.
  • Selector drift. GitHub redesigns its settings UI. The agent reads the accessibility tree rather than relying on brittle CSS selectors, then delegates the "did I find the setting?" judgment to the model.
  • Non-determinism. Claude's classification of a rendered page isn't deterministic, so each check requires a structured output schema, an explicit UNKNOWN state, and an evidence quote for the human to verify.
  • Vision fallback. If the accessibility tree comes back empty or the model says UNKNOWN, the agent retries with a screenshot via Claude's vision capability instead.

Install

Requires Python 3.11+ and an Anthropic API key.

git clone https://github.com/b9nn/posture-probe.git
cd posture-probe
python -m venv .venv
.venv\Scripts\activate          # Windows
# source .venv/bin/activate     # macOS / Linux
pip install -e .
playwright install chromium
cp .env.example .env            # then edit .env with your ANTHROPIC_API_KEY

One-time login

GitHub auth happens once, interactively, and the session is saved to .auth/github.json. The agent reuses that on every subsequent run.

python -m posture_probe login

A browser window opens. Sign in to GitHub (do 2FA as usual). When you see your dashboard, switch back to the terminal and press Enter. The session state is written and the window closes.

Run an audit

python -m posture_probe audit --target b9nn --benchmark benchmarks/github_user.yaml

Findings land in out/findings-<timestamp>.json and a redacted Markdown report at out/report-<timestamp>.md. Screenshots used as evidence go into out/screenshots/.

Project layout

posture-probe/
  posture_probe/
    __main__.py        CLI entry (login, audit)
    agent.py           Per-check agent loop (navigate, observe, classify)
    browser.py         Playwright session wrapper
    llm.py             Anthropic SDK wrapper with the prompt + JSON schema
    benchmark.py       YAML loader
    findings.py        Output schema + report rendering
  benchmarks/
    github_user.yaml   The checks themselves
  examples/
    sample_findings.json

Status

v1 covers GitHub user-account checks. v2 will add per-repository checks (iterates over repos and applies repo-level CIS items) and a vision-LLM fallback path for the cases where the accessibility tree is too noisy to judge from text alone.

License

MIT.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages