posture-probe

A browser agent that audits a GitHub account against a CIS-style security checklist by actually opening the settings pages, reading what's rendered, and asking Claude whether each setting matches the expected posture.

It's a narrow browser agent on purpose: one application (GitHub), one benchmark (a hand-rolled CIS-style subset), one output format (findings JSON). The point is to show the agent loop end-to-end (plan, navigate, observe, classify, report) on a real public site, not to be a general scanner.

What it does

For each check in a YAML benchmark, the agent:

Launches a Chromium browser via Playwright with a saved login session.
Navigates to the GitHub settings page that holds the relevant setting.
Snapshots the page's accessibility tree (semantic, less brittle than raw DOM).
Asks Claude: "given this page content and this benchmark check, is the setting compliant? Quote the evidence."
Records a structured finding (PASS / FAIL / UNKNOWN, with a quoted evidence string and a screenshot path).

Output is a JSON file like:

{
  "target": "b9nn",
  "generated_at": "2026-05-17T22:14:03Z",
  "benchmark": "github_user_cis_subset_v1",
  "findings": [
    {"id": "GH-USER-001", "title": "Two-factor authentication enabled",
     "status": "PASS", "evidence": "Two-factor authentication is enabled."},
    {"id": "GH-USER-004", "title": "No active classic personal access tokens",
     "status": "FAIL", "evidence": "3 personal access tokens (classic) are active."}
  ]
}

Why this shape

The hard parts of a real browser agent show up here, just at small scale:

Auth. GitHub gates everything behind a session cookie. The agent uses Playwright's storage_state to persist a logged-in session you create once interactively, so no password automation or TOTP juggling.
Selector drift. GitHub redesigns its settings UI. The agent reads the accessibility tree rather than relying on brittle CSS selectors, then delegates the "did I find the setting?" judgment to the model.
Non-determinism. Claude's classification of a rendered page isn't deterministic, so each check requires a structured output schema, an explicit UNKNOWN state, and an evidence quote for the human to verify.
Vision fallback. If the accessibility tree comes back empty or the model says UNKNOWN, the agent retries with a screenshot via Claude's vision capability instead.

Install

Requires Python 3.11+ and an Anthropic API key.

git clone https://github.com/b9nn/posture-probe.git
cd posture-probe
python -m venv .venv
.venv\Scripts\activate          # Windows
# source .venv/bin/activate     # macOS / Linux
pip install -e .
playwright install chromium
cp .env.example .env            # then edit .env with your ANTHROPIC_API_KEY

One-time login

GitHub auth happens once, interactively, and the session is saved to .auth/github.json. The agent reuses that on every subsequent run.

python -m posture_probe login

A browser window opens. Sign in to GitHub (do 2FA as usual). When you see your dashboard, switch back to the terminal and press Enter. The session state is written and the window closes.

Run an audit

python -m posture_probe audit --target b9nn --benchmark benchmarks/github_user.yaml

Findings land in out/findings-<timestamp>.json and a redacted Markdown report at out/report-<timestamp>.md. Screenshots used as evidence go into out/screenshots/.

Project layout

posture-probe/
  posture_probe/
    __main__.py        CLI entry (login, audit)
    agent.py           Per-check agent loop (navigate, observe, classify)
    browser.py         Playwright session wrapper
    llm.py             Anthropic SDK wrapper with the prompt + JSON schema
    benchmark.py       YAML loader
    findings.py        Output schema + report rendering
  benchmarks/
    github_user.yaml   The checks themselves
  examples/
    sample_findings.json

Status

v1 covers GitHub user-account checks. v2 will add per-repository checks (iterates over repos and applies repo-level CIS items) and a vision-LLM fallback path for the cases where the accessibility tree is too noisy to judge from text alone.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benchmarks		benchmarks
examples		examples
posture_probe		posture_probe
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

posture-probe

What it does

Why this shape

Install

One-time login

Run an audit

Project layout

Status

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

posture-probe

What it does

Why this shape

Install

One-time login

Run an audit

Project layout

Status

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages