A Claude Code plugin that turns natural language into blueprints,
blueprints into parallel build plans, and build plans into working software —
with automated iteration, validation, and dual-model adversarial review via Codex.
Install · How It Works · Quick Start · Parallel Execution · Codex Review · Commands · Methodology · Examples
AI coding agents are powerful, but they fail in predictable ways:
- They lose context. Ask an agent to build a full-stack feature and it forgets what it said three steps ago.
- They skip validation. Code gets written but never verified against the original intent.
- They can't parallelize. One agent, one task, one branch — even when the work is independent.
- They don't iterate. A single pass produces a rough draft, not production code.
Blueprint fixes all of this.
Instead of prompting an agent and hoping for the best, Blueprint introduces a specification layer between your intent and the code. You describe what you want. The system decomposes it into domain blueprints with numbered requirements and testable acceptance criteria. Then it builds from those blueprints — not from memory, not from vibes — in an automated loop that validates every step.
┌─── Task 1 ─── Agent A ───┐
│ │
You ── /bp:draft ──► Blueprints ── /bp:architect ──► Build Site ──┤─── Task 2 ─── Agent B ───┤──► done
│ │
└─── Task 3 ─── Agent C ───┘
The blueprints are the source of truth. Agents read them, build from them, and validate against them. When something breaks, the system traces the failure back to the blueprint — not the code.
| Without Blueprint | With Blueprint |
|---|---|
One shot. No validation. No traceability. The agent guessed what you wanted. |
Every line of code traces to a requirement. Every requirement has acceptance criteria. |
git clone https://github.com/JuliusBrussee/blueprint.git ~/.blueprint
cd ~/.blueprint && ./install.shThis registers the Blueprint plugin with Claude Code, syncs it into your local Codex plugin marketplace, links Codex prompt files into ~/.codex/prompts/, and installs the blueprint CLI. Restart Claude Code and Codex after installing.
Requirements: Claude Code, git, macOS/Linux.
Optional: Codex (npm install -g @openai/codex) — enables adversarial review at the design, build, and command levels. Blueprint works without it, but Codex makes it significantly harder to ship flawed specs and broken code.
Blueprint follows four phases — Draft, Architect, Build, Inspect — each driven by a slash command inside Claude Code. An optional Research phase grounds the design in real evidence before blueprints are written. A standalone /bp:design command creates and maintains a DESIGN.md design system that becomes a cross-cutting constraint enforced throughout all phases.
RESEARCH DRAFT ARCHITECT BUILD INSPECT
──────── ───── ───────── ───── ───────
(optional) "What are we Break into tasks, Auto-parallel: Gap analysis:
Multi-agent building?" map dependencies, /bp:build built vs.
codebase + organize into groups work intended.
web research Produces: tiered build site into adaptive Peer review.
blueprints + dependency graph subagent packets Trace to specs.
Produces: with R-numbered tier by tier
research brief requirements Produces: Produces:
in context/refs task graph Codex reviews findings report
Codex challenges every tier gate
the design (speculative +
synchronous)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
/bp:design (standalone) → DESIGN.md → design tokens referenced in blueprints + tasks
design-reviewer enforces across build + inspect
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
/bp:research "build a Verse compiler targeting WASM"
Dispatches 2–8 parallel subagents to explore the codebase and search the web for current best practices, library landscape, reference implementations, and common pitfalls. A synthesizer agent cross-validates findings and produces a research brief in context/refs/. Research is also offered inline during /bp:draft when the project involves unfamiliar technology or architectural decisions with multiple viable approaches.
/bp:design
Creates or imports a DESIGN.md design system that becomes a cross-cutting constraint layer across the entire pipeline. Once present, every blueprint references its design tokens, every task carries a Design Ref, and every build result is audited for design violations.
Four sub-commands:
/bp:design create— generate a new DESIGN.md from scratch via guided Q&A/bp:design import— extract a DESIGN.md from an existing codebase/bp:design audit— check current implementation against DESIGN.md, report violations/bp:design update— revise DESIGN.md and log the change tocontext/designs/design-changelog.md
When DESIGN.md exists, the design-reviewer agent validates UI changes during build and inspect, flagging DESIGN VIOLATION statuses for any task that drifts from the tokenized system. Design changes are tracked in a changelog so intent is never lost across build cycles.
/bp:draft
You describe what you're building in natural language. Blueprint decomposes it into domain blueprints — structured documents with numbered requirements (R1, R2, ...) and testable acceptance criteria. Each blueprint is stack-independent and human-readable.
When the project would benefit from it, the draft phase offers to run deep research before design Q&A — grounding clarifying questions and approach proposals in real evidence rather than LLM priors.
After the internal reviewer approves, blueprints are sent to Codex for a design challenge — an adversarial review that catches decomposition flaws, missing requirements, and ambiguous criteria before any code is written.
For existing codebases, /bp:draft --from-code reverse-engineers blueprints from your code and identifies gaps.
/bp:architect
Reads all blueprints, breaks requirements into tasks, maps dependencies, and organizes everything into a tiered build site — a dependency graph where Tier 0 has no dependencies, Tier 1 depends only on Tier 0, and so on. The build site includes a Coverage Matrix that maps every individual acceptance criterion to its task(s), ensuring nothing specified in the blueprints gets lost in translation. This is what the build loop consumes.
/bp:build
Before starting, a pre-flight coverage check validates that the build site covers all blueprint acceptance criteria — gaps are flagged before any code is written. After completion, a post-flight blueprint verification cross-references what was built against the original blueprints, adding remediation tasks for any criteria that slipped through.
The Ralph Loop. Each iteration:
┌──────────────────────────────────────────────────────────┐
│ │
│ Read build site → Find next unblocked task │
│ │ │
│ ▼ │
│ Load relevant blueprint + acceptance criteria │
│ │ │
│ ▼ │
│ Implement the task │
│ │ │
│ ▼ │
│ Validate (build + tests + acceptance criteria) │
│ │ │
│ ├── PASS → commit → mark done → next task ──┐ │
│ │ │ │
│ └── FAIL → diagnose → fix → revalidate │ │
│ │ │
│ ◄─────────────────────────────────────────────────┘ │
│ │
│ Loop until: all tasks done OR iteration limit reached │
└──────────────────────────────────────────────────────────┘
At every tier boundary, Codex adversarial review gates advancement — P0/P1 findings must be fixed before the next tier starts. With speculative review enabled (default), this adds near-zero latency because the review runs in the background while the next tier builds.
/bp:inspect
Gap analysis compares what was built against what was specified. Peer review checks for bugs, security issues, and missed requirements. Everything traced back to blueprint requirements.
Greenfield project:
> /bp:draft
What are you building?
> A REST API for task management. Users, projects, tasks with priorities
and due dates, assignments. PostgreSQL.
Created 4 blueprints (22 requirements, 69 acceptance criteria)
Next: /bp:architect
> /bp:architect
Generated build site: 34 tasks, 5 tiers
Next: /bp:build
> /bp:build
Loop activated — 34 tasks, 20 max iterations.
...
All tasks done. Build passes. Tests pass.
BLUEPRINT COMPLETE — 34 tasks in 18 iterations.
Existing codebase:
> /bp:draft --from-code
Exploring codebase... Next.js 14, Prisma, NextAuth.
Created 6 blueprints — 4 requirements are gaps (not yet implemented).
> /bp:architect --filter collaboration
Generated build site: 8 tasks, 3 tiers
> /bp:build
Loop activated — 8 tasks.
...
BLUEPRINT COMPLETE — 8 tasks in 8 iterations.
See example.md for full annotated conversations.
/bp:build automatically parallelizes. When multiple tasks are ready (no unmet dependencies), it groups them into a few coherent work packets based on shared files, subsystem, and task complexity, then runs those packets in parallel.
> /bp:build
═══ Wave 1 ═══
3 task(s) ready:
T-001: Database schema (tier 0, deps: none)
T-002: Auth middleware (tier 0, deps: none)
T-003: Config loader (tier 0, deps: none)
Dispatching 2 grouped subagents...
All 3 tasks complete. Merging...
═══ Wave 2 ═══
2 task(s) ready:
T-004: User endpoints (tier 1, deps: T-001, T-002)
T-005: Health check (tier 1, deps: T-003)
Dispatching 2 grouped subagents...
All done.
═══ BUILD COMPLETE ═══
Waves: 2 | Tasks: 5/5
How it works:
- Reads the build site and computes the frontier — all tasks whose dependencies are complete
- Groups the ready frontier into coherent work packets before delegating
- Uses parallel subagents where file ownership and task size make that worthwhile
- After all complete, merges results and computes the next frontier
- Repeats wave-by-wave until all tasks are done — no manual intervention between tiers
Circuit breakers prevent infinite loops: 3 test failures → task marked BLOCKED, all tasks blocked → stop and report.
Blueprint uses Codex (OpenAI's coding agent) as an adversarial reviewer — a second model with a fundamentally different perspective that catches blind spots Claude cannot see in its own output. This dual-model approach operates at three levels:
After Claude drafts blueprints and the internal reviewer approves them, the entire blueprint set is sent to Codex for a design challenge — an adversarial review focused exclusively on architecture-level concerns:
Claude drafts Blueprint Codex challenges User reviews
blueprints ──────► reviewer approves ──────► the design ──────► blueprints + findings
│
Checks: │
• Domain decomposition quality
• Missing requirements
• Ambiguous acceptance criteria
• Implicit assumptions
• Cross-domain coherence
Codex returns structured findings categorized as critical (must fix before building) or advisory (worth considering). Critical findings trigger an auto-fix loop — Claude addresses them, Codex re-challenges, up to 2 cycles. Advisory findings are presented alongside blueprints at the user review gate.
The design challenge is purpose-built to prohibit implementation feedback. No framework suggestions, no file path opinions — only design-level concerns that would cause real problems during the build phase.
During /bp:build, every completed tier triggers a Codex adversarial code review before advancing:
═══ Tier 0 Complete ═══
Codex reviews diff (T-001, T-002, T-003) ...
Review: 2 findings (1 P0, 1 P3)
Gate: BLOCKED → fix cycle 1/2
Fixing P0: nil pointer in auth middleware ...
Re-review ...
Gate: PROCEED
═══ Tier 1 starting ═══
The severity-based gate classifies findings by impact:
| Severity | Behavior |
|---|---|
| P0 (critical) | Blocks tier advancement. Fix task generated automatically. |
| P1 (high) | Blocks tier advancement. Fix task generated automatically. |
| P2 (medium) | Deferred. Logged but does not block. |
| P3 (low) | Deferred. Logged but does not block. |
Gate modes are configurable: severity (default — P0/P1 block), strict (all findings block), permissive (nothing blocks), or off.
The review-fix cycle runs up to 2 iterations per tier. After that, the build advances with a warning — the system never deadlocks.
By default, Blueprint runs the Codex review of the previous tier in the background while Claude builds the current tier:
Tier 0 complete ───────────────────────────────► Tier 1 complete
│ │
└── Codex reviews Tier 0 (background) ──────────►│
│
Results ready ◄───────────┘
before gate runs
When the current tier finishes and the gate checks for the previous tier's review, the results are already available — cutting tier gate latency to near-zero. If the background review isn't done yet, the system waits (with a configurable timeout) and falls back to synchronous review if needed.
A PreToolUse hook intercepts every Bash command before execution and classifies its safety:
Agent runs bash command
│
▼
Fast-path check ──► allowlist (50+ safe commands) → approve
│ └► blocklist (rm -rf, force push, DROP TABLE, ...) → block
│
▼ (ambiguous)
Codex classifies ──► safe → approve
│ └► warn → approve + log
│ └► block → prevent execution
│
▼ (cached)
Verdict cache ──► normalized pattern match → reuse verdict
The gate integrates with Claude Code's permission system — commands already allowed or blocked in settings bypass the gate entirely. Verdicts are cached by normalized command pattern within the session to avoid redundant API calls. When Codex is unavailable, the gate falls back to static rules only — it never blocks a command solely because the classifier is unreachable.
All Codex features are additive. When Codex is not installed:
- Design challenge is skipped — the internal blueprint reviewer still runs
- Tier gate is skipped — the build loop proceeds without review pauses
- Command gate falls back to static allowlist/blocklist only
- A one-time install nudge appears:
Tip: Install Codex for adversarial code review
Blueprint works the same as before. Codex makes it harder to ship bad blueprints and bad code.
Blueprint settings can live in two places:
- User default:
~/.blueprint/config - Project override:
.blueprint/config
Precedence is: project override > user default > built-in default.
| Setting | Values | Default | Purpose |
|---|---|---|---|
bp_model_preset |
expensive quality balanced fast |
quality |
Resolve reasoning, execution, and exploration models for Blueprint commands |
codex_review |
auto off |
auto |
Enable/disable Codex reviews |
codex_model |
model string | (Codex default) | Model for Codex calls |
tier_gate_mode |
severity strict permissive off |
severity |
How findings gate tier advancement |
command_gate |
all interactive off |
all |
Which sessions get command gating |
command_gate_timeout |
milliseconds | 3000 |
Timeout for Codex safety classification |
speculative_review |
on off |
on |
Background review of previous tier |
speculative_review_timeout |
seconds | 300 |
Max wait for speculative results |
Built-in model presets:
| Preset | Reasoning | Execution | Exploration |
|---|---|---|---|
expensive |
opus |
opus |
opus |
quality |
opus |
opus |
sonnet |
balanced |
opus |
sonnet |
haiku |
fast |
sonnet |
sonnet |
haiku |
Use /bp:config to inspect or change the active preset.
Examples:
/bp:config
/bp:config list
/bp:config preset balanced
/bp:config preset fast --global| Command | Phase | Description |
|---|---|---|
/bp:research |
Research | Deep multi-agent research — codebase + web, produces research brief |
/bp:design |
Design | Create, import, audit, or update DESIGN.md — establishes a tokenized design system enforced across the pipeline |
/bp:draft |
Draft | Decompose requirements into domain blueprints (offers research if warranted) |
/bp:architect |
Architect | Generate a tiered build site from blueprints |
/bp:build |
Build | Auto-parallel build — dispatches independent tasks concurrently, progresses through tiers autonomously |
/bp:inspect |
Inspect | Gap analysis + peer review against blueprints |
/bp:config |
— | Show or update the active Blueprint execution preset |
/bp:codex-review |
— | Run standalone Codex adversarial review on current diff |
/bp:progress |
— | Check build site progress |
/bp:gap-analysis |
— | Compare built vs. intended |
/bp:revise |
— | Trace manual fixes back into blueprints |
/bp:help |
— | Show usage guide |
| Command | Description |
|---|---|
blueprint version |
Print version |
context/
├── blueprints/ # Domain blueprints (persist across cycles)
│ ├── blueprint-overview.md
│ └── blueprint-{domain}.md
├── designs/ # Design system artifacts
│ ├── DESIGN.md # Tokenized design system (colors, typography, spacing, components)
│ └── design-changelog.md # Audit log of design decisions and changes
├── sites/ # Build sites (one per plan)
│ ├── build-site-*.md
│ └── archive/
├── impl/ # Implementation tracking
│ ├── impl-{domain}.md
│ ├── impl-review-findings.md # Codex review findings ledger
│ ├── impl-speculative-log.md # Speculative review timing data
│ ├── loop-log.md
│ └── archive/
└── refs/ # Reference materials (PRDs, API docs)
├── research-brief-{topic}.md # Synthesized research brief
└── research-{topic}/ # Raw findings + findings board
scripts/
├── bp-config.sh # Canonical Blueprint config + model preset resolver
├── codex-detect.sh # Codex binary and plugin detection
├── codex-config.sh # Backward-compatible wrapper for bp-config.sh
├── codex-review.sh # Adversarial code review invocation
├── codex-findings.sh # Structured finding management
├── codex-gate.sh # Severity-based tier gating + fix cycle
├── codex-design-challenge.sh # Design challenge for blueprint drafts
├── codex-speculative.sh # Background speculative review pipeline
└── command-gate.sh # PreToolUse command safety gate
Blueprint is built on a simple observation: LLMs are non-deterministic, but software engineering doesn't have to be. By applying the scientific method — hypothesize, test, observe, refine — we extract reliable outcomes from a stochastic process.
| Concept | Role |
|---|---|
| Blueprints | The hypothesis — what you expect the software to do |
| Validation gates | Controlled conditions — build, tests, acceptance criteria |
| Convergence loops | Repeated trials — iterate until stable |
| Implementation tracking | Lab notebook — what was tried, what worked, what failed |
| Revision | Update the hypothesis — trace bugs back to blueprints |
The plugin ships with 9 specialized agents (including a design-reviewer that validates UI changes against DESIGN.md), a multi-agent research system, and 15 deep-dive skills covering the full methodology. When Codex is installed, the system operates as a dual-model architecture — Claude builds and Codex reviews — catching classes of errors that single-model self-review cannot detect.
View all skills
- Design System — how to create and maintain a DESIGN.md that agents enforce
- UI Craft — component patterns, animation playbook, accessibility checklist, and review checklist for UI work
- Blueprint Writing — how to write blueprints agents can consume
- Convergence Monitoring — detecting when iterations plateau
- Peer Review — six modes for cross-model review
- Validation-First Design — every requirement must be verifiable
- Context Architecture — progressive disclosure for agent context
- Revision — tracing bugs upstream to blueprints
- Brownfield Adoption — adding Blueprint to an existing codebase
- Speculative Pipeline — overlapping phases for faster builds
- Prompt Pipeline — designing the prompts that drive each phase
- Implementation Tracking — living records of build progress
- Documentation Inversion — docs for agents, not just humans
- Peer Review Loop — combining Ralph Loop with cross-model review
- Core Methodology — the full DABI lifecycle
Most AI coding tools treat the agent as a black box — you prompt, it generates, you hope. Blueprint inverts this. The specification is the product. The code is a derivative. When the spec is clear, the code follows. When the code is wrong, the spec tells you why.
This matters because AI agents are getting better every month, but the fundamental problem remains: without a specification, there's nothing to validate against. Blueprint gives every agent — current and future — a contract to build from and a standard to meet.
With Codex adversarial review, Blueprint goes further: a second model with different training and different blind spots reviews both the specification and the implementation. Two models disagreeing is a signal. Two models agreeing is confidence.
MIT