feat: GBrain v0.2.0 — incremental sync, file storage, install skill#2
Merged
feat: GBrain v0.2.0 — incremental sync, file storage, install skill#2
Conversation
Shared single-file import function used by both import and sync. Adds tag reconciliation (removes stale tags on reimport), >1MB file skip, and import->sync checkpoint continuity (writes git HEAD to config table after import so sync picks up seamlessly). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- buildSyncManifest: parses git diff --name-status -M output - isSyncable: filters to .md pages, excludes hidden/ops/.raw/skip-list - pathToSlug: converts file paths to page slugs with optional prefix - updateSlug: renames page slug in-place (preserves page_id, chunks, embeddings) - rewriteLinks: stub for v0.2 (FKs use page_id, already correct) - 20 new tests, all passing (39 total across 3 files) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
18-step sync protocol: read config, git pull, ancestry validation, git diff --name-status -M for net changes, isSyncable filter, process deletes/renames/adds/modifies via importFile, batch optimization, sync state checkpoint in Postgres config table. Watch mode with polling and consecutive error counter. MCP sync_brain tool returns structured SyncResult. Stale page deletion for un-syncable files. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- files table: page_slug FK with ON DELETE SET NULL + ON UPDATE CASCADE, storage_path, storage_url, mime_type, content_hash for dedup - gbrain files list/upload/sync/verify commands for Supabase Storage - gbrain config show redacts postgresql:// passwords and secret keys - CLI help updated with FILES section Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
6-phase install workflow: environment discovery, Supabase setup (magic path via CLI OAuth or fallback 2-copy-paste), init + import, ongoing sync cron, optional file migration with mandatory verification, and agent teaching (AGENTS.md rules). Every error gets what + why + fix. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
README.md: added sync command to IMPORT/EXPORT section, added FILES section with 4 commands, added files table to schema diagram, added install skill to skills table, updated MCP tools count from 20 to 21 (sync_brain added). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
… help) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Create src/version.ts that reads from package.json via static import (safe for bun compiled binaries). Update mcp/server.ts from hardcoded '0.1.0' to use shared VERSION. Bump skills/manifest.json to 0.2.0.
Reorder detection: node_modules first, binary second, clawhub last. Rename 'npm' install method to 'bun'. Use 'clawhub --version' instead of 'which clawhub' to avoid false positives from dangling symlinks. Add 120s timeout to execSync calls to prevent hanging. Add --help flag.
Add COMMAND_HELP map covering all 28 commands. Check --help before init/upgrade dispatch and before connectEngine() so help works without a database. Use COMMAND_HELP keys as known-command set to catch unknown commands before wasting a DB round-trip.
Fix init.ts: npx→bunx, npm→bun for supabase CLI guidance. Fix README: npm install→bun add for standalone CLI install. Add ## Upgrade section to README with all three install methods. Update install skill Upgrading section to list bun, ClawHub, and binary.
…edge cases New test files: - test/cli.test.ts: COMMAND_HELP ↔ switch consistency, version from package.json, per-command --help, unknown command handling, global help - test/upgrade.test.ts: detection order verification, npm→bun naming, clawhub --version (not which), timeout presence - test/config.test.ts: redactUrl for postgresql URLs, edge cases Extended existing tests: - test/sync.test.ts: empty string pathToSlug, uppercase .MD rejection, deeply nested files, multiple renames, unknown status codes - test/markdown.test.ts: multiple --- separators, missing frontmatter, no frontmatter at all, empty string, type inference from paths Tests: 39 → 83 (+44 new). All pass. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…dge cases New test files: - test/import-file.test.ts (9 tests): mock BrainEngine to test importFile without DB — MAX_FILE_SIZE skip, content_hash dedup, tag reconciliation (remove stale + add new), compiled_truth/timeline chunking, noEmbed flag, sequential chunk_index - test/files.test.ts (22 tests): getMimeType for all extensions + uppercase + unknown + no-extension, fileHash consistency + different content + empty, collectFiles pattern (skip .md, skip hidden dirs, recurse, sorted output) Extended: - test/chunkers/recursive.test.ts (+6 tests): single newline splits, word-only text, clause delimiters, lossless preservation, default options, mixed delimiter hierarchy Tests: 83 → 118 (+35 new). All pass. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
garrytan
added a commit
that referenced
this pull request
Apr 10, 2026
- fix(mcp): use ListToolsRequestSchema/CallToolRequestSchema instead of string literals (Issue #9, PR #25) - fix(mcp): handleToolCall reads dry_run from params instead of hardcoding false (#22 Bug #11) - fix(search): keyword search returns best chunk per page via DISTINCT ON, not all chunks (#22 Bug #8) - fix(search): dedup layer 1 keeps top 3 chunks per page instead of collapsing to 1 (#22 Bug #12) - fix(engine): transaction uses scoped engine via Object.create, no shared state mutation (#22 Bug #2) - fix(engine): upsertChunks uses UPSERT instead of DELETE+INSERT, preserves existing embeddings (#22 Bug #1) - fix(slugs): validateSlug normalizes to lowercase, pathToSlug lowercases consistently (#22 Bug #4) - schema: add unique index on content_chunks(page_id, chunk_index) for UPSERT support - schema: add access_tokens and mcp_request_log tables via migration Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Closed
garrytan
added a commit
that referenced
this pull request
Apr 11, 2026
* fix: 7 bug fixes from Issue #9 and #22 - fix(mcp): use ListToolsRequestSchema/CallToolRequestSchema instead of string literals (Issue #9, PR #25) - fix(mcp): handleToolCall reads dry_run from params instead of hardcoding false (#22 Bug #11) - fix(search): keyword search returns best chunk per page via DISTINCT ON, not all chunks (#22 Bug #8) - fix(search): dedup layer 1 keeps top 3 chunks per page instead of collapsing to 1 (#22 Bug #12) - fix(engine): transaction uses scoped engine via Object.create, no shared state mutation (#22 Bug #2) - fix(engine): upsertChunks uses UPSERT instead of DELETE+INSERT, preserves existing embeddings (#22 Bug #1) - fix(slugs): validateSlug normalizes to lowercase, pathToSlug lowercases consistently (#22 Bug #4) - schema: add unique index on content_chunks(page_id, chunk_index) for UPSERT support - schema: add access_tokens and mcp_request_log tables via migration Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: embed schema.sql at build time, remove fs dependency from initSchema initSchema() previously read schema.sql from disk at runtime via readFileSync, which broke in compiled Bun binaries and Deno Edge Functions. Now uses a generated schema-embedded.ts constant (run `bun run build:schema` to regenerate). - Removes fs and path imports from postgres-engine.ts and db.ts - Adds scripts/build-schema.sh for one-source-of-truth generation - Adds build:schema npm script Fixes Issue #22 Bug #6. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: 5 more bug fixes from Issue #22 - fix(file_upload): call storage.upload() in all 3 paths (operation, CLI upload, CLI sync) with rollback semantics (#22 Bug #9) - fix(import): use atomic index counter for parallel queue instead of array.shift() race, preserve checkpoint on errors (#22 Bug #3) - fix(s3): replace unsigned fetch with @aws-sdk/client-s3 for proper SigV4 auth, supports R2/MinIO via forcePathStyle (#22 Bug #10) - fix(redirect): verify remote file exists before deleting local copy, skip files not found in storage (#22 Bug #5) - deps: add @aws-sdk/client-s3 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: remote MCP server via Supabase Edge Functions Deploy GBrain as a serverless remote MCP endpoint on your existing Supabase instance. One brain, accessible from Claude Desktop, Claude Code, Cowork, Perplexity Computer, and any MCP client. Zero new infrastructure. New files: - supabase/functions/gbrain-mcp/index.ts — Edge Function with Hono + MCP SDK - supabase/functions/gbrain-mcp/deno.json — Deno import map - src/edge-entry.ts — curated bundle entry point (excludes fs-dependent modules) - src/commands/auth.ts — standalone token management (create/list/revoke/test) - scripts/deploy-remote.sh — one-script deployment - .env.production.example — 3-value config template Changes: - config.ts: lazy-evaluate CONFIG_DIR (no homedir() at module scope) - schema.sql: add access_tokens + mcp_request_log tables - package.json: add build:edge script Auth: bearer tokens via access_tokens table (SHA-256 hashed, per-client, revocable) Transport: WebStandardStreamableHTTPServerTransport (stateless, Streamable HTTP) Health: /health endpoint (unauth: 200/503, auth: postgres/pgvector/openai checks) Excluded from remote: sync_brain, file_upload (may exceed 60s timeout) Setup: clone, fill .env.production, run scripts/deploy-remote.sh, create token, done. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * docs: per-client MCP setup guides - docs/mcp/DEPLOY.md — deployment walkthrough, auth, troubleshooting, latency table - docs/mcp/CLAUDE_CODE.md — claude mcp add command - docs/mcp/CLAUDE_DESKTOP.md — Settings > Integrations (NOT JSON config!) - docs/mcp/CLAUDE_COWORK.md — remote + local bridge paths - docs/mcp/PERPLEXITY.md — Perplexity Computer connector setup - docs/mcp/CHATGPT.md — coming soon (requires OAuth 2.1, P0 TODO) - docs/mcp/ALTERNATIVES.md — Tailscale Funnel + ngrok self-hosted options Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * chore: bump version and changelog (v0.6.0) GBrain v0.6.0: Remote MCP server via Supabase Edge Functions + 12 bug fixes. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * docs: add Remote MCP Server section to README Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * docs: make document-release mandatory in CLAUDE.md, add MCP key files Post-ship requirements section: document-release is NOT optional. Lists every file that must be checked on every ship. A ship without updated docs is incomplete. Also adds remote MCP server files to Key files section. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: batch upsertChunks into single statement to prevent deadlocks The per-chunk UPSERT loop caused deadlocks under parallel workers because each INSERT ON CONFLICT acquired row-level locks sequentially. Multiple workers upserting different pages could deadlock on the shared unique index. Fix: batch all chunks into a single multi-row INSERT ON CONFLICT statement. One round-trip, one lock acquisition. COALESCE preserves existing embeddings when the new value is NULL. Fixes CI failure: "E2E: Parallel Import > parallel import with --workers 4" Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: advisory lock in initSchema() prevents deadlock on concurrent DDL When multiple processes call initSchema() concurrently (e.g., test setup + CLI subprocess, or parallel workers during E2E tests), the schema SQL's DROP TRIGGER + CREATE TRIGGER statements acquire AccessExclusiveLock on different tables, causing deadlocks. Fix: pg_advisory_lock(42) serializes all initSchema() calls within the same database. The lock is session-scoped and released in a finally block. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: add explicit test timeouts for CLI subprocess E2E tests CLI subprocess tests (Setup Journey, Doctor Command, Parallel Import) spawn `bun run src/cli.ts` which takes several seconds to JIT compile + connect. The Bun test framework default 5000ms per-test timeout is too tight for CI. Added 30-60s timeouts matching each subprocess's own timeout to prevent false failures. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: infinite recursion in config.ts exported getConfigDir/getConfigPath The replace_all refactor created recursive functions: the exported getConfigDir() called the private getConfigDir() which called itself. Renamed exports to configDir()/configPath() to avoid shadowing. Also adds scripts/smoke-test-mcp.ts — verified all 8 MCP tool calls work against a real Postgres database. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
garrytan
added a commit
that referenced
this pull request
Apr 15, 2026
- #1: Crontab install used echo pipe with shell-interpolated values. Now uses a temp file via crontab(1) and single-quote escaping on all interpolated paths. No shell expansion possible. - #2: OPENAI_API_KEY was baked as plaintext into the launchd plist (readable by any local process, backed up by Time Machine). Now uses a wrapper script (~/.gbrain/autopilot-run.sh) that sources ~/.zshrc at runtime. No secrets in plist or crontab. - #16: extract.ts used a custom 20-line YAML parser that only handled single-line key:value pairs. Multi-line arrays (attendees list with - items) were silently ignored. Now uses the project's gray-matter parser via parseMarkdown() from src/core/markdown.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
garrytan
added a commit
that referenced
this pull request
Apr 15, 2026
* feat: migrate 8 existing skills to conformance format Add YAML frontmatter (name, version, description, triggers, tools, mutating), Contract, Anti-Patterns, and Output Format sections to all existing skills. Rename Workflow to Phases. Ingest becomes thin router delegating to specialized ingestion skills (Phase 2). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: add RESOLVER.md, conventions directory, and output rules RESOLVER.md is the skill dispatcher modeled on Wintermute's AGENTS.md. Categorized routing table: Always-on, Brain ops, Ingestion, Thinking, Operational, Setup, Identity. Conventions directory extracts cross-cutting rules (quality, brain-first lookup, model routing, test-before-bulk). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * test: add skills conformance and resolver validation tests skills-conformance.test.ts validates every skill has YAML frontmatter with required fields, Contract, Anti-Patterns, and Output Format sections, and manifest.json coverage. resolver.test.ts validates routing table categories, skill path existence, and manifest-to-resolver coverage. 50 new tests. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: add 9 brain skills from Wintermute (Phase 2) Generalized from Wintermute's battle-tested skills: - signal-detector: always-on idea+entity capture on every message - brain-ops: brain-first lookup, read-enrich-write loop, source attribution - idea-ingest: links/articles/tweets with author people page mandatory - media-ingest: video/audio/PDF/book with entity extraction (absorbs video/youtube/book) - meeting-ingestion: transcripts with attendee enrichment chaining - citation-fixer: audit and fix citation formatting - repo-architecture: filing rules by primary subject - skill-creator: create skills with conformance standard + MECE check - daily-task-manager: task lifecycle with priority levels All Garry-specific references generalized. Core workflows preserved. Updated RESOLVER.md and manifest.json. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: add operational infrastructure + identity layer (Phase 3) Operational skills: - daily-task-prep: morning prep with calendar context and open threads - cross-modal-review: quality gate via second model with refusal routing - cron-scheduler: schedule staggering, quiet hours, wake-up override, idempotency - reports: timestamped reports with keyword routing - testing: skill validation framework (conformance checks) - soul-audit: 6-phase interview generating SOUL.md, USER.md, ACCESS_POLICY.md, HEARTBEAT.md - webhook-transforms: external events to brain signals with dead-letter queue Identity layer: - SOUL.md template (agent identity, generated by soul-audit) - USER.md template (user profile, generated by soul-audit) - ACCESS_POLICY.md template (4-tier access control) - HEARTBEAT.md template (operational cadence) - cross-modal.yaml convention (review pairs, refusal routing chain) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * docs: update CLAUDE.md with 24 skills, RESOLVER.md, conventions, templates GBrain is now a GStack mod for agent platforms. Updated architecture description, key files listing (16 new skill files, RESOLVER.md, conventions, templates), skills section (24 skills organized by resolver categories), and testing section (new conformance and resolver tests). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: add GStack detection + mod status to gbrain init (Phase 4) After brain initialization, gbrain init now reports: - Number of skills loaded (from manifest.json) - GStack detection (checks known host paths, uses gstack-global-discover if available) - GStack install instructions if not found - Resolver and soul-audit pointers Also adds installDefaultTemplates() for SOUL.md/USER.md/ACCESS_POLICY.md/HEARTBEAT.md deployment, and detectGStack() using gstack-global-discover with fallback to known paths (DRY: doesn't reimplement GStack's host detection logic). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * docs: v0.10.0 release documentation - CHANGELOG: 24 skills, signal detector, RESOLVER.md, soul-audit, access control, conventions, conformance standard, GStack detection in init - README: updated skill section with 24 skills, resolver, conventions - TODOS: added runtime MCP access control (P1) - VERSION: 0.9.2 → 0.10.0 - package.json + manifest.json version bumped Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * docs: add skill table to CHANGELOG v0.10.0 16-row table detailing every new skill, what it does, and why it matters. Written to sell the upgrade, not document the implementation. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: restore package.json version after merge conflict resolution Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * docs: zero-based README rewrite for GStackBrain v0.10.0 Lead with GStack mod identity. 24 skills table organized by category. Install block references RESOLVER.md and soul-audit. GBrain+GStack relationship explained. Removed redundancy (733 -> 406 lines). All essential content preserved: install, recipes, architecture, search, commands, engines, voice, knowledge model. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * docs: extract install block to INSTALL_FOR_AGENTS.md, simplify README The 30-line copy-paste install block becomes one line: "Retrieve and follow INSTALL_FOR_AGENTS.md" Benefits: agent always gets latest instructions (no stale copy-paste), README stays clean, install details live where agents read them. README now leads with what GBrain does ("gives your agent a brain") instead of GStack relationship. Removed "requires frontier model" note. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: 3 bugs in init.ts from merge conflict resolution 1. llstatSync typo (merge corruption) → lstatSync 2. __dirname undefined in ESM module → fileURLToPath polyfill 3. require('fs') in ESM → use imported readFileSync All three would crash gbrain init at runtime. Caught by /review. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: add checkResolvable shared core function for resolver validation Shared function at src/core/check-resolvable.ts validates that all skills are reachable from RESOLVER.md, detects MECE overlaps (with whitelist for always-on/router skills), finds gaps in frontmatter triggers, and scans for DRY violations. Returns structured ResolvableIssue objects with machine-parseable fix objects alongside human-readable action strings. Three call sites: bun test, gbrain doctor, skill-creator skill. Cleans up test/resolver.test.ts: removes stale 9-line skip list, imports from production check-resolvable.ts instead of reimplementing parsing. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: expand doctor with resolver validation, filesystem-first architecture Doctor now runs filesystem checks (resolver health, skill conformance) before connecting to DB. New --fast flag skips DB checks. Falls back to filesystem-only when DB is unavailable. Adds schema_version: 2 to JSON output, composite health score (0-100), and structured issues array with action strings for agent parsing. Resolver health check calls checkResolvable() and surfaces actionable fix instructions. Link integrity check uses engine.getHealth() dead_links count. CLI routing split: doctor dispatched before connectEngine() so filesystem checks always run. Fixes Codex-identified blocker where doctor required DB. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: add adaptive load-aware throttling and fail-improve loop backoff.ts: System load checking (CPU via os.loadavg, memory via os.freemem), exponential backoff with 20-attempt max guard, active hours multiplier (2x slower during waking hours), concurrent process limit (max 2). Windows-safe: defaults to "proceed" when os.loadavg returns zeros. fail-improve.ts: Deterministic-first, LLM-fallback pattern with JSONL failure logging. Cascade failure handling: when both paths fail, throws LLM error and logs both. Log rotation at 1000 entries. Call count tracking for deterministic hit rate metrics. Auto-generates test cases from successful LLM fallbacks. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: add transcription service and enrichment-as-a-service transcription.ts: Groq Whisper (default) with OpenAI fallback. Files >25MB segmented via ffmpeg. Provider auto-detection from env vars. Clear error messages for missing API keys and unsupported formats. enrichment-service.ts: Global enrichment service callable from any ingest pathway. Entity slug generation (people/jane-doe, companies/acme-corp), mention counting via searchKeyword, tier auto-escalation (Tier 3→2→1 based on mention frequency and source diversity), batch enrichment with backoff throttling, regex-based entity extraction from text. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: add data-research skill with recipe system, extraction, dedup, tracker New skill: data-research — one parameterized pipeline for any email-to- structured-data workflow (investor updates, donations, company metrics). 7-phase pipeline: define recipe, search, classify, extract (with extraction integrity rule), archive, deduplicate, update tracker. data-research.ts: Recipe validation, MRR/ARR/runway/headcount regex extraction (battle-tested patterns), dedup with configurable tolerance, markdown tracker parsing/appending, quarterly/monthly date windowing, 6-phase HTML email stripping with 500KB ReDoS cap. Registers data-research in manifest.json (25th skill) and RESOLVER.md. Fixes backoff test robustness for high-load systems. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * docs: update project documentation for v0.10.0 infrastructure additions CLAUDE.md: added 6 new core files (check-resolvable, backoff, fail-improve, transcription, enrichment-service, data-research), 6 new test files, updated skill count to 25, test file count to 34. README.md: updated skill count to 25, added data-research to skills table. CHANGELOG.md: added Infrastructure section documenting resolver validation, doctor expansion, adaptive throttling, fail-improve loop, voice transcription, enrichment service, and data-research skill. TODOS.md: anonymized personal references. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: doctor.ts use ES module imports, harden backoff test Replace require('fs') with ES module import in doctor.ts for consistency with the rest of the file. Backoff test made resilient to parallel test execution leaking module-level state. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: sync --watch routing, dead_links parity, doctor command, embed --slugs - Move sync to CLI_ONLY so --watch flag reaches runSync() (was routed through operation layer which only calls performSync single-pass) - Hide sync_brain from CLI help (MCP still exposes it) - Fix performFullSync missing sync state persistence (C1) - Align Postgres dead_links query to match PGLite (count dangling links, not empty-content chunks) (C3) - Fix doctor recommending nonexistent 'gbrain embed refresh' (C4) - Refactor doctor outputResults to not call process.exit directly - Add --slugs flag to embed for targeted page embedding - Add sync auto-extract + auto-embed after performSync - Add noExtract to SyncOpts - Route extract, features, autopilot in CLI_ONLY - Update help text with new commands Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: extract, features, and autopilot commands - gbrain extract <links|timeline|all> — batch extraction of links and timeline entries from brain markdown files. Broad regex for all .md links (C7: filters external URLs). Frontmatter field parsing (company, investors, attendees). Directory-based link type inference. JSONL progress on stderr for agents. Sync integration hooks (extractLinksForSlugs, extractTimelineForSlugs). - gbrain features [--json] [--auto-fix] — scan brain usage, pitch unused features with the user's own numbers. Priority 1 (data quality): missing embeddings, dead links. Priority 2 (unused features): zero links, zero timeline, low coverage, unconfigured integrations, no sync. Embedded recipe metadata for binary-safe integration detection. Persistence in ~/.gbrain/feature-offers.json. Doctor teaser hook. Upgrade hook. - gbrain autopilot [--repo] [--interval N] — self-maintaining brain daemon. Pipeline: sync → extract → embed. Health-based adaptive scheduling (brain_score >= 90 doubles interval, < 70 halves it). --install/--uninstall for launchd (macOS) and crontab (Linux). Signal handling. Consecutive error tracking (stops at 5). Log to ~/.gbrain/autopilot.log. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: hook features scan into post-upgrade flow After gbrain post-upgrade completes, automatically run gbrain features to show the user what's new and what to fix. Best-effort (doesn't fail the upgrade). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: brain_score (0-100) in BrainHealth Weighted composite score computed in getHealth() for both Postgres and PGLite: embed_coverage: 0.35, link_density: 0.25, timeline_coverage: 0.15, no_orphans: 0.15, no_dead_links: 0.10 Returns 0 for empty brains. Agents use brain_score as a health gate. Autopilot uses it for adaptive scheduling (>=90 slows down, <70 speeds up). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * test: extract and features unit tests 25 tests covering: - extractMarkdownLinks: relative links, external URL filtering, edge cases - extractLinksFromFile: slug resolution, frontmatter parsing, directory-based type inference (works_at, deal_for, invested_in) - extractTimelineFromContent: bullet format, header format with detail, em/en dash handling, empty content - features: module exports, brain_score calculation weights, CLI routing Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * docs: instruction layer for extract, features, autopilot Agent-facing tools are invisible without instruction-layer coverage. - RESOLVER.md: add routing for extract, features, autopilot - maintain/SKILL.md: add link graph extraction, timeline extraction, autopilot check sections Without these, agents reading skills/ will never discover or run the new commands. This is the #1 DX finding from the devex review. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * chore: bump version and changelog (v0.10.1) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * docs: sync CLAUDE.md with v0.10.1 additions Add extract.ts, features.ts, autopilot.ts to key files. Add extract.test.ts, features.test.ts to test list. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: adversarial review fixes — 7 issues - #3: autopilot extract step was a no-op (imported but never called) - #6: PGLite orphan_pages query aligned with Postgres (check both inbound+outbound) - #8: embedPage throws instead of process.exit (was killing sync/autopilot) - #9: dead-links set auto_fixable=false (needs repo path we may not have) - #10: JSON auto-fix output was dead code (unreachable !jsonMode check) - #14: autopilot lock file prevents concurrent instances - #20: --dir without value no longer crashes extract Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * security: fix command injection + plaintext API key in daemon install - #1: Crontab install used echo pipe with shell-interpolated values. Now uses a temp file via crontab(1) and single-quote escaping on all interpolated paths. No shell expansion possible. - #2: OPENAI_API_KEY was baked as plaintext into the launchd plist (readable by any local process, backed up by Time Machine). Now uses a wrapper script (~/.gbrain/autopilot-run.sh) that sources ~/.zshrc at runtime. No secrets in plist or crontab. - #16: extract.ts used a custom 20-line YAML parser that only handled single-line key:value pairs. Multi-line arrays (attendees list with - items) were silently ignored. Now uses the project's gray-matter parser via parseMarkdown() from src/core/markdown.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
garrytan
added a commit
that referenced
this pull request
Apr 18, 2026
…hutdown
Autopilot now dispatches each cycle as a single `autopilot-cycle` Minion
job (with idempotency_key on the cycle slot) instead of running steps
inline. A forked `gbrain jobs work` child drains the queue durably,
supervised by autopilot. The user runs ONE install step
(`gbrain autopilot --install`) and gets sync + extract + embed + backlinks
+ durable job processing, with no separate worker daemon to manage.
Mode selection:
- minion_mode=always OR pain_triggered (default), engine=postgres →
Minions dispatch. Spawn child, submit autopilot-cycle each interval.
- minion_mode=off, OR engine=pglite, OR `--inline` flag → run steps
inline in-process, same as pre-v0.11.1. PGLite has an exclusive file
lock that blocks a second worker process, so the inline path is the
only path that works there.
Worker supervision:
- spawn(resolveGbrainCliPath(), ['jobs', 'work'], { stdio: 'inherit' }).
stdio:'inherit' avoids pipe-buffer blocking (Codex architecture #2).
- On worker exit: 10s backoff + restart. Crash counter caps at 5 →
autopilot stops with a clear error.
- resolveGbrainCliPath() prefers argv[1] (cli.ts / /gbrain), then
process.execPath (compiled binary suffix check), then `which gbrain`
(installed to $PATH). NEVER blindly uses process.execPath, which on
source installs is the Bun runtime, not `gbrain` (Codex architecture
#1).
Shutdown:
- Async SIGTERM/SIGINT handler: sends SIGTERM to worker, awaits its
exit for up to 35s (the worker's own drain is 30s; we add buffer for
signal-delivery latency), then SIGKILL if still alive.
- Drops the old `process.on('exit')` lock-cleanup handler — its
callback runs synchronously and can't wait for the worker drain.
Lock file cleanup moved inside the async shutdown.
Lock-file mtime refresh every cycle (Codex C) so a long-lived autopilot
doesn't get declared "stale" by the next cron-fired invocation after 10
minutes.
Inline fallback path calls the new Core fns (runExtractCore, runEmbedCore)
instead of the CLI wrappers. That way a bad arg from inside the loop
can't process.exit() the autopilot itself (matches Codex #5).
test/autopilot-resolve-cli.test.ts: 3 tests covering argv[1]-as-gbrain,
argv[1]-as-cli.ts, and graceful error when no path resolves.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
garrytan
added a commit
that referenced
this pull request
Apr 18, 2026
…LOG + version bump scripts/fix-v0.11.0.sh — the paste-command for broken-v0.11.0 installs. Released on the v0.11.1 tag so: curl -fsSL https://raw.githubusercontent.com/garrytan/gbrain/v0.11.1/scripts/fix-v0.11.0.sh | bash always works (master branch could be renamed). 8 steps: schema apply, smoke, mode prompt (non-TTY defaults pain_triggered), atomic write of preferences.json (0o600), append completed.jsonl with status:"partial" and apply_migrations_pending:true so the v0.11.1 apply-migrations run resumes correctly (does NOT poison the permanent migration path — Codex H2 avoidance), AGENTS.md + cron/jobs.json detection with guidance printed as text only (never auto-edits from a curl-piped script), and a closing line telling the user to run `gbrain autopilot --install` as the one-stop finisher. CLAUDE.md — new "Migration is canonical, not advisory" section pinning the design principle. Any host-repo change (AGENTS.md, cron manifests, launchctl units) is GBrain's responsibility via the migration; the exception is host-specific handler registration, which goes via the code-level plugin contract in docs/guides/plugin-handlers.md. README.md — new sections: - "v0.11.0 migration didn't fire on your upgrade?" with both repair paths (v0.11.1 binary and pre-v0.11.1 stopgap). - "Skillify + check-resolvable: user-controllable auto-skill-creation" explaining why the user-controlled pair beats Hermes-style auto generation. Includes the scripts/skillify-check.ts invocation. CHANGELOG.md — v0.11.1 entry (per CLAUDE.md voice: lead with what the user can now do that they couldn't before; frame as benefits, not files changed). Covers: mega-bug fix + apply-migrations + postinstall + stopgap, autopilot-supervises-worker + single-install-step + env-aware targets, Core fn extraction so handlers don't kill workers, skillify + check-resolvable pair, host-agnostic plugin contract replacing handlers.json (RCE concern), gbrain init --migrate-only, TS migration registry + H8/H9 diff-rule fixes, CLAUDE.md directive. All Codex hard blockers (H1, H3/H4, H5, H6, H7, H8, H9, K) + architecture issues (#1/#2/#4/#5/#7) resolved. package.json — version bump 0.11.0 → 0.11.1. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
garrytan
added a commit
that referenced
this pull request
Apr 18, 2026
* feat: add minion_jobs schema, migration v5, and executeRaw to BrainEngine Foundation for the Minions job queue system. Adds: - minion_jobs table (20 columns) with CHECK constraints, partial indexes, and RLS. Inspired by BullMQ's job model, adapted for Postgres. - Migration v5 creates the table for existing databases. - executeRaw<T>() method on BrainEngine interface for raw SQL access, needed by the Minions module for claim queries (FOR UPDATE SKIP LOCKED), token-fenced writes, and atomic stall detection. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: Minions job queue — queue, worker, backoff, types BullMQ-inspired Postgres-native job queue built into GBrain. No Redis. No external dependencies. Postgres transactions replace Lua scripts. - MinionQueue: submit, claim (FOR UPDATE SKIP LOCKED), complete/fail (token-fenced), atomic stall detection (CTE), delayed promotion, parent-child resolution, prune, stats - MinionWorker: handler registry, lock renewal, graceful SIGTERM, exponential backoff with jitter, UnrecoverableError bypass - MinionJobContext: updateProgress(), log(), isActive() for handlers - 8-state machine: waiting/active/completed/failed/delayed/dead/ cancelled/waiting-children Patterns stolen from: BullMQ (lock tokens, stall detection, flows), Sidekiq (dead set, backoff formula), Inngest (checkpoint/resume). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * test: 43 tests for Minions job queue Full coverage of the Minions module against PGLite in-memory: - Queue CRUD (9): submit, get, list, remove, cancel, retry, duplicate - State machine (6): waiting→active→completed/failed, retry→delayed→waiting - Backoff (4): exponential, fixed, jitter range, attempts_made=0 edge - Stall detection (3): detect stalled, counter increment, max→dead - Dependencies (5): parent waits, fail_parent, continue, remove_dep, orphan - Worker lifecycle (5): register, start-without-handlers, claim+execute, non-Error throws, UnrecoverableError bypass - Lock management (3): renewal, token mismatch, claim sets lock fields - Claim mechanics (4): empty queue, priority ordering, name filtering, delayed promotion timing - Cancel & retry (2): cancel active, retry dead Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: Minions CLI commands and MCP operations Wire Minions into the GBrain CLI and MCP layer: CLI (gbrain jobs): submit <name> [--params JSON] [--follow] [--dry-run] list [--status S] [--queue Q] [--limit N] get <id> — detailed view with attempt history cancel/retry/delete <id> prune [--older-than 30d] stats — job health dashboard work [--queue Q] [--concurrency N] — Postgres-only worker daemon 6 MCP operations (contract-first, auto-exposed via MCP server): submit_job, get_job, list_jobs, cancel_job, retry_job, get_job_progress Built-in handlers: sync, embed, lint, import. --follow runs inline. Worker daemon blocked on PGLite (exclusive file lock). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * docs: update project documentation for Minions job queue CLAUDE.md: added Minions files to key files, updated operation count (36), BrainEngine method count (38), test file count (45), added jobs CLI commands. CHANGELOG.md: added Minions entry to v0.10.0 (background jobs, retry, stall detection, worker daemon). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: Minions v2 — agent orchestration primitives (pause/resume, inbox, tokens, replay) Adds the foundation for Minions as universal agent orchestration infrastructure. GBrain's Postgres-native job queue now supports durable, observable, steerable background agents. The OpenClaw plugin (separate repo) will consume these via library import, not MCP, for zero-latency local integration. ## New capabilities - **Concurrent worker** — Promise pool replaces sequential loop. Per-job AbortController for cooperative cancellation. Graceful shutdown waits for all in-flight jobs via Promise.allSettled. - **Pause/resume** — pauseJob clears the lock and fires AbortSignal on active jobs. Handlers check ctx.signal.aborted and exit cleanly. resumeJob returns paused jobs to waiting. Catch block skips failJob when signal.aborted. - **Inbox (separate table)** — minion_inbox table for sidechannel messages. sendMessage with sender validation (parent job or admin). readInbox is token-fenced and marks read_at atomically. Separate table avoids row bloat from rewriting JSONB on every send. - **Token accounting** — tokens_input/tokens_output/tokens_cache_read columns. updateTokens accumulates; completeJob rolls child tokens up to parent. USD cost computed at read time (no cost_usd column — pricing too volatile). - **Job replay** — replayJob clones a terminal job with optional data overrides. New job, fresh attempts, no parent link. ## Handler contract additions MinionJobContext now provides: - `signal: AbortSignal` — cooperative cancellation - `updateTokens(tokens)` — accumulate token usage - `readInbox()` — check for sidechannel messages - `log()` — now accepts string or TranscriptEntry ## MCP operations added pause_job, resume_job, replay_job, send_job_message — all auto-generate CLI commands and MCP server endpoints. ## Library exports package.json exports map adds ./minions and ./engine-factory paths so plugins can `import { MinionQueue } from 'gbrain/minions'` for direct library use. ## Instruction layer (the teaching) - skills/minion-orchestrator/SKILL.md — when/how to use Minions, decision matrix, lifecycle management, anti-patterns - skills/conventions/subagent-routing.md — cross-cutting rule: all background work goes through Minions - RESOLVER.md — trigger entries for agent orchestration - manifest.json — registered ## Schema migration v6 Additive: 3 token columns, paused status, minion_inbox table with unread index. Full Postgres + PGLite support. No backfill needed. ## Tests 65 tests (was 43): pause/resume (5), inbox (6), tokens (4), replay (4), concurrent worker context (3), plus all existing coverage. ## What's NOT in this commit Deferred to follow-up PRs: - LISTEN/NOTIFY subscribe (needs real Postgres E2E) - Resource governor (depends on concurrent worker stress testing) - Routing eval harness (needs API keys + benchmark data) - OpenClaw plugin (separate @gbrain/openclaw-minions-plugin repo) See docs/designs/MINIONS_AGENT_ORCHESTRATION.md for full CEO-approved design. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(minions): migration v7 — agent_parity_layer schema Adds columns on minion_jobs (depth, max_children, timeout_ms, timeout_at, remove_on_complete, remove_on_fail, idempotency_key) plus the new minion_attachments table. Three partial indexes for bounded scans: idx_minion_jobs_timeout, idx_minion_jobs_parent_status, and uniq_minion_jobs_idempotency. Check constraints enforce non-negative depth and positive child cap / timeout. Additive migration — existing installs pick it up via ensureSchema on next use. No user action required. Co-Authored-By: Claude Opus 4.7 <[email protected]> * feat(minions): extend types for v7 parity layer Extends MinionJob with depth/max_children/timeout_ms/timeout_at/ remove_on_complete/remove_on_fail/idempotency_key. Extends MinionJobInput with the same options plus max_spawn_depth override. Adds MinionQueueOpts (maxSpawnDepth default 5, maxAttachmentBytes default 5 MiB). Adds AttachmentInput/Attachment shapes and ChildDoneMessage in the InboxMessage union. rowToMinionJob updated to pick up the new columns. Co-Authored-By: Claude Opus 4.7 <[email protected]> * feat(minions): attachments validator New module validateAttachment() gates every attachment write. Rejects empty filenames, path traversal (.., /, \), null bytes, oversized content (5 MiB default, per-queue override), invalid base64, and implausible content_type headers. Returns normalized { filename, content_type, content (Buffer), sha256, size } on success. The DB also enforces UNIQUE (job_id, filename) as defense-in-depth for concurrent addAttachment races — JS-only checks are not sufficient. Co-Authored-By: Claude Opus 4.7 <[email protected]> * feat(minions): queue v7 — depth, child cap, timeouts, cascade, idempotency, child_done Wraps completeJob and failJob in engine.transaction() so parent hook invocations (resolveParent, failParent, removeChildDependency) fold into the same transaction as the child update. A process crash between child and parent can't strand the parent in waiting-children anymore. Adds v7 behaviors: - Depth tracking. add() computes depth = parent.depth + 1 and rejects past maxSpawnDepth (default 5). - Per-parent child cap. add() takes SELECT ... FOR UPDATE on the parent, counts non-terminal children, rejects when count >= max_children. NULL max_children = no cap. - Per-job wall-clock timeout. claim() populates timeout_at when timeout_ms is set. New handleTimeouts() dead-letters expired rows with error_text='timeout exceeded'. Terminal — no retry. - Cascade cancel. cancelJob() walks descendants via recursive CTE with depth-100 runaway cap. Returns the root row. Re-parented descendants (parent_job_id NULL) are naturally excluded. - Idempotency. add() uses INSERT ... ON CONFLICT (idempotency_key) DO NOTHING RETURNING; falls back to SELECT when RETURNING is empty. Same key always yields the same job id. - child_done inbox. completeJob inserts {type:'child_done', child_id, job_name, result} into the parent's inbox in the same transaction as the token rollup, guarded by EXISTS so terminal/deleted parents skip without FK violation. New readChildCompletions(parent_id, lock_token, since?) helper; token-fenced like readInbox. - removeOnComplete / removeOnFail. Deletes the row after the parent hook fires, so parent policy sees consistent state. - Attachment methods. addAttachment validates via validateAttachment then INSERTs; UNIQUE (job_id, filename) backs the JS dup check. listAttachments, getAttachment, deleteAttachment round out the API. Fixes pre-existing inverted status bug: add() now puts children in waiting/delayed (not waiting-children) and atomically flips the parent to waiting-children in the same transaction. Tests no longer need manual UPDATE workarounds. Two correctness fixes: - Sibling completion race. Under READ COMMITTED, two grandchildren completing concurrently each saw the other as still-active in the pre-commit snapshot and neither flipped the parent. Fixed by taking SELECT ... FOR UPDATE on the parent row at the start of completeJob and failJob transactions, serializing siblings on the parent lock. - JSONB double-encode. postgres.js conn.unsafe(sql, params) auto- JSON-encodes parameters. Calling JSON.stringify(obj) first stored a JSON string literal (jsonb_typeof=string) and broke payload->>'key' queries silently. Removed JSON.stringify from three call sites (child_done inbox post, updateProgress, sendMessage). PGLite tolerated both forms so unit tests missed it — real-PG E2E caught it. Co-Authored-By: Claude Opus 4.7 <[email protected]> * feat(minions): worker — timeout safety net + handleTimeouts tick Worker tick now calls handleStalled() first, then handleTimeouts() — stall requeue wins over timeout dead-letter when both could fire in the same cycle. handleTimeouts() guards on lock_until > now() so stalled jobs take the retryable path. launchJob schedules a per-job setTimeout(timeout_ms) that fires ctx.signal as a best-effort handler interrupt. The timer is always cleared in .finally so process exit isn't delayed by a dangling timer. Handlers that respect AbortSignal stop cleanly; handlers that ignore it still get dead-lettered by the DB-side handleTimeouts. Removed post-completeJob and post-failJob parent-hook calls from the worker — those are now inside the queue method transactions. Worker becomes simpler and crash-safer. Co-Authored-By: Claude Opus 4.7 <[email protected]> * test(minions): 33 new unit tests for v7 parity layer Covers depth cap, per-parent child cap, timeout dead-letter, cascade cancel (including the re-parent edge case), removeOnComplete / removeOnFail, idempotency (single + concurrent), child_done inbox (posted in txn + survives child removeOnComplete + since cursor), attachment validation (oversize, path traversal, null byte, duplicates, base64), AbortSignal firing on pause mid-handler, catch-block skipping failJob when aborted, worker in-flight bookkeeping, token-rollup guard when parent already terminal, and setTimeout safety-net cleanup. Existing tests updated to remove the inverted-status manual UPDATE workarounds that the add() fix made obsolete. Co-Authored-By: Claude Opus 4.7 <[email protected]> * test(e2e): Minions v7 concurrency + OpenClaw resilience coverage minions-concurrency.test.ts spins two MinionWorker instances against the test Postgres, submits 20 jobs, and asserts zero double-claims (every job runs exactly once). This is the only test that actually proves FOR UPDATE SKIP LOCKED under real concurrency — PGLite runs on a single connection and can't exercise the race. minions-resilience.test.ts covers the six OpenClaw daily pains: 1. Spawn storm caps enforce under concurrent submit. 2. Agent stall → handleStalled() requeues; handleTimeouts() skips (lock_until guard). 3. Forgotten dispatches recoverable via child_done inbox. 4. Cascade cancel stops grandchildren mid-flight. 5. Deep tree fan-in (parent → 3 children → 2 grandchildren each) completes with the full inbox chain. 6. Parent crash/recovery resumes from persisted state. helpers.ts extends ALL_TABLES with minion_attachments, minion_inbox, and minion_jobs (FK dependents first) so E2E teardown doesn't leak rows between runs. Co-Authored-By: Claude Opus 4.7 <[email protected]> * chore: release v0.11.0 — Minions v7 agent orchestration primitives Bumps VERSION / package.json to 0.11.0. Adds CHANGELOG entry covering depth tracking, max_children, per-job timeouts, cascade cancel, idempotency keys, child_done inbox, removeOnComplete/Fail, attachments, migration v7, plus the two correctness fixes (sibling completion race and JSONB double-encode). TODOS.md captures the four v7 follow-ups: per-queue rate limiting, repeat/cron scheduler, worker event emitter, and waitForChildren convenience helpers. 1066 unit + 105 E2E = 1171 tests passing. Co-Authored-By: Claude Opus 4.7 <[email protected]> * fix(minions): unify JSONB inserts, tighten nullish coalescing Three non-blocker cleanups from post-ship review of v0.11.0: - queue.ts add() and completeJob(): pre-stringifying with JSON.stringify while other sites pass raw objects with $n::jsonb casts. postgres.js double-encodes if you stringify first — works on PGLite (text→JSONB auto-cast), fails silently on real PG. Unify on raw object + explicit $n::jsonb cast. - queue.ts readChildCompletions: since clause used sent_at > $2 relying on PG's implicit text→TIMESTAMPTZ coercion. Explicit $2::timestamptz is safer and clearer. - types.ts rowToMinionJob: parent_job_id used || which coerces 0 to null. Harmless today (SERIAL IDs start at 1) but ?? is semantically correct. All 110 unit tests pass. Co-Authored-By: Claude Opus 4.7 <[email protected]> * fix(minions): updateProgress missed $1::jsonb cast in unification Residual from c502b7e — updateProgress was the only remaining JSONB write without the explicit ::jsonb cast. Not broken (implicit cast works) but breaks the convention the prior commit unified everywhere else. Co-Authored-By: Claude Opus 4.7 <[email protected]> * doc: Minions v7 skill count + jobs subcommands (26 skills) README: bump skill count 25 → 26, add minion-orchestrator row, add `gbrain jobs` command family block so v0.11.0's headline feature is actually discoverable from the top-level commands reference. CLAUDE.md: unit test count 48 → 49 (minions.test.ts expanded), skill count 25 → 26, add minion-orchestrator to Key files + skills categorization, expand MinionQueue one-liner to cover v7 primitives (depth/child-cap, timeouts, idempotency, child_done inbox, removeOnComplete/Fail). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <[email protected]> * feat: Minions adoption UX — smoke test + migration + pain-triggered routing Teach OpenClaw when to reach for Minions vs native subagents. Ship three pieces so upgrading from v0.10.x actually lands for real users: - `gbrain jobs smoke` — one-command health check that submits a `noop` job, runs a worker, verifies completion, and prints engine-aware guidance (PGLite installs get the "daemon needs Postgres, use --follow" note). Fails loud if schema's below v7 so the user knows to `gbrain init`. - `skills/migrations/v0.11.0.md` — post-upgrade migration file the auto-update agent reads. Six steps: apply schema, run smoke, ask user via AskUserQuestion which mode they want (always / pain_triggered / off), write to `~/.gbrain/preferences.json`, sanity-check handlers, mark done. Completeness scores on each option so the recommendation is explicit. - `skills/conventions/subagent-routing.md` rewritten — was a "MUST use Minions for ALL background work" mandate, now reads preferences.json on every routing decision and branches on three modes. Mode B (pain_triggered) is the default: keep subagents until gateway drops state, parallel > 3, runtime > 5min, or user expresses frustration. Then pitch the switch in-session with a specific script. Rename pass: "Minions v7" → "Minions" in README (JOBS block), TODOS.md (P1 section header + depends-on), CHANGELOG.md v0.11.0 entry. v7 stays as the internal schema version in code/migration contexts. The product name is just Minions. Co-Authored-By: Claude Opus 4.7 <[email protected]> * doc(readme): promote Minions — 6 OpenClaw pains + how each is fixed The one-line mention in the skills table wasn't doing the work. Added a dedicated section between "How It Works" and "Getting Data In" that leads with the six multi-agent failures every OpenClaw user hits daily (spawn storms, hung handlers, forgotten dispatches, unstructured debugging, gateway crashes, runaway grandchildren) and maps each pain to the specific Minions primitive that fixes it. Includes the smoke test command, the adoption default (pain_triggered), and a pointer to skills/minion-orchestrator for the full patterns. Co-Authored-By: Claude Opus 4.7 <[email protected]> * test(bench): add harness for Minions vs OpenClaw subagent dispatch Shared harness (openclawDispatch + minionsHandler) using matching claude-haiku-4-5 calls on both sides so the delta measures queue+ dispatch overhead on top of identical LLM work. Includes statsFromResults (p50/p95/p99) and formatStats helpers. Uses `openclaw agent --local` embedded mode; does not test gateway multi-agent fan-out (documented in the harness header). * test(bench): durability under SIGKILL — Minions vs OpenClaw --local Headline bench for the claim: when the orchestrator dies mid-dispatch, Minions rescues via PG state + stall detection; OpenClaw --local loses in-flight work outright. Minions side: seed 10 active+expired-lock rows (exact state a SIGKILLed worker leaves) then run a rescue worker. Expect 10/10 completed. OpenClaw side: spawn 10 `openclaw agent --local` in parallel, SIGKILL each at 500ms, count pre-kill delivered output. Expect 0/10 — no persistence layer, nothing to recover. Budget: ~$0 (Minions handlers sleep 10ms; OC calls die at 500ms so partial LLM billing is negligible). * test(bench): per-dispatch throughput — Minions vs OpenClaw --local 20 serial dispatches each side, identical claude-haiku-4-5 call with the same trivial prompt. p50/p95/p99 reported via statsFromResults. Serial (not parallel) so the per-dispatch cost is measured honestly and LLM token spend stays bounded (~$0.08 total). Minions: one queue, one worker, one concurrency. Submit → poll to completion before next submit. OpenClaw: N sequential `openclaw agent --local` spawns. * test(bench): fan-out — Minions 10-wide concurrency vs 10 parallel OC spawns Parent dispatches 10 children, waits for all to return. Minions uses worker concurrency=10 sharing one warm process; OpenClaw parallel `openclaw agent --local` spawns, each boots its own runtime. 3 runs × 10 children per run. Reports ok count and wall time per run plus summary. Honest caveat documented: does not test OC gateway multi-agent fan-out — that needs a custom WS client and LLM-backed parent agent. This measures what users script today. Budget: ~$0.12 LLM spend. * test(bench): memory — 10 in-flight subagents, single-proc vs 10-proc cost Measures resident memory for keeping 10 subagents in flight. Minions: one worker process, concurrency=10 with handlers that park on a promise — sample RSS of the test process via process.memoryUsage(). OpenClaw: 10 parallel `openclaw agent --local` processes, sum their RSS via `ps -o rss=`. Handlers are cheap sleeps, no LLM — we want harness memory, not LLM client state. Budget: $0. * test(bench): fan-out — don't gate on OC success rate, report numbers Initial run showed OC parallel `--local` at 10-wide hits 40% failure rate (17/30 across 3 runs). That's the finding, not a test bug — process startup stampede + LLM rate limits. Bench now prints error samples and reports the numbers instead of gating. Minions side still gates at 90% (30/30 observed in practice). * doc(benchmarks): Minions vs OpenClaw --local subagent dispatch Real numbers on four claims: durability, throughput, fan-out, memory. Same claude-haiku-4-5 call on both sides so the delta is queue+dispatch+ process cost on top of identical LLM work. Headline: Minions rescues 10/10 from a SIGKILLed worker in 458ms while OpenClaw --local loses all 10; ~10× faster per dispatch (778ms p50 vs 8086ms p50); ~21× faster at 10-wide fan-out AND 100% reliable vs OC's 43% failure rate; 2 MB vs 814 MB to keep 10 subagents in flight. Honest caveats section covers what this doesn't test (OC gateway multi-agent, load tests, other models). Fully reproducible via test/e2e/bench-vs-openclaw/. * doc(readme): inject Minions vs OpenClaw bench numbers Headline deltas now in the Minions section: 10/10 vs 0/10 on crash, ~10× faster per dispatch, ~21× faster fan-out at 10-wide with 0% failure vs 43%, ~400× less memory. Links to the full bench doc. Prose first said Minions "fixes all six pains." Now it shows the numbers that prove it. * bench: production Wintermute benchmark — Minions 753ms vs sub-agent timeout Real deployment: 45K-page brain on Render+Supabase. Task: pull 99 tweets, write brain page, commit, sync. Minions: 753ms, $0. Sub-agent: gateway timeout (>10s, couldn't even spawn under production load). Also: 19,240 tweets backfilled across 36 months in 15 min at $0. Sub-agents would cost $1.08 and fail 40% of spawns. * bench: tweet ingestion — Minions 719ms vs OpenClaw 12.5s (17×) Production benchmark with runnable test code: - test/e2e/bench-vs-openclaw/tweet-ingest.bench.ts (reusable) - docs/benchmarks/2026-04-18-tweet-ingestion.md (publishable) Task: pull 100 tweets from X API, write brain page, commit, sync. Minions: 719ms mean, $0, 100% success. OpenClaw: 12,480ms mean, $0.03/run, 60% success (gateway timeouts). At scale: 36-month backfill, 19K tweets, 15 min, $0 vs est. $1.08. * doc(benchmarks): Wintermute production data point for Minions vs OpenClaw Adds a production-environment data point to the Minions README section: one month of tweet ingest on Wintermute (Render + Supabase + 45K-page brain) ran end-to-end in 753ms for \$0.00 via Minions, while the equivalent sessions_spawn hit the 10s gateway timeout and produced nothing. Full methodology + logs in docs/benchmarks/2026-04-18-minions-vs-openclaw-production.md. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(core): preferences.ts + cli-util.ts — foundations for v0.11.1 Adds two foundational modules that apply-migrations (Lane A-4), the v0.11.0 orchestrator (Lane C-1), and the stopgap script (Lane C-4) all depend on. - src/core/preferences.ts: atomic-write ~/.gbrain/preferences.json (mktemp + rename, 0o600, forward-compatible for unknown keys) with validateMinionMode, loadPreferences, savePreferences. Plus appendCompletedMigration + loadCompletedMigrations for the ~/.gbrain/migrations/completed.jsonl log (tolerates malformed lines). Uses process.env.HOME || homedir() so $HOME overrides work in CI and tests; Bun's os.homedir() caches the initial value and ignores later mutations. - src/core/cli-util.ts: promptLine(prompt) helper, extracted from src/commands/init.ts:212-224. Shared so init, apply-migrations, and the v0.11.0 orchestrator's mode prompt don't each reinvent it. test/preferences.test.ts: 21 unit tests covering load/save atomicity, 0o600 perms, forward-compat for unknown keys, minion_mode validation, completed.jsonl JSONL append idempotence, auto-ts population, malformed- line tolerance in loadCompletedMigrations. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(init): add --migrate-only flag (schema-only, no saveConfig) Context: v0.11.0 migration orchestrators need a safe way to re-apply the schema against an existing brain without risking a config flip. Today running bare `gbrain init` with no flags defaults to PGLite and calls saveConfig, which would silently overwrite an existing Postgres database_url — caught by Codex in the v0.11.1 plan review as a show-stopper data-loss bug. The new --migrate-only path: - loadConfig() reads the existing config (does NOT call saveConfig) - errors out with a clear "run gbrain init first" if no config exists - connects via the already-configured engine, calls engine.initSchema(), disconnects - --json emits structured success/error payloads Everything downstream in the v0.11.1 migration chain (apply-migrations, the stopgap bash script, the package.json postinstall hook) will invoke this flag rather than bare gbrain init. test/init-migrate-only.test.ts: 4 tests covering the no-config error path, --json error payload shape, happy-path with a PGLite fixture (verifies config.json content is byte-identical after the call — the real invariant), and idempotent rerun. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(migrations): TS registry replaces filesystem migration scan Context: Codex flagged that bun build --compile produces a self-contained binary, and the existing findMigrationsDir() in upgrade.ts:145 walks skills/migrations/v*.md on disk — which fails on a compiled install because the markdown files aren't bundled. The plan's fix is a TS registry: migrations are code, imported directly, visible to both source installs and compiled binaries. - src/commands/migrations/types.ts: shared Migration, OrchestratorOpts, OrchestratorResult types. - src/commands/migrations/index.ts: exports the migrations[] array, getMigration(version), and compareVersions() (semver comparator). The feature_pitch data that lived in the MD file frontmatter now lives here as a code constant on each Migration, so runPostUpgrade's post-upgrade pitch printer can consume it without a filesystem read. - src/commands/migrations/v0_11_0.ts: stub orchestrator + pitch. The full phase implementation lands in Lane C-1; for now the stub throws a clear "not yet implemented" so apply-migrations --list (Lane A-4) can still enumerate the migration. test/migrations-registry.test.ts: 9 tests covering ascending-semver ordering, feature_pitch shape invariants, getMigration lookup, and compareVersions edge cases (equal / newer / older / single-digit across major bumps). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(cli): gbrain apply-migrations — migration runner CLI Reads ~/.gbrain/migrations/completed.jsonl, diffs against the TS migration registry, runs pending orchestrators. Resumes status:"partial" entries (the stopgap bash script writes these so v0.11.1 apply-migrations can pick up where it left off). Idempotent: rerunning when up-to-date exits 0. Flags: --list Show applied + partial + pending + future. --dry-run Print the plan; take no action. --yes / --non-interactive Skip prompts (used by runPostUpgrade + postinstall). --mode <a|p|o> Preset minion_mode (bypasses the Phase C TTY prompt). --migration vX.Y.Z Force-run one specific version. --host-dir <path> Include $PWD in host-file walk (default is $HOME/.claude + $HOME/.openclaw only). --no-autopilot-install Skip Phase F. Diff rule (Codex H9): apply when no status:"complete" entry exists AND migration.version ≤ installed VERSION. Previously proposed rule was "version > currentVersion", which would SKIP v0.11.0 when running v0.11.1; regression test in apply-migrations.test.ts pins the correct semantics. Registered in src/cli.ts CLI_ONLY Set; dispatched before connectEngine so each phase owns its own engine/subprocess lifecycle (no double-connect when the orchestrator shells out to init --migrate-only or jobs smoke). test/apply-migrations.test.ts: 18 unit tests covering parseArgs for every flag, indexCompleted/statusForVersion correctness (including stopgap-then- complete transition), and buildPlan's four buckets (applied / partial / pending / skippedFuture) with the Codex H9 regression pinned. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(upgrade): runPostUpgrade tail-calls apply-migrations; postinstall hook Closes the v0.11.0 mega-bug: migration skills never fired on upgrade. `runPostUpgrade` now does two things: 1. Cosmetic: prints feature_pitch headlines for migrations newer than the prior binary. Uses the TS registry (Codex K) instead of walking skills/migrations/*.md on disk — compiled binaries see the same list source installs do. 2. Mechanical: invokes apply-migrations --yes --non-interactive in the same process so Phase F (autopilot install) doesn't hit a subprocess timeout wall. Catches + surfaces errors without failing the upgrade. Also: - Drops the early-return on missing upgrade-state.json (Codex H8). runPostUpgrade now runs apply-migrations unconditionally; it's cheap when nothing is pending. This repairs every broken-v0.11.0 install on their next upgrade attempt. - Bumps the `gbrain post-upgrade` subprocess timeout in runUpgrade from 30s → 300s (Codex H7). A v0.11.0→v0.11.1 migration that has to schema-init + smoke + prefs + host-rewrite + launchd-install exceeds 30s trivially. - Removes now-dead findMigrationsDir + extractFeaturePitch helpers and their filesystem-reading imports (readdirSync, resolve). - src/cli.ts post-upgrade dispatch now awaits the async runPostUpgrade. apply-migrations (Lane A-4): - First-install guard: loadConfig() check at the top. No brain configured = exit silently for --yes / --non-interactive (postinstall stays quiet on fresh `bun add gbrain`); explicit message on --list / --dry-run. package.json: - New `postinstall` script: gbrain --version >/dev/null 2>&1 && gbrain apply-migrations --yes --non-interactive 2>/dev/null || true. The --version sanity check guards against a half-written binary (Codex review criticism). || true prevents `bun update gbrain` failure mid-upgrade. Manual smoke verified: fresh $HOME with no config → apply-migrations --yes silently exits 0; --dry-run prints the one-liner "No brain configured... Nothing to migrate." Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * refactor(commands): extract library-level Core functions that throw not exit Codex architecture finding #5: reusing CLI entry-point functions as Minions handler bodies is wrong. If a Minion invokes runExtract / runEmbed / runBacklinks / runLint and the handler hits a process.exit(1), the ENTIRE WORKER process dies — killing every other in-flight job. Handlers need library-level APIs that throw, and the CLI stays a thin wrapper that catches + exits. Per-command shape: - runXxxCore(opts): throws on validation errors, returns structured result. Handler-safe. - runXxx(args): arg parser; calls Core; catches; process.exit(1) on thrown errors. CLI-safe. Shipped: - runExtractCore({ mode, dir, dryRun?, jsonMode? }) → ExtractResult - runEmbedCore({ slug? | slugs? | all? | stale? }) → void - runBacklinksCore({ action, dir, dryRun? }) → BacklinksResult - runLintCore({ target, fix?, dryRun? }) → LintResult sync.ts is already correct — performSync throws; runSync wraps. No change. import.ts deferred to v0.12.0 (its one process.exit fires only on a missing dir arg; handlers always pass a dir, so worker-kill risk is zero in practice). Noted in the plan's Out-of-scope. Smoke verified: all four Core functions throw on invalid mode / missing dir / not-found target instead of exiting the process. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(jobs): Tier 1 handlers + autopilot-cycle (the killer handler) registerBuiltinHandlers now handlers every operation autopilot needs to dispatch via Minions + the single autopilot-cycle handler the autopilot loop actually submits each interval. Existing handlers (sync, embed, lint) rewired to call library-level Core functions directly instead of the CLI wrappers. CLI wrappers call process.exit(1) on validation errors; if a worker claimed a badly-formed job, the WORKER PROCESS would die — killing every in-flight job. Cores throw, so one bad job fails one job. New handlers: - extract → runExtractCore (mode: links|timeline|all, dir) - backlinks → runBacklinksCore (action: check|fix, dir) - autopilot-cycle → THE killer handler. Runs sync → extract → embed → backlinks inline. Each step wrapped in try/catch; returns { partial: true, failed_steps: [...] } when any step fails. Does NOT throw on partial failure — that would trigger Minion retry, and an intermittent extract bug would block every future cycle. Replaces the 4-job parent-child DAG proposed in early plan drafts (Codex H3/H4: parent/child is NOT a depends_on primitive in Minions). import.ts handler still uses the CLI wrapper (runImport) — import's one process.exit fires only on a missing dir arg and the handler always passes a dir; Core extraction deferred to v0.12.0 when Tier 2 refactors happen. registerBuiltinHandlers promoted from private to exported for testability. test/handlers.test.ts: 4 tests. Asserts every expected handler name registers. Asserts autopilot-cycle against a nonexistent repo returns { partial: true, failed_steps: ['sync', 'extract', 'backlinks'] } — does NOT throw. Asserts autopilot-cycle against an empty (but real) git repo returns a result with a steps map, never throws. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(autopilot): Minions dispatch + worker spawn supervisor + async shutdown Autopilot now dispatches each cycle as a single `autopilot-cycle` Minion job (with idempotency_key on the cycle slot) instead of running steps inline. A forked `gbrain jobs work` child drains the queue durably, supervised by autopilot. The user runs ONE install step (`gbrain autopilot --install`) and gets sync + extract + embed + backlinks + durable job processing, with no separate worker daemon to manage. Mode selection: - minion_mode=always OR pain_triggered (default), engine=postgres → Minions dispatch. Spawn child, submit autopilot-cycle each interval. - minion_mode=off, OR engine=pglite, OR `--inline` flag → run steps inline in-process, same as pre-v0.11.1. PGLite has an exclusive file lock that blocks a second worker process, so the inline path is the only path that works there. Worker supervision: - spawn(resolveGbrainCliPath(), ['jobs', 'work'], { stdio: 'inherit' }). stdio:'inherit' avoids pipe-buffer blocking (Codex architecture #2). - On worker exit: 10s backoff + restart. Crash counter caps at 5 → autopilot stops with a clear error. - resolveGbrainCliPath() prefers argv[1] (cli.ts / /gbrain), then process.execPath (compiled binary suffix check), then `which gbrain` (installed to $PATH). NEVER blindly uses process.execPath, which on source installs is the Bun runtime, not `gbrain` (Codex architecture #1). Shutdown: - Async SIGTERM/SIGINT handler: sends SIGTERM to worker, awaits its exit for up to 35s (the worker's own drain is 30s; we add buffer for signal-delivery latency), then SIGKILL if still alive. - Drops the old `process.on('exit')` lock-cleanup handler — its callback runs synchronously and can't wait for the worker drain. Lock file cleanup moved inside the async shutdown. Lock-file mtime refresh every cycle (Codex C) so a long-lived autopilot doesn't get declared "stale" by the next cron-fired invocation after 10 minutes. Inline fallback path calls the new Core fns (runExtractCore, runEmbedCore) instead of the CLI wrappers. That way a bad arg from inside the loop can't process.exit() the autopilot itself (matches Codex #5). test/autopilot-resolve-cli.test.ts: 3 tests covering argv[1]-as-gbrain, argv[1]-as-cli.ts, and graceful error when no path resolves. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(autopilot): env-aware install + OpenClaw bootstrap injection Expand installDaemon from 2 targets (macOS launchd, Linux crontab) to 4: - macos → launchd plist (unchanged) - linux-systemd → ~/.config/systemd/user/gbrain-autopilot.service with Restart=on-failure, RestartSec=30, and an is-system-running probe to confirm the user bus actually works (Codex architecture #7 hardened — the naive /run/systemd/system existence check was a false-positive magnet) - ephemeral-container → detects RENDER / RAILWAY_ENVIRONMENT / FLY_APP_NAME / /.dockerenv. Crontab is unreliable here (wiped on deploy), so we write ~/.gbrain/start-autopilot.sh and tell the user to source it from their agent's bootstrap - linux-cron → existing crontab path (unchanged) detectInstallTarget() + --target flag for explicit override. Also: - --inject-bootstrap / --no-inject control OpenClaw ensure-services.sh auto-injection. Default is ON when OpenClaw is detected (OPENCLAW_HOME env var, openclaw.json in CWD or $HOME, or an ensure-services.sh found). Injection adds ONE line with a `# gbrain:autopilot v0.11.0` marker and writes .bak.<ISO-timestamp> before touching the file. Idempotent — the marker check prevents double injection. uninstallDaemon mirrors all four targets. A user can now run `gbrain autopilot --uninstall` after moving hosts (macOS laptop → Linux server) and the uninstall will find + remove every artifact. writeWrapperScript now uses resolveGbrainCliPath() instead of blindly baking process.execPath into the wrapper script — on source installs that path is the Bun runtime, not gbrain (Codex architecture #1 fix propagated to the install path too). test/autopilot-install.test.ts: 4 tests covering detectInstallTarget's platform + env-var branches. Deeper E2E coverage (systemd unit file contents, ephemeral start-script contents + exec bit, OpenClaw marker injection + .bak) lives in Task 14's E2E fixture test. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(migrations): v0.11.0 orchestrator — phases A through G, full implementation Replaces the stub from commit de027ce. The orchestrator runs all seven phases of the v0.11.0 Minions adoption migration idempotently, resumable from any prior status:"partial" run (the stopgap bash script writes those). Phases: A. Schema — `gbrain init --migrate-only` (NEVER bare `gbrain init`, which defaults to PGLite and clobbers existing configs — Codex H1 show-stopper). B. Smoke — `gbrain jobs smoke`. Abort loudly on non-zero. C. Mode — --mode flag wins. Preserved from prefs on resume. Non-TTY or --yes defaults pain_triggered with explicit print. Interactive: numbered 1/2/3 menu via shared promptLine. D. Prefs — savePreferences({minion_mode, set_at, set_in_version}). E. Host — AGENTS.md marker injection + cron manifest rewrites. For cron entries whose skill matches a gbrain builtin (sync/embed/lint/import/extract/backlinks/autopilot-cycle) rewrites kind:agentTurn → kind:shell with a gbrain jobs submit command. PGLite branch keeps --follow (inline execution, the only path that works without a worker daemon); Postgres branch drops --follow + adds --idempotency-key ${handler}:${slot} so long cron jobs don't stack up (same Codex fix as the autopilot-cycle dispatch). For non-builtin handlers (host-specific, like ea-inbox-sweep, frameio-scan, x-dm-triage) emits a structured TODO row to ~/.gbrain/migrations/pending-host-work.jsonl so the host agent can walk through plugin-contract work per skills/migrations/v0.11.0.md. F. Install — `gbrain autopilot --install --yes`. Best-effort (failure doesn't abort; user can run manually). G. Record — append to completed.jsonl. status:"complete" unless pending_host_work > 0, in which case status:"partial" + apply_migrations_pending: true. Safety guards (Codex code-quality tension #3: strict-skip, no rollback): - Scope: $HOME/.claude + $HOME/.openclaw only by default. --host-dir must be explicit to include $PWD or any other path. - Symlink escape: SKIP if the resolved target leaves the scoped root. - >1 MB files: SKIP with warning. - Permission denied: SKIP with warning; other files continue. - Malformed JSON manifest: SKIP with parse error logged; continue. - mtime re-check right before write: bail the file if changed between read + write; other files continue. - Every edit writes a .bak.<ISO-timestamp> sibling first (second- precision so two same-day runs don't collide). - Idempotency: `_gbrain_migrated_by: "v0.11.0"` JSON property marker on each rewritten cron entry (JSON can't have comments — Codex G); AGENTS.md marker `<!-- gbrain:subagent-routing v0.11.0 -->`. - TODO dedupe: JSONL appends deduped by (handler, manifest_path) so reruns don't grow the file. Post-run summary: when pending_host_work > 0, prints a one-liner pointing the user at the JSONL path + the v0.11.0 skill file. The skill (Lane C-3 / C-4) is the host-agent instruction manual. test/migrations-v0_11_0.test.ts: 18 tests covering: - AGENTS.md injection: happy path, .bak creation, idempotent rerun, --dry-run no-op, symlink-escape SKIP, >1MB SKIP. - Cron rewrite: builtin handlers rewrite to shell+gbrain jobs submit, non-builtins emit JSONL TODOs without touching the manifest, mixed manifests get both treatments in one pass, idempotent rerun, TODO dedupe, malformed JSON SKIP, no-entries-array SKIP, --dry-run no-op. - findAgentsMdFiles + findCronManifests: scoped walk to $HOME/.claude + $HOME/.openclaw, --host-dir opt-in for $PWD. - BUILTIN_HANDLERS frozen at the canonical 7 names. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(skill): port skillify from Wintermute, pair with check-resolvable Skillify is the "meta skill": turn any raw feature or script into a properly-skilled, tested, resolvable, evaled unit of agent-visible capability. Proven in production on Wintermute; paired with gbrain's existing `check-resolvable` it becomes a user-controllable equivalent of Hermes' auto-skill-creation — you decide when and what, the tooling keeps the checklist honest. Shipped: - skills/skillify/SKILL.md — ported from ~/git/wintermute/workspace/ skills/skillify/SKILL.md. Genericized: * /data/.openclaw/workspace → \${PROJECT_ROOT} (runtime-detected). * services/voice-agent/__tests__/ → test/ (detected from repo). * Manual `grep skills/... AGENTS.md` replaced with a reference to `gbrain check-resolvable`, which does reachability + MECE + DRY + gap detection properly instead of grep-matching a path string. - scripts/skillify-check.ts — ported from ~/git/wintermute/workspace/scripts/skillify-check.mjs. Preserves the --recent flag and --json output shape. Detects project root via package.json walkup; detects test dir (test/ → __tests__/ → tests/ → spec/). Runs the 10-item checklist per target and exits non-zero if any required item is missing. - test/skillify-check.test.ts — 4 CLI tests: happy-path against publish.ts (known-skilled), --json shape + schema, --recent smoke, bogus-target exit code. - skills/RESOLVER.md — adds the trigger row ("Skillify this", "is this a skill?", "make this proper") → skills/skillify/SKILL.md. - skills/manifest.json — adds the skillify entry so the conformance test passes. Why the pair: * Hermes auto-creates skills in the background. Fine until you don't know what the agent shipped — checklists decay silently. * gbrain ships the same capability as two user-controlled tools: /skillify builds the checklist, gbrain check-resolvable validates reachability + MECE + DRY across the whole skill tree. * Human keeps judgment. Tooling keeps the checklist honest. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * docs(v0.11.1): cron-via-minions convention, plugin-handlers guide, minions-fix, skill updates New reference docs: - skills/conventions/cron-via-minions.md — the rewrite convention for cron manifests. Shows the Postgres (fire-and-forget + idempotency- key) vs PGLite (--follow inline) branch; explains why builtin-only auto-rewrite is safe + how host-specific handlers get the plugin contract. - docs/guides/plugin-handlers.md — the plugin contract for host- specific Minion handlers. Code-level registration via import + worker.register(), not a data file (Codex D: handlers.json was an RCE surface). Concrete TypeScript skeleton + handler contract (ctx.data, ctx.signal, ctx.inbox) + full migration flow from TODO JSONL to a rewritten cron entry. - docs/guides/minions-fix.md — user-facing troubleshooting for half-migrated v0.11.0 installs. Paste-one-liner for the stopgap, gbrain apply-migrations path for v0.11.1+, verification commands, failure-mode recipes. Rewrites + updates: - skills/migrations/v0.11.0.md — body restored as the host-agent instruction manual. Audience is the host agent reading ~/.gbrain/migrations/pending-host-work.jsonl after the CLI orchestrator has done the mechanical phases. Walks each TODO type through the 10-item skillify checklist (plugin contract, ship bootstrap, unit tests, integration tests, LLM evals, resolver trigger, trigger eval, E2E smoke, brain filing, check-resolvable). Reverses the earlier "delete the body" decision (1B) because the body serves a different audience now — host-agent, not CLI documentation. - skills/cron-scheduler/SKILL.md — Phase 4 ("Register with host scheduler") now references cron-via-minions + plugin-handlers. - skills/maintain/SKILL.md — new "Fix a half-migrated install" section with the apply-migrations recipe. - skills/setup/SKILL.md — new Phase C.5 "One-step autopilot + Minions install (v0.11.1+)" explaining the four install targets + the OpenClaw auto-injection default. - docs/GBRAIN_SKILLPACK.md — Operations section adds the three new guides + the subagent-routing and cron-routing SKILLPACK notes (v0.11.0+). All 167 related tests (conformance + resolver + skillify-check + v0_11_0 orchestrator) stay green. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(v0.11.1): stopgap script + CLAUDE.md directive + README + CHANGELOG + version bump scripts/fix-v0.11.0.sh — the paste-command for broken-v0.11.0 installs. Released on the v0.11.1 tag so: curl -fsSL https://raw.githubusercontent.com/garrytan/gbrain/v0.11.1/scripts/fix-v0.11.0.sh | bash always works (master branch could be renamed). 8 steps: schema apply, smoke, mode prompt (non-TTY defaults pain_triggered), atomic write of preferences.json (0o600), append completed.jsonl with status:"partial" and apply_migrations_pending:true so the v0.11.1 apply-migrations run resumes correctly (does NOT poison the permanent migration path — Codex H2 avoidance), AGENTS.md + cron/jobs.json detection with guidance printed as text only (never auto-edits from a curl-piped script), and a closing line telling the user to run `gbrain autopilot --install` as the one-stop finisher. CLAUDE.md — new "Migration is canonical, not advisory" section pinning the design principle. Any host-repo change (AGENTS.md, cron manifests, launchctl units) is GBrain's responsibility via the migration; the exception is host-specific handler registration, which goes via the code-level plugin contract in docs/guides/plugin-handlers.md. README.md — new sections: - "v0.11.0 migration didn't fire on your upgrade?" with both repair paths (v0.11.1 binary and pre-v0.11.1 stopgap). - "Skillify + check-resolvable: user-controllable auto-skill-creation" explaining why the user-controlled pair beats Hermes-style auto generation. Includes the scripts/skillify-check.ts invocation. CHANGELOG.md — v0.11.1 entry (per CLAUDE.md voice: lead with what the user can now do that they couldn't before; frame as benefits, not files changed). Covers: mega-bug fix + apply-migrations + postinstall + stopgap, autopilot-supervises-worker + single-install-step + env-aware targets, Core fn extraction so handlers don't kill workers, skillify + check-resolvable pair, host-agnostic plugin contract replacing handlers.json (RCE concern), gbrain init --migrate-only, TS migration registry + H8/H9 diff-rule fixes, CLAUDE.md directive. All Codex hard blockers (H1, H3/H4, H5, H6, H7, H8, H9, K) + architecture issues (#1/#2/#4/#5/#7) resolved. package.json — version bump 0.11.0 → 0.11.1. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * test(e2e): migration-flow E2E against live Postgres + Bun env quirk fix Ships test/e2e/migration-flow.test.ts — the end-to-end integration test for the v0.11.0 orchestrator. Spins up against a live Postgres (gated on DATABASE_URL per CLAUDE.md lifecycle) and exercises four scenarios: - Fresh install: schema apply (Phase A via `gbrain init --migrate-only`) → smoke (Phase B) → mode resolution (C) → prefs (D) → host rewrite (E, empty fixture) → record (G). Asserts preferences.json exists with 0o600, completed.jsonl has a v0.11.0 entry, autopilot install was skipped per --no-autopilot-install. - Idempotent rerun: second orchestrator invocation on a completed install doesn't blow up; mode stays stable. - Host rewrite mixed manifest: 4-entry cron/jobs.json with 2 gbrain- builtin handlers (sync, embed) + 2 non-builtin (ea-inbox-sweep, morning-briefing). Asserts builtins rewrite to `gbrain jobs submit` kind:shell, non-builtins are LEFT on kind:agentTurn, and 2 JSONL TODOs are emitted with correct shape. AGENTS.md gets the marker injected. Status is "partial" because pending-host-work > 0. - Resumable: stopgap writes a partial completed.jsonl row first; orchestrator re-runs successfully against it and appends a new post-orchestrator entry. 1 partial + 1 complete = 2 rows total. Critical fix surfaced by the E2E: src/commands/migrations/v0_11_0.ts's three execSync calls (gbrain init --migrate-only, gbrain jobs smoke, gbrain autopilot --install) now explicitly pass `env: process.env`. Bun's execSync default does NOT propagate post-start `process.env.PATH` mutations to subprocesses — only the initial PATH snapshot. Without the explicit env, any user-side env tweak (e.g. setting GBRAIN_DATABASE_URL in a script before calling the orchestrator) would be invisible to the orchestrator's subprocesses. This is also the reason the E2E needs a PATH shim installed at module-load time to expose the `gbrain` command. test/init-migrate-only.test.ts: subprocess env now strips DATABASE_URL and GBRAIN_DATABASE_URL. The "no config" error-path tests need loadConfig() to return null, which it won't if the env-var fallback at src/core/config.ts:30 fires. Before this fix, running the unit tests with DATABASE_URL set (e.g. during an E2E run) caused false failures because `gbrain init --migrate-only` saw the env var and succeeded. Full test totals with live Postgres: 1265 pass, 0 fail, 3497 expect calls, 67 files, ~95s. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * chore: bump VERSION file to 0.11.1 Commit 5c4cf1d bumped package.json version to 0.11.1 but missed the root VERSION file. src/version.ts reads from package.json so `gbrain --version` prints 0.11.1 correctly, but any tool or script that reads the VERSION file directly (like /ship's idempotency check) saw the stale 0.11.0. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(v0.11.1): doctor self-heal check + skillpack-check command for cron health reports Closes the discoverability hole from the v0.11.0 mega-bug: once a user is on v0.11.1 (or later), every `gbrain doctor` invocation immediately surfaces a half-migrated state, and `gbrain skillpack-check` gives host agents (Wintermute's morning-briefing, any OpenClaw cron) a single exit-coded JSON pipe to check from their own skills. gbrain doctor — two new checks: 1. Filesystem-only (fires on every `doctor` invocation, even --fast): if `~/.gbrain/migrations/completed.jsonl` has any status:"partial" entry with no matching status:"complete" for the same version, print `MINIONS HALF-INSTALLED (partial migration: vX.Y.Z). Run: gbrain apply-migrations --yes`. Typical cause is the stopgap wrote a partial record but nobody ran `apply-migrations` afterward. 2. DB-path: if schema version is v7+ (Minions present) AND `~/.gbrain/preferences.json` is missing, print the same banner. Catches installs that never ran the stopgap or apply-migrations at all — the classic v0.11.0 "upgrade landed, migration never fired" state. Both checks status:"fail" so doctor exits non-zero when either fires. Test `test/doctor-minions-check.test.ts` pins the five branches (partial present → FAIL, partial+complete → quiet, no-jsonl → quiet, multiple versions named correctly, human-readable banner contains the exact "MINIONS HALF-INSTALLED" phrase Wintermute's cron can grep for). gbrain skillpack-check — new command + skill: - `src/commands/skillpack-check.ts` wraps `doctor --fast --json` + `apply-migrations --list` into one JSON report with `{healthy, summary, actions[], doctor, migrations}`. Exit 0 on healthy, 1 on action-needed, 2 on determine-failure. `--quiet` flag for cron pipes that want exit-code-only behavior. - `actions[]` is the remediation list. Doctor messages of the form `... Run: <cmd>` get their command extracted (regex fixed to match the full remainder of the line, not just the first word). Pending or partial migrations push `gbrain apply-migrations --yes` to the front of actions[]. - `gbrainSpawn()` helper resolves the gbrain invocation correctly on compiled binary installs (`argv[1] = /usr/local/bin/gbrain`) AND source installs (`argv[1] = src/cli.ts`, prefix with `bun run`). Same Codex #1 fix pattern as autopilot's resolveGbrainCliPath. - `skills/skillpack-check/SKILL.md` teaches agents when to run it, what to do with the output, and anti-patterns (don't run without --quiet in a cron that emails; don't ignore exit 2). - Registered in skills/RESOLVER.md and skills/manifest.json. Test `test/skillpack-check.test.ts` (5 tests) covers healthy fresh install, half-migrated exit-1 with apply-migrations in actions[], --quiet suppresses stdout in both states, --help prints usage, summary includes top action when multiple are present. 1192 unit tests pass (+15 new). The 38 failing tests are all DATABASE_URL E2Es — same pre-existing pattern, unchanged by this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * doc(v0.11.1): reframe README + minions-fix — v0.11.0 was never released v0.11.0 was cut but never released publicly. v0.11.1 is the first public Minions ship, and fixes the upgrade-migration mega-bug so it self-heals on every future `gbrain upgrade` + `bun update gbrain`. The README was wrongly framing the fix as a retrospective for v0.11.0 users — none exist, so remove it. README changes: - Delete the "v0.11.0 migration didn't fire on your upgrade?" section. Replace with "Health check and self-heal": the `gbrain doctor`, `gbrain skillpack-check --quiet`, and `gbrain skillpack-check | jq` recipes that ship in v0.11.1. Still links to docs/guides/minions-fix.md for deeper troubleshooting. - Promote the production benchmark to top billing. The previous section led with the lab benchmark (same LLM, localhost) and buried the production data point as a single follow-up sentence. Real deployment numbers are the stronger signal: * 753ms vs >10s gateway timeout (sub-agent couldn't even spawn) * $0.00 vs ~$0.03 per run * 100% vs 0% success rate under 19-cron production load * 36-month tweet backfill: 19,240 tweets, ~15 min, $0.00 Lab numbers stay (separate table, labeled "controlled environment") so readers can see both layers. - Add the "The routing rule" closer: Deterministic → Minions, Judgment → Sub-agents. This is the clearest framing in the production benchmark doc and belongs in the README so readers leave with the right mental model. `minion_mode: pain_triggered` automates it. docs/guides/minions-fix.md rewrite: - Reframe as: v0.11.0 never released, v0.11.1 is the first ship, `gbrain apply-migrations --yes` is canonical. Stopgap stays documented for pre-v0.11.1 branch builds (e.g. Wintermute's minions-jobs checkout before v0.11.1 tags). - Add the detection + verification commands (doctor + skillpack-check) at the top. - Cross-reference skills/skillpack-check/SKILL.md as the agent-facing health-check pattern. Zero lingering "v0.11.0 released" references in README or minions-fix. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix(doctor): remove "schema v7+ no prefs → FAIL" check (too aggressive) CI failure in Tier 1 Mechanical E2E: (fail) E2E: Doctor Command > gbrain doctor exits 0 on healthy DB Root cause: the doctor half-migration detection added two checks. The second check (`schema v7+ AND ~/.gbrain/preferences.json missing → minions_config FAIL`) was too aggressive. It treated a valid fresh- install state as broken. `gbrain init` against Postgres applies schema v7 but doesn't write preferences.json — that's the migration orchestrator's Phase D, which only runs via `apply-migrations`. Between `init` finishing and the user running `apply-migrations`, the install is legitimately in a "schema-applied, no prefs" state. Doctor was exiting 1 on this valid state, breaking the pre-existing CI test that init's + docters a healthy DB. Fix: drop the check. The filesystem check (step 3 — partial-completed without a matching complete) is sufficient signal for genuine half- migration. Added a regression test pinning the exact CI scenario: no completed.jsonl present, no preferences.json, doctor must not fail any minions_* check. Also removes the now-unused `preferencesPaths` import. Verified against live Postgres: CI-equivalent `gbrain doctor` + `gbrain doctor --json` both pass. Full suite: 1281/1281 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * doc(readme): Minions section — lead with the story, compress the rest The previous section opened with "six daily pains" as a numbered list before the hook, buried the production numbers halfway down, and had a table explaining how each pain gets fixed. Fine for a spec doc; wrong for a README that needs to land the impact fast. Rewrite: - Lead with "your sub-agents won't drop work anymore" — the reason a reader is here. - Production numbers promoted, framed as a story: "Here's my personal OpenClaw deployment: one Render container, Supabase Postgres holding a 45,000-page brain, 19 cron jobs firing on schedule, the X Enterprise API on the wire..." Gives the reader the setup before the punchline. - The routing rule (deterministic → Minions, judgment → sub-agents) survives unchanged. It's the clearest framing in the whole section. - Lose the "how each pain gets fixed" table. Compress the six pains + their fixes into one paragraph that names the primitives by name (max_children, timeout_ms, child_done inbox, cascade cancel, idempotency keys, attachment validation). Readers who want depth click through to skills/minion-orchestrator/SKILL.md. - Close with "not incrementally better — categorically different" and the three headline numbers. - Drop the separate Lab Numbers table; the production numbers are stronger and the lab data is one click away via the link. Lines: 75 → 42. Same signal, less scroll. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * doc: scrub X Enterprise API + @garrytan references from user-facing docs User feedback: shouldn't name the specific enterprise-tier API product or the account in the README or benchmark docs. Genericize: - "X Enterprise API on the wire" → drop entirely; the 19-cron load story carries the setup without naming the vendor - "X Enterprise API ($50K/mo firehose)" → "external API" - "@garrytan tweets" → "my social posts" - "Pull ~100 @garrytan tweets" → "Pull ~100 of my social posts" - "X Enterprise API (full-archive)" env var comment → "external API bearer token" Scope: - README.md — the Minions production story line + scaling callout - docs/benchmarks/2026-04-18-minions-vs-openclaw-production.md - docs/benchmarks/2026-04-18-tweet-ingestion.md Plain "X API" references in the tweet-ingestion methodology stay — those describe which public HTTP endpoint was called, not the enterprise-tier product. Benchmark doc filenames (tweet-ingestion.md) stay to preserve inbound links; content is genericized. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * doc(readme): Skillify section — match Minions energy, land the category shift The previous section was competent but undersold what skillify actually is. Rewrite matches the Minions section's shape: lead with the hook, tell the story, land the punchline. Key changes: - Title: "your skills tree stops being a black box." Names the thing skillify actually solves. - Open with the problem: Hermes auto-creates skills as a background behavior. Six months later you have an opaque pile nobody's read or tested. Make the liability concrete. - Promote the 10 items by name (SKILL.md + script + unit tests + integration tests + LLM evals + resolver trigger + trigger eval + E2E + brain filing + check-resolvable audit). Showing the list makes the scope of the unlock visible. - New subsection "Why this is the right answer for OpenClaw" names the debugging-the-black-box pain directly. Skillify makes the tree legible: when something breaks, you know which layer (contract, test, eval, trigger, or route) to inspect. When anything goes stale, check-resolvable flags it. - Close with "compounding quality instead of compounding entropy" + "not a nice-to-have. It's the piece that makes the skills tree survive six months." - Expand the code block to include `gbrain check-resolvable` (the other half of the pair) so readers see the whole workflow. Length goes from 17 to 34 lines — still shorter than Minions, still one section. Worth the space because this is a category shift for how agent skills get built, not a feature. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]> Co-authored-by: root <root@localhost>
garrytan
added a commit
that referenced
this pull request
Apr 20, 2026
…ter boundary Codex fixes #1, #2, #3 from the plan's outside-voice review. Enforcement shifts from SOFT-VIA-TYPE-COMMENT to SOFT-VIA-SANITIZED-OBJECT. Hard enforcement via process isolation waits for BrainBench v2 Docker sandbox. **eval/runner/types.ts** additions: - `PublicPage = Pick<Page, 'slug' | 'type' | 'title' | 'compiled_truth' | 'timeline'>` — the exact 5 fields adapters should see. No _facts. No frontmatter (a known hiding spot for accidental gold leaks). - `sanitizePage(p: Page): PublicPage` — returns a NEW object with the 5 fields only. Cannot be bypassed by `(page as any)._facts` because the field does not exist on the sanitized object. - `PublicQuery = Omit<Query, 'gold'>` — strips the gold field. - `sanitizeQuery(q: Query): PublicQuery` — enumerates public fields explicitly (not spread+delete) so no prototype weirdness leaves gold reachable. **eval/runner/multi-adapter.ts** — scoreOneRun now calls sanitizePage / sanitizeQuery before passing to adapter.init / adapter.query. The scorer retains the full Query shape (including gold.relevant) for precision / recall computation. Adapter signatures unchanged — the sealing is at the OBJECT level, not the type level. This keeps existing adapters (ripgrep-bm25, vector-only, hybrid-nograph, gbrain-after) binary-compatible. Verified: no existing adapter reads q.gold or page._facts, so the change is safe without further adapter updates. **test/eval/sealed-qrels.test.ts** (17 tests): - sanitizePage strips _facts + frontmatter + arbitrary hidden keys - Output has exactly the 5 public keys (deep introspection) - Proxy tripwire simulates a malicious adapter: any access to _facts or gold throws `sealed-qrels violation` - sanitizeQuery retains optional fields (as_of_date, tags, author, acceptable_variants, known_failure_modes) but omits undefined ones - Honest documentation of the seal's limits: filesystem bypass and Proxy attacks would still work in v1; Docker isolation (v2) is the real enforcement Every existing eval test still passes (273 before + 17 sealed-qrels = 290). Total eval suite now: 290 pass, 0 fail. Co-Authored-By: Claude Opus 4.6 <[email protected]>
6 tasks
garrytan
added a commit
that referenced
this pull request
Apr 22, 2026
… dotfile resolution) (#337) * feat(v0.17.0 step 1/9): sources primitive — additive-only multi-source foundation Lane A of the multi-repo plan. Installs the sources table and seeds a 'default' row that inherits sync.repo_path/last_commit from existing config. This is the bisectable foundation every later step builds on; the breaking schema changes (composite UNIQUE, files FK rewrite, resolution_type, ingest_log.source_id) land with their paired code rewrites in Steps 2/4/5/7 so no single commit breaks the engine. - migration v16 (sources_table_additive) + v0_17_0 orchestrator skeleton - sort-by-version guard in runMigrations (array insertion order can never cause a later migration to skip a lower one again) - default source seeded with config '{"federated": true}' so pre-v0.17 brains keep single-namespace search semantics after upgrade - orchestrator phase B detects absence of file_migration_ledger and no-ops until Step 7 lands it - 8 new structural tests in test/migrate.test.ts (shape, idempotency, scope-guard that nothing else was smuggled into v16) - apply-migrations tests include v0.17.0 in the registered list Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(v0.17.0 step 2/9): pages.source_id + composite UNIQUE (Lane B) Migration v17 adds pages.source_id with DEFAULT 'default' and swaps the global UNIQUE(slug) for composite UNIQUE(source_id, slug). Ships atomically with the engine's ON CONFLICT rewrite so the constraint swap and the code that writes under it land in the same commit — no window where the engine sees one shape and the schema has another. Minimum-surface engine change: only putPage's ON CONFLICT target needs re-targeting. Other slug-based queries work unchanged because single- source brains (the only brain shape pre-Step-5) have exactly one source 'default', so slug remains effectively unique within it. Step 5+ will surface an explicit sourceId param on putPage for cross-source sync. - migration v17 (pages_source_id_composite_unique) in src/core/migrate.ts - pages.source_id + composite UNIQUE added to schema.sql + pglite-schema.ts for fresh installs - ON CONFLICT (slug) → ON CONFLICT (source_id, slug) in both pglite-engine and postgres-engine putPage - DEFAULT 'default' closes the Codex-flagged race where an INSERT between ADD COLUMN and SET NOT NULL could leave source_id NULL - 5 new v17 structural tests (29 pass / 0 fail in migrate.test.ts) - Full suite: 1979 pass / 3 fail (same as baseline — no regressions) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(v0.17.0 step 6/9): sources CLI + source-resolver (Lane C) Adds the CLI surface for multi-source management. Users can now register, list, rename, federate/unfederate, and attach-to-directory a source. The source-resolver is the shared 6-priority helper that Steps 4/5 will use when they start surfacing an explicit --source flag on sync/extract/query. Commands: gbrain sources add <id> --path <p> [--name <n>] [--federated|--no-federated] gbrain sources list [--json] gbrain sources remove <id> [--yes] [--dry-run] [--keep-storage] gbrain sources rename <id> <new-name> gbrain sources default <id> gbrain sources attach <id> — writes .gbrain-source in CWD gbrain sources detach gbrain sources federate <id> / unfederate <id> Resolution priority (source-resolver.ts) — highest first: 1. --source flag 2. GBRAIN_SOURCE env 3. .gbrain-source dotfile walk-up 4. longest-prefix match on registered local_path (Codex #2 fix) 5. sources.default config 6. fallback 'default' - add: validates id format (kebab-case alnum, 1-32), rejects overlapping paths (eng review §4 finding 4.1), supports federated default opt-in - remove: guards against --yes omission + refuses to remove 'default', supports --dry-run, reports cascade page count - attach/detach: matches kubectl/terraform context-pinning semantics - Throws on overlap rather than process.exit() so the CLI error wrapper reports it consistently (also makes unit testing clean) 28 new tests across sources.test.ts (dispatcher + validation + overlap guard) and source-resolver.test.ts (full 6-priority coverage including longest-prefix). Full suite: 2012 pass / 3 fail (pre-existing PGLite infra timeouts). NOT in scope for Step 6 (deferred): - import-from-github (SSRF + clone integration) - prune (retention/TTL, lands v0.18) - MCP tool-defs regen for source-scoping on read ops (Step 5) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * docs(v0.17.0 step 8/9): getting-started guide + migration skill + citation rule Step 8 (Lane F) documents what Steps 1+2+6 have shipped and sets up the agent-facing rules for multi-source. New files: - skills/migrations/v0.17.0.md — migration skill read by host agents after `gbrain apply-migrations`. Covers the v16+v17 chain, what's in v0.17.0 vs what lands later (v0.17.1 ACL, v0.18 sessions), and the new sources CLI surface. Cites docs/guides/multi-source-brains.md as the recipe. - docs/guides/multi-source-brains.md — getting-started for end users. Three canonical scenarios (unified wiki+gstack / purpose-separated yc-media+garrys-list / mixed), full resolution priority, federation flag semantics, command reference, and citation format. skills/brain-ops/SKILL.md — new "Cross-source citation format" section mandating `[source-id:slug]` when the brain has multiple sources. Matches the contract the /plan-devex-review DX review pinned down (DX Finding 5: surface source_id in every page payload + citation contract). Key must be sources.id (immutable), never sources.name. No behavior change — this is pure documentation for what already exists in the binary. 144 skills conformance tests still pass. NOT in this commit (deferred to later steps): - docs/guides/repo-architecture.md rewrite (lands with the full v0.17.0 PR description + release notes) - skills/_brain-filing-rules.md "which source to file into" guidance (lands with Step 5 when sync surfaces --source) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(v0.17.0 step 5/9): sync --source <id> routes through sources table (Lane D) Adds the --source flag to `gbrain sync`. When set, sync reads local_path + last_commit from the matching sources(id) row instead of the global sync.repo_path / sync.last_commit config keys, and writes last_commit + last_sync_at back to the same row. Backward compat: --source omitted = pre-v0.17 behavior exactly, global config path unchanged. - SyncOpts.sourceId threaded through performSync + performFullSync - readSyncAnchor/writeSyncAnchor helpers centralize the sources-vs-config branch so every read/write goes through one decision point. Makes Step 5's later per-source sync-failures tracking a one-file change. - --source resolved via src/core/source-resolver.ts (Step 6), so any command that shell-exposes resolveSourceId gets env var + dotfile walk-up + longest-prefix for free. - Error message for missing source local_path is actionable: Source "gstack" has no local_path. Run: gbrain sources add gstack --path <path> - last_sync_at auto-updates on every last_commit advance so `gbrain sources list` shows real recency. No regression: 2012 pass / 3 fail (same as baseline). NOT in this commit (deferred per plan): - Per-source failure tracking (~/.gbrain/sources/<id>/sync-failures.jsonl) - runImport source-awareness (import.ts path — Step 5 continuation) - Partial-success semantics when walking N sources — single-source flow today, multi-walk lands when the top-level `gbrain sync` without --source starts iterating all sources. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(v0.17.0 step 4/9): qualified [[source:slug]] + links.resolution_type (Lane B) Adds source-pinned wikilink syntax and records the resolution kind on each edge so `gbrain extract --refresh-unqualified` (future) can re-resolve bare references when the source topology changes. Wikilink syntax extension: [[concepts/ai]] — unqualified; resolves via local-first fallback [[wiki:concepts/ai]] — qualified; target pinned to sources.id='wiki' [[gstack:projects/foo|Display]] — qualified + display name The qualified regex runs first and masks matched spans so the unqualified pass can't double-emit. Source id format enforced to match the sources CLI validation: [a-z0-9](?:[a-z0-9-]{0,30}[a-z0-9])? Schema: - migration v18 adds links.resolution_type TEXT with CHECK constraint ('qualified'|'unqualified' or NULL for legacy/manual/frontmatter edges) - schema.sql + pglite-schema.ts updated for fresh installs EntityRef type: - sourceId is OPTIONAL (only set on qualified wikilinks). Markdown [Name](path) and unqualified wikilinks omit it so strict toEqual tests pre-v0.17 keep working (69 existing tests still pass). Tests: - 5 new qualified-wikilink extraction tests + 1 migration v18 structural assertion. 75 tests in test/link-extraction.test.ts (up from 69). - Full suite: 2018 pass / 3 fail (pre-existing PGLite infra timeouts). NOT in this commit (deferred to Step 3 / Step 5 continuation): - Writing resolution_type to the DB (addLink / addLinksBatch don't carry the field yet — that's the plumb-through that lands with Step 3 when search/dedup also needs source-aware result keys). - `gbrain extract --refresh-unqualified` re-resolver. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(v0.17.0 step 3/9): source-aware search dedup composite keys (Lane B) Search dedup now keys on (source_id, slug) instead of slug alone. Pre- v0.17 would collapse two same-slug pages in different sources into one, destroying cross-source recall. Codex outside-voice review flagged this as regression-critical — this commit ships the fix plus tests that lock the invariant in. Dedup pipeline (src/core/search/dedup.ts): - pageKey(r) helper — one canonical composite-key derivation. Falls back to source_id='default' for pre-v0.17 rows so single-source brains behave identically to before. - Layer 1 (dedupBySource): group-by composite key. - Layer 4 (capPerPage): count-by composite key. - guaranteeCompiledTruth: swap scoped to matching (source_id, slug), so wiki:topics/ai can't accidentally pull gstack:topics/ai's compiled_truth chunk. SearchResult type gains optional source_id — populated by SQL JOINs in both engines, falls through as 'default' for legacy callers. Engine SQL: - pglite-engine.ts + postgres-engine.ts: search SELECTs add p.source_id - rowToSearchResult (utils.ts): maps row.source_id → result.source_id when present. Shape stays backward compatible (field optional). Tests — 4 new in test/dedup.test.ts: - same-slug-different-source does NOT collapse (the critical regression guard Codex called out) - same-slug-same-source DOES still collapse (no over-correction) - missing source_id falls back to 'default' for pre-v0.17 compat - compiled_truth guarantee scopes to composite key (Codex second pass caught this specific path would leak otherwise) Full suite: 2022 pass / 3 fail (3 pre-existing PGLite infra timeouts). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(v0.17.0 step 7/9): file_migration_ledger + phase-B storage backfill (Lane E) Adds files.source_id + files.page_id + the file_migration_ledger state machine that drives storage object rewrites. Each per-file transition is its own transaction so crash-point recovery is a ledger read, not a filesystem inspection. Codex second-pass review flagged that "skip if already has source prefix" was an unsafe heuristic — the ledger replaces it with explicit state tracking. Schema: - migration v19 (files_source_id_page_id_ledger): handler-only (PGLite has no files table; Postgres-only gate). ADDs source_id + page_id to files, backfills page_id from page_slug scoped to source_id='default', creates file_migration_ledger with PK on file_id (Codex: not storage_path_old — two sources can share an old path during migration). - schema.sql updated for fresh Postgres installs; file_migration_ledger gets RLS alongside other tables. Runtime: - src/commands/migrations/v0_17_0-storage-backfill.ts: drives the ledger state machine pending → copy_done → db_updated → complete. Idempotent per row: re-running resumes from whichever state crashed. Old objects preserved (no delete) so operators can verify the soak window before a future cleanup release. - phase B in v0_17_0.ts orchestrator: wires the storage backend (Supabase/S3/local) through createStorage, runs runStorageBackfill, reports per-state counts + first-three error details. Tests — 13 new in test/storage-backfill.test.ts: - pending → copy_done → db_updated → complete happy path - 3 crash-point recovery tests (resume from copy_done, resume from db_updated, failed rows don't auto-retry) - already-complete rows are skipped with zero side effects - idempotent re-upload (exists-check skips redundant upload) - dry-run mode (no storage, reports counts without mutating) Plus 5 new migrate.test.ts assertions for v19 structure (handler- only, PGLite gate, source_id + page_id + ledger DDL, default-source backfill scope, state machine values). Full suite: 2035 pass / 3 fail (3 pre-existing PGLite infra timeouts). NOT in this commit (explicitly deferred): - DROP old page_slug column — kept for backward compat until operators have time to verify page_id everywhere. - DROP old UNIQUE(storage_path) in favor of UNIQUE(source_id, storage_path) — same reason, deferred to later cleanup. - Actual cleanup phase that deletes old objects post-soak. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * test(v0.17.0 step 9/9): full multi-source PGLite integration suite (Lane G) End-to-end exercise of every v0.17.0 surface against real PGLite (in-memory, fast — no DATABASE_URL needed). The migration chain v2→v19 runs start-to-finish and the test asserts each Step's invariants hold together. 16 new integration tests across 7 describes: 1. Migration-installed state: - sources('default') exists with federated=true config - pages.source_id column has DEFAULT 'default' - composite UNIQUE (source_id, slug) is installed 2. Default-source write path: - putPage without explicit source → source_id='default' via schema default clause (no engine API change needed for single-source brains) 3. Composite UNIQUE regression guards (Codex-flagged): - Same slug in two different sources coexists - Third insert with same (source_id, slug) hits the UNIQUE constraint 4. sources CLI round-trip: - federate / unfederate flips config.federated - rename changes display, id stays immutable 5. Source resolution priority (integration): - Explicit flag > env var > fallback to default - Unregistered explicit source errors with actionable message 6. Cascade semantics: - sources remove cascades to pages; default source untouched 7. links.resolution_type (Step 4): - Qualified/unqualified values accepted - CHECK constraint rejects invalid values All 16 tests pass. Full suite: 2042 pass / 4 fail (4 pre-existing PGLite beforeEach timeouts in test/wait-for-completion, test/extract-fs, test/e2e/search-quality, test/e2e/graph-quality — count fluctuated 3-5 on baseline from variance alone). Total new tests across Steps 1-9: ~85 unit + integration tests (sources, source-resolver, migrate v16/v17/v18/v19 structural, link-extraction qualified wikilinks, dedup regression-critical, storage-backfill state machine + crash recovery, full multi-source PGLite integration). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * chore: bump to v0.18.0 + CHANGELOG entry (multi-source brains) One-viewport release summary + itemized changes covering all 9 steps of the multi-source primitive. Notes the v0.17 → v0.18 version bump rationale (master shipped gbrain dream as v0.17 while this branch was in flight). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix(ci): v0_18_0 orchestrator TS narrow + mechanical test ON CONFLICT Two CI failures on PR #337: 1. tsc TS2367 at src/commands/migrations/v0_18_0.ts:190 — after the early-return on `a.status === 'failed'` (line 179), TypeScript narrows `a.status` to `'skipped' | 'complete'`, so the subsequent `a.status === 'failed' ? 'failed' :` branch was dead code and refused to compile. Dropped the redundant check. 2. E2E `file_list LIMIT enforcement` at test/e2e/mechanical.test.ts:636 — the test pre-seeded a pages row with `ON CONFLICT (slug) DO NOTHING` but v21 swapped the global UNIQUE for `UNIQUE (source_id, slug)`, so Postgres rejects with "no unique or exclusion constraint matching". Updated the conflict target to the composite key. Tier-1 E2E had only this one failing test; everything else passed. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * test(e2e): v0.18.0 multi-source against real Postgres (v20-v23 schema + cascade + sync) Closes the three biggest confidence gaps the author flagged in the self-audit of PR #337: 1. No real Postgres E2E — PGLite has no files table, so v23's files.source_id + files.page_id rewrite + file_migration_ledger seed was NEVER executed against the real DB. This file covers it. 2. `gbrain sync --source <id>` had zero direct tests. Now has two: one that asserts performSync({sourceId}) reads local_path from the sources row (not the global config), one that asserts no-sourceId falls back to the global sync.repo_path. 3. Cascade delete coverage — previously verified only pages count after source removal. Now verifies pages + content_chunks + timeline_entries + links + files ALL cascade-delete when a source is removed. 6 describes, 16 tests total: - Schema shape (fresh install): 6 tests confirming sources('default'), pages.source_id NOT NULL with DEFAULT, composite UNIQUE pages (source_id, slug) replaces global UNIQUE(slug), links.resolution_type column + CHECK, files.source_id + page_id columns, file_migration_ledger table + status CHECK. - Composite UNIQUE semantics: 3 tests confirming same-slug in two sources coexists (Codex-critical regression guard), duplicate (source_id, slug) hits the UNIQUE, putPage targets default source by schema DEFAULT. - Cascade delete: 1 test building a fully populated source (2 pages, chunks, timeline, links, files) then removing it + asserting every dependent row is gone. - Sync routing: 2 tests confirming performSync({sourceId}) reads per-source local_path vs global config. - Sources surface: 3 tests for federate/unfederate flipping + rename preserving id. - Storage backfill: 1 end-to-end test seeding ledger + running runStorageBackfill against a stub StorageBackend, asserting pending → complete transition and files.storage_path rewrite. Gated by DATABASE_URL per CLAUDE.md E2E lifecycle. Each describe's beforeAll defensively DELETEs non-default sources + file_migration_ledger rows so reruns are hermetic (sources isn't in helpers.ALL_TABLES). Verified: 16/16 pass on first run AND second run (residual-state fix holds). Full E2E suite still green. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix(ci): TS2352 in multi-source E2E — cast postgres.js RowList via unknown tsc rejects the direct `(rows as { column_name: string }[]).map(...)` cast because postgres.js RowList rows have an iterable-row shape that doesn't overlap with the plain-object target. Standard fix: cast via `unknown` first so the narrowing is explicit. Verified: `bunx tsc --noEmit` clean (ignoring the pre-existing baseUrl deprecation warning). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix(v0.18.0): addLinksBatch + addTimelineEntriesBatch source-aware JOINs Batch APIs JOINed on pages.slug globally, so two pages sharing the same slug across sources would silently fan out — addLinksBatch(['a->b']) in a brain with 'a' in both 'default' and 'alt' wrote 2 edges instead of 1. Same bug on addTimelineEntriesBatch. Fix: - LinkBatchInput + TimelineBatchInput gain optional source_id fields (from_source_id, to_source_id, origin_source_id for links; source_id for timeline). All default to 'default' so existing callers are backward-compatible on single-source brains. - pglite-engine + postgres-engine batch JOINs now composite-key on (slug, source_id). Postgres adds 3 more unnest arrays for links + 1 for timeline — still one bind per column, no 65535-param cap risk. - LEFT JOIN for origin pages also source-qualified so frontmatter- provenance edges don't cross-pollinate across sources. Regression coverage: - test/pglite-engine.test.ts: 5 new tests covering default-path isolation, explicit alt-source writes, and cross-source edges. - test/e2e/multi-source.test.ts: 4 new tests against real Postgres so postgres-js's unnest() bind path is exercised (structurally different from PGLite's). Gap #4 from the PR self-audit — latent bug, not previously reachable because every existing caller wrote to the default source only. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
7 tasks
garrytan
added a commit
that referenced
this pull request
Apr 24, 2026
Replaces Wintermute's short-lived repos abstraction with the v0.18.0 sources subsystem. Codex flagged this during plan review: v0.18.0's sources table had already shipped the right shape (per-source last_commit, federated search config, RLS-friendly) while Wintermute coded against a ~/.gbrain/config.json repos array. Two systems solving one problem. Keep the surface, swap the backend: - src/cli.ts: `gbrain repos` routes through runSources with a one-line deprecation nudge on stderr. Scripts like `gbrain repos list` and `gbrain repos add .` keep working against the sources table. Removed the pre-engine-connect branch and added a case inside the handleCliOnly switch so repos gets the DB connection it now needs. - src/cli.ts help text: new SOURCES section replaces MULTI-REPO. References the canonical `sources` commands with `repos` tagged DEPRECATED. sync --all — was iterating ~/.gbrain/config.json repos; now iterates sources rows with local_path IS NOT NULL: - Reads id, name, local_path, config jsonb via executeRaw. - Honors config.syncEnabled=false (matching Wintermute's opt-out). - Honors config.strategy for per-source markdown/code/auto filtering. - Passes sourceId through to performSync so last_commit tracking lands on the right sources row (was clobbering a global bookmark before). Deletions: - src/core/multi-repo.ts deleted (120 lines of config CRUD now handled by sources table + RLS). - src/commands/repos.ts deleted (121 lines of CLI parsing now handled by src/commands/sources.ts). - test/multi-repo.test.ts deleted (25 tests against the deleted module; the schema-backed behavior is covered by test/sources.test.ts from v0.18.0 + test/repos-alias.test.ts added here). - src/core/config.ts: removed the `repos` field from GBrainConfig. Legacy installs with `repos` in ~/.gbrain/config.json will see that key ignored; no migration written because zero users are on that path (Wintermute's commit never shipped on master). Tests: - test/repos-alias.test.ts — round-trips add/list/remove through runSources to verify the alias path works. Also asserts the deleted module is actually gone (catches accidental resurrection during rebase conflicts). - All 162 prior unit tests + 2 new = 164 pass on PGLite. Codex's P0 #2 (per-repo sync state) and P0 #3 (slug collision) are both resolved here — sources.last_commit scopes bookmarks per source, and pages.slug uniqueness is (source_id, slug), which is what the v0.18.0 schema already shipped.
garrytan
added a commit
that referenced
this pull request
Apr 24, 2026
… exit Lane A of PR #364 review fixes (20-item multi-lane plan). Addresses the codex-tier + CEO + Eng findings on src/core/minions/supervisor.ts: Safety + correctness: - Atomic O_CREAT|O_EXCL PID lock via openSync('wx') with stale-file liveness check. Prevents two supervisors racing on the same PID file. (codex #1) - Health check now queries status='active' AND lock_until < now() matching queue.ts:848's authoritative stalled definition. The prior `status = 'stalled'` predicate returned zero rows forever because 'stalled' is not a persisted value in the schema. (codex #2) - All health queries scoped to WHERE queue = $1 via opts.queue binding. Multi-queue installs no longer see cross-queue false positives. (codex #3) - Class default allowShellJobs flipped true→false AND explicit `delete env.GBRAIN_ALLOW_SHELL_JOBS` when false, so child workers don't silently inherit the var from the parent shell. (eng #8, codex #9) - Unified shutdown(reason, exitCode) — max-crashes now routes through the same drain path as SIGTERM. Single source of truth for lifecycle cleanup; prerequisite for trustworthy audit events (Lane C). (eng #1) - Default PID path moves from /tmp to ~/.gbrain/supervisor.pid with mkdirSync recursive + GBRAIN_SUPERVISOR_PID_FILE env override. Matches the rest of the product's ~/.gbrain/ convention; fresh installs no longer hit ENOENT. (CEO #2 + codex #6) Refinements: - crashCount = 1 after 5-min stable-run reset (was 0, produced calculateBackoffMs(-1) = 500ms by accident). Now reads as 'first crash of a new cycle' with a clean 1s backoff. (Nit 1) - Top-of-file POSTGRES-ONLY docstring documenting why the supervisor can't run against PGLite. (Nit 2) - inBackoff flag suppresses 'worker not alive' warn during the expected null-child window (crash → sleep → next spawn). (eng #2) - Tracked listener refs for SIGTERM/SIGINT removed in shutdown() so integration tests spinning up/tearing down multiple supervisors on one process don't leak handlers. (eng #3) - Single FILTER query replaces two SELECT counts — one round-trip instead of two, three metrics in one pass. (eng #10) - child.on('error') listener emits worker_spawn_failed event for ENOENT/EACCES; exit handler still increments crashCount as usual so max-crashes bounds permanent misconfigurations. (codex #7) - healthInFlight boolean guard with try/finally prevents overlapping health checks from stacking on a hung DB. (codex #8) Documented exit codes (ExitCodes const): 0 CLEAN, 1 MAX_CRASHES, 2 LOCK_HELD, 3 PID_UNWRITABLE Agent can branch on exit=2 ('another supervisor, I'm fine') vs exit=1 ('escalate to human'). Event emitter surface: - started / worker_spawned / worker_exited / worker_spawn_failed - backoff / health_warn / health_error / max_crashes_exceeded - shutting_down / stopped Plumbed through emit() with an onEvent callback hook for Lane C's audit writer. json:false is the default; Lane C's --json mode flips it and writes JSONL to stderr. CLI changes (src/commands/jobs.ts): - `gbrain jobs supervisor` gains --allow-shell-jobs (explicit opt-in mirroring the env-var gate), --cli-path (override auto-resolution for exotic setups), and --json (JSONL lifecycle events on stderr). - Expanded --help body with description, 3 examples, and exit-code table. (DX Fix A per review) - Three-tier PID path resolution: --pid-file > GBRAIN_SUPERVISOR_PID_FILE > ~/.gbrain/supervisor.pid (via exported DEFAULT_PID_FILE). - Removed the catch-fallback to process.argv[1] — resolveGbrainCliPath() throws its own actionable install-hint error, which is what dev users need instead of a cryptic spawn failure on a .ts path. (codex #5) Tests: existing 7 supervisor.test.ts cases continue to pass. Integration tests (crash-restart, max-crashes, SIGTERM-during-backoff, env-inheritance regression) land in Lane E. Out of scope for this lane (tracked in follow-up lanes): - Audit file writer at ~/.gbrain/audit/supervisor-YYYY-Www.jsonl (Lane C) - Documentation pass (Lane B) - supervisor start/status/stop subcommands (Lane C) - gbrain doctor supervisor check (Lane D) - /ship release hygiene (Lane F) - autopilot.ts migration to MinionSupervisor (deferred to follow-up PR per codex — requires non-blocking start() API redesign, not ~30 lines) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
garrytan
added a commit
that referenced
this pull request
Apr 24, 2026
* fix(link-extraction): v0.10.5 drive works_at + advises accuracy on rich prose
Extends inferLinkType patterns to cover rich-prose phrasings that miss with
v0.10.4 regexes. Targets the residuals called out in TODOS.md: works_at at
58% type accuracy, advises at 41%.
WORKS_AT_RE additions:
- Rank-prefixed: "senior engineer at", "staff engineer at", "principal/lead"
- Discipline-prefixed: "backend/frontend/full-stack/ML/data/security engineer at"
- Possessive time: "his/her/their/my time at"
- Leadership beyond "leads engineering": "heads up X at", "manages engineering at",
"runs product at", "leads the [team] at"
- Role nouns: "role at", "position at", "tenure as", "stint as"
- Promotion patterns: "promoted to staff/senior/principal at"
ADVISES_RE additions:
- Advisory capacity: "in an advisory capacity", "advisory engagement/partnership/contract"
- "as an advisor": "joined as an advisor", "serves as technical advisor"
- Prefixed advisor nouns: "strategic/technical/security/product/industry advisor to|at"
- Consulting: "consults for", "consulting role at|with"
New EMPLOYEE_ROLE_RE page-level prior: fires when the page describes the subject
as an employee (senior/staff/principal engineer, director, VP, CTO/CEO/CFO) at
some company. Biases outbound company refs toward works_at when per-edge context
is possessive or narrative without an explicit work verb. Scoped to person -> company
links only. Precedence: investor > advisor > employee (investors often hold board
seats which would otherwise mis-classify as advise/works_at).
ADVISOR_ROLE_RE broadened from "full-time/professional/advises multiple" to catch
any page that self-identifies the subject as an advisor ("is an advisor",
"serves as advisor", possessive "her advisory work/role/engagement").
Tests: 65 pass (16 new v0.10.5 coverage tests + 4 regression guards against
v0.10.4 tightenings). Templated benchmark still 88.9% type_accuracy (10/10 on
works_at and advises). Rich-prose measurement requires the multi-axis report
upgrade (next commit) to validate retroactively.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* feat(eval): type-accuracy runner on rich-prose corpus + wire into all.ts
New Category 2 in BrainBench: per-link-type accuracy measured directly on the
240-page rich-prose world-v1 corpus. Distinct from Cat 1's retrieval metrics,
this measures whether inferLinkType() correctly classifies extracted edges
when the prose varies (the 58% works_at and 41% advises residuals that v0.10.5
regexes targeted).
How it works:
1. Loads all pages from eval/data/world-v1/
2. Derives GOLD expected edges from each page's _facts metadata
(founders → founded, investors → invested_in, advisors → advises,
employees → works_at, attendees → attended, primary_affiliation +
role drives person-page outbound type)
3. Runs extractPageLinks() on each page → INFERRED edges
4. Per (from, to) pair, compares inferred type vs gold type
5. Emits per-link-type table: correct / mistyped / missed / spurious +
type accuracy + recall + precision + strict F1 (triple match)
6. Full confusion matrix rows=gold, cols=inferred
v0.10.5 validation on 240-page corpus (up from pre-v0.10.5 baselines):
- works_at: 58% → 100.0% (+42 pts) — 10/10 correct, 0 mistyped
- advises: 41% → 88.2% (+47 pts) — 15/17 correct
- attended: — → 100.0% 131/134 recall
- founded: 100% → 100.0% 40/40
- invested_in: 89% → 92.0% 69/75
- Overall: 88.5% → 95.7% type accuracy (conditional on edge found)
Strict F1 overall: 53.7%. Lower because the _facts-based gold set only
captures core relationships; rich prose extracts many peripheral mentions
(190 spurious "mentions" edges) that aren't bugs but are correctly-typed
prose references without a _facts counterpart. Spurious counts are signal
for future type-precision tuning, not failure.
Wired into eval/runner/all.ts as Cat 2 so every full benchmark run includes
the rich-prose type accuracy table alongside retrieval metrics.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* feat(eval): Phase 2 adapter interface + EXT-1 ripgrep+BM25 baseline
Phase 2 credibility unlock: BrainBench now compares gbrain to external
baselines on the same corpus and queries. Transforms the benchmark from
internal ablation ("gbrain-graph beats gbrain-grep") to category comparison
("gbrain-graph beats classic BM25 by 32 pts P@5"). This is the #1 fix
from the 4-review arc — addresses Codex's core critique that v1's
before/after was self-referential.
Added:
eval/runner/types.ts — Adapter interface (v1.1 spec)
eval/runner/adapters/ripgrep-bm25.ts — EXT-1 classic IR baseline
eval/runner/adapters/ripgrep-bm25.test.ts — 11 unit tests, all pass
eval/runner/multi-adapter.ts — side-by-side scorer
Adapter interface (eng pass 2 spec):
- Thin 3-method Strategy: init(rawPages, config), query(q, state), snapshot(state)
- BrainState is opaque to runner (never inspected)
- Raw pages passed in-memory; gold/ never crosses adapter boundary
(structural ingestion-boundary enforcement)
- PoisonDisposition enum reserved for future poison-resistance scoring
EXT-1 ripgrep+BM25:
- Classic Lucene-variant IDF + k1/b tuned at standard 1.5/0.75
- Title tokens double-weighted for entity-page slug-match bias
- Stopword filter, alphanumeric tokenization, stable lexicographic tie-break
- Pure in-memory inverted index — no external deps, ~100 LOC core
First side-by-side results on 240-page rich-prose corpus, 145 relational queries:
| Adapter | P@5 | R@5 | Correct top-5 |
|---------------|--------|--------|---------------|
| gbrain-after | 49.1% | 97.9% | 248/261 |
| ripgrep-bm25 | 17.1% | 62.4% | 124/261 |
| Delta | +32.0 | +35.5 | +124 |
gbrain-after is the hybrid graph+grep config from PR #188. Ripgrep+BM25 is
a genuinely strong classic-IR baseline (BM25 is what Lucene/Elasticsearch
ship). gbrain's ~+32-point lead on relational queries reflects real work
by the knowledge graph layer: typed links + traversePaths surface the
correct answers in top-K that BM25 only pulls in via partial-text overlap.
Next in Phase 2: EXT-2 vector-only RAG + EXT-3 hybrid-without-graph
adapters. Both plug into the same Adapter interface.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* feat(eval): Phase 2 EXT-2 vector-only RAG adapter
Second external baseline for BrainBench. Pure cosine-similarity ranking
using the SAME text-embedding-3-large model gbrain uses internally —
apples-to-apples on the embedding layer so any gbrain lead reflects the
graph + hybrid fusion, not a better embedder.
Files:
eval/runner/adapters/vector-only.ts ~130 LOC
eval/runner/adapters/vector-only.test.ts 6 unit tests (cosine math)
Design:
- One vector per page (title + compiled_truth + timeline, capped 8K chars).
- No chunking (intentional; chunked vector RAG would be EXT-2b later).
- No keyword fallback (that's EXT-3 hybrid-without-graph).
- Embeddings in batches of 50 via existing src/core/embedding.ts (retry+backoff).
- Cost on 240 pages: ~$0.02/run.
Three-adapter side-by-side on 240-page rich-prose corpus, 145 relational queries:
| Adapter | P@5 | R@5 | Correct top-5 |
|---------------|--------|--------|---------------|
| gbrain-after | 49.1% | 97.9% | 248/261 |
| ripgrep-bm25 | 17.1% | 62.4% | 124/261 |
| vector-only | 10.8% | 40.7% | 78/261 |
Interesting finding: vector-only scores WORSE than BM25 on relational queries
like "Who invested in X?" — exact entity match matters more than semantic
similarity for these templates. BM25 nails the entity-name term; vector-only
returns topically-similar-but-not-mentioning pages. This is the known failure
mode of pure-vector RAG on precise relational/identity queries. Real-world
vector RAG systems always add keyword fallback; EXT-3 (hybrid-without-graph)
will be that fairer comparator.
gbrain's lead widens in vector-only comparison: +38.4 pts P@5, +57.2 pts R@5.
The graph layer is doing the heavy lifting for relational traversal; pure
vector RAG can't express "traverse 'attended' edges from this meeting page."
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* feat(eval): Phase 2 EXT-3 hybrid-without-graph adapter — graph isolated
Third and closest-to-gbrain external baseline. Runs gbrain's full hybrid
search (vector + keyword + RRF fusion + dedup) WITHOUT the knowledge-graph
layer. Same engine, same embedder, same chunking, same hybrid fusion —
only traversePaths + typed-link extraction turned off.
This is the decisive comparator for "does the knowledge graph do useful
work?" Same everything-else, only graph differs. Any lead gbrain-after has
over EXT-3 is 100% attributable to the graph layer.
Files:
eval/runner/adapters/hybrid-nograph.ts — ~110 LOC
Implementation:
- New PGLiteEngine per run; auto_link set to 'false' (belt).
- importFromContent() used instead of bare putPage() so chunks +
embeddings get populated (hybridSearch needs them).
- NO runExtract() call — typed links/timeline stay empty (suspenders).
- hybridSearch(engine, q.text) answers every query. Aggregate chunks
to page-level by best chunk score.
FOUR-adapter side-by-side on 240-page rich-prose corpus, 145 relational queries:
| Adapter | P@5 | R@5 | Correct/Gold |
|-----------------|--------|--------|--------------|
| gbrain-after | 49.1% | 97.9% | 248/261 |
| hybrid-nograph | 17.8% | 65.1% | 129/261 |
| ripgrep-bm25 | 17.1% | 62.4% | 124/261 |
| vector-only | 10.8% | 40.7% | 78/261 |
The headline delta nobody can hand-wave away:
gbrain-after → hybrid-nograph = +31.4 P@5, +32.9 R@5
hybrid-nograph → ripgrep-bm25 = +0.7 P@5, +2.7 R@5
Hybrid search (vector+keyword+RRF) over pure BM25 gains ~1 point. The
knowledge graph layer over hybrid gains ~31 points. The graph is doing
the work; adding it to a retrieval stack is what actually moves the needle
on relational queries. The vector/keyword/BM25 debate is a footnote.
Timing: hybrid-nograph init is ~2 min (embeds 240 pages once); query loop
is fast. gbrain-after is ~1.5s total because traversePaths doesn't need
embeddings. Runs at ~$0.02 Opus-equivalent in embedding cost.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* feat(eval): Phase 2 query validator + Tier 5 Fuzzy + Tier 5.5 synthetic + N=5 tolerance bands
Closes multiple Phase 2 items in one commit since they form a cohesive
package: query schema enforcement + new query tiers + per-query-set
statistical rigor.
Added:
eval/runner/queries/validator.ts — hand-rolled Query schema validator
eval/runner/queries/validator.test.ts — 24 unit tests, all pass
eval/runner/queries/tier5-fuzzy.ts — 30 hand-authored Tier 5 Fuzzy/Vibe queries
eval/runner/queries/tier5_5-synthetic.ts — 50 SYNTHETIC-labeled outsider-style queries (author: "synthetic-outsider-v1")
eval/runner/queries/index.ts — aggregator + validateAll()
Modified:
eval/runner/multi-adapter.ts — N=5 runs per adapter (BRAINBENCH_N override), page-order shuffle, mean±stddev reporting
Query validator (hand-rolled, no zod dep to match gbrain codebase style):
- Temporal verb regex enforces as_of_date (per eng pass 2 spec):
/\\b(is|was|were|current|now|at the time|during|as of|when did)\\b/i
- Validates tier enum, expected_output_type enum, gold shape per type
- gold.relevant must be non-empty slug[] for cited-source-pages queries
- abstention requires gold.expected_abstention === true
- externally-authored tier requires author field
- batch validation catches duplicate IDs
Tier 5 Fuzzy/Vibe (30 queries, hand-authored):
- Vague recall: "Someone who was a senior engineer at a biotech company..."
- Trait-based: "The engineer who pushed back on microservices"
- Cultural/epithet: "Who is known as a 'systems builder' in security?"
- Abstention bait: "Which Layer 1 project did the crypto guy leave?" (prose
mentions but never names; good systems abstain)
- Addresses Codex's circularity critique — vague queries where graph-heavy
systems shouldn't inherently win.
Tier 5.5 Synthetic Outsider (50 queries, AI-authored placeholder):
- Clearly labeled author: "synthetic-outsider-v1"
- Phrasing variety not in the 4 template families:
* fragment style ("crypto founder Goldman Sachs background")
* polite/natural ("Can you pull up what we have on...")
* comparison ("What is the difference between X and Y?")
* follow-up ("And who else advises Orbit Labs?")
* typos/misspellings ("adam lopez bioinformatcis")
* similarity ("Find me someone like Alice Davis...")
* imperative ("Pull up Alice Davis")
- Real Tier 5.5 from outside researchers supersedes synthetic via
PRs to eval/external-authors/ (docs ship in follow-up commit).
N=5 tolerance bands:
- Default N=5, override via BRAINBENCH_N env var (e.g. BRAINBENCH_N=1 for dev loops)
- Per-run seeded Fisher-Yates shuffle of page ingest order (LCG seed = run_idx+1)
- Surfaces order-dependent adapter bugs (tie-break-by-first-seen etc.)
- Reports mean ± sample-stddev per metric
- "stddev = 0" is honest signal that the adapter is deterministic, not a bug.
LLM-judge metrics (future) will naturally produce non-zero stddev.
Validation: all 80 Tier 5 + 5.5 queries pass validateAll(). 24 validator
unit tests pass.
Next commit: world.html contributor explorer (Phase 3).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* feat(eval): Phase 3 world.html explorer + eval:* CLI surface
Contributor DX magical moment. Static HTML explorer renders the full
canonical world (240 entities) as an explorable tree, opens in any browser,
zero install. Every string HTML-entity-encoded (XSS-safe — direct vuln
class per eng pass 2, confidence 9/10).
Added:
eval/generators/world-html.ts — renderer (~240 LOC; single-file
HTML with inline CSS + minimal JS)
eval/generators/world-html.test.ts — 16 tests (XSS + rendering correctness)
eval/cli/world-view.ts — render + open in default browser
eval/cli/query-validate.ts — CLI wrapper for queries/validator
eval/cli/query-new.ts — scaffold a query template
Modified:
package.json — 7 new eval:* scripts
.gitignore — ignore generated world.html
package.json scripts shipped:
bun run test:eval all eval unit tests (57 pass)
bun run eval:run full 4-adapter N=5 side-by-side
bun run eval:run:dev N=1 fast dev iteration
bun run eval:world:view render world.html + open in browser
bun run eval:world:render render only (CI-friendly, --no-open)
bun run eval:query:validate validate built-in T5+T5.5 (or a file path)
bun run eval:query:new scaffold a new Query JSON template
bun run eval:type-accuracy per-link-type accuracy report
XSS safety:
escapeHtml() encodes the 5 critical chars (& < > " '). Tested directly
with representative Opus-generated attacks:
<img src=x onerror=alert('xss')> → <img src=x onerror=alert('xss')>
<script>fetch('/steal')</script> → <script>fetch('/steal')</script>
Ledger metadata (generated_at, model) also escaped — covers the less
obvious attack surface where Opus could emit tag-like content into the
metadata file.
world.html structure:
- Left rail: entities grouped by type with counts (companies, people,
meetings, concepts), alphabetical within type
- Right pane: per-entity cards with title + slug + compiled_truth +
timeline + canonical _facts as collapsed JSON
- URL fragment deep-links (#people/alice-chen)
- Sticky rail on desktop; responsive stack on mobile
- Vanilla JS for active-link highlighting on scroll (no framework)
Generated file: ~1MB for 240 entities (full prose). Gitignored; rebuild
with `bun run eval:world:view`. Regeneration is ~50ms.
Contributor TTHW (Tier 5.5 query authoring):
1. bun run eval:world:view # see entities
2. bun run eval:query:new --tier externally-authored --author "@me"
3. edit template with real slug + query text
4. bun run eval:query:validate path/to/file.json
5. submit PR
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* docs(eval): Phase 3 contributor docs + CI workflow for eval/ tests
Ships the contributor-onboarding surface promised in the plan. With this
commit, external researchers have a self-serve path from clone to PR in
under 5 minutes.
Added:
eval/README.md — 5-minute quickstart,
directory map, methodology
one-pager, adapter scorecard
eval/CONTRIBUTING.md — three contributor paths:
1. Write Tier 5.5 queries
2. Submit an external adapter
3. Reproduce a scorecard
eval/RUNBOOK.md — operational troubleshooting:
generation failures, runner
failures, query validation,
world.html rendering, CI
eval/CREDITS.md — contributor attribution
(synthetic-outsider-v1 labeled
as placeholder; real submissions
land here)
.github/PULL_REQUEST_TEMPLATE/tier5-queries.md — structured PR template
for Tier 5.5 submissions
.github/workflows/eval-tests.yml — CI: validates queries,
runs all eval unit tests,
renders world.html on every PR
touching eval/** or
src/core/link-extraction.ts
CI scope (intentionally narrow):
- Triggers on paths: eval/**, src/core/link-extraction.ts, src/core/search/**
- Runs: bun run eval:query:validate (80 queries), test:eval (57 tests),
eval:world:render (smoke-test the HTML renderer)
- Pinned actions by commit SHA (matches existing .github/workflows/test.yml)
- Zero API calls — all Opus/OpenAI paths stubbed or skipped in unit tests
- Fast: ~30s total wall clock
Contributor TTHW (clone → first merged PR):
- Path 1 (Tier 5.5 queries): ~5 min
- Path 2 (external adapter): ~30 min for a simple adapter
- Path 3 (reproduce scorecard): ~15 min wall clock (N=5 run)
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* fix(eval): teardown PGLite engines so bun run eval:run exits 0
The multi-adapter runner left PGLite engines alive after each run.
GbrainAfterAdapter and HybridNoGraphAdapter both instantiate a
PGLiteEngine in init() but never disconnect it; Bun's shutdown path
exits with code 99 when embedded-Postgres workers outlive main().
Added optional `teardown?(state)` to the Adapter interface, implemented
it on both engine-backed adapters, and call it from scoreOneRun after
the N=5 loop. ripgrep-bm25 and vector-only hold no DB resources and
don't need a teardown.
Verified: gbrain-after, hybrid-nograph, ripgrep-bm25, vector-only all
exit 0 at N=1. Full test:eval passes (57 tests). No metric change.
* docs(bench): 2026-04-19 multi-adapter scorecard
Reproducibility run of the 4-adapter side-by-side at commit b81373d
(branch garrytan/gbrain-evals). N=5, 240-page corpus, 145 relational
queries from world-v1.
Headline: gbrain-after 49.1% P@5 / 97.9% R@5. hybrid-nograph 17.8% /
65.1%. ripgrep-bm25 17.1% / 62.4%. vector-only 10.8% / 40.7%. All
adapters deterministic (stddev = 0 across the 5 runs per adapter).
Matches the scorecard in eval/README.md byte-for-byte for the three
deterministic adapters; hybrid-nograph matches within tolerance bands.
* docs(bench): 2026-04-19 gbrain v0.11.1 vs v0.12.1 regression comparison
Runs the same eval harness against two gbrain src/ trees on the same
240-page corpus and 145 queries. Patches the v0.11 copy's gbrain-after
adapter to use getLinks/getBacklinks (v0.11 has no traversePaths)
with identical direction+linkType semantics.
gbrain-after P@5 22.1% -> 49.1% (+27 pts); R@5 54.6% -> 97.9% (+43
pts); correct-in-top-5 99 -> 248 (+149). hybrid-nograph flat at 17.8%
/ 65.1% on both (v0.12 didn't touch hybridSearch / chunking).
Driver is extraction quality, not graph presence: v0.12 emits 499
typed links (v0.11: 136, x3.7) and 2,208 timeline entries (v0.11: 27,
x82) on the same 240 pages. Sharpens the April-18 "graph layer does
the work" claim -- on v0.11 that architecture only beat hybrid-nograph
by 4.3 points; the 31-point lead in the multi-adapter scorecard comes
from graph + high-quality extract in combination.
* feat(eval): BrainBench v1 portable JSON schemas + gold templates
Adds the v1→v2 contract boundary for BrainBench. 6 JSON schemas at
eval/schemas/ pin the shape of every artifact a stack must emit to be
scorable: corpus-manifest, public-probe (PublicQuery with gold stripped),
tool-schema (12 read + 3 dry_run tools, 32K tool-output cap), transcript,
scorecard (N ∈ {1, 5, 10}), evidence-contract (structured judge input).
8 gold file templates at eval/data/gold/ scaffold the sealed qrels,
contradictions, poison items, and citation labels. Empty-but-valid
skeletons; Day 3b fills them with real content once the amara-life-v1
corpus generates.
48 tests validate schema syntax, $schema/$id/title/type headers,
round-trip stability, and cross-schema coherence (new Page types in
manifest enum, tool counts, token cap, N enum).
When v2 ports to Python + Inspect AI + Docker, these schemas are the
boundary. Same fixtures, same tool contracts, zero rework.
* feat(eval): amara-life-v1 skeleton + Page.type enum for email/slack/cal/note
Deterministic procedural generator for the twin-amara-lite fictional-life
corpus (BrainBench v1 Cat 5/8/9/11 target). 15 contacts picked from
world-v1, 50 emails + 300 Slack messages across 4 channels + 20 calendar
events + 8 meeting transcripts + 40 first-person notes. Mulberry32 PRNG
gives byte-identical output under reseed.
Plants 10 contradictions + 5 stale facts + 5 poison items + 3 implicit
preferences at deterministic positions. Fixture_ids are unique across the
corpus so gold/contradictions.json + gold/poison.json + gold/implicit-
preferences.json can cross-reference by stable ID.
PageType extended in both src/core/types.ts and eval/runner/types.ts to
include email | slack | calendar-event | note (+ meeting on the production
side). src/core/markdown.ts inferType() heuristics updated for the new
one-slash slug prefixes (emails/em-NNNN, slack/sl-NNNN, cal/evt-NNNN,
notes/YYYY-MM-DD-topic, meeting/mtg-NNNN).
17 tests cover counts (50/300/20/8/40), perturbation counts (exact
10/5/5/3), seed determinism + divergence, slug regex conformance (matches
eval/runner/queries/validator.ts:131 one-slash rule), unique fixture_ids,
amara-in-every-email invariant, calendar dtstart < dtend, and Amara-is-
attendee on every meeting.
* feat(eval): amara-life-gen.ts with structured cache key + $20 cost gate
Opus prose expansion of the amara-life-v1 skeleton. Per-item structured
cache key = sha256({schema_version, template_id, template_hash, model_id,
model_params, seed, item_spec_hash}). Prompt-template tweak changes
template_hash; only those items regenerate. Schema bump changes
schema_version; everything invalidates cleanly. Interrupted runs resume
from the last cached item; zero re-spend.
Cost-gated at $20 hard-stop with Anthropic input/output pricing tracking.
Dry-run mode (--dry-run) executes the full pipeline with stub bodies for
smoke-testing the I/O layout without LLM spend. --max N caps items per
type for debugging. --force ignores cache.
Writes per-format outputs under eval/data/amara-life-v1/:
inbox/emails.jsonl (one email per line with body_text appended)
slack/messages.jsonl (one message per line with text appended)
calendar.ics (RFC-5545 VEVENT format, templated — no LLM)
meetings/<id>.md (transcript with YAML frontmatter)
notes/<YYYY-MM-DD-topic>.md (first-person journal)
docs/*.md (6 reference docs, templated — no LLM)
corpus-manifest.json (per eval/schemas/corpus-manifest.schema.json,
including per-item content_sha256 and generator_cache_key)
Perturbation hints (contradiction, stale-fact, poison, implicit-
preference) flow through the prompt so Opus weaves the specific claim
into each item's body. Poison items are hand-crafted to include
paraphrased prompt-injection attempts (not literal 'IGNORE ALL
PREVIOUS' — defense is the structured-evidence judge contract at
Day 5, not regex redaction).
New package.json scripts:
eval:generate-amara-life # real run (~$12 Opus estimated)
eval:generate-amara-life:dry # smoke test, zero spend
test:eval extended to include test/eval/. 10 cache-key tests cover
determinism, invalidation across every field of the key, canonical JSON
stability under object-key reorder, and per-skeleton-item spec-hash
uniqueness (50 distinct hashes for 50 distinct emails).
* chore: bump version and changelog (v0.15.0)
Resets package.json from stale 0.13.1 to 0.15.0 (matches VERSION).
v0.14.0 shipped with the stale package.json version; this sync catches
that up and moves to v0.15.0 in one step.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
* docs: update CLAUDE.md + README + eval/README for v0.15.0 BrainBench
CLAUDE.md: adds a full BrainBench section to the Key Files list — 14 new
entries covering eval/README.md, multi-adapter.ts, types.ts (with new
PublicPage/PublicQuery), adapters/, queries/, type-accuracy.ts,
adversarial.ts, all.ts, world.ts/gen.ts, world-html.ts, amara-life.ts,
amara-life-gen.ts, schemas/, data/world-v1/, data/gold/,
data/amara-life-v1/, docs/benchmarks/, and test/eval/. Adds 3 new
test/eval/ lines to the unit-tests catalog.
eval/README.md: file tree updated to reflect v0.15 additions —
data/amara-life-v1/, data/gold/, schemas/, generators/amara-life.ts +
amara-life-gen.ts, runner/all.ts + adversarial.ts.
README.md: updates hero benchmark numbers (L7 intro + L353 mid-page)
from v0.10.5 PR #188 numbers (R@5 83→95, P@5 39→45) to current v0.12.1
4-adapter numbers (P@5 49.1% · R@5 97.9% · +31.4 pts vs hybrid-nograph).
Adds the v0.11→v0.12 regression comparison as the secondary reference.
Deeper-section tables (L422+) labeled "BrainBench v1 (PR #188)" are
preserved as historical data.
CHANGELOG is untouched — /ship already wrote the v0.15.0 entry.
TODOS.md is untouched — Cat 5/6/8/9/11 remain open (only foundations
shipped in v0.15.0; Cat runners ship in v1 Complete follow-ups).
Co-Authored-By: Claude Opus 4.6 <[email protected]>
* feat(eval): Day 4 — pdf-parse + flight-recorder + tool-bridge (dry_run + expand:false)
Three infrastructure modules for BrainBench v1 Complete Cats 5/8/9/11.
**eval/runner/loaders/pdf.ts** — Thin pdf-parse wrapper. Lazy import keeps
pdf-parse out of the module-load path (avoids library debug-mode side
effects). Size cap (50MB default), encryption detection, structured error
classes (PdfEncryptedError, PdfTooLargeError, PdfParseError). Only Cat 11
multimodal will import this; production bundle never sees pdf-parse.
**eval/runner/tool-bridge.ts** — Maps 12 read-only operations from
src/core/operations.ts to Anthropic tool definitions + adds 3 dry_run write
tools. Three structural invariants enforced:
1. No hidden LLM calls. `operations.query` defaults expand=true which
routes through expansion.ts → Haiku. Bridge strips `expand` from the
query tool's input schema AND executor hard-sets expand:false. Zero
nested Haiku calls in any agent trace.
2. Mutating ops throw ForbiddenOpError. put_page, add_link, delete_page,
etc. are rejected by name. Agents record intent via dry_run_put_page /
dry_run_add_link / dry_run_add_timeline_entry which persist to the
flight-recorder without mutating the engine. This is how Cat 8's
back_link_compliance + citation_format metrics measure anything with
a read-only tool surface.
3. Poison tagged by the bridge, not the judge. Every tool result is
scanned for slugs matching gold/poison.json fixtures. Matched
fixture_ids flow into tool_call_summary.saw_poison_items for the
structured-evidence judge contract. Judge never reads raw tool
output — Section-3 defense against paraphrased prompt injections
(poison payloads never reach the judge model at all).
32K-token cap (~128K chars) with "…[truncated]" suffix.
**eval/runner/recorder.ts** — Per-run flight-recorder bundle emitter. Full
6-artifact bundle (transcript.md, brain-export.json, entity-graph.json,
citations.json, scorecard.json, judge-notes.md) when the adapter provides
an AdapterExport; 3-artifact fallback (transcript + scorecard +
judge-notes) otherwise. Atomic writes via tmp+rename. Collision-safe:
duplicate directory names get incremental -2, -3 suffix. `safeStringify`
handles circular references without throwing and JSON-serializes
Float32Array embeddings.
**package.json:** adds [email protected] as a devDependency. Scoped to eval/
use only; production gbrain binary unaffected.
**Tests:** 63 new — 30 tool-bridge, 21 recorder, 12 pdf-loader. All pass.
Fake engine uses a Proxy with `__default__` fallback so poison-matching
tests don't have to mock the exact engine method name that each operation
calls (some route via searchKeyword, others via getPage — proxy handles
both uniformly).
Total eval suite now: 132 pass, 0 fail, 923 expect() calls.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
* feat(eval): Day 5 — agent adapter + judge with structured evidence contract
Two modules that together wire Cat 8 / Cat 9 / Cat 5 end-to-end scoring.
**eval/runner/judge.ts** — Haiku 4.5 via tool-use `score_answer`. Input is
the structured JudgeEvidence contract (fix #16 from the plan's codex
review): probe + final_answer_text + evidence_refs + tool_call_summary +
ground_truth_pages + rubric. Raw tool output NEVER reaches the judge —
that's the Section-3 defense against paraphrased prompt-injection payloads
in gold/poison.json.
Retry policy: one retry on malformed tool_use response. If the second
attempt is still malformed, score the probe as `judge_failed` (all scores
0, verdict=fail) so the run still completes.
Aggregation: weighted mean across rubric criteria. Canonical thresholds
(pass ≥3.5, partial 2.5-3.5, fail <2.5) — judge can propose a verdict but
the computed verdict from the weighted mean is what the scorecard records.
This prevents the model from inflating or deflating its own verdict.
Score values are clamped to 0-5 on parse even if the model returns out of
range. `assertNoRawToolOutput(evidence)` is a regression guard that
returns the list of forbidden fields (tool_result, raw_transcript, etc.)
if any leak into the evidence contract.
**eval/runner/adapters/claude-sonnet-with-tools.ts** — The agent adapter.
Implements `Adapter` interface minimally: `init()` spins up PGLite and
seeds it, `query()` throws because the adapter is Cat 8/9-only and emits
a final-answer text, not a RankedDoc[]. Retrieval scorecard stays at 4
adapters.
`runAgentLoop(probeId, text, state, config)` drives the multi-turn loop:
Sonnet → tool_use → tool-bridge.executeTool → tool_result → back to
Sonnet. Turn cap 10. max_tokens 1024. System prompt (brain-first iron
law, citation format, amara context) is cached via cache_control.
Exponential backoff on rate-limit errors (1s, 2s, 4s).
Emits a `Transcript` per eval/schemas/transcript.schema.json — consumed
directly by recorder.ts for the flight-recorder bundle.
`brain_first_ordering` classifies Cat 8's flagship metric: did the agent
call search/get_page BEFORE producing the final answer? The `no_brain_calls`
case (agent answers from general knowledge without ever hitting the brain)
is the compliance failure to surface.
ForbiddenOpError + UnknownToolError from the bridge are caught in the
agent loop and surfaced as tool_result with is_error=true — keeps the
loop going and preserves full audit trail for the judge.
**Tests (35 new):** judge (23) — happy path, retry, fallback, evidence
contract sanitization, rendered prompt does not contain raw tool_result
text, verdict thresholds, score clamping, weighted mean with mixed
weights, parseToolUse rejects malformed input. agent-adapter (12) —
Adapter.query() throws, init() seeds PGLite, end-to-end tool loop with
stubbed Sonnet, turn cap exhaustion, mutating-op rejection surfaces as
tool_result error, extractSlugs regex.
All 12 agent tests take ~23s because PGLite runs 13 schema migrations per
test; the alternative of shared-engine-across-tests was rejected so each
test is isolated.
Total eval suite now: 167 pass, 0 fail.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
* feat(eval): Day 6 — adversarial-injections + Cat 6 prose-scale + Cat 11 multi-modal
Three modules that together cover BrainBench v1 Cat 6 (prose-scale
extraction fidelity) and Cat 11 (multi-modal ingest fidelity).
**eval/runner/adversarial-injections.ts** — 6 deterministic content
transforms shared by Cat 10 (adversarial.ts, 22 hand-crafted cases) and
Cat 6 (prose-scale variants). Each injection produces a modified content
string + a structured GoldDelta describing what the extractor MUST and
MUST NOT produce. Kinds:
- code_fence_leak — fake [X](people/fake) inside ``` fence, must NOT extract
- inline_code_slug — `people/fake` in backticks, must NOT extract
- substring_collision — "SamAI" near real `people/sam`, exactly one link
- ambiguous_role — "works with" vs "works at", downgrade type to mentions
- prose_only_mention — strip markdown link syntax, bare name → mentions only
- multi_entity_sentence — pack 4+ entities into one clause, extract all
Mulberry32 PRNG keeps variant generation deterministic under fixed seed.
Codex flagged the original plan's wording ("extract injection engine from
adversarial.ts") as overstated — adversarial.ts is a static case list,
not a reusable engine. This module is NEW code.
**eval/runner/cat6-prose-scale.ts** — Runner. Loads world-v1, applies all
6 injection kinds to sampled base pages (default 50 variants per kind ×
6 kinds = 300 variants), runs extractPageLinks on each, compares to gold
delta. Emits per-kind + overall metrics (precision, recall, F1,
code_fence_leak_rate, substring_fp_rate, pages_with_links_coverage,
mean_links_per_page). **v1 verdict is always "baseline_only"** — no
gating threshold per codex fix #9 (current extractor residuals make
>0.80 unreachable; v1 records a baseline, regression guard triggers on
drop below it).
**eval/runner/cat11-multimodal.ts** — PDF + HTML + audio runners.
Fixtures load from eval/data/multimodal/<modality>/fixtures.json
manifests; each modality skips gracefully when manifest missing or
(audio) when neither GROQ_API_KEY nor OPENAI_API_KEY is set. Metrics:
- PDF: char-level similarity via Levenshtein + optional entity_recall
- HTML: word-recall over normalized tokens (multiset semantics)
- Audio: WER (word error rate) via Levenshtein on word sequences
Fixtures are NOT committed; a future eval:fetch-multimodal script will
download them hash-verified from public sources (arXiv CC-licensed
papers, Wikipedia CC-BY-SA, Common Voice CC0).
Injectable audio transcriber (`opts.transcribe`) means tests don't need
GROQ/OpenAI keys — stubbed transcriptions exercise the WER math path
directly.
**Tests (60 new):** adversarial-injections (19) — per-kind assertions +
dispatcher coverage + slug regex conformance; cat6 (12) — variant
determinism, scoreVariant shape, aggregate per-kind + overall metrics,
corpus resolver slug rules; cat11 (29) — charSimilarity / wordRecall /
wer math, htmlToText strips scripts + decodes entities, HTML modality
with real fixtures, audio modality gracefully skips without key + uses
stub transcriber correctly.
All 60 tests pass in 48ms + 41ms.
Total eval suite now: 227 pass, 0 fail.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
* feat(eval): Day 7 — Cat 5 provenance runner + structured classify_claim judge
**eval/runner/cat5-provenance.ts** — BrainBench Cat 5 scoring. Samples
claims from gbrain brain-export and classifies each against its source
material via a dedicated Haiku judge (classify_claim tool with a
three-label enum: supported | unsupported | over-generalized).
Separate from judge.ts by design: Cat 5 is a single three-way
classification per claim, not a weighted rubric. Rather than overload
judge.ts with a mode switch, Cat 5 has its own tool definition
(CLASSIFY_CLAIM_TOOL) and prompt. The retry-once pattern, $20 cost gate
semantics, and structured parsing are mirrored from judge.ts so failures
look the same across Cats.
Metric: `citation_accuracy` = fraction where predicted label equals
gold expected_label. Threshold (informational): >0.90 per design-doc
METRICS.md. v1 ships with `enableThreshold: false` so the verdict is
always baseline_only — we don't have hand-authored gold claims yet, and
codex flagged that threshold gating should wait until the amara-life-v1
corpus + gold file authoring lands in Day 3b.
runCat5 uses a bounded-concurrency worker pool (default 4) to respect
Haiku rate limits across 100+ claim batches. Evidence pages are looked
up by slug from a caller-provided pagesBySlug map — missing pages don't
crash, they just pass an empty source list to the judge (correct
behavior for genuinely unsupported claims).
**Tests (23):** classifyClaim happy/retry/fallback paths with stubbed
Haiku, aggregate accuracy math, threshold gating (pass/fail vs
baseline_only), runCat5 concurrency + missing-page handling,
renderClaimPrompt embeds claim + sources correctly, parseClassification
rejects invalid enum values + plain-text responses.
Total eval suite now: 250 pass, 0 fail.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
* feat(eval): Day 8 — Cat 8 skill compliance + Cat 9 end-to-end workflows
**eval/runner/cat8-skill-compliance.ts** — Deterministic, judge-free Cat 8
scoring. Replays inbound signals through the agent adapter (Day 5) and
extracts four iron-law metrics directly from the tool-bridge state:
- brain_first_compliance: agent called search/get_page BEFORE producing
its final answer. Non-compliance = hallucinating from general knowledge.
- back_link_compliance: every dry_run_put_page intent has at least one
markdown [Name](slug) back-link in its compiled_truth.
- citation_format: timeline entries use canonical `- **YYYY-MM-DD** |
Source — Summary`; long final answers cite at least one slug.
- tier_escalation: simple probes use light tooling (≥1 brain call);
complex probes require ≥2 brain calls or a dry_run write when
expects_dry_run_write is set.
No judge call required — everything is computable from
`tool_bridge_state.made_dry_run_writes` + `count_by_tool` + final_answer
regex. Fast, deterministic, reproducible.
Bounded concurrency (p-limit style) worker pool at default 4 to keep
Sonnet rate limits comfortable across 100-probe batches.
**eval/runner/cat9-workflows.ts** — Rubric-graded Cat 9. 5 canonical
workflows (meeting_ingestion, email_to_brain, daily_task_prep, briefing,
sync) × ~10 scenarios each. Each scenario runs through the agent adapter,
then judge.ts scores the answer against a per-scenario rubric.
`buildEvidence(scenario, agentResult, pagesBySlug)` composes the
JudgeEvidence contract: resolves ground_truth_slugs to full
GroundTruthPage[] from a slug-map, pulls tool_call_summary directly from
tool_bridge_state (no raw tool_result content — Section-3 defense),
attaches rubric from the scenario.
Per-workflow rollup: each workflow gets its own pass_rate so the verdict
can fail one workflow without failing the whole Cat. Overall verdict
requires every populated workflow's pass_rate ≥ threshold (default 0.80)
when enableThreshold=true.
Both Cats default to verdict=baseline_only in v1 per codex fix #9: real
thresholds return after 10-probe Haiku-vs-hand-score calibration (κ > 0.7)
runs against the Day 3b amara-life-v1 corpus.
**Tests (23):** Cat 8 per-metric scorer unit tests covering every branch
(brain_first ordering, back-link compliance on mixed writes, long vs
short answer citation requirement, tier escalation for simple/complex/
writey probes, finalAnswerCiteCount dedups across syntaxes). Cat 9
buildEvidence contract shape — evidence_refs flow from agent, missing
slugs skip gracefully, no raw_transcript/tool_result leakage to judge.
Cat 9 runCat9 integration with stubbed agent + mixed-verdict judge
produces fractional pass rates correctly.
Total eval suite now: 273 pass, 0 fail.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
* feat(eval): Day 9 — sealed qrels via PublicPage + PublicQuery at adapter boundary
Codex fixes #1, #2, #3 from the plan's outside-voice review. Enforcement
shifts from SOFT-VIA-TYPE-COMMENT to SOFT-VIA-SANITIZED-OBJECT. Hard
enforcement via process isolation waits for BrainBench v2 Docker sandbox.
**eval/runner/types.ts** additions:
- `PublicPage = Pick<Page, 'slug' | 'type' | 'title' | 'compiled_truth' |
'timeline'>` — the exact 5 fields adapters should see. No _facts.
No frontmatter (a known hiding spot for accidental gold leaks).
- `sanitizePage(p: Page): PublicPage` — returns a NEW object with the 5
fields only. Cannot be bypassed by `(page as any)._facts` because the
field does not exist on the sanitized object.
- `PublicQuery = Omit<Query, 'gold'>` — strips the gold field.
- `sanitizeQuery(q: Query): PublicQuery` — enumerates public fields
explicitly (not spread+delete) so no prototype weirdness leaves gold
reachable.
**eval/runner/multi-adapter.ts** — scoreOneRun now calls sanitizePage /
sanitizeQuery before passing to adapter.init / adapter.query. The scorer
retains the full Query shape (including gold.relevant) for precision /
recall computation. Adapter signatures unchanged — the sealing is at the
OBJECT level, not the type level. This keeps existing adapters
(ripgrep-bm25, vector-only, hybrid-nograph, gbrain-after) binary-compatible.
Verified: no existing adapter reads q.gold or page._facts, so the change
is safe without further adapter updates.
**test/eval/sealed-qrels.test.ts** (17 tests):
- sanitizePage strips _facts + frontmatter + arbitrary hidden keys
- Output has exactly the 5 public keys (deep introspection)
- Proxy tripwire simulates a malicious adapter: any access to _facts or
gold throws `sealed-qrels violation`
- sanitizeQuery retains optional fields (as_of_date, tags, author,
acceptable_variants, known_failure_modes) but omits undefined ones
- Honest documentation of the seal's limits: filesystem bypass and
Proxy attacks would still work in v1; Docker isolation (v2) is the
real enforcement
Every existing eval test still passes (273 before + 17 sealed-qrels = 290).
Total eval suite now: 290 pass, 0 fail.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
* feat(eval): Day 10 — all.ts rewrite + llm-budget + BrainBench N tiers
Final wiring of BrainBench v1 Complete. all.ts now orchestrates the full
Cat catalog (1-12) via a mix of subprocess dispatch (Cats 1, 2, 3, 4, 6,
7, 10, 11, 12 — standalone runners with CLI entry points) and
programmatic invocation (Cats 5, 8, 9 — require runtime inputs that
can't come via CLI flags). Subprocess Cats run concurrently under a
p-limit(2) bound to cap peak memory around ~800MB (two PGLite instances
at ~400MB each).
Cats 5/8/9 show as "programmatic" in the report with a one-line
reference to their `runCatN({...})` harness API. They're deliberately
skipped from the master runner because their inputs (claim catalog,
probe catalog, scenario catalog, pre-seeded agent state, evidence
pagesBySlug) are task-specific and assembled at the caller.
**eval/runner/all.ts** — rewritten:
- CATEGORIES is a tagged union of SubprocessCategory | ProgrammaticCategory
- runCatSubprocess spawns Bun with pipe'd stdout/stderr, 10-min timeout
per Cat (124 exit + SIGTERM on timeout; no hung subprocesses)
- runConcurrently is a bounded worker pool preserving input order
- buildReport emits the full markdown with per-Cat elapsed times,
migration-noise filter, and a separate programmatic-only section
- Honors BRAINBENCH_N (1/5/10 for smoke/iteration/published),
BRAINBENCH_CONCURRENCY (default 2),
BRAINBENCH_LLM_CONCURRENCY (default 4, consumed by llm-budget)
**eval/runner/llm-budget.ts** — shared LLM rate-limit semaphore. A full
N=10 published scorecard makes ~900 Anthropic calls (150 Cat 8/9 probes
× N=10 + 100 Cat 5 claims × N=10). Without coordination, concurrent
adapters trigger 429s on per-minute limits.
- LlmBudget class: acquireSlot/releaseSlot + withLlmSlot(fn) wrapper
that releases on success AND throw (try/finally)
- getDefaultLlmBudget() singleton reads BRAINBENCH_LLM_CONCURRENCY,
falls back to 4 on missing/garbage values
- capacity enforced ≥1 (rejects 0/negative)
- Double-release is a no-op (guards against upstream double-call bugs)
- Active + waiting counts exposed for observability / tests
**package.json** scripts:
- eval:brainbench — default N=5 iteration
- eval:brainbench:smoke — N=1 for fast iteration
- eval:brainbench:published — N=10 for committed baselines
- eval:cat6 / eval:cat11 — individual new subprocess Cats
**Tests (24):** CATEGORIES catalog enforces the exact Cat-number partition
(subprocess: 1,2,3,4,6,7,10,11,12; programmatic: 5,8,9). runConcurrently
respects the cap (observable via peak in-flight counter), preserves input
order under non-uniform delays, handles empty input. LlmBudget enforces
capacity, releases on throw, honors env var, rejects 0/negative.
buildReport filters migration noise, counts passed/failed/programmatic
correctly, includes every Cat + programmatic-only section.
Full eval suite now: 314 pass, 0 fail (15 test files).
Co-Authored-By: Claude Opus 4.6 <[email protected]>
* fix(eval): drop top_p from amara-life-gen Opus params + gitignore _cache/
Two fixes surfaced during the Day 3b real-corpus run against Opus 4.5:
**eval/generators/amara-life-gen.ts** — Current Opus rejects
`temperature` and `top_p` together:
```
400 invalid_request_error: `temperature` and `top_p` cannot both be
specified for this model. Please use only one.
```
top_p=1.0 was a no-op (no nucleus truncation), so removing it has zero
semantic effect. The field is still part of MODEL_PARAMS for the cache
key so any past cache entries (none in v1) would invalidate cleanly
on the next schema version bump.
**.gitignore** — `eval/data/amara-life-v1/_cache/` is runtime Opus
cache (398 files, ~1.6MB). Regenerable from seed; no point in source
control. The corpus itself (inbox/slack/calendar/meetings/notes/docs +
corpus-manifest.json with per-item content_sha256) stays committable
for reproducibility, just the cache directory gets excluded.
Real corpus generation ran cleanly after these two fixes: 398 LLM
calls, 84,424 input / 38,062 output tokens, \$4.12 spent (vs \$20 cap,
vs \$12 estimate). All 418 items produced. Poison fixtures use
subtle paraphrased injection ("for anyone on your team who might be
triaging this thread later…") — exactly the pattern that defeats
regex redaction and requires the structured-evidence judge contract
from Day 5.
Corpus itself stays local (will move to the brainbench sibling repo
during the v0.16 split per the design doc). No eval/data/amara-life-v1/
content landing in this PR.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
* chore: bump version to 0.20.0
Renumbered from 0.17.0 per the gbrain-versioning slot. Other work is
landing on master around this PR; 0.18 is the slot locked for this
BrainBench v1 Complete release. Also pushed the "brainbench split"
forward reference in the CHANGELOG from v0.18 → v0.19 to match.
Co-Authored-By: Claude Opus 4.6 <[email protected]>
* refactor: extract BrainBench to sibling gbrain-evals repo
BrainBench lived in this repo through v0.17, which meant every gbrain install
pulled down ~5MB of eval corpus, benchmark reports, and a pdf-parse devDep
that the 99% of users who never run benchmarks don't need.
v0.18 moves the full eval harness, 14 eval test files (314 tests), all
docs/benchmarks scorecards, and the pdf-parse devDep to
github.com/garrytan/gbrain-evals. That repo depends on gbrain via GitHub URL
and consumes it through a new public exports map.
What stays in gbrain:
- Page.type enum extensions (email | slack | calendar-event | note | meeting)
useful for any ingested format, not just evals
- inferType() heuristics for /emails/, /slack/, /cal/, /notes/, /meetings/
- 11 new public exports covering the gbrain internals gbrain-evals consumes
(gbrain/engine, gbrain/pglite-engine, gbrain/search/hybrid, etc.) — now
gbrain's stable third-party contract
What moved:
- eval/ — 4.6MB of schemas, runners, adapters, generators, CLI tools
- test/eval/ — 14 test files, 314 tests
- docs/benchmarks/ — all scorecards and regression reports
- eval:* package.json scripts
- pdf-parse devDep
Tests: 1760 pass, 0 fail, 174 skipped (E2E require DATABASE_URL).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* Merge origin/master into garrytan/gbrain-evals
Master landed significant work since this branch was cut (v0.15.x → v0.16.x →
v0.17.0 gbrain dream + runCycle → v0.18.0 multi-source brains → v0.18.1 RLS
hardening). Bumped this branch's version from the claimed 0.18.0 to 0.19.0
because master already owns 0.18.x.
Conflicts resolved:
- VERSION: 0.19.0 (was 0.18.0 on HEAD vs 0.18.1 on master)
- package.json: 0.19.0, kept all 11 eval-facing exports, merged master's
typescript devDep + postinstall script + test script (typecheck added)
- src/core/types.ts: union of both PageType additions. Master had added
`meeting | note`; this branch added `email | slack | calendar-event`
for inbox/chat/calendar ingest. Final enum carries all five.
- CHANGELOG.md: renumbered the BrainBench-extraction entry to 0.19.0 and
placed it above master's 0.18.1 RLS entry. Tweaked copy ("In v0.17 it
lived inside this repo" → "Previously it lived inside this repo") to
stop implying a specific version that never shipped.
- CLAUDE.md: adjusted "BrainBench in a sibling repo" heading from
(v0.18+) → (v0.19+).
- docs/benchmarks/2026-04-18-minions-vs-openclaw-production.md:
resolved modify-vs-delete conflict in favor of delete (the extraction).
- scripts/llms-config.ts: dropped the docs/benchmarks/ entry (directory
no longer exists here; lives in gbrain-evals).
- llms.txt / llms-full.txt: regenerated after the config change.
- bun.lock: accepted master's (master already dropped pdf-parse as a
drive-by; aligned with our removal).
Tests: 2094 pass, 236 skip, 18 fail. Spot-checked failures — build-llms,
dream, orphans tests all pass in isolation. Failures reproduce only under
full-suite parallel load and are pre-existing master flakiness (matches the
graph-quality flake noted in the earlier summary). Not merge-introduced.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* chore: bump to v0.20.0
Master is now at v0.18.2 (migration hardening + RLS + multi-source brains).
BrainBench extraction ships as v0.20.0 to leave v0.19 free for any in-flight
work on other branches.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* ci: remove eval-tests workflow (moved to gbrain-evals)
The Eval tests workflow ran `bun run eval:query:validate`, `test:eval`, and
`eval:world:render` — all three scripts moved to the gbrain-evals repo when
BrainBench was extracted in v0.20.0. The workflow has been failing on master
since the split because the scripts no longer exist here.
Eval CI now runs from gbrain-evals's own workflows.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* fix(tests): bump PGLite hook timeouts to 60s for parallel-load stability
Six test files spin up PGLite + 20 migrations + git repos in beforeEach/
beforeAll hooks. Under 136-way parallel test file execution, bun's default
5s hook timeout wasn't enough, producing 18 flaky failures that only
reproduced under full-suite parallel load (all 6 files passed in isolation).
Root cause: PGLite.create() + initSchema() takes ~3-5s under idle load, but
under 136 concurrent WASM instantiations the OS thrashes and hooks stall
well past 5s. The bunfig.toml `timeout = 60_000` applies to TESTS, not HOOKS
— bun requires per-hook timeouts as the third beforeEach/beforeAll argument.
Files touched (hook timeouts added, no test logic changed):
- test/dream.test.ts — 5 describe blocks × before/afterEach
- test/orphans.test.ts — 1 beforeEach + afterEach
- test/core/cycle.test.ts — shared beforeAll + afterAll
- test/brain-allowlist.test.ts — beforeAll + afterAll
- test/extract-db.test.ts — beforeAll + afterAll
- test/multi-source-integration.test.ts — beforeAll + afterAll
Results: 2317 pass / 0 fail (was 2253 pass / 18 fail).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* test: coverage for inferType() BrainBench corpus dirs
Closes the 1 gap surfaced by Step 7 coverage audit. 9 table-driven
assertions covering the new Page.type branches:
emails/*.md, email/*.md -> 'email'
slack/*.md -> 'slack'
cal/*.md, calendar/*.md -> 'calendar-event'
notes/*.md, note/*.md -> 'note'
meetings/*.md, meeting/*.md -> 'meeting'
The fixtures use realistic paths from the amara-life-v1 corpus in the
sibling gbrain-evals repo (em-0001, sl-0037, evt-0042, mtg-0003) so the
test doubles as a contract check between the two repos.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* docs(TODOS): mark BrainBench Cats 5/6/8/9/11 + v0.10.5 inferLinkType as completed
All five BrainBench categories shipped in v0.20.0 (to the gbrain-evals
sibling repo). v0.10.5 inferLinkType regex expansion shipped in-tree.
Remaining P1 BrainBench work: Cat 1+2 at full scale (2-3K pages) —
currently 240 pages in world-v1 corpus.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* docs: sync CLAUDE.md + polish CHANGELOG voice for v0.20.0
CLAUDE.md: add v0.19 commands to key-files list (skillify, skillpack,
routing-eval, filing-audit, skill-manifest, resolver-filenames);
add 8 new test files + openclaw-reference-compat E2E to test index;
repoint the release-summary template's benchmark source from
`docs/benchmarks/[latest].md` to `gbrain-evals/docs/benchmarks/` since
those files now live in the sibling repo.
CHANGELOG voice polish for v0.20.0: replace em dashes with periods,
parens, or ellipses per project style guide. No content changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
* docs: regenerate llms-full.txt after CLAUDE.md + CHANGELOG edits (fixes CI)
The v0.20.0 doc-sync commit (9e567bb) added 7 new v0.19 modules to the
CLAUDE.md Key Files index and polished CHANGELOG voice. Both are
includeInFull: true inputs to llms-full.txt but the generator wasn't
re-run, so the drift-detection guard (test/build-llms.test.ts) failed CI.
One-line fix: regenerate. No content changes beyond what the two source
docs already carry.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
garrytan
added a commit
that referenced
this pull request
Apr 24, 2026
…nager (#364) * feat: add `gbrain jobs supervisor` — self-healing worker process manager Adds a first-class supervisor command that: - Spawns `gbrain jobs work` as a child process - Restarts on crash with exponential backoff (1s→60s cap) - Resets crash counter after 5min of stable operation - PID file locking prevents duplicate supervisors - Periodic health checks (stalled jobs, completion gaps) - Graceful shutdown (SIGTERM→35s→SIGKILL) Usage: gbrain jobs supervisor --concurrency 4 Replaces ad-hoc nohup patterns in bootstrap scripts. The autopilot command's internal supervisor can be migrated to use this in a follow-up. Tests: 7 pass (backoff calc, PID management, crash tracking) * supervisor: atomic PID lock, queue-scoped health, env safety, unified exit Lane A of PR #364 review fixes (20-item multi-lane plan). Addresses the codex-tier + CEO + Eng findings on src/core/minions/supervisor.ts: Safety + correctness: - Atomic O_CREAT|O_EXCL PID lock via openSync('wx') with stale-file liveness check. Prevents two supervisors racing on the same PID file. (codex #1) - Health check now queries status='active' AND lock_until < now() matching queue.ts:848's authoritative stalled definition. The prior `status = 'stalled'` predicate returned zero rows forever because 'stalled' is not a persisted value in the schema. (codex #2) - All health queries scoped to WHERE queue = $1 via opts.queue binding. Multi-queue installs no longer see cross-queue false positives. (codex #3) - Class default allowShellJobs flipped true→false AND explicit `delete env.GBRAIN_ALLOW_SHELL_JOBS` when false, so child workers don't silently inherit the var from the parent shell. (eng #8, codex #9) - Unified shutdown(reason, exitCode) — max-crashes now routes through the same drain path as SIGTERM. Single source of truth for lifecycle cleanup; prerequisite for trustworthy audit events (Lane C). (eng #1) - Default PID path moves from /tmp to ~/.gbrain/supervisor.pid with mkdirSync recursive + GBRAIN_SUPERVISOR_PID_FILE env override. Matches the rest of the product's ~/.gbrain/ convention; fresh installs no longer hit ENOENT. (CEO #2 + codex #6) Refinements: - crashCount = 1 after 5-min stable-run reset (was 0, produced calculateBackoffMs(-1) = 500ms by accident). Now reads as 'first crash of a new cycle' with a clean 1s backoff. (Nit 1) - Top-of-file POSTGRES-ONLY docstring documenting why the supervisor can't run against PGLite. (Nit 2) - inBackoff flag suppresses 'worker not alive' warn during the expected null-child window (crash → sleep → next spawn). (eng #2) - Tracked listener refs for SIGTERM/SIGINT removed in shutdown() so integration tests spinning up/tearing down multiple supervisors on one process don't leak handlers. (eng #3) - Single FILTER query replaces two SELECT counts — one round-trip instead of two, three metrics in one pass. (eng #10) - child.on('error') listener emits worker_spawn_failed event for ENOENT/EACCES; exit handler still increments crashCount as usual so max-crashes bounds permanent misconfigurations. (codex #7) - healthInFlight boolean guard with try/finally prevents overlapping health checks from stacking on a hung DB. (codex #8) Documented exit codes (ExitCodes const): 0 CLEAN, 1 MAX_CRASHES, 2 LOCK_HELD, 3 PID_UNWRITABLE Agent can branch on exit=2 ('another supervisor, I'm fine') vs exit=1 ('escalate to human'). Event emitter surface: - started / worker_spawned / worker_exited / worker_spawn_failed - backoff / health_warn / health_error / max_crashes_exceeded - shutting_down / stopped Plumbed through emit() with an onEvent callback hook for Lane C's audit writer. json:false is the default; Lane C's --json mode flips it and writes JSONL to stderr. CLI changes (src/commands/jobs.ts): - `gbrain jobs supervisor` gains --allow-shell-jobs (explicit opt-in mirroring the env-var gate), --cli-path (override auto-resolution for exotic setups), and --json (JSONL lifecycle events on stderr). - Expanded --help body with description, 3 examples, and exit-code table. (DX Fix A per review) - Three-tier PID path resolution: --pid-file > GBRAIN_SUPERVISOR_PID_FILE > ~/.gbrain/supervisor.pid (via exported DEFAULT_PID_FILE). - Removed the catch-fallback to process.argv[1] — resolveGbrainCliPath() throws its own actionable install-hint error, which is what dev users need instead of a cryptic spawn failure on a .ts path. (codex #5) Tests: existing 7 supervisor.test.ts cases continue to pass. Integration tests (crash-restart, max-crashes, SIGTERM-during-backoff, env-inheritance regression) land in Lane E. Out of scope for this lane (tracked in follow-up lanes): - Audit file writer at ~/.gbrain/audit/supervisor-YYYY-Www.jsonl (Lane C) - Documentation pass (Lane B) - supervisor start/status/stop subcommands (Lane C) - gbrain doctor supervisor check (Lane D) - /ship release hygiene (Lane F) - autopilot.ts migration to MinionSupervisor (deferred to follow-up PR per codex — requires non-blocking start() API redesign, not ~30 lines) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * docs: supervisor as canonical worker deployment pattern Lane B of PR #364 review fixes. Reframes docs/guides/minions-deployment.md around `gbrain jobs supervisor` as the default answer (blocker 7), deletes the 68-line legacy bash watchdog (F10), and updates README + deployment snippets to match. docs/guides/minions-deployment.md: - New 'Worker supervision' section at the top with the canonical 3-command agent pattern (start --detach / status --json / stop) and a documented exit-code table (0 clean, 1 max-crashes, 2 lock-held, 3 PID-unwritable). - 'Which supervisor when?' decision table: container = supervisor as PID 1, Linux VM = systemd-over-supervisor, dev laptop = bare terminal. - New 'Agent usage' section for OpenClaw / Hermes / Cursor / Codex — the 3-turn discover-start-maintain workflow that replaces shell archaeology with machine-parseable JSON events + an audit file at ~/.gbrain/audit/supervisor-YYYY-Www.jsonl. - Demoted the 'Option 1: watchdog cron' path entirely; replaced with a straightforward upgrade migration block (stop script, remove cron line, start supervisor, verify via doctor). - Preconditions now check Postgres connectivity directly (supervisor is Postgres-only; the CLI rejects PGLite with a clear error). Snippets: - systemd.service: ExecStart now invokes `gbrain jobs supervisor` instead of raw `gbrain jobs work`. Two-layer supervision (systemd → supervisor → worker) buys automatic restart on reboot plus fast crash recovery. ReadWritePaths expanded to cover $HOME/.gbrain (supervisor PID + audit). - Procfile + fly.toml.partial: same change — platform restarts the container on host events, supervisor restarts the worker on crashes. - minion-watchdog.sh: deleted (git history retains it for anyone in an exotic deployment). Supervisor subsumes every capability it had plus atomic PID locking, structured audit events, queue-scoped health checks, and graceful drain on SIGTERM. README.md: - Added a paragraph under the Minions section pointing `gbrain jobs supervisor` as canonical, noting the --detach / status / stop surface and the audit file path, with a link to the full deployment guide. Kept `gbrain jobs work` documented for direct raw invocation but flagged 'prefer supervisor' for any long-running use. The supervisor `--help` body itself (3 examples + exit-code table in src/commands/jobs.ts) landed with Lane A — this lane finishes the discoverability story by making the supervisor findable via doc grep, README landing, and deployment-guide landing paths. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * supervisor: daemon-manager subcommands + JSONL audit writer Lane C of PR #364 review fixes. Adds the daemon-manager CLI surface so agents can drive `gbrain jobs supervisor` in 3 turns instead of 10, and the audit writer that makes lifecycle events inspectable across process restarts. (Blocker 8, closes DX Fix A/B/C.) New: src/core/minions/handlers/supervisor-audit.ts - writeSupervisorEvent(emission, supervisorPid) appends JSONL to `${GBRAIN_AUDIT_DIR:-~/.gbrain/audit}/supervisor-YYYY-Www.jsonl`. ISO-week rotation via a `computeSupervisorAuditFilename()` helper that mirrors `shell-audit.ts` exactly (year-boundary ISO week math, Thursday anchor, etc). - readSupervisorEvents({sinceMs}) returns parsed events from the current week's file, oldest-first, for Lane D's doctor check. Malformed lines are skipped silently (disk-full truncation is already best-effort at write time). - Reuses `resolveAuditDir()` from shell-audit.ts so the `GBRAIN_AUDIT_DIR` env var override works identically across all gbrain audit trails. src/commands/jobs.ts: supervisor subcommand dispatcher - `gbrain jobs supervisor [start] [--detach] [--json] ...` — default subcommand. Without --detach, runs foreground as before. With --detach, forks a background child (inheriting stderr so the caller can still tail JSONL events), writes a stdout payload: {"event":"started","supervisor_pid":N,"pid_file":"...","detached":true} and exits 0. Stdin/stdout on the detached child are /dev/null so the parent shell isn't held open. - `gbrain jobs supervisor status [--json]` — reads the PID file, checks liveness via `kill -0`, then reads the last 24h from the supervisor audit file to compute crashes_24h / last_start / max_crashes_exceeded. Exits 0 if running, 1 if not. JSON output is machine-parseable; human output is a 5-line ASCII report. - `gbrain jobs supervisor stop [--json]` — reads PID, sends SIGTERM, polls `kill -0` every 250ms for up to 40s (supervisor's own 35s worker-drain + 5s slack). Reports outcome: drained / timeout_40s / pid_file_missing / pid_file_corrupt / process_gone. Exit 0 on clean stop. - `--json` flag is already plumbed through to the supervisor opts from Lane A — this lane adds the onEvent audit-writer callback so every supervisor emission (started, worker_spawned, worker_exited, worker_spawn_failed, backoff, health_warn, health_error, max_crashes_exceeded, shutting_down, stopped) lands in the JSONL file with the supervisor's PID attached. --help body updated: - Three separate usage lines (start / status / stop). - SUBCOMMANDS block with one-line summaries each. - EXIT CODES block (unchanged from Lane A, moved under SUBCOMMANDS). - EXAMPLES block updated with status --json + stop + --detach forms. Tests: existing 127 supervisor + minions tests continue to pass. Integration tests for the new subcommands + audit writer land with Lane E. Follow-up (Lane D): `gbrain doctor` will read readSupervisorEvents() from this module to surface a `supervisor` health check alongside its existing checks (DB connectivity, schema version, queue health). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * doctor: add supervisor health check Lane D of PR #364 review fixes. Closes the observability loop: now that Lane C writes supervisor lifecycle events to `${GBRAIN_AUDIT_DIR:-~/.gbrain/audit}/supervisor-YYYY-Www.jsonl`, `gbrain doctor` surfaces a `supervisor` check alongside its existing health indicators. Implementation (src/commands/doctor.ts, filesystem-only block 3b-bis): - Resolves DEFAULT_PID_FILE via the same three-tier logic as the start path (--pid-file > GBRAIN_SUPERVISOR_PID_FILE > ~/.gbrain/supervisor.pid). - Reads the PID file + `kill -0 <pid>` for liveness. - Calls readSupervisorEvents({sinceMs: 24h}) from the audit module to derive last_start / crashes_24h / max_crashes_exceeded. - Suppresses the check entirely when the user has never invoked the supervisor (no PID file AND no audit events) — avoids noise on installs that don't use the feature. Status thresholds: fail max_crashes_exceeded event seen in last 24h (supervisor gave up; operator needs to restart or triage) warn supervisor not running but audit shows prior use (unexpected stop — likely crash or manual kill) warn running but > 3 crashes in last 24h (supervisor recovering but worker is unstable) ok running + ≤ 3 crashes + no max_crashes event All failure paths emit a paste-ready recovery command. Read/import errors are swallowed (best-effort like the other doctor checks). Tests: all 127 supervisor + minions tests still green; 13 existing doctor tests unaffected. F3 done. All four lanes A/B/C/D are now committed; Lane E (integration tests) and Lane F (/ship v0.20.2) remain. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * test: 4 critical integration tests for supervisor lifecycle Lane E of PR #364 review fixes (blocker 10). Fills the ~15% coverage gap flagged in the eng review by actually exercising the code paths that will break in production — crash-restart loop, max-crashes exit, SIGTERM-during-backoff, env-var inheritance — via real spawn() calls against fake shell-script workers. No mocks: real fork, real signals, real env propagation, real audit file writes. test/fixtures/supervisor-runner.ts (new, 55 lines): A standalone bun script that constructs a MinionSupervisor from env vars (SUP_PID_FILE / SUP_CLI_PATH / SUP_MAX_CRASHES / SUP_BACKOFF_FLOOR_MS / SUP_HEALTH_INTERVAL_MS / SUP_ALLOW_SHELL_JOBS / SUP_AUDIT_DIR) and calls start(). Mock engine returns empty rows for executeRaw (health check path still exercised without Postgres). Tests spawn this as a subprocess because MinionSupervisor.start() calls process.exit() on shutdown — can't run it in the test runner's own process. test/supervisor.test.ts (existing; 91 → 300 lines): - Added IntegrationHarness helper: creates a unique tmpdir per test, a fake worker shell script, a PID-file path, and an audit-dir path; cleanup runs in finally. - spawnSupervisor() forks bun on the runner with env vars set. - readAudit() reads the supervisor-YYYY-Www.jsonl file via the existing readSupervisorEvents() helper (Lane C), threading GBRAIN_AUDIT_DIR through so tests don't collide on ~/.gbrain. - waitFor(pred, timeoutMs) polls helper for event-driven tests. Four integration tests (with _backoffFloorMs=5 for <1s suite runs): 1. "respawns the worker after a crash and eventually exits with max-crashes code=1" Worker always `exit 1`. maxCrashes=3. Asserts: exit code 1, PID file cleaned up, audit contains started + 3x worker_spawned + 3x worker_exited + max_crashes_exceeded + shutting_down + stopped, and the stopped event carries {reason:'max_crashes', exit_code:1}. Locks in blockers 1 (PID lock), 2+3+6 (health SQL doesn't 500), 5 (unified shutdown emits right events), F8 (spawn errors counted). 2. "receives SIGTERM while sleeping between crashes and exits 0 cleanly" Worker always `exit 1`, backoff floor 800ms to catch the sleep. Asserts: SIGTERM during backoff → exit code 0 (not 1) in <5s, no signal kill (process.exit via shutdown), audit contains shutting_down {reason:'SIGTERM'} + stopped, PID file cleaned up. Locks in eng Issue 1 (unified exit path), eng Issue 3 (signal handlers don't accumulate across shutdowns). 3. "strips inherited GBRAIN_ALLOW_SHELL_JOBS when allowShellJobs=false, even if parent has it set" ⚠ CRITICAL regression test Parent env has GBRAIN_ALLOW_SHELL_JOBS=1. SUP_ALLOW_SHELL_JOBS=0. Worker writes $GBRAIN_ALLOW_SHELL_JOBS (or 'UNSET' if absent) to an OUT_FILE. Asserts child sees 'UNSET'. Locks in codex #9 + eng #8: the `else delete env.GBRAIN_ALLOW_SHELL_JOBS` branch from Lane A is load-bearing for the supervisor's security posture; this test prevents a future refactor silently re-opening the inheritance hole. 4. "DOES pass GBRAIN_ALLOW_SHELL_JOBS to child when allowShellJobs=true" Positive-path companion to #3. SUP_ALLOW_SHELL_JOBS=1 → worker sees '1'. Confirms the else-branch doesn't over-strip and that operators who explicitly opt in still get shell-exec enabled. Plus two audit-format unit tests: - computeSupervisorAuditFilename format (regex match) - Year-boundary ISO week: 2027-01-01 → supervisor-2026-W53.jsonl (matches the shell-audit.ts pattern exactly) Before: 7 tests covering backoff math + PID helpers (~15% behavioral coverage per eng review). After: 13 tests across all critical lifecycle paths (crash-restart, max-crashes, SIGTERM, env-inheritance, audit rotation). All 146 tests in supervisor + minions + doctor suites green in ~8s. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * chore: bump version and changelog (v0.20.2) Lane F of PR #364 review fixes. Closes the multi-lane plan with release hygiene: VERSION bump 0.19.0 → 0.20.2, package.json sync, CHANGELOG entry in GStack voice with release summary + "numbers that matter" table + "To take advantage of v0.20.2" migration block + itemized changes. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix: escape template-literal interpolation in supervisor --help The --help body in src/commands/jobs.ts is one big backtick template literal. The supervisor subcommand description I added in Lane B used both `${GBRAIN_AUDIT_DIR:-~/.gbrain/audit}` (parsed as a template interpolation into an undefined variable) and inline `code` backticks (parsed as nested template literals). CI caught it with ~200 tsc parse errors across the file. Fix: - Escape `${...}` → `\${...}` so the audit-file path renders literally. - Replace prose inline-code backticks with plain single-quote fences (`gbrain jobs work` → 'gbrain jobs work', `~/.gbrain/supervisor.pid` → ~/.gbrain/supervisor.pid). `--help` output is human prose; the single-quote form reads cleanly in a terminal without needing to smuggle nested backticks through a template literal. `bunx tsc --noEmit` is clean. 146 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * chore: regenerate llms-full.txt after Lane B doc rewrite CI drift guard caught that `llms-full.txt` didn't match the current generator output. Root cause: the Lane B rewrite of `docs/guides/minions-deployment.md` (supervisor as canonical, watchdog deleted) changed content that gets inlined into `llms-full.txt`, but I didn't run `bun run build:llms` to regenerate. `bun test test/build-llms.test.ts` now clean (7/7 pass). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --------- Co-authored-by: root <root@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
garrytan
added a commit
that referenced
this pull request
Apr 25, 2026
…l, parent-scope chunking (#422) * feat: v0.18.0 baseline — code indexing + multi-repo (Layer 0) Tree-sitter-based code chunker for TS/JS/Python/Ruby/Go. Splits code at semantic boundaries (functions, classes, types, exports). Each chunk includes a structured header for embedding context. Multi-repo config: gbrain repos add/list/remove, gbrain sync --all. Strategy-aware sync: markdown (default), code, or auto. New PageType 'code' for code file pages. This is Layer 0 of the v0.18.0 code-indexing plan (see ~/.claude/plans cathedral plan). Subsequent layers add: tests, bun --compile WASM embedding + CI guard (A1), schema migrations v16 (pages.repo_name) + v17 (content_chunks code metadata), per-repo sync bookmarks, runCycle multi-repo, Chonkie chunker parity (E2a), incremental chunking (E2), doc↔impl linking (E1), markdown fence extraction (E3), symbol navigation commands (code-def, code-refs), cost preview, BrainBench code category, CHANGELOG, migration file, docs. Backward compatible: no config changes = existing behavior preserved. * feat: v0.19.0 Layer 1 — tests for baseline + errors envelope + version bump Adds the structured error envelope (src/core/errors.ts) that downstream v0.19.0 commands (code-def, code-refs, sync --all cost preview, importCodeFile) all hand back to agents. The envelope follows the v0.17.0 CycleReport.PhaseResult.error shape so agent-consumption stays consistent across every gbrain surface. Test coverage for Wintermute's baseline (added in Layer 0): - test/errors.test.ts — envelope helper + GBrainError + serializeError - test/multi-repo.test.ts — config CRUD, dedup, file permissions - test/sync-strategy.test.ts — isSyncable strategy matrix + include/exclude globs + slugifyCodePath + pathToSlug with pageKind Bug fixes uncovered by the new tests: - src/core/sync.ts: globToRegex handles `src/**/*.ts` matching `src/foo.ts` (zero intermediate dirs). `**/` now compiles to `(?:.*/)?` instead of `.*/`. Also `?` now matches only non-slash chars (was `.`). - src/core/config.ts: configDir() respects GBRAIN_HOME env override so tests can isolate ~/.gbrain/. Matches GBRAIN_AUDIT_DIR convention. Bun's os.homedir() ignores $HOME on macOS, so we need an explicit override variable. Version bump: package.json 0.18.2 → 0.19.0. v0.18.0-2 were already released (multi-source brains + RLS + migration hardening), so the next free minor for code indexing is 0.19.0. Wintermute's baseline author label of 0.16.4 had been stale since v0.17.0 shipped; no user-visible regression from the jump. Per the rebased cathedral plan: Wintermute's multi-repo.ts and repos CLI are preserved at the baseline but will be superseded in Layer 4 by the v0.18.0 sources system (src/core/source-resolver.ts, src/commands/sources.ts). multi-repo tests stay valid for the baseline and will be removed alongside the code they cover. * feat: v0.19.0 Layer 2 — bun --compile WASM embedding + CI guard The single highest-risk change in v0.19.0 code indexing. Before this, the chunker loaded WASMs via `new URL('../../../node_modules/...', import.meta.url)` which silently breaks in the compiled binary (no node_modules at runtime). Users would see degraded chunking quality with no error, just fallback- recursive chunks instead of real semantic chunks. Codex flagged this as the #1 silent-failure mode. Mechanics: - `src/assets/wasm/tree-sitter.wasm` + 36 grammar WASMs committed to the repo (50MB). Not a small check-in, but the alternative is a postinstall script that runs before every dev bun run and fails fragile-ly on network errors. - `src/core/chunkers/code.ts` uses Bun's `import ... with { type: 'file' }` import attribute. At runtime the imported value is a file path — the actual repo path in dev, a bundler-synthesized path in the compiled binary. The tree-sitter runtime's `Language.load(path)` reads it the same way in both cases. - Layer 2 keeps the 6-language support Wintermute shipped (TS/TSX/JS/Py/ Rb/Go). Layer 5 (E2a chunker parity) expands to all 36 bundled grammars. - CHUNKER_VERSION=2 constant introduced. importCodeFile will fold this into content_hash in Layer 3 so chunker-shape changes across releases force clean re-chunks without the user needing `sync --force`. CI guard — `scripts/check-wasm-embedded.sh` + `scripts/chunker-smoketest.ts`: - Compiles a smoketest binary that calls chunkCodeText on a known TS snippet. - Asserts the output has `has_real_symbols: true`, a `[TypeScript]` language tag, and the expected symbol name. - If the chunker silently falls through to recursive chunks, the assertions fail the build. - Wired into `bun test` via package.json script pipeline. Also exposed as `bun run check:wasm` for standalone invocation. Verification: - Dev: `bun -e '...'` smoke test returns 2 chunks with correct symbol names in under 100ms. - Compiled: `bash scripts/check-wasm-embedded.sh` passes end to end. - Binary size: the gbrain binary grows from ~90MB to ~140MB, dominated by the 50MB of grammar WASMs. Still well within normal for CLIs that ship a language runtime. * feat: v0.19.0 Layer 3 — schema migrations for page_kind + chunk code metadata Adds two migrations to unblock C6/C7 (query --lang, code-def, code-refs) and the orphans/auto-link branching in later layers. v25 (pages_page_kind): - ALTER TABLE pages ADD COLUMN page_kind TEXT NOT NULL DEFAULT 'markdown' CHECK (page_kind IN ('markdown','code')) - Postgres path uses ADD CONSTRAINT ... NOT VALID + VALIDATE CONSTRAINT in a separate statement so tables with millions of pages don't hold a write lock during the initial check. PGLite has no concurrent writers, so its variant uses the simpler ALTER TABLE pattern. - Existing rows carry DEFAULT 'markdown' — pre-v0.19 brains were markdown-only by definition. v26 (content_chunks_code_metadata): - ALTER TABLE content_chunks ADD COLUMN language, symbol_name, symbol_type, start_line, end_line (all nullable). - Two partial indexes: idx_chunks_symbol_name WHERE symbol_name IS NOT NULL, and idx_chunks_language WHERE language IS NOT NULL. Only code chunks populate these columns, so partial indexes stay small even on a 50K-chunk brain with mixed markdown+code. - Markdown chunks leave all five columns NULL. Only importCodeFile populates them, from the tree-sitter AST via chunkCodeText. Wiring (both engines): - PageInput gains `page_kind?: PageKind` ('markdown' | 'code'). Defaults to 'markdown' when omitted so existing callers don't change. putPage on both engines writes it through, with ON CONFLICT DO UPDATE updating page_kind alongside the other fields. - ChunkInput gains language, symbol_name, symbol_type, start_line, end_line (all optional). upsertChunks on both engines writes them through. Existing markdown call sites pass nothing and get NULLs — zero behavior change for markdown pages. importCodeFile updates: - Sets page_kind='code' on the PageInput. - Populates chunk metadata from the chunker's CodeChunk.metadata for every chunk it persists. Columns line up 1:1 with the tree-sitter AST output already produced by the chunker. - Folds CHUNKER_VERSION=2 into content_hash so chunker shape changes across releases force clean re-chunks without `sync --force`. The hash was previously {title, type, content, lang} — now also chunker_version. Fresh-install path (src/schema.sql + pglite-schema.ts): - Both include the page_kind column + CHECK constraint. - Both include the five new content_chunks columns. - Both ship the partial indexes so new brains have the same query performance as migrated brains. Ran `bun run build:schema` to regenerate src/core/schema-embedded.ts from schema.sql. Naming: renamed our new Error subclass in src/core/errors.ts from GBrainError to StructuredAgentError. The legacy GBrainError in src/core/types.ts predates this change and has a different shape (positional problem/cause/fix arguments) — keeping both under the same name was inviting a year of import ambiguity. New v0.19.0 surfaces use StructuredAgentError + the serializeError() helper. Tests: - test/migrations-v0_19_0.test.ts — 12 cases. Covers: MIGRATIONS array shape (v25/v26 presence, NOT VALID pattern on Postgres, partial index WHERE clauses), fresh-install schema (page_kind default, CHECK constraint rejects invalid values, chunk metadata nullable), putPage round-trip (markdown default + code explicit), upsertChunks round-trip (code metadata preserved + markdown chunks leave NULLs). - All 139 existing + new unit tests pass on PGLite (1.5 sec). * feat: v0.19.0 Layer 4 — delete Wintermute's multi-repo, wire sources Replaces Wintermute's short-lived repos abstraction with the v0.18.0 sources subsystem. Codex flagged this during plan review: v0.18.0's sources table had already shipped the right shape (per-source last_commit, federated search config, RLS-friendly) while Wintermute coded against a ~/.gbrain/config.json repos array. Two systems solving one problem. Keep the surface, swap the backend: - src/cli.ts: `gbrain repos` routes through runSources with a one-line deprecation nudge on stderr. Scripts like `gbrain repos list` and `gbrain repos add .` keep working against the sources table. Removed the pre-engine-connect branch and added a case inside the handleCliOnly switch so repos gets the DB connection it now needs. - src/cli.ts help text: new SOURCES section replaces MULTI-REPO. References the canonical `sources` commands with `repos` tagged DEPRECATED. sync --all — was iterating ~/.gbrain/config.json repos; now iterates sources rows with local_path IS NOT NULL: - Reads id, name, local_path, config jsonb via executeRaw. - Honors config.syncEnabled=false (matching Wintermute's opt-out). - Honors config.strategy for per-source markdown/code/auto filtering. - Passes sourceId through to performSync so last_commit tracking lands on the right sources row (was clobbering a global bookmark before). Deletions: - src/core/multi-repo.ts deleted (120 lines of config CRUD now handled by sources table + RLS). - src/commands/repos.ts deleted (121 lines of CLI parsing now handled by src/commands/sources.ts). - test/multi-repo.test.ts deleted (25 tests against the deleted module; the schema-backed behavior is covered by test/sources.test.ts from v0.18.0 + test/repos-alias.test.ts added here). - src/core/config.ts: removed the `repos` field from GBrainConfig. Legacy installs with `repos` in ~/.gbrain/config.json will see that key ignored; no migration written because zero users are on that path (Wintermute's commit never shipped on master). Tests: - test/repos-alias.test.ts — round-trips add/list/remove through runSources to verify the alias path works. Also asserts the deleted module is actually gone (catches accidental resurrection during rebase conflicts). - All 162 prior unit tests + 2 new = 164 pass on PGLite. Codex's P0 #2 (per-repo sync state) and P0 #3 (slug collision) are both resolved here — sources.last_commit scopes bookmarks per source, and pages.slug uniqueness is (source_id, slug), which is what the v0.18.0 schema already shipped. * feat: v0.19.0 Layer 5 — Chonkie chunker parity (E2a) Expands Wintermute's 6-language chunker to 29 languages, swaps the heuristic tokenizer for the real thing, and adds small-sibling merging so a file of 20 tiny const declarations doesn't produce 20 embedding calls. This closes the Chonkie gap Garry called out in CEO review. Language coverage — 6 → 29: - Added grammars: rust, java, c_sharp, cpp, c, php, swift, kotlin, scala, lua, elixir, elm, ocaml, dart, zig, solidity, bash, css, html, vue, json, yaml, toml. All shipping in src/assets/wasm/ (committed in Layer 2). Bun's --compile bundles every import attributes path, so the compiled binary carries every grammar. - TOP_LEVEL_TYPES populated for the 11 most-used new languages (rust, java, c_sharp, cpp, c, php, swift, kotlin, scala, lua, elixir, bash, solidity) + the original 6. Tree-sitter loads the grammar but the chunker falls through to recursive chunking when TOP_LEVEL_TYPES isn't set — still correct output, just less semantic. Every grammar ships with a working fallback. - detectCodeLanguage extended for 29 extension families including .mts/.cts (TypeScript), .cc/.hpp/.cxx (C++), .kt/.kts (Kotlin), .scala/.sc (Scala), .ex/.exs (Elixir), etc. - DISPLAY_LANG table lookup replaces the inline 6-entry map; structured headers now read '[Rust]', '[C#]', '[PHP]' etc. Accurate tokenizer: - @dqbd/tiktoken with cl100k_base encoding (same encoder text-embedding-3-large uses). Lazy-loaded on first call via require() so dev and compiled binary share the init path. - Falls back to the old len/4 heuristic only if the encoder fails to initialize (vanishingly unlikely — keeps the chunker available instead of throwing). - Existing estimateTokens call sites (large-node threshold + sub-range splitting + new merge pass) all now see real counts. Real code is 2-3x more token-dense than prose; the old heuristic systematically under-split so large functions sometimes exceeded the embedding API's 8191-token hard cap. Small-sibling merging: - New mergeSmallSiblings post-pass runs on the chunk list after tree-sitter extraction. - Adjacent chunks under 40% of chunkSizeTokens get accumulated into one merged chunk up to the full budget. - Large chunks (functions, classes) pass through untouched. - Merged chunks get symbolName=null, symbolType='merged', startLine/endLine spanning the group. The header reads: '[Lang] path:N-M merged (K siblings)' so retrieval can still show coherent context. - Mirrors Chonkie's CodeChunker._group_child_nodes() + bisect_left accumulation. A Go file with 30 top-level imports + 5 functions no longer produces 30 separate import chunks. CHUNKER_VERSION bumped 2 → 3: - Any existing v0.18.x brain with code pages will re-chunk on next sync because content_hash folds CHUNKER_VERSION in. Without the bump, stale (2-3x token-off, non-merged) chunks would persist forever until manual 'sync --force'. CI guard + smoketest updates: - scripts/chunker-smoketest.ts replaced the tiny hello/Foo/Id fixture with a realistic TS snippet (calculateScore with branches + UserRegistry class) so at least one chunk has a concrete symbol name — small-sibling merging would otherwise collapse the old fixture and fail the assertion. - scripts/check-wasm-embedded.sh assertions updated: check has_symbol_names:true (at-least-one-real-symbol), still verify [TypeScript] header and specifically the calculateScore symbol. Tests — test/chunkers/code.test.ts (15 cases): - CHUNKER_VERSION=3 shape assertion (guards silent re-chunking across releases). - detectCodeLanguage across 29 extensions + unknown + case-insensitive. - chunkCodeText on TypeScript / Python / Rust / Go producing chunks with correct language tag + symbol names. - Fallback path for unsupported extension produces recursive-chunk module-kind output. - Small-sibling merging: 5 tiny consts → 1-2 chunks; big function passes through untouched; merged chunk line range spans group. - Structured header shape: starts with [Lang], contains file path, line range, symbol name. - Empty input returns empty array. All 177 unit tests pass + CI guard on compiled binary passes. * feat: v0.19.0 Layer 6 — incremental chunking + doc↔impl linking Two expansions from the plan's E1 + E2. E3 (markdown fence extraction) deferred to a follow-up PR — the feature surface is small and doesn't block the main cathedral. E1 — Design-doc ↔ implementation linking: - New extractCodeRefs() in src/core/link-extraction.ts. Scans markdown prose for references like 'src/core/sync.ts:42'. Anchored on a prefix allowlist (src|lib|app|test|tests|scripts|docs|packages| internal|cmd|examples) + the 39-extension code file list so random phrases like 'foo/bar.js' don't generate false-positive edges. Dedups by path (first occurrence wins). - importFromContent writes bidirectional edges for every code ref found in compiled_truth + timeline: markdown_slug --[documents]--> code_slug code_slug --[documented_by]--> markdown_slug Both use link_source='markdown', origin_page_id=markdown_slug, origin_field='compiled_truth' so runAutoLink reconciliation scopes edges correctly. - addLink's inner SELECT naturally drops edges to non-existent pages, so a markdown guide imported before the code repo is synced writes no edges — they'll land when the code arrives via A3 reverse-scan (deferred to a follow-up since it only activates for users who sync markdown and code in opposite order). E2 — Incremental chunking: - importCodeFile reads existing chunks via engine.getChunks(slug) before embedding. - Keys existing chunks by `${chunk_index}:${chunk_text}`. Any new chunk that matches verbatim at the same index reuses the existing embedding (chunk.embedding + token_count). Only new/changed chunks go to embedBatch. - Cost impact: a daily autopilot on a stable repo touches ~2-5% of chunks on each run. E2 cuts OpenAI embedding spend by ~95% vs naive full re-embed. Stated before (Codex A2 decision) and now actually implemented. - Uses chunk_index + chunk_text as the key (not symbol_fqn) because the tree-sitter chunker already makes chunk_index semantic — it's AST-order. A blank line at the top of a file shifts start_byte for every chunk below but leaves chunk_text identical, so the cache still hits. - Fallback: when embedBatch throws (rate-limit, network, etc.) the existing warn-but-continue behavior stays. Un-embedded chunks land in the DB with NULL embedding; a later `embed --stale` will fix them. Tests (test/link-extraction-code-refs.test.ts, 10 cases): - :line suffix capture. - Prefix allowlist (11 directories). - Extension recognition (39 extensions). - Rejects paths outside allowlisted prefixes. - Rejects non-code extensions. - Dedup by path (first occurrence wins). - Different paths coexist. - Real-markdown integration: guide with 4 code refs (one with line number) produces the right set of paths. - Doesn't match URL-like strings (word-boundary behavior). Tests (test/incremental-chunking.test.ts, 3 cases): - Identical content re-import skips entirely (content_hash match). - Editing ONE function in a 3-function file preserves the other two chunks verbatim (same chunk_text in DB). Verifies the cache-hit path actually works end-to-end on PGLite. - Fresh-file import embeds all chunks (nothing to reuse). All 189 unit tests pass on PGLite. * feat: v0.19.0 Layer 7 — code-def + code-refs CLI surfaces Delivers the magical-moment commands for v0.19.0 code indexing. These are the agent-facing endpoints that turn 'brain-first lookup' from a markdown-only Iron Law into something that covers code too. gbrain code-def <symbol>: - Queries content_chunks.symbol_name = $1 AND page_kind = 'code' AND symbol_type IN (function, class, interface, type, enum, struct, trait, module, contract, export statement). - Orders by symbol_type rank (function first, then class, etc.) then page slug then line number — deterministic across runs. - --lang <language> filter narrows to a single language. - --limit N caps results (default 20). - Returns Array<{ slug, file, language, symbol_type, start_line, end_line, snippet }> — the 7-field shape the agent persona needs. gbrain code-refs <symbol>: - Bypasses the standard searchKeyword path, which uses DISTINCT ON (slug) to collapse results to one chunk per page. That collapse is right for markdown search but wrong for code-refs — a single file typically has many usage sites, each interesting to the agent. - Direct ILIKE scan over content_chunks + JOIN pages WHERE page_kind = 'code'. Word-boundary precision is a follow-up (would need tsvector or regex); for v0.19.0 the substring heuristic is good enough because symbol names are distinctive by design. - Same --lang / --limit / --json flag surface as code-def. - Returns Array<{ slug, file, language, symbol_name, symbol_type, start_line, end_line, snippet }> — 8 fields (code-def + the containing symbol_name). Agent-DX doctrine (from DX review): - Auto-JSON on pipe: both commands emit JSON when stdout is not a TTY (gh-CLI convention). Explicit --json forces JSON on TTY; --no-json forces human output even when piped. - Structured error envelope: missing symbol argument returns { class: 'UsageError', code: '..._requires_symbol', hint: '...' } serialized as JSON in non-TTY mode, plain message in TTY. Catch-all DB error path uses serializeError() — no raw stack traces leak to the agent. Tests — test/code-def-refs.test.ts (10 cases): - Seeds a fixture repo (two TS files with deliberately large symbols to stay independent under small-sibling merging). - findCodeDef: - Resolves interface + function by name to the right file. - Empty-symbol query returns []. - Language filter narrows to typescript; python returns []. - findCodeRefs: - Finds multiple usage sites across files (both src/engine.ts and src/sync.ts appear when searching for BrainEngine — this is the DISTINCT ON bypass working). - Deterministic ordering by slug + line number. - Unknown symbol returns []. - --limit caps result count. - Snippets are <= 500 chars (the agent doesn't get flooded). CLI wiring: - Added 'code-def', 'code-refs' to CLI_ONLY. - New switch cases in handleCliOnly call runCodeDef / runCodeRefs. - Help text gains a CODE INDEXING (v0.19.0) section. All 199 unit tests pass. Deferred from Layer 7 per the cathedral plan: - sync --all cost preview with TTY detection — requires folding the tokenizer into the sync path. Pushed to a follow-up. - query --lang filter — requires changes to src/core/search/*.ts. Pushed to a follow-up. * feat: v0.19.0 Layer 8 — BrainBench code category (E2E) Retrieval-quality gate for v0.19.0 code indexing. Seeds a ~25-file fictional corpus across 5 languages (TS, Python, Go, Rust, Java), imports each via importCodeFile, and asserts code-def + code-refs produce the expected shape. Runs against PGLite in-memory so no OpenAI key or external Postgres is needed; reproducible on CI with just Bun. What the E2E covers: - Corpus seeded: 25+ code pages, all page_kind='code'. - code-def finds AuthService across multiple languages (≥2 of TS/Rust/Java). - code-def --lang typescript filters precisely (P@5=1.0 for CacheService + typescript). - code-refs surfaces multiple usage sites across files (the DISTINCT ON bypass working in practice). - code-refs over the shared "start" method across 5 languages produces ≥3 language hits (ranking stability). - Magical-moment assertion: code-refs completes in <500ms on a 25-file corpus (budget is 100ms; 500ms pad absorbs CI variance). - MRR sanity: top result for exact symbol is the defining file. - Edge cases: non-existent symbol returns [], not error. Language filter with zero matches returns []. Re-import is idempotent. Chunker retune: - Small-sibling merge threshold dropped from 40% to 15% of chunkSizeTokens. The 40% figure was collapsing 3-method classes into 'merged' chunks, killing symbol_name lookups for the entire class. 15% matches the original intent: merge truly tiny declarations (const X = 1; import ... from ...;) while leaving substantive symbols (functions, classes) independent. Verified by the BrainBench test — AuthService is now its own chunk with symbol_name='AuthService', so findCodeDef('AuthService') resolves. - Unit test updated: 10 consts with a generous chunkSizeTokens=1000 still exercise the merge path. Total v0.19.0 unit + E2E coverage: 91 tests across 9 new test files, 357 assertions, all green. * feat: v0.19.0 Layer 9 — release: CHANGELOG + migration + docs Closes out the v0.19.0 cathedral. Total shipped across 10 layers: - 91 new unit + E2E tests (9 new files, 357 assertions, all green) - 2 schema migrations (v25 pages.page_kind + v26 content_chunks code metadata) - 4 new CLI surfaces (repos [alias] + code-def + code-refs + sources passthrough) - 1 new core module (src/core/errors.ts) - 36 tree-sitter grammar WASMs embedded via Bun --compile - 1 CI guard preventing silent-chunker regression - Wintermute's multi-repo replaced with v0.18.0 sources backend CHANGELOG.md — release-summary section in the GStack/Garry voice per CLAUDE.md "Release-summary template": bold two-line headline + lead paragraph + "The numbers that matter" table + "What this means for builders" + itemized changes + "To take advantage of v0.19.0" block. No em dashes, no AI vocabulary, no banned phrases. Numbers are from the v0.19.0 test-fixture benchmarks. CLAUDE.md — four new file entries in the Key files section (src/core/chunkers/ annotated with v0.19.0 additions, src/core/errors.ts, src/assets/wasm/, src/commands/code-def.ts + code-refs.ts). skills/migrations/v0.19.0.md — agent-readable migration walkthrough per the v0.11.0 convention. Tells the agent what to do after `gbrain upgrade` runs the orchestrator: verify schema v26, register a code source via `gbrain sources add`, run `sync --source <id>`, confirm `gbrain code-def` / `code-refs` both work. Notes the deprecated `gbrain repos` alias for scripts that used Wintermute's baseline. Flagged in pending-host-work.jsonl per the v0.11.0 convention so headless agents surface the prompt. VERSION — 0.18.2 → 0.19.0. All 91 v0.19.0 tests + the CI guard pass. * docs: v0.19.0 — add 4 deferred follow-ups to TODOS.md Lands the four items the v0.19.0 cathedral explicitly scoped out but that the /plan-ceo-review + /plan-devex-review + /plan-eng-review chain identified as genuine follow-ups rather than abandoned ideas. Items added under a new 'code-indexing (v0.19.0 follow-ups)' section: - P1 — sync --all cost preview with TTY detection. Closes DX fix #1 from the /plan-devex-review pass: the agent persona can't respond to stdin prompts. Non-TTY path must emit a parseable ConfirmationRequired envelope; TTY path uses [y/N]. File refs: src/commands/sync.ts:590, src/core/chunkers/code.ts estimateTokens, src/core/errors.ts buildError. - P2 — query --lang filter through src/core/search/*.ts. Column ships in v0.19.0 (migration v26 + partial index); the query path just needs to respect it. Keeps ranking honest when the user knows the language. File refs: src/core/search/, pglite-engine searchKeyword, test/e2e/code-indexing.test.ts language-filter pattern. - P2 — E3 markdown code-fence extraction. After parseMarkdown, iterate marked's lexer tokens for { type: 'code', lang, text } and chunk each through chunkCodeText with chunk_source='fenced_code'. ~40% of gbrain's brain is guides with substantial inline code — this lands those fences as first-class TS/Python/Go chunks in search instead of treating them as prose. - P2 — A3 reverse-scan backfill for doc↔impl. Companion piece to E1. Markdown-first → code-later import order currently loses edges because addLink's JOIN drops them when the code page doesn't exist yet. A3 makes importCodeFile scan existing markdown for references to the new code path and backfill edges both directions. Trade-off: per-file scan is expensive on first sync; batch 'gbrain reconcile-links' is an alternative shape. Each entry follows the CLAUDE.md TODOS format: What/Why/Pros/Cons/ Context with exact file refs/line numbers/Effort (S/M/L + human vs CC)/Depends on. All four are purely additive on top of v0.19.0 — nothing blocks. * fix: pre-existing test infrastructure + typecheck drift Three pre-existing conditions surfaced when running the full suite and blocked a clean CI floor for Cathedral II work: 1. `bun run test` default 5s hook timeout fails under load. PGLite WASM init can exceed 5s when many test files spin up instances in parallel. The bunfig.toml `timeout = 60_000` key is honored by `bun test` but does not propagate to beforeEach/afterEach hooks when `bun test` runs behind `bun run typecheck` in the CI chain. Pass `--timeout=60000` explicitly on the command line, where it covers both per-test and per-hook timeouts. Before: 2136 pass / 30 fail (on-branch baseline) After: 2272 pass / 0 fail All 30 failures were `beforeEach/afterEach hook timed out for this test` → `TypeError: undefined is not an object (evaluating 'engine.disconnect')` — i.e. the hook never finished connecting PGLite, so the engine variable was never assigned, so afterEach tripped on `engine.disconnect()`. The new timeout gives PGLite WASM init enough headroom under concurrent load. 2. `test/repos-alias.test.ts` references the deliberately-deleted `src/core/multi-repo.ts` via a dynamic import inside a try/catch (the test asserts the module is no longer importable at runtime). TS 5.x module resolution flags this at typecheck time even inside try/catch. Build the path at runtime (`'../src/core/' + 'multi-repo.ts'`) so TS's compile-time module resolution doesn't fail on a path the test is EXPLICITLY verifying doesn't resolve. 3. `llms-full.txt` drifted from `bun run build:llms` output (earlier CLAUDE.md updates in v0.19.0 never regenerated). `bun run build:llms` now produces matching output. Zero behavior changes to production code. Test infrastructure only. * feat: v0.20.0 Cathedral II Layer 1 — Foundation schema migration Layer 1 of 14 for the v0.20.0 "best code search in the world" cathedral. Ships all Cathedral II DDL atomically so downstream layers have the columns + tables + trigger they depend on. Schema-only; no consumer behavior changes until Layer 5 (A1 edge extractor). Reordered to Layer 1 after codex second-pass review (SP-4): previously Layer 0b (chunk-grain FTS trigger) referenced columns added in the former Layer 3 (Foundation), breaking bisectability. All schema DDL now lands first; every subsequent layer's prerequisites exist. ### What this migration adds (one idempotent v27 transaction) 1. `content_chunks` gains 4 new columns: - `parent_symbol_path TEXT[]` — scope chain for nested symbols (A3) - `doc_comment TEXT` — extracted JSDoc/docstring (A4) - `symbol_name_qualified TEXT` — 'Admin::UsersController#render' (A1) - `search_vector TSVECTOR` — chunk-grain FTS (Layer 1b consumer) All nullable; markdown chunks leave them NULL. 2. `sources.chunker_version TEXT` (SP-1 gate). Layer 10 will check this against CURRENT_CHUNKER_VERSION and force a full sync walk on mismatch, bypassing the git-HEAD up_to_date early-return that would otherwise make a bare CHUNKER_VERSION bump a silent no-op. 3. `code_edges_chunk` — resolved call-graph + reference edges. - `from_chunk_id` + `to_chunk_id` with FK CASCADE from content_chunks - UNIQUE (from_chunk_id, to_chunk_id, edge_type) holds idempotency - `source_id TEXT` matches `sources.id` actual type (codex F4 caught the prior UUID typo) - source scoping enforced in resolution logic, not the key, because from_chunk_id → pages.source_id already determines it 4. `code_edges_symbol` — unresolved refs. Target symbol known by qualified name; defining chunk not seen yet. Rows UNION with code_edges_chunk on read (codex 1.3b); no promotion step (SP-7). 5. `update_chunk_search_vector` trigger — BEFORE INSERT/UPDATE OF (chunk_text, doc_comment, symbol_name_qualified). Weights doc_comment and symbol_name_qualified at 'A', chunk_text at 'B'. Natural-language queries rank doc-comment hits above body text (A4 intent, delivered via the trigger from day one even though Layer 5 populates the doc_comment column). ### Engine interface + types - `BrainEngine` gains 6 new methods for code edges, all stubbed in both engines with explicit NotImplemented errors pointing at the layer that will fill them (5, 7, or 1b): addCodeEdges, deleteCodeEdgesForChunks, getCallersOf, getCalleesOf, getEdgesByChunk, searchKeywordChunks - `CodeEdgeInput`, `CodeEdgeResult` types added to src/core/types.ts - `SearchOpts` extended with Cathedral II fields: language, symbolKind, nearSymbol, walkDepth, sourceId (all optional; consumers wire in Layer 5/7/10) - `ChunkInput` extended with: parent_symbol_path, doc_comment, symbol_name_qualified (populated by importCodeFile in Layer 5/6) - `Chunk` read shape mirrors the added columns as optional fields - `chunk_source` union widens to include 'fenced_code' for D2 fence extraction (Layer 6 consumer) ### Tests `test/migrations-v0_20_0.test.ts` — 17 structural assertions against the v27 migration registry. Covers every column + table + index + the trigger weight shape. E2E migration-application coverage lands in `test/e2e/cathedral-ii.test.ts` alongside Layer 5. ### Status - CEO + Eng + 2 codex passes CLEARED (see docs/designs/CODE_CATHEDRAL_II.md) - 16 cross-model findings absorbed (7 codex pass 1 + 6 codex pass 2 + 3 eng review) - 13 more layers to go (0a → 14); see plan for full sequencing. * feat: v0.20.0 Cathedral II Layer 2 (1a) — file-classifier widening + SP-5 slug dispatch Codex F1: `sync.ts:35` v0.19.0 classified only 9 extensions as code. Rust/Java/C#/C++/Swift/Kotlin/etc. never reached the chunker on a normal repo sync, making v0.19.0's "29 languages" claim aspirational on the read path. Layer 2 widens the classifier so every language the chunker knows (~35 extensions) actually reaches it during sync. ### Changes 1. `src/core/sync.ts` CODE_EXTENSIONS widened from 9 to 35 extensions, matching the chunker's detectCodeLanguage coverage: adds .rs, .java, .cs, .cpp/.cc/.cxx/.hpp/.hxx/.hh, .c/.h, .php, .swift, .kt/.kts, .scala/.sc, .lua, .ex/.exs, .elm, .ml/.mli, .dart, .zig, .sol, .sh/.bash, .css, .html/.htm, .vue, .json, .yaml/.yml, .toml, .mts/.cts. 2. `src/core/sync.ts` adds `resolveSlugForPath(path)` — SP-5 fix. Before Cathedral II, sync delete/rename paths called `pathToSlug(path)` with default pageKind='markdown'. For the 9-ext classifier this was mostly fine (code files rare), but widening to 35 exts means Rust/Java/Ruby/etc. deletes and renames would mismatch on slug shape (pathToSlug markdown-style vs slugifyCodePath code-style). resolveSlugForPath dispatches on isCodeFilePath so delete/rename always hit the right page. Used in `src/commands/sync.ts` at the three slug-resolution sites (un-syncable delete, batch delete, rename from/to). 3. `src/core/chunkers/code.ts` adds `setLanguageFallback(fn)` + optional `content` arg to `detectCodeLanguage(path, content?)`. Pre-wires the Magika fallback hook that Layer 9 (B2) will consume for extension-less files (Dockerfile, Makefile, shell shebangs). Null default → no behavior change today; Layer 9 sets it at bootstrap. Fallback throws are swallowed (recursive chunker is always an acceptable degradation). ### Tests - `test/sync-classifier-widening.test.ts` — 20 cases covering the full widened extension set, resolveSlugForPath dispatch, and the Magika fallback hook contract (including throw-swallow and null-pass-through). - `test/sync-strategy.test.ts` updated: `.json` is no longer rejected (the chunker's language map includes JSON for structured-data chunking). Test clarifies Cathedral II semantics; adds .svg + .zip as non-code examples. ### CI result 2292 pass / 0 fail via `bun run test`, 388s wall time. * feat: v0.20.0 Cathedral II Layer 3 (1b) — chunk-grain FTS with page-grain wrap Codex F2 caught that v0.19.0's searchKeyword ranked via pages.search_vector, so doc-comment content living on a chunk couldn't influence ranking and A2 two-pass retrieval had no way to find the best matching chunk. Layer 3 moves the FTS primitive to content_chunks.search_vector (the column + trigger added in Layer 1/v27), dedups-to-best-chunk-per-page on return so every external caller still sees the v0.19.0 page-grain contract (SP-6), and exposes searchKeywordChunks as the raw chunk-grain primitive A2 two-pass will consume (Layer 7). ### Backfill migration v28 Layer 1's trigger only fires on INSERT/UPDATE — rows inserted before v27 applied had NULL search_vector. v28 backfills every existing chunk with the same weight shape the trigger uses (doc_comment + symbol_name_qualified at weight A, chunk_text at B). Idempotent via `WHERE search_vector IS NULL`; re-runs pick up only remaining NULL rows. ~2-3s on a 20K-chunk brain. ### searchKeyword rewrite (both engines) CTE chain: rank chunks by cc.search_vector → DISTINCT ON (slug) picks best chunk per page → order by score → limit. External shape identical to v0.19.0: one row per matched page, score comes from the best chunk on that page, chunk metadata attached. Zero breaking changes for backlinks counting, enrichment-service.countMentions, list_pages, etc. Inner fetch limit is 3x the requested page limit so dedup has enough chunks to produce N distinct pages (a co-occurring-term cluster in one page can't eat the result set). Postgres keeps the SET LOCAL statement_timeout='8s' from v0.12.3 search timeout scoping. PGLite gets the same CTE shape minus the transaction- scoped GUC (PGLite has no pool). ### searchKeywordChunks (new internal primitive) Same chunk-grain ranking WITHOUT dedup. Returns raw top-N chunks by FTS score regardless of page. Used by A2 two-pass retrieval (Layer 7) as its anchor-discovery primitive — two-pass wants top chunks, not best-per-page. Most callers should prefer searchKeyword. ### Tests - test/chunk-grain-fts.test.ts: 11 cases covering migration v28 shape, page-grain external contract (dedup preserves invariants), chunk-grain primitive (no dedup, score-ordered), and the doc-comment weight-A precedence over body weight-B — the A4 ranking win validated today even though Layer 5 is what populates doc_comment from AST. - test/pglite-engine.test.ts existing "tsvector trigger populates search_vector on insert" updated: v0.19.0 searched pages.search_vector (built from title + compiled_truth) so two-word queries matching non-chunk text worked. Cathedral II ranks chunks only — test updated to search 'AI agents' which is in the chunk_text directly. - test/migrations-v0_20_0.test.ts "v27 is highest" relaxed to "v27 is the foundation migration; max >= 27" so later layers can land migrations without breaking this assertion. ### CI result 2553 tests / 0 fail via `bun test --timeout=60000`, 422s wall time. * feat: v0.20.0 Cathedral II Layer 4 (B1) — language manifest foundation Consolidate the 29-way GRAMMAR_PATHS + parallel DISPLAY_LANG record into a single LANGUAGE_MANIFEST keyed on SupportedCodeLanguage. Each entry is a LanguageEntry with { displayName, embeddedPath?, lazyLoader? }. ### Why this matters for Cathedral II Before: adding a language meant editing two maps (path + display name) AND adding a new `import G_X from ...` at the top, for every new lang. After: one manifest entry + one `with { type: 'file' }` import (embedded) or one registerLanguage() call at boot (lazy). loadLanguage() consults the manifest uniformly — it doesn't know or care whether a grammar is embedded in the compiled binary or resolved from node_modules at runtime. ### The 3 extension points - `embeddedPath` — Bun `with { type: 'file' }` asset. Ships with `bun --compile` output; already in place for the 29 core grammars. - `lazyLoader` — async function returning path or Uint8Array. Used at first reference, then cached in `languageCache` like embedded grammars. Forward-compat for v0.20.x+ full tree-sitter-wasms (~136 more langs). - `registerLanguage(lang, entry)` / `unregisterLanguage(lang)` / `listRegisteredLanguages()` — runtime registration hook. Layer 9 (B2 Magika) will wire detection for extensionless files through this API. Dynamic registrations win over core manifest on conflict so hot-fix overrides during a session work without restart. ### Behavior guarantees preserved - All 29 v0.19.0 core grammars continue to ship embedded — no binary-size growth, no runtime network dependency for the core set. - `detectCodeLanguage` untouched; its output key still maps 1:1 through LANGUAGE_MANIFEST. - `displayLang()` now derived from the manifest. Chunk headers read "[Python]" / "[TypeScript]" / "[Ruby]" just as before — one source of truth, manifest-derived. ### Tests (test/language-manifest.test.ts, 8 cases) - Manifest covers all 29 v0.19.0 languages (typescript/tsx/js/py/rb/go/ rust/java/c_sharp/cpp/c/php/swift/kotlin/scala/lua/elixir/elm/ocaml/ dart/zig/solidity/bash/css/html/vue/json/yaml/toml). - registerLanguage does NOT invoke the lazy loader at registration time (proves the loader fires at most on first chunkCodeText() call). - Dynamic registrations override core manifest entries (hot-fix path). - unregisterLanguage removes a dynamic entry and clears its parser cache. - chunkCodeText still loads core grammars (TypeScript / Python / Ruby) end-to-end; chunk headers use the manifest displayName ("[Python]", not "[python]"). ### What's NOT shipped here Adding the additional ~136 languages from tree-sitter-wasms is deliberate v0.20.x+ follow-up work. The manifest infrastructure is in place; expanding coverage is now a data-only PR (one entry per language). ### CI result 2561 tests / 0 fail via `bun test --timeout=60000`, 425s wall time. * feat: v0.20.0 Cathedral II Layer 8 D1 — sync --all cost preview + ConfirmationRequired envelope Closes the v0.19.0 DX review's #1 pain point: "first sync surprise bill." Before Cathedral II, `gbrain sync --all` on a fresh multi-source brain could spin up tens of thousands of OpenAI embedding calls before anyone saw a cost number. Agent callers (OpenClaw, Hermes, etc.) had no way to gate the operation behind a spend check. ### Behavior Before `sync --all` touches a single source, walk the working trees of every registered source with `local_path`, sum tokens per file via the same cl100k_base tokenizer text-embedding-3-large actually uses, and compute a USD estimate. Gate on that: - **TTY + !--json + !--yes** → interactive `[y/N]` prompt. - **non-TTY OR --json OR piped** → emit `ConfirmationRequired` envelope to stdout via the v0.18 `errorFor` builder, exit code 2. Reserves exit 1 for runtime errors so agent callers can distinguish "awaiting user call" from "something crashed." - **--yes** → skip prompt entirely. Agent/CI path. - **--dry-run** → print preview, exit 0 without syncing. - **--no-embed** → skip the cost gate entirely (user already opted out of OpenAI spend; they'll run `embed --stale` later). ### Preview shape One stderr line or one JSON payload: sync --all preview: <N> files across <M> source(s), ~<T> tokens, est. $<X> on text-embedding-3-large. Conservative overestimate: full working-tree content, not just the incremental diff. A source never embedded before WILL embed everything on first sync; already-synced sources with small diffs get a ceiling, not a floor. False-high bias is intentional — users never get surprised by MORE cost than the preview claimed. ### Files - `src/core/chunkers/code.ts`: `estimateTokens` now exported (was module-private). Same cl100k_base tokenizer, just a public symbol. - `src/core/embedding.ts`: add `EMBEDDING_COST_PER_1K_TOKENS = 0.00013` + `estimateEmbeddingCostUsd(tokens)`. Single source of truth for cost math; every cost-preview surface reads this constant, so a pricing change is a one-line edit. - `src/commands/sync.ts`: - new `estimateSyncAllCost(sources)` helper walks trees, sums tokens per active source, returns breakdown. - new `walkSyncableFiles(repo, cb, strategy)` recursive walker. Honors the same `isSyncable` rules as the real sync so preview and execution agree on scope. Skips hidden dirs, node_modules, ops/, and files over 5MB. Best-effort file-read errors don't block the preview. - new `promptYesNo(question)` readline wrapper — resolves false on non-'y' answer OR EOF. - `--yes` and `--json` flags parsed at sync argv layer. - cost preview runs before the per-source sync loop on `--all`, gates via the TTY / --json / --yes / --dry-run matrix above. ### Tests `test/sync-cost-preview.test.ts` (6 cases): - EMBEDDING_COST_PER_1K_TOKENS pinned to $0.00013. - `estimateEmbeddingCostUsd` scales linearly across 0 → 1M tokens. - `estimateTokens` round-trips (empty → 0, short → <10, 100x text → >50x). ### CI result 2567 tests / 0 fail via `bun test --timeout=60000`, 424s wall time. * feat: v0.20.0 Cathedral II Layer 8 D2 — markdown fence extraction ~40% of gbrain's brain is docs + guides + architecture notes with substantial inline code. In v0.19.0 those fenced code blocks chunked as prose, so querying "how do we handle errors in TypeScript" ranked paragraphs ABOUT the import above the actual import example. D2 walks the marked lexer tokens, extracts each recognized code fence, and persists them as extra chunks on the parent markdown page with `chunk_source='fenced_code'` and full code-metadata (language, symbol_name, symbol_type, start/end line). ### Behavior In `importFromContent`, after `parseMarkdown` returns compiled_truth, we additionally run the text through `marked.lexer()` and walk for `{ type: 'code', lang, text }` tokens. For each: - Map the fence language tag (`ts`/`typescript`/`js`/...) to a pseudo-path (`fence.ts`/`fence.js`/...) so `detectCodeLanguage` picks the right grammar. - Call `chunkCodeText(text, pseudoPath)` — one or more code chunks depending on fence size. Tree-sitter-aware chunking means a big TS fence splits at function boundaries, not character count. - Persist each chunk with `chunk_source='fenced_code'`. Extends the existing chunk_source enum; schema allows it via the TEXT column. ### Fence-bomb DOS guard `MAX_FENCES_PER_PAGE = 100` by default, overridable via `GBRAIN_MAX_FENCES_PER_PAGE` env var. A malicious markdown page with 10K ```ts blocks could otherwise force 10K embedding API calls. Beyond the cap, remaining fences skip with a one-line console warn so operators can see the event. ### Per-fence error isolation Each fence runs through its own try/catch. One malformed fence (e.g. marked lexer choking on edge-case markdown) doesn't abort the whole page import — the other fences + the prose chunks from compiled_truth all still land. ### Recognized fence tags (29 languages + 7 aliases) ts/typescript, tsx, js/javascript, jsx, py/python, rb/ruby, go/golang, rs/rust, java, c#/cs/csharp, cpp/c++, c, php, swift, kt/kotlin, scala, lua, ex/elixir, elm, ml/ocaml, dart, zig, sol/solidity, sh/bash/shell/zsh, css, html, vue, json, yaml/yml, toml. Unknown tag → skipped (no synthetic chunk, no crash). Missing tag (```\n...\n```) → skipped. Empty body → skipped. ### Collateral fix `rowToChunk` in src/core/utils.ts now maps the code-chunk metadata columns (language, symbol_name, symbol_type, start_line, end_line) + the v0.20.0 Cathedral II additions (parent_symbol_path, doc_comment, symbol_name_qualified) out of the DB. Pre-Cathedral II the code columns were written via upsertChunks but never read back — caught by the new fence test assertions. ### Tests (test/fence-extraction.test.ts, 7 cases) - TS fence → language='typescript' chunk - Python fence → language='python', chunk_text contains def - Ruby fence → language='ruby' - Unknown tag (```mermaid, ```unknown-xyz) → no fenced_code chunks - Missing tag → no fenced_code chunks - 3 fences on one page, mix of langs → 3+ fenced_code chunks - Empty fence body → no chunks ### CI result 2574 tests / 0 fail via `bun test --timeout=60000`, 434s wall time. * feat: v0.20.0 Cathedral II Layer 8 D3 — reconcile-links batch command Closes the v0.19.0 Layer 6 doc↔impl order-dependency: when a markdown guide imports BEFORE the code it cites (common — docs land first, code sync runs second), the Layer 6 E1 forward-scan calls addLink but its inner JOIN silently drops the edge because the code page doesn't exist yet. The guide and the code eventually both exist in the brain, but the edge never materialized. ### New CLI surface gbrain reconcile-links [--dry-run] [--json] Walks every markdown page, re-runs `extractCodeRefs` on compiled_truth+timeline, and calls addLink(md, code, ..., 'documents') + reverse for each hit. ON CONFLICT DO NOTHING at the links table makes the operation idempotent — existing edges stay, new edges land. ### Per-lang coverage via extractCodeRefs Inherits the regex from `src/core/link-extraction.ts` which already recognizes code paths for 29 extensions (ts/tsx/js/py/rb/go/rust/java/ c#/cpp/c/php/swift/kotlin/scala/lua/elixir/elm/ocaml/dart/zig/sol/sh/ css/html/vue/json/yaml/toml). Fence-extraction (D2) and classifier- widening (Layer 2) keep this in sync with the chunker's actual reach. ### Why batch over per-import reverse-scan Codex's two-pass review flagged per-import reverse-scan as O(N) ILIKE/JOIN queries per code file imported — on a 47K-page brain first- syncing 5K code files that's 5K ILIKE scans. A user-triggered batch run on an already-synced brain is one walk, slug-indexed via addLink's existing lookup. Same correctness, much faster. ### Behavior - Dry-run: counts refs, attempts = 0, writes nothing. - auto_link=false in config: returns status='auto_link_disabled' + no-op. Users who disabled auto-linking on put_page don't want reconcile-links silently re-populating edges either. - Missing code target: counted as `edgesTargetsMissing`, not thrown. The ref exists in the guide, but the code page hasn't been synced yet. Re-run after the next code sync to materialize. - Progress reporter: `reconcile_links.scan` phase, one tick per markdown page, with rolling summary `guides/foo (+N refs)` per tick. ### Tests (test/reconcile-links.test.ts, 6 cases) - Extracts code refs and creates bidirectional edges (guide→code + code→guide). - Idempotent: second run inserts zero new edges. - Dry-run reports counts without writing. - Markdown page with no code refs is a no-op. - Respects auto_link=false. - Missing code target is counted, not thrown. ### CI result 2580 tests / 0 fail via `bun test --timeout=60000`, 432s wall time. * feat: v0.20.0 Cathedral II Layer 12 — CHUNKER_VERSION 3→4 + SP-1 gate Codex's second-pass review caught that bumping CHUNKER_VERSION alone is a silent no-op on an unchanged repo: performSync short-circuits at `up_to_date` before reaching importCodeFile's content_hash check. Layer 12 adds a sources.chunker_version gate that forces a full re-walk when the version mismatches, regardless of git HEAD equality. - CHUNKER_VERSION 3 → 4 (src/core/chunkers/code.ts:99), folded into content_hash via v0.19.0 Layer 5 wiring — any bump forces clean re-chunks. - src/commands/sync.ts: readChunkerVersion/writeChunkerVersion helpers; version-mismatch gate runs BEFORE the up_to_date early-return and forces a full walk; writeChunkerVersion called after every last_commit anchor. - test/chunker-version-gate.test.ts: 3 pinning tests (constant value, import stability, v27 migration shape). - test/chunkers/code.test.ts: update v0.19.0 CHUNKER_VERSION=3 assertion to Cathedral II v0.20.0 CHUNKER_VERSION=4. Full CI: 2333 pass / 250 skip / 0 fail / 6155 expect() / 408s. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat: v0.20.0 Cathedral II Layer 13 (E2) — reindex-code + migration orchestrator Ships the user-facing explicit-backfill path. v0.19.0 → v0.20.0 brains get CHUNKER_VERSION 3→4 rolled over automatically via Layer 12's gate on next sync. Users who want the benefits NOW (before their next sync) run `gbrain reindex-code --yes`. - New src/commands/reindex-code.ts. runReindexCode(engine, opts) walks code pages from the DB in batches of 100 (Finding 4.4 OOM protection), reads compiled_truth + frontmatter.file, re-runs importCodeFile. --dry-run reports cost + token count without importing. --force bypasses importCodeFile's content_hash early-return. --source filters to one sources row. Pages without frontmatter.file fail cleanly (counted, not thrown). runReindexCodeCli parses argv, wires the D1 cost-preview gate (TTY prompt or ConfirmationRequired envelope for non-TTY/JSON), delegates. - src/core/import-file.ts: importCodeFile gains opts.force flag. When true, skips the content_hash === hash early-return so a paranoid full reindex always re-chunks + re-embeds even when content hasn't changed. - src/cli.ts: register 'reindex-code' case + CLI_ONLY entry. - src/commands/migrations/v0_20_0.ts: orchestrator with 3 phases (schema → backfill_prompt → verify). Phase B prints the two backfill choices directly (automatic via sync vs immediate via reindex-code). Follows v0.12.2/v0.18.1 idempotent-resumable pattern. - src/commands/migrations/index.ts: registers v0_20_0 after v0_18_1. - skills/migrations/v0.20.0.md: agent-facing post-upgrade instructions. - test/reindex-code.test.ts: 5 cases (count, dry-run, walk+failures, empty brain, batch pagination). - test/migration-orchestrator-v0_20_0.test.ts: 5 cases (registry wiring, feature-pitch content, __testing exports, dry-run skips, is-latest). - test/apply-migrations.test.ts: extend skippedFuture pins with 0.20.0. Full CI: 2343 pass / 250 skip / 0 fail / 6193 expect() / 426s. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat: v0.20.0 Cathedral II Layer 10 partial (C1 + C2) — query --lang / --symbol-kind Ships the cheap half of the C tier: language + symbol-kind filters on hybrid search. The content_chunks.language and content_chunks.symbol_type columns have existed since v0.19.0 Layer 5 (code chunker populates both); Layer 10 exposes them as filter flags on the 'query' operation. The expensive half (C3 --near-symbol, C4 code-callers, C5 code-callees) is blocked on Layer 5 A1 edge extractor — those need the code_edges_chunk + code_edges_symbol tables populated. They ship in a follow-up. - src/core/pglite-engine.ts: searchKeyword / searchKeywordChunks / searchVector all accept opts.language + opts.symbolKind. Filters added via parameterized $N indices; unknown values return zero results (no false positives). - src/core/postgres-engine.ts: same three methods, same filters, threaded through the postgres.js sql-fragment pattern. Honors SET LOCAL statement_timeout discipline. - src/core/search/hybrid.ts: threads opts.language + opts.symbolKind into per-engine searchOpts so filters fire at SQL level (not post-filtered in-memory). - src/core/operations.ts: query op params gain lang + symbol_kind entries. Handler maps them into hybridSearch opts.language / opts.symbolKind. - src/cli.ts: updated --help CODE INDEXING section to list the new flags + reconcile-links + reindex-code commands. - test/search-lang-symbol-kind.test.ts: 9 cases (no filter, lang-only, symbolKind-only, combined AND, searchKeywordChunks variant, unknown lang/kind return zero, operation schema check). Full CI: 2352 pass / 250 skip / 0 fail / 6216 expect() / 432s. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat: v0.20.0 Cathedral II Layer 6 (A3) — parent-scope + nested-chunk emission Ships the chunk-granularity change codex called out in the second-pass review. Before Cathedral II, `export class BrainEngine { m1() {} m2() {} }` emitted ONE chunk for the whole class. Retrieval returned the entire class body for a symbol-specific query like "how does searchKeyword work" — the agent had to re-read the whole thing. A3 extends the chunker to emit each method as its own chunk carrying `parentSymbolPath: ['BrainEngine']`, with a `(in BrainEngine)` suffix in the header so the embedding captures scope context. The class-level parent chunk still ships (slim body: declaration line + member digest) so class-level queries still hit something. Recursive expansion: Ruby `module Admin { class UsersController { def render } }` emits 3 chunks — Admin (parent=[]), UsersController (parent=[Admin]), render (parent=[Admin, UsersController]). - src/core/chunkers/code.ts: - CodeChunkMetadata gains `parentSymbolPath?: string[]`. - NESTED_EMIT_CONFIG map per language (TS, TSX, JS, Python, Ruby, Rust impl blocks, Java class/interface/record). Maps parent types (class_declaration / class_definition / module / impl_item) to child types (method / method_definition / function_definition / singleton_method / constructor_declaration). - findNestableParent unwraps TS export_statement to reach the inner class_declaration — the export wrapper was a classic gotcha. - emitNestedScoped: recursive, builds full parent-chain path, pushes a slim scope-header chunk for each parent level + leaf chunks for methods. Handles module → class → method chains. - buildChunk emits "(in ClassName.method)" header suffix when parentSymbolPath is non-empty. - mergeSmallSiblings now bails on any file that has parent-scoped chunks. Methods emitted by A3 are intentionally small and individually addressable; merging them would erase the scope context Layer 6 just established. - src/core/import-file.ts: importCodeFile passes parent_symbol_path from chunker metadata into ChunkInput so it lands in content_chunks. - src/core/pglite-engine.ts + src/core/postgres-engine.ts: upsertChunks extends the column list to persist parent_symbol_path (TEXT[]), doc_comment (TEXT), symbol_name_qualified (TEXT). All three existed as schema columns from Layer 1 but the writers weren't plumbed yet. ON CONFLICT DO UPDATE includes all three so re-imports refresh metadata correctly. - test/parent-scope.test.ts: 9 cases covering TypeScript class method expansion, Python class, Ruby module+class, top-level function passthrough, and round-trip through upsertChunks to verify text[] persistence. Full CI: 2361 pass / 250 skip / 0 fail / 6270 expect() / 439s. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat: v0.20.0 Cathedral II Layer 5 (A1) — edge extractor + qualified names (8 langs) The 10x leap. v0.19.0 shipped symbol-column filtering and could find "the definition of X"; v0.20.0 Layer 5 captures who CALLS X. Walk the tree-sitter tree during chunking, harvest call-site edges, persist to code_edges_symbol with the callee's short-name as to_symbol_qualified. `getCallersOf("helper")` now returns every call site, ready for Layer 7 two-pass retrieval to expand into structural neighbors. Scope: precision 80, recall 99. We don't try to resolve receiver types at capture time (obj.method() stores "method", not "ObjClass.method"). That receiver-type inference is a future optimization; the edges are captured, which is the whole point. Cross-file resolution is also deferred — all Layer 5 edges land unresolved in code_edges_symbol. Per-language shipped: TypeScript, TSX, JavaScript, Python, Ruby, Go, Rust, Java. ~85% of real brain code. Other languages chunk normally, edges just empty. - src/core/chunkers/qualified-names.ts (new): per-language delimiter conventions. Ruby `Admin::UsersController#render` (instance) vs Python `admin.users.UsersController.render` vs Rust `users::UsersController::render`. Unknown languages dot-join as fallback (never drop). - src/core/chunkers/edge-extractor.ts (new): iterative AST walk (no recursion — tree-sitter trees can be deep, stack overflow risk on generated code). Per-language CALL_CONFIG maps node types to callee field names. extractCalleeName unwraps member_expression, scoped_identifier, field_expression to reach the innermost identifier. findChunkForOffset maps a byte offset to the innermost chunk for from_chunk_id resolution. - src/core/chunkers/code.ts: CodeChunkMetadata gains symbolNameQualified. buildChunk folds in qualified-name from parents + name. New chunkCodeTextFull API returns (chunks, edges); chunkCodeText stays as back-compat wrapper. - src/core/import-file.ts: call chunkCodeTextFull, build ChunkInput list with symbol_name_qualified, after upsertChunks run findChunkForOffset to map call-site byte offsets to resolved chunk IDs, call deleteCodeEdgesForChunks (codex SP-2 inbound invalidation) then addCodeEdges. Edge persistence is best-effort — failure logs a warn but does not fail the import. - src/core/pglite-engine.ts + src/core/postgres-engine.ts: implement the 5 stub methods. addCodeEdges splits resolved vs unresolved by to_chunk_id presence, inserts with ON CONFLICT DO NOTHING. getCallersOf / getCalleesOf UNION code_edges_chunk + code_edges_symbol (codex 1.3b: no promotion, UNION-on-read forever). getEdgesByChunk honors direction {in, out, both}. deleteCodeEdgesForChunks wipes both tables in both directions (codex SP-2). - test/qualified-names.test.ts: 9 cases (TS/Ruby instance method/Python/ Rust/Java/unknown-lang fallback). - test/edge-extractor.test.ts: 11 cases (per-language call capture + findChunkForOffset mapping + unknown-language empty-list). - test/code-edges.test.ts: 7 cases (addCodeEdges insert + idempotency, getCallersOf short-name match, resolved path, getEdgesByChunk direction filters, deleteCodeEdgesForChunks both-direction wipe). Full CI: 2391 pass / 250 skip / 0 fail / 6308 expect() / 449s. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat: v0.20.0 Cathedral II Layer 10 rest (C4 + C5) — code-callers / code-callees CLI Exposes Layer 5's call-graph edges as user-facing agent commands. The existing code-def / code-refs pair answers "where is X defined?" and "where is X referenced?"; Layer 10 rest adds "who CALLS X?" and "what does X CALL?" — the structural questions v0.19.0 couldn't answer. Conventions follow the code-def / code-refs precedent: - Auto-JSON on non-TTY (gh-CLI convention) - StructuredAgentError envelope on usage / runtime failure - Exit 2 on UsageError, exit 1 on runtime - --all-sources to widen beyond the anchor's source; default source-scoped - src/commands/code-callers.ts (new) — wraps engine.getCallersOf. - src/commands/code-callees.ts (new) — wraps engine.getCalleesOf. - src/cli.ts — register both cases, update CLI_ONLY list, update --help CODE INDEXING section to list the two new commands. - test/code-callers-cli.test.ts — 2 cases (module exports, callable). The --near-symbol / --walk-depth flags on query ship with Layer 7 (A2 two-pass retrieval) in a follow-up layer commit. Full CI: 2393 pass / 250 skip / 0 fail / 6310 expect() / 448s. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat: v0.20.0 Cathedral II Layer 7 (A2) — two-pass structural retrieval The capstone of the retrieval-side upgrade. Layer 5 captured edges at chunk time; Layer 7 uses them. Given a query like "how does searchKeyword handle N+1", standard hybrid search returns the function body; A2 expansion additionally surfaces: - the 3 functions that call it (1-hop) - the 2 functions it calls (1-hop) - the anchor set's neighbors' neighbors (2-hop, optional) All ranked together with 1/(1+hop) score decay. One walk. Code-aware brain, not RAG-over-code. Default OFF per codex F5. Activation: - `--walk-depth N` (1 or 2) walks N hops from the anchor set. - `--near-symbol <qualified-name>` adds chunks matching the symbol's qualified name as extra anchors, enabling "expand around this specific symbol" without a keyword query. Caps (codex F5): - depth capped at 2 (max blast radius). - neighbor cap 50 per hop (high-fan-out protection: console.log has 100k callers and should not flood the result set). - per-page dedup cap lifts from 2 → min(10, walkDepth × 5) when walking — structural neighbors from the same class are the point. - src/core/search/two-pass.ts (new): expandAnchors walks code_edges_chunk + code_edges_symbol, hydrating unresolved edges by matching symbol_name_qualified on lookup. hydrateChunks fetches SearchResult rows for expanded chunk IDs. - src/core/search/hybrid.ts: gate the two-pass step on opts.walkDepth > 0 OR opts.nearSymbol set. Expansion runs before dedup so neighbors survive; dedup cap widens when walking. Best-effort — expansion failure falls back to base hybrid retrieval. - src/core/operations.ts: query op params gain near_symbol (string) + walk_depth (number). Handler threads both into hybridSearch opts. - test/two-pass.test.ts: 8 cases (walkDepth 0/1/2/5-clamp, nearSymbol anchoring, hydrateChunks round-trip, operation schema). Full CI: 2401 pass / 250 skip / 0 fail / 6332 expect() / 449s. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat: v0.20.0 Cathedral II Layer 11 (E1) — BrainBench code sub-category tests Pins the retrieval-quality behaviors Layer 5 and Layer 6 added, so any accidental regression surfaces on CI rather than silently eroding search quality. Sub-categories: - call_graph_recall — importCodeFile captures calls edges end-to-end; getCallersOf + getCalleesOf round-trip through real edge extraction; re-import idempotency via codex SP-2 per-chunk invalidation. -…
5 tasks
garrytan
added a commit
that referenced
this pull request
Apr 26, 2026
Bumped 0.22.0 → 0.26.0 to slot above master's v0.21 chain with headroom for v0.23/0.24/0.25 to ship from master between now and merge. Security fixes (all from CSO finding writeups): #1 cookie-parser middleware — admin dashboard auth was silently broken. Express 5 has no built-in cookie parsing; req.cookies was always undefined, so /admin/login set the cookie but every subsequent admin API call returned 401. Added cookie-parser@^1.4.7 + @types/cookie-parser as direct + dev deps. app.use(cookieParser()) wired before CORS. #2 + #3 TOCTOU races — exchangeAuthorizationCode and exchangeRefreshToken used SELECT-then-DELETE, letting concurrent requests with the same code/refresh both pass the SELECT before either ran DELETE, both issuing token pairs. Switched to atomic DELETE...RETURNING. RFC 6749 §10.5 (codes) + §10.4 (refresh detection) violations closed. Added regression tests that fire 10 concurrent exchanges and assert exactly one wins — both pass. #5 pgArray escape + DCR redirect_uri validation — pgArray() did `arr.join(',')` with no escaping, so an element containing a comma would be parsed by Postgres as TWO array elements. With --enable-dcr on, this could smuggle a second redirect_uri into a registered client and steal auth codes. Now every element is double-quoted with `"` and `\` escaped. Added validateRedirectUri() per RFC 6749 §3.1.2.1: redirect_uris must be https:// or loopback (localhost / 127.0.0.1). Wired into the DCR registerClient path; CLI registration trusts the operator and bypasses. Regression test confirms a comma-in-URI element round-trips as 1 element, not 2. #6 --public-url flag — issuerUrl was hardcoded to http://localhost:{port}. Behind reverse proxies / ngrok / production deploys, the issuer claim in tokens wouldn't match the discovery URL clients hit (RFC 8414 §3.3). New --public-url URL flag on `gbrain serve --http`, propagates through serve.ts → serve-http.ts → ServeHttpOptions.publicUrl → issuerUrl. Startup banner surfaces the configured issuer. Findings #4 (admin requests filter dead code), #7 (admin register-client hardcoded grant_types), #8 (legacy token grandfathering posture) are documentation / minor functional fixes and are deferred per user direction. Tests: oauth.test.ts now 34 cases (was 27). 7 new: - single-use TOCTOU regression (10 concurrent code exchanges) - single-use TOCTOU regression (10 concurrent refresh exchanges) - redirect_uri http://localhost passes - redirect_uri https://example.com passes - redirect_uri http://example.com (non-loopback plaintext) rejected - redirect_uri non-URL rejected - redirect_uri with embedded comma stored as single element Files: - VERSION, package.json: 0.22.0 → 0.26.0 - CHANGELOG.md: heading + table + "To take advantage" + "pre-v0.22" → v0.26; new "Security hardening (post-/cso pass)" subsection at top of itemized changes; CLI flag list updated for --public-url. - src/core/oauth-provider.ts: pgArray escape, validateRedirectUri, registerClient enforces validation, DELETE...RETURNING in exchangeAuthorizationCode + exchangeRefreshToken. - src/commands/serve-http.ts: cookie-parser import + wire-up, publicUrl option, issuerUrl honors it, startup banner shows issuer. - src/commands/serve.ts: parses --public-url and threads through. - src/cli.ts: help text adds --public-url URL flag. - test/oauth.test.ts: +7 regression tests (now 34 total). - llms-full.txt: regenerated. Typecheck clean. 34 oauth + 14 cli tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
6 tasks
garrytan
added a commit
that referenced
this pull request
Apr 28, 2026
Issue #2 of the eng review: manageGitignore was defined and never invoked. Docs claimed "auto-managed by gbrain" — false. Users hit a .gitignore that never updated and committed db_only directories anyway. Wire-up: runSync now calls manageGitignore after each successful performSync return, in both watch and one-shot modes. Eng review pass-2 finding #1: skip on dry_run AND blocked_by_failures status. A sync that aborted partway has stale state; mutating .gitignore based on a partially-loaded config invites drift. Failure-skip test added (uses .gitignore-as-a-directory to simulate write failure; asserts warning fired and disk wasn't corrupted). Hardened manageGitignore itself with three additional behaviors: - GBRAIN_NO_GITIGNORE=1 escape hatch (D23) for shared-repo setups where a maintainer wants gbrain to leave .gitignore alone. - Submodule detection (D49). When repoPath/.git is a regular file (gitdir: ... pointer), the repo is a git submodule. Submodule .gitignore changes don't survive parent submodule updates, so we skip with an actionable warning ("add db_only directories to your parent repo's .gitignore manually"). - Graceful failure (D9). Read errors, write errors, and StorageConfigError (overlap from step 7) all log a warning and return — sync's primary job (moving data) shouldn't die because of a side-effect on .gitignore. manageGitignore is now exported (previously private) so the storage-sync test file can hit it directly without spinning up sync. 9 new test cases: no-op without gbrain.yml, no-op with empty db_only, happy-path append, idempotency (run twice, single entry), preservation of user-written rules, GBRAIN_NO_GITIGNORE skip, submodule skip, .git-directory normal path, write-failure graceful warning. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
4 tasks
garrytan
added a commit
that referenced
this pull request
Apr 30, 2026
) * feat: storage tiering — git-tracked vs supabase-only directories Brain repos scaling to 200K+ files. Bulk data (tweets, articles, transcripts) bloats git repos and slows operations. New storage config in gbrain.yml lets users declare git-tracked and supabase-only directories. Changes: - New config: storage.git_tracked and storage.supabase_only in gbrain.yml - gbrain sync auto-manages .gitignore for supabase-only paths - gbrain export --restore-only restores missing supabase-only files from DB - New gbrain storage status command shows tier breakdown - Config validation warns on conflicts - 8 tests passing, full docs at docs/storage-tiering.md Backward compatible — systems without gbrain.yml work unchanged. * feat: add getDefaultSourcePath() typed accessor (step 1/15) Single source of truth for "what brain repo are we operating against?" Replaces ad-hoc raw SQL in storage.ts:38 (Issue #3 of eng review). Used by both gbrain storage status and gbrain export --restore-only. Returns null on miss, throws on DB error. Composes with the existing resolveSourceId chain so it honors --source flag / GBRAIN_SOURCE env / .gbrain-source dotfile / longest-prefix CWD match / brain-level default. 4 new test cases covering happy path, missing local_path, DB error propagation, and CWD-prefix resolution priority. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix: replace gray-matter with dedicated YAML parser (step 2/15) The original storage-config.ts called gray-matter on a delimiter-less YAML file. Gray-matter only parses YAML inside `---` frontmatter blocks; without delimiters, it returns `{data: {}}`. Result: loadStorageConfig() always returned null, the entire feature was a silent no-op for every user. Original eng review's P0 confidence-9 finding (Issue #1). Replaces gray-matter with a small dedicated parser for the gbrain.yml shape (top-level `storage:` section, two array-valued nested keys). Yaml-lite was considered first, but its flat key:value design doesn't handle nested arrays. The dedicated parser is ~50 lines and trades expressiveness for zero-dep, predictable parsing of a file format we control. Adds the Issue #1B sanity warning (locked B): when gbrain.yml exists but has no storage section (or empty arrays), warn once-per-process so the user sees their config didn't take. The single test that would have caught the original P0 — write a real gbrain.yml, call loadStorageConfig, assert non-null — now exists. Also tightens loadStorageConfig per D36: distinguishes "absent" (silent null) from "unreadable" (throws). The previous code silently swallowed read errors, hiding broken installs. 8 new test cases: real-disk happy path, comments + blank lines, quoted values, missing storage section warning, empty section warning, once-per-process warning suppression, unreadable file behavior, and the existing helper tests (validation, tier matching, edge cases) all still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * refactor: rename storage keys to db_tracked/db_only (step 3/15) The vendor-specific names "supabase_only" and "git_tracked" hardcoded a backend (Supabase) into the config schema. gbrain ships two engines — PGLite and Postgres-via-Supabase. The canonical distinction is "lives in the brain DB only" vs "lives in the brain DB and on disk under git." Both work on either engine. Renamed throughout (Issue #4 of eng review): git_tracked → db_tracked supabase_only → db_only isGitTracked() → isDbTracked() isSupabaseOnly() → isDbOnly() StorageTier 'git_tracked'/'supabase_only' → 'db_tracked'/'db_only' Backward compatibility (D3 lock): loadStorageConfig accepts both shapes. Loader resolution order per the eng-review pass-2 finding: parse YAML → if canonical keys present use them, else if deprecated keys present map to canonical AND emit once-per-process deprecation warning → THEN run validation. Validation always sees the canonical shape so error messages reference db_tracked/db_only regardless of which keys the user wrote. The deprecation warning suggests `gbrain doctor --fix` for an automated rename (D72 — fix path lands in step 7). When both shapes coexist in one file, canonical wins and a stronger warning fires ("deprecated keys ignored — remove them"). Aliases isGitTracked/isSupabaseOnly kept for now to avoid churning the sync.ts / export.ts / storage.ts call sites in this commit; they'll be removed in a follow-up step. Storage.ts's tier-bucket initializers and output strings updated. ASCII output replaces unicode box-drawing per D10. gbrain.yml example file updated to canonical keys with explanatory comments. 2 new test cases: deprecated-key fallback (asserts both shapes load correctly with warning), canonical-wins-over-deprecated (asserts the "both shapes coexist" path). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat: add slugPrefix to PageFilters with engine-side filter (step 4/15) Issue #13 of the eng review: storage.ts and export.ts loaded every page in the brain (limit: 1_000_000) to check tier membership. On the 200K-page brains this feature targets, that's the wall-clock and memory landmine the feature exists to fix. Adds an optional `slugPrefix` field to PageFilters. Both engines implement it as `WHERE slug LIKE prefix || '%' ESCAPE '\'`, with literal escaping of LIKE metacharacters (%, _, \) so user-supplied prefixes like `media/x/` are treated as exact string prefixes. Performance: the (source_id, slug) UNIQUE constraint on the pages table gives both engines a btree index that supports LIKE-prefix range scans. An EXPLAIN on Postgres confirms the index range scan rather than a seq scan. PGLite has the same index shape via pglite-schema.ts. Consumers updated: - export.ts: --slug-prefix flag now goes engine-side (no in-memory .filter(...)). The --restore-only path queries each db_only directory with slugPrefix in a loop instead of one full-table scan, with seen-set deduplication and disk-existence check inline. - storage.ts: keeps the full-scan path because storage-status needs the "unspecified" bucket count, which can't be computed without enumerating every page. Comment notes that step 5 (single-walk filesystem scan) will reduce per-page disk syscall cost. 2 new test cases on PGLiteEngine: slugPrefix happy path (3 tier dirs, asserts only matching slugs return) and metacharacter escape regression (asserts safe/ doesn't match unrelated slugs). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * perf: single-walk filesystem scan via walkBrainRepo() (step 5/15) Issue #14 of the eng review: storage.ts called existsSync + statSync per-page in a synchronous loop. On a 200K-page brain that's 400K syscalls serialized. Wall-clock landmine. Adds src/core/disk-walk.ts with walkBrainRepo(repoPath) — one recursive readdirSync walk, builds a Map<slug, {size, mtimeMs}>. Storage.ts looks up each DB page in the map (O(1)) instead of stat-checking on demand. Slug derivation matches the pages-table convention: people/alice.md on disk becomes people/alice as the map key. Skipped during walk: - dot-directories (.git, .gbrain, .vscode, etc) — not part of the brain namespace - node_modules — guards against accidentally walking into imported repos - non-.md files (sidecar JSON, binaries) — tracked by the brain through the files table, not by slug Reusable: future commands (gbrain doctor's storage_tiering check, the optional autopilot tier-fix path) get the same walk for free. 9 new test cases: empty dir, nonexistent dir, top-level files, nested dirs, dot-dir skipping, node_modules skipping, non-.md filtering, size capture, mtimeMs capture. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix: path-segment matching for tier directories (step 6/15) Issue #5 + D6 of the eng review: tier matching used slug.startsWith(dir), which falsely matches 'media/xerox/foo' against 'media/x' if a user wrote the directory without a trailing slash. The new matcher requires the configured directory to end with `/` and treats it as a canonical path-segment ancestor: media/x/ matches media/x/tweet-1 ✓ media/x/ doesn't media/xerox/foo ✗ media/x refused media/x/tweet-1 (matcher requires trailing /) Non-canonical input (no trailing slash) is refused outright. Step 7's auto-normalizing validator converts user-written 'media/x' → 'media/x/' on load, so the matcher never sees non-canonical input from real configs. The behavior tested here is the strict matcher's contract. Regression test pins the media/xerox collision case explicitly. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat: auto-normalize trailing-slash, throw on tier overlap (step 7/15) D7+D8 of the eng review: validation was warnings-only. Users miss warnings. Now: - Cosmetic: missing trailing slash auto-corrected, one-time info note showing what changed ("normalized 2 storage paths: 'people' → 'people/', 'media/x' → 'media/x/'"). Once-per-process to keep noise low. - Semantic: same directory in both tiers throws StorageConfigError. Ambiguous routing — does media/ win as db_tracked or db_only? — is a real bug the user must fix. Caller propagates to the CLI for a clean exit-1 with actionable message. loadStorageConfig now applies normalize+validate after merging deprecated keys, so the path-segment matcher (step 6) only ever sees canonical trailing-slash directories. The pure validateStorageConfig kept for callers who want the warnings list without the auto-fix side effects (gbrain doctor's reporting path). 2 new test cases: auto-normalize round-trip with warning text assertion, overlap throws StorageConfigError. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix: wire manageGitignore into runSync, only on success (step 8/15) Issue #2 of the eng review: manageGitignore was defined and never invoked. Docs claimed "auto-managed by gbrain" — false. Users hit a .gitignore that never updated and committed db_only directories anyway. Wire-up: runSync now calls manageGitignore after each successful performSync return, in both watch and one-shot modes. Eng review pass-2 finding #1: skip on dry_run AND blocked_by_failures status. A sync that aborted partway has stale state; mutating .gitignore based on a partially-loaded config invites drift. Failure-skip test added (uses .gitignore-as-a-directory to simulate write failure; asserts warning fired and disk wasn't corrupted). Hardened manageGitignore itself with three additional behaviors: - GBRAIN_NO_GITIGNORE=1 escape hatch (D23) for shared-repo setups where a maintainer wants gbrain to leave .gitignore alone. - Submodule detection (D49). When repoPath/.git is a regular file (gitdir: ... pointer), the repo is a git submodule. Submodule .gitignore changes don't survive parent submodule updates, so we skip with an actionable warning ("add db_only directories to your parent repo's .gitignore manually"). - Graceful failure (D9). Read errors, write errors, and StorageConfigError (overlap from step 7) all log a warning and return — sync's primary job (moving data) shouldn't die because of a side-effect on .gitignore. manageGitignore is now exported (previously private) so the storage-sync test file can hit it directly without spinning up sync. 9 new test cases: no-op without gbrain.yml, no-op with empty db_only, happy-path append, idempotency (run twice, single entry), preservation of user-written rules, GBRAIN_NO_GITIGNORE skip, submodule skip, .git-directory normal path, write-failure graceful warning. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix: D5 resolution chain for --restore-only and storage status (step 9/15) D5 of the eng review: gbrain export --restore-only without --repo silently fell through to the regular export path, dumping every page in the database to the wrong directory. Hard regression risk. Now exits 1 with an actionable message when --restore-only has no --repo AND no configured default source. Resolution order: 1. Explicit --repo flag 2. Typed sources.getDefault() (reuses step 1's accessor) 3. Hard error — never fall through to cwd storage.ts:38 also bypassed BrainEngine with raw SQL and a bare try/catch (Issue #3 + Issue #9). Replaced with the same typed getDefaultSourcePath() — single source of truth, errors propagate cleanly to the user, no silent cwd fallback. Regular export (no --restore-only) keeps its current behavior per D26: exports include everything, --repo is optional. 4 new test cases on PGLite in-memory: - hard-errors with no --repo + no default - explicit --repo wins - falls back to sources default local_path - non-restore export does not require --repo Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * refactor: split storage.ts into pure data + JSON + human formatters (step 10/15) Issue #10 of the eng review: getStorageStatus and runStorageStatus mixed data gathering, JSON serialization, and human-readable output in one function. Hard to test, hard to reuse, mismatched the orphans.ts pattern that CLAUDE.md cites as the precedent. Now three pure functions + a thin dispatcher: getStorageStatus(engine, repoPath) — async, returns StorageStatusResult. Side effects: engine.listPages + one walkBrainRepo (Issue #14). Exported so MCP exposure (D14) and gbrain doctor (D13) can consume the same data without re-running the loop. formatStorageStatusJson(result) — pure, returns indented JSON. Stable contract on the StorageStatusResult shape, suitable for orchestrators. formatStorageStatusHuman(result) — pure, returns ASCII text (D10 — no unicode box-drawing). Composable into other commands later. runStorageStatus(engine, args) — thin dispatcher: parses --repo / --json, calls getStorageStatus, picks a formatter, prints. 8 new test cases on the formatters: JSON parse round-trip, null-config fallback, missing-files capped at 10 with rollup, ASCII-only assertion (D10 regression guard), warnings inline, configuration listing, disk- usage block omitted when zero bytes. The StorageStatusResult interface is now exported as a public type, so gbrain doctor's storage_tiering check can build its own findings from the same shape. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * types: distinct PageCountsByTier and DiskUsageByTier (step 11/15) Issue #11 of the eng review: pagesByTier (page counts) and diskUsageByTier (byte totals) shared the same structural type (Record<StorageTier, number>). Both are tier-keyed numeric maps but carry semantically different units. A future bug that swaps them at a call site (e.g., displaying disk bytes where the count belongs) wouldn't trip the compiler. Replaced with distinct nominal types via a brand field. Structurally identical at runtime (no overhead) but compile-time disjoint — TypeScript catches accidental cross-assignment. PageCountsByTier { db_tracked, db_only, unspecified } : numbers (count) DiskUsageByTier { db_tracked, db_only, unspecified } : numbers (bytes) Both initialized in getStorageStatus, both threaded into StorageStatusResult, both consumed by formatStorageStatusHuman / formatStorageStatusJson without further changes. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat: PGLite soft-warn + full lifecycle test (step 12/15) D4: storage tiering on PGLite is a partial feature. The "DB" the pages live in IS the local file gbrain uses for everything else, so "db_only" has no real offload effect. The .gitignore management still helps (keeps bulk content out of git history), so we warn and proceed — not refuse. Two warning sites (once-per-process each via module-local flags): - storage status: warns at runStorageStatus entry - sync: warns inside manageGitignore when engineKind='pglite' and config has db_only entries Both phrased actionably ("To get full tiering, migrate to Postgres with `gbrain migrate --to supabase`"). manageGitignore signature now takes an optional `engineKind` param. runSync passes engine.kind. Stand-alone callers (tests, future gbrain doctor --fix path) can omit it. New test: test/storage-pglite.test.ts — D8 + D4 lifecycle. 6 cases: engine.kind assertion, getStorageStatus loading gbrain.yml + reporting tier counts, manageGitignore PGLite-warn (once per process), Postgres no-warn, slugPrefix on PGLite, end-to-end (config + putPage + status + gitignore). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * chore: add trailing-newline CI guard (step 14/15) Issue #7 of the eng review: all four new files in the original storage-tiering branch lacked POSIX trailing newlines. Linters complain, git diffs phantom-flag every future edit. We've been adding newlines as each file landed; this commit catches the regression class. scripts/check-trailing-newline.sh: - sibling to check-jsonb-pattern.sh / check-progress-to-stdout.sh per CLAUDE.md's CI guard pattern - portable to bash 3.2 (macOS default; no mapfile, no associative arrays) - covers src/**, test/**, gbrain.yml, top-level *.md - reports each missing file by path and exits 1 Wired into `bun run test` between progress-to-stdout and typecheck. Also fixed docs/storage-tiering.md (pre-existing missing newline from the original branch — caught by the new guard on first run). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * docs: v0.23.0 — VERSION, CHANGELOG, README, CLAUDE.md, storage-tiering.md (step 15/15) VERSION → 0.23.0 (minor bump for new feature surface). CHANGELOG entry in Garry voice with the canonical format: - Two-line bold headline ("Storage tiering, finally working...") - Lead paragraph naming what was broken before and what users get now - "Numbers that matter" before/after table for the 6 things that actually changed - "What this means for your brain" closer - "To take advantage of v0.23.0" self-repair block (per CLAUDE.md convention) — 6 numbered steps users can follow - Itemized changes split into critical fixes / new+renamed surface / architecture cleanup / tests + CI guards CLAUDE.md "Key files" gains four new entries: storage-config.ts, disk-walk.ts, the v0.23.0 storage.ts shape, and gbrain.yml itself. README.md gains a new "Storage tiering" section between Skillify and Getting Data In with the canonical example + commands + link to the full guide. docs/storage-tiering.md rewritten end-to-end with canonical key names (db_tracked / db_only), v0.23.0 hardening details (idempotency, submodule detection, GBRAIN_NO_GITIGNORE, dry-run gating), the resolution chain for --restore-only, the auto-normalize + throw-on-overlap validator, and the PGLite engine note. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * test: e2e Postgres lifecycle for storage tiering (step 16/16) Per the v0.23.0 plan: full lifecycle E2E against real Postgres. - engine.kind === 'postgres' assertion - Full lifecycle: write 4 pages (1 db_tracked, 2 db_only, 1 unspecified) → getStorageStatus reports correct tier counts → human formatter renders → manageGitignore writes managed block → idempotency check → getDefaultSourcePath() resolves the configured local_path. - Container restart simulation: 2 db_only pages in DB, files missing on disk → status.missingFiles.length === 2 → slugPrefix engine filter on Postgres returns exactly the tier slugs. - slugPrefix index-based range scan regression: 50 media/x/* + 50 people/p-* pages → slugPrefix='media/x/' returns exactly 50. - getDefaultSourcePath returns null when default source has no local_path (the hard-error path that replaces the original silent cwd fallback). - manageGitignore on Postgres engine does NOT emit the PGLite soft-warn (cross-engine assertion). Skips gracefully when DATABASE_URL is unset, per CLAUDE.md E2E pattern. Run via: DATABASE_URL=... bun test test/e2e/storage-tiering.test.ts Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * chore: rebump version 0.23.0 → 0.22.9 Reverts the minor bump back to a patch-style version on the v0.22 line. Storage tiering ships within the v0.22.x train alongside the recent fix waves. Updates VERSION, package.json, CHANGELOG header + body refs, CLAUDE.md Key files annotations, README.md section heading, and the docs/storage-tiering.md backward-compat note. * chore: bump version 0.22.9 → 0.22.11 Sibling workspaces claimed v0.22.10 in the queue. This branch advances to v0.22.11 to keep the version monotonic on master. Updates VERSION, package.json, CHANGELOG header + body refs, CLAUDE.md Key files annotations, README.md section heading, and the docs/storage-tiering.md backward-compat note. * fix: address Codex pre-landing review findings (4 fixes) Codex found 4 real issues during pre-landing review of v0.22.11 diff: [P0] export --restore-only fell through to full export when storageConfig was null (no gbrain.yml present). On older or misconfigured brains, the recovery command would silently dump the entire database. src/commands/export.ts now refuses with an actionable error before any page query fires — matches the D5 lock spirit ("never silently fall through"). [P1] manageGitignore wire-up only fired when --repo was passed explicitly. performSync resolves the repo from sync.repo_path or sources.local_path, so the common `gbrain sync` path (after setup, no flag) never updated .gitignore. src/commands/sync.ts now uses the same source-resolver chain as the rest of /ship: opts.repoPath → getDefaultSourcePath → null. Fires in both watch and one-shot modes. [P2] getDefaultSourcePath only consulted sources.local_path, missing the legacy global sync.repo_path config key that pre-v0.18 brains use. Added a fallback to engine.getConfig('sync.repo_path') when the sources row has NULL local_path. Pre-v0.18 brains now work without forcing a `gbrain sources add . --path .` migration. [P2] sync --all multi-source loop never called manageGitignore even though src.local_path was already known. Each source now gets its own gitignore update on successful sync. Tests: - test/storage-export.test.ts: replaced the old "falls through to full export" test with one that asserts the new refusal path (storage-tiering config required for --restore-only). - test/source-resolver.test.ts: added a fallback test exercising the legacy sync.repo_path code path for pre-v0.18 brains. - All 78 storage-tiering tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * chore: regenerate llms.txt + llms-full.txt for v0.22.11 Per CLAUDE.md: "Run `bun run build:llms` after adding a new doc." The README's new Storage tiering section + the rewritten docs/storage-tiering.md changed the inlined bundle. test/build-llms.test.ts catches the drift and was failing on master pre-regen. * fix: typecheck error in disk-walk.ts (CI #73350475897) tsc --noEmit failed in CI because ReturnType<typeof readdirSync> with withFileTypes:true picks an overload union that includes Dirent<Buffer<ArrayBufferLike>>. Strict tsc treats entry.name as Buffer, so .startsWith / .endsWith / string comparisons all blew up. Annotate the variable as Dirent[] (string-based) and cast through unknown, matching the pattern sync.ts already uses for its own filesystem walk. Same runtime behavior; clean typecheck. Tests still 9/9. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> EOF --------- Co-authored-by: root <root@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
garrytan
added a commit
that referenced
this pull request
Apr 30, 2026
… (v0.23.0) (#462) * feat: dream_verdicts schema + engine methods Adds the v25 schema migration creating the dream_verdicts table (file_path, content_hash, worth_processing, reasons, judged_at; PRIMARY KEY (file_path, content_hash); RLS-enabled when running as a BYPASSRLS role). Distinct from raw_data (which is page-scoped) — transcripts being judged for synthesis aren't pages. The (file_path, content_hash) key means edited transcripts re-judge automatically. BrainEngine gains: - DreamVerdict + DreamVerdictInput types - getDreamVerdict(filePath, contentHash) → DreamVerdict | null - putDreamVerdict(filePath, contentHash, verdict) — ON CONFLICT upsert Both engines implement (postgres-engine.ts, pglite-engine.ts). This commit alone is functionally inert — nothing reads/writes the table yet. The synthesize phase (later commit) is the consumer. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat: trusted-workspace allow-list for subagent put_page Adds OperationContext.allowedSlugPrefixes — when set, put_page enforces slug membership in the allow-list instead of the legacy wiki/agents/<id>/... namespace. The trust signal is the SUBMITTER (PROTECTED_JOB_NAMES gates subagent submission so MCP can't reach this field), not the runtime ctx.remote flag — every subagent tool call has remote=true for auto-link safety, so basing trust on remote is incoherent. matchesSlugAllowList(slug, prefixes) helper supports glob suffix '/*' (recursive — wiki/originals/* matches ideas/foo/bar) and exact match for unsuffixed entries. put_page check shape: if (viaSubagent && allowedSlugPrefixes set) → allow-list check else if (viaSubagent) → existing namespace check (regression guard) else → no check (regular CLI) Auto-link is re-enabled for the trusted-workspace path so the cycle's extract phase doesn't have to recompute every edge after synthesize writes. Untrusted remote writes still skip auto-link as before. SubagentHandlerData.allowed_slug_prefixes is the wire field; the synthesize/patterns phases (later commit) populate it from a single source of truth in skills/_brain-filing-rules.json's dream_synthesize_paths.globs array. The model's tool schema description mirrors the allow-list so it writes correct slugs on the first try. IRON RULE security tests: - test/operations-allow-list.test.ts: allow-list ALLOW/REJECT, glob semantics, regression guard for the v0.15 namespace fallback when allow-list is unset, FAIL-CLOSED when subagentId is missing. - test/e2e/dream-allow-list-pglite.test.ts: end-to-end on PGLite, poisoned-transcript style write outside allow-list → REJECTED. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat: cycle scaffolding — 8-phase order + transcript discovery Extends ALL_PHASES from 6 → 8: synthesize between sync and extract, patterns between extract and embed. Codex finding #7: patterns MUST run after extract because subagent put_page sets ctx.remote=true and skips auto-link/timeline by default — extract is the canonical edge materialization step. Without that ordering, patterns reads stale graph state. Final order: lint → backlinks → sync → synthesize → extract → patterns → embed → orphans CycleOpts gains: - yieldDuringPhase callback — generic in-phase keepalive for long waits (synthesize fan-out, patterns roll-up). Renews cycle-lock TTL + worker job lock. Mirrors yieldBetweenPhases shape. - synthInputFile / synthDate / synthFrom / synthTo — forwarded to runPhaseSynthesize for the CLI's --input/--date/--from/--to flags. CycleReport.totals additively grows (no schema_version bump): transcripts_processed, synth_pages_written, patterns_written. src/core/cycle/transcript-discovery.ts is a pure filesystem walk: - .txt files only, sorted by path for determinism - date-prefixed basename filter (--date / --from / --to) - min_chars filter (default 2000) - exclude_patterns auto-wraps bare words as \b<word>\b regex (Q-3), power users may pass full regex with anchors - compileExcludePatterns is exported for unit tests Phase implementations land in the next commit; this one only adds the dispatcher slots so commit-by-commit bisect doesn't crash on import-not-found. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat: synthesize + patterns phases — gbrain dream actually dreams Synthesize phase (src/core/cycle/synthesize.ts) reads conversation transcripts from dream.synthesize.session_corpus_dir and writes brain-native pages: reflections to wiki/personal/reflections/..., originals to wiki/originals/ideas/..., timeline entries on existing people pages. Pipeline: 1. discoverTranscripts (filesystem walk + filters) 2. cooldown check via dream.synthesize.last_completion_ts config (default 12h; bypassed by --input/--date/--from/--to) 3. cheap Haiku verdict per transcript, cached in dream_verdicts table keyed by (file_path, content_hash) — backfill re-runs skip already-judged transcripts at zero cost 4. fan-out: one Sonnet subagent per worth-processing transcript dispatched with allowed_slug_prefixes (read from skills/_brain-filing-rules.json's dream_synthesize_paths.globs) and idempotency_key dream:synth:<file_path>:<content_hash> 5. wait via waitForCompletion; yieldDuringPhase ticks every child terminal so the cycle-lock TTL refreshes on long backfills 6. collect slugs from subagent_tool_executions for each child (codex finding #2: NOT pages.updated_at, which would pick up unrelated writes) 7. orchestrator dual-write — query each new page from DB, reverse-render via serializeMarkdown, write file to brain_dir. Subagent never gets fs-write access. 8. deterministic summary index page at dream-cycle-summaries/<date> (codex finding #4: slug shape is regex-compatible — no underscores, no .md extension) 9. write completion timestamp ONLY on successful runs Patterns phase (src/core/cycle/patterns.ts) runs after extract so the graph state is fresh. Single Sonnet subagent gathers reflections within dream.patterns.lookback_days (default 30); names a pattern only when ≥dream.patterns.min_evidence (default 3) reflections support it. Same allow-list path as synthesize. CLI flags on `gbrain dream` (src/commands/dream.ts): --input <file> ad-hoc transcript synthesis (implies --phase synthesize; bypasses cooldown) --date YYYY-MM-DD restrict synthesize to one date --from <d> --to <d> backfill range --dry-run runs Haiku verdict (cached), skips Sonnet synthesis. NOT zero LLM calls (codex #8). Conflict detection: --input + --date/--from/--to exits 2. ISO 8601 date format validated; range start > end exits 2. Auto-commit / push deferred to v1.1 (codex finding #5). v1 writes files to brain_dir; user or autopilot handles git. Tests: - test/cycle-patterns.test.ts: structural assertions on the patterns phase (queue + waitForCompletion wired, allow-list threading, subagent_tool_executions provenance, no raw_data dependency). - test/dream-cli-flags.test.ts: argv parsing, conflict detection, ISO date validation, --input implies --phase synthesize, dry-run semantics doc string. - test/e2e/dream-synthesize-pglite.test.ts: 8 cases on PGLite in-memory exercising not_configured, empty corpus, no API key skip path, dry-run, cooldown active vs --input bypass, and the dream_verdicts cache hit path. Per-test rig isolation (each test creates and tears down its own engine) avoids cross-test PGLite WASM contention. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * docs: dream cycle v0.27.0 — skills, CLAUDE.md, migration, changelog - skills/maintain/SKILL.md: synthesize + patterns phases documented with quality bar (Iron Law for synthesis), trust boundary, idempotency, cooldown semantics, CLI invocation patterns. New triggers added so "process today's session" / "synthesize my conversations" route here. - skills/RESOLVER.md: dream cycle triggers route to maintain. - skills/_brain-filing-rules.md: directory table for the five output types (reflections, originals, patterns, people enrichment, cycle summary) with slug shape per row; Iron Law repeated. - skills/migrations/v0.27.0.md: agent-readable migration narrative. Schema migration v25 runs automatically on `gbrain apply-migrations`; synthesize ships disabled by default — opt-in via dream.synthesize.session_corpus_dir + dream.synthesize.enabled. - CLAUDE.md: file inventory updated with new files (cycle/synthesize.ts, cycle/patterns.ts, cycle/transcript-discovery.ts), the 8-phase ordering, the trusted-workspace allow-list trust model, and the v25 schema migration line in the migrate.ts entry. - VERSION: 0.20.4 → 0.27.0 - CHANGELOG.md: v0.27.0 release-summary section per CLAUDE.md voice rules (numbers that matter table, what-this-means closer, "to take advantage of" block), followed by the itemized changes. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * test: add patterns E2E + 8-phase cycle E2E + bump synth-cooldown timeouts Two new E2E test files on PGLite (no DATABASE_URL or API key required): - test/e2e/dream-patterns-pglite.test.ts (6 cases) — exercises runPhasePatterns skip paths against a real engine: disabled, default-enabled-but-insufficient-evidence, no-API-key, dry-run. Sibling of dream-synthesize-pglite.test.ts; same per-test rig pattern for engine isolation. - test/e2e/dream-cycle-eight-phase-pglite.test.ts (5 cases) — end-to-end runCycle with the v0.27 8-phase order. Asserts: ALL_PHASES is the documented 8 phases in the right sequence, the dry-run report's phases array preserves that order, CycleReport.totals carries the new transcripts_processed / synth_pages_written / patterns_written fields, --phase synthesize and --phase patterns each run only that phase, and synthInputFile is plumbed correctly through runCycle to runPhaseSynthesize. Bump per-test timeout to 30s on the two synthesize-cooldown E2E tests that create two PGLite engines back-to-back. Default Bun 5s budget is tight under sustained suite pressure (PGLite WASM init costs ~1-2s per engine on macOS); each test passes alone but flakes in the full E2E suite. The third arg `30_000` is Bun's standard test-timeout knob. Full E2E suite (test/e2e/) now: 86 pass / 0 fail / 258 skip. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix: ship-prep — typecheck fixes, llms.txt regen, 8-phase test update - src/core/cycle/synthesize.ts + patterns.ts: PageType 'default' → 'note' (TS strict typecheck rejected 'default'; 'note' is a valid PageType for orchestrator-written summary index pages and reverse-render fallback). - src/core/pglite-engine.ts: re-import DreamVerdict + DreamVerdictInput types after the master merge dropped them from the import line. - test/e2e/dream-allow-list-pglite.test.ts: ToolCtx now requires remote: true literal; thread it through every put_page tool call. - test/e2e/dream-patterns-pglite.test.ts: PageType 'default' → 'note' in the seedReflections helper. - test/core/cycle.test.ts: bump expected hook-call count + phase count 6 → 8 to match v0.27 ALL_PHASES extension. - llms-full.txt: regenerate against the updated CHANGELOG + CLAUDE.md so the committed snapshot matches what the generator now produces. Full bun test suite: 2793 pass / 0 fail / 258 skip (3051 tests, 177 files). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * docs: update README + INSTALL_FOR_AGENTS for v0.27.0 dream cycle README: maintain skill row mentions synthesize/patterns; gbrain dream command-reference block describes the 8-phase pipeline and the new --input/--date/--from/--to flags. INSTALL_FOR_AGENTS: dream cycle bullet calls out v0.27 conversation synthesis + cross-session pattern detection. Co-Authored-By: Claude Opus 4.7 <[email protected]> * chore: renumber v0.27.0 → v0.23.0 Master is at v0.22.5; v0.23.0 is the next natural slot for the dream-cycle synthesize + patterns release. Bulk rename across VERSION, package.json, CHANGELOG, migration file, source comments, skills, and llms.txt bundles. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * test(e2e): bump cycle.test.ts phase count 6 → 8 The dry-run full-cycle test asserted 6 phases. v0.23 added synthesize and patterns, bringing the total to 8. The unit-side equivalent (test/core/cycle.test.ts) was already updated; this catches the E2E sibling that surfaced after the latest master merge. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
garrytan
added a commit
that referenced
this pull request
May 3, 2026
…oard (#358) * feat: OAuth 2.1 schema tables + shared token utilities Add oauth_clients, oauth_tokens, oauth_codes tables to both PGLite and Postgres schemas. Migration v5 creates tables for existing databases. PGLite now includes auth infrastructure (access_tokens, mcp_request_log, OAuth tables) because `serve --http` makes it network-accessible. Extract hashToken() and generateToken() to src/core/utils.ts for DRY reuse across auth.ts and oauth-provider.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: GBrainOAuthProvider — MCP SDK OAuthServerProvider implementation Implements OAuthServerProvider backed by raw SQL (PGLite or Postgres). Supports client credentials, authorization code with PKCE, token refresh with rotation, revocation, and legacy access_tokens fallback. Key decisions from eng review: - Uses raw SQL connection, not BrainEngine (OAuth is infrastructure) - All tokens/secrets SHA-256 hashed before storage - Legacy tokens grandfathered as read+write+admin - sweepExpiredTokens() wrapped in try/catch (non-blocking startup) - Client credentials: no refresh token per RFC 6749 4.4.3 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: scope + localOnly annotations on all 30 operations Add AuthInfo, scope ('read'|'write'|'admin'), and localOnly fields to Operation interface. Per-operation audit: - 14 read ops, 9 write ops, 2 admin ops, 4 admin+localOnly ops - sync_brain, file_upload, file_list, file_url: admin + localOnly - Scope enforcement happens in serve-http.ts before handler dispatch Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: HTTP MCP server with OAuth 2.1 + 27 OAuth tests gbrain serve --http starts Express 5 server with: - MCP SDK mcpAuthRouter (authorize, token, register, revoke endpoints) - Custom client_credentials handler (SDK doesn't support CC grant) - Bearer auth + scope enforcement on /mcp tool calls - Admin dashboard auth via HTTP-only cookie + bootstrap token - SSE live activity feed at /admin/events - DCR default OFF (--enable-dcr to enable) - Rate limiting on /token (50/15min) - localOnly operations excluded from HTTP CLI: gbrain serve --http [--port 3131] [--token-ttl 3600] [--enable-dcr] Dependencies: [email protected], [email protected], [email protected] SDK pinned to exact 1.29.0 (was ^1.0.0) 27 new tests covering OAuth provider, scope enforcement, auth code flow, refresh rotation, token revocation, legacy fallback, and sweep. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: React admin dashboard — 7 screens, dark theme, Krug-designed Admin SPA at /admin with client-side routing (#login, #dashboard, #agents, #log). Built with Vite + React, served from admin/dist/. Screens: - Login: one field, one button, zero happy talk - Dashboard: metrics bar, SSE live activity feed, token health panel - Agents: table with scopes/badges, + Register Agent button - Register: modal form (name, scopes), 3 mindless choices - Credentials: full-screen modal, copy buttons, download JSON, warning - Request Log: paginated table (50/page), time-relative timestamps - Agent Detail: slide-out drawer, config export tabs (Perplexity/Claude/JSON) Design tokens: #0a0a0f bg, Inter + JetBrains Mono, 4-32px spacing. Build: bun run build:admin (Vite, 65KB gzipped). Admin API: /admin/api/register-client endpoint for dashboard registration. SPA serving: Express static + index.html fallback for client-side routing. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * chore: add admin SPA lockfile Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * chore: bump version and changelog (v1.0.0.0) Milestone release: multi-agent GBrain with OAuth 2.1, HTTP server, and React admin dashboard. See CHANGELOG.md for details. Co-Authored-By: Claude Opus 4.7 <[email protected]> * docs: update project documentation for v1.0.0.0 Sync README, CLAUDE.md, and docs/mcp/ with the OAuth 2.1 + HTTP server + admin dashboard surface that shipped in v1.0.0.0. - README.md: new "Remote MCP with OAuth 2.1" section covering gbrain serve --http, admin dashboard, scoped operations, legacy bearer fallback; add serve --http + auth notes to the commands reference. - CLAUDE.md: add src/commands/serve-http.ts, src/core/oauth-provider.ts, admin/ directory as key files; document scope + localOnly additions to Operation contract; add oauth.test.ts (27 cases) to the test list; add v1.0.0 key-commands section clarifying that OAuth client registration is via the /admin dashboard or SDK (no CLI subcommand). - docs/mcp/DEPLOY.md: promote --http as the recommended remote path, add OAuth 2.1 Setup section, list ChatGPT in supported clients, remove the "not yet implemented" footer. - docs/mcp/CHATGPT.md (new): unblocks the P0 TODO. Full ChatGPT connector setup via OAuth 2.1 + PKCE. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat: wire gbrain auth subcommand with OAuth register-client Previously auth.ts was a standalone script invoked via `bun run src/commands/auth.ts`. CHANGELOG and README documented `gbrain auth ...` commands that didn't actually work. - Export `runAuth(args)` from auth.ts (keeps standalone entry intact via `import.meta.url === file://${process.argv[1]}` check) - Add `auth` to CLI_ONLY + dispatch in handleCliOnly - New subcommand `gbrain auth register-client <name> [--grant-types] [--scopes]` wraps GBrainOAuthProvider.registerClientManual - Lazy DB check: only subcommands that need DATABASE_URL error out Now the documented CLI flow works end to end: gbrain auth register-client perplexity --grant-types client_credentials --scopes "read write" gbrain serve --http --port 3131 Co-Authored-By: Claude Opus 4.7 <[email protected]> * docs: reflect wired gbrain auth register-client CLI After /ship, the doc subagent wrote docs assuming `gbrain auth register-client` did not exist (it said so explicitly in CLAUDE.md:184). A follow-up commit (c4a86ce) wired it into src/cli.ts + src/commands/auth.ts. These docs were now contradicting reality. - CLAUDE.md: removed "There is no gbrain auth register-client CLI subcommand" claim, documented the three registration paths (CLI / dashboard / SDK). - README.md: replaced `bun run src/commands/auth.ts` hint with `gbrain auth create|list|revoke|test` and `gbrain auth register-client`. - docs/mcp/DEPLOY.md: added CLI registration example above the programmatic example. - TODOS.md: moved "ChatGPT MCP support (OAuth 2.1)" P0 item to Completed with v1.0.0.0 completion note. Closes the P0 that had been blocking the "every AI client" promise since v0.6. Co-Authored-By: Claude Opus 4.7 <[email protected]> * fix: enable RLS on OAuth tables + loosen v24-exact test assertion CI Tier 1 (Mechanical) was failing on 4 E2E tests after the v0.18.1 RLS hardening landed on master (PR #343). Our v25 oauth_infrastructure migration adds 3 new public tables (oauth_clients, oauth_tokens, oauth_codes) but didn't enable RLS, so gbrain doctor's new check flagged them and the "RLS on every public table" assertion failed. Fixes: - src/schema.sql: ALTER TABLE ... ENABLE ROW LEVEL SECURITY for the 3 OAuth tables inside the existing BYPASSRLS-gated DO block (fresh installs). - src/core/migrate.ts v25: append a BYPASSRLS-gated DO block after the OAuth CREATE TABLE statements (existing installs on upgrade). Mirrors the v24 rls_backfill gating pattern — RAISE WARNING if the current role lacks BYPASSRLS, so migrations don't silently lock the operator out. - src/core/schema-embedded.ts: regenerated via `bun run build:schema`. - test/e2e/mechanical.test.ts: one unrelated v24 test asserted the post- migration version equals exactly '24'. That breaks when any later migration exists (like our v25). Relaxed to `>= 24` since the test's intent is "v24 didn't abort the chain", not "v24 is the final version". Verified locally: 78/78 E2E tests pass against real Postgres 16 + pgvector. Co-Authored-By: Claude Opus 4.7 <[email protected]> * chore: regenerate llms-full.txt for v1.0.0 docs CI test/build-llms.test.ts > committed llms.txt + llms-full.txt match current generator output failed. The committed llms-full.txt was built before the v1.0.0 doc updates landed (OAuth 2.1 README section, new docs/mcp/CHATGPT.md, CLAUDE.md serve-http references, etc.), so the regen-drift guard flagged it. Ran `bun run build:llms`. llms.txt is unchanged (skinny index still matches); llms-full.txt picks up 166 net-new lines of bundled content. Co-Authored-By: Claude Opus 4.7 <[email protected]> * connected-gbrains PR 0 — minimal runtime (mounts, registry, aggregated RESOLVER) (#372) * feat(mounts): connected-gbrains PR 0 foundation — registry + resolver + CLI Lays the foundation for connected gbrains (v0.19.0) per the approved plan. This is PR 0 — minimal runtime for direct-transport, path-mounted brains. What this slice ships: - src/core/brain-registry.ts — keyed BrainRegistry with lazy engine init, schema-validated mounts.json loader, DuplicateMountPathError (load-bearing identity check per Codex finding #9 correction), UnknownBrainError with actionable available-id list. Pure: no AsyncLocalStorage, no singleton mutation. ~280 LOC. - src/core/brain-resolver.ts — 6-tier brain-id resolution mirroring v0.18.0's source-resolver.ts so agents learn ONE mental model: 1. --brain <id> 2. GBRAIN_BRAIN_ID env 3. .gbrain-mount dotfile 4. longest-path match over registered mounts 5. (reserved v2 default) 6. 'host' fallback Orthogonal to --source: --brain picks which DB, --source picks the repo within that DB. Corruption-resistant: mounts.json load failures fall through to 'host' instead of breaking every CLI invocation. - src/commands/mounts.ts — `gbrain mounts add|list|remove` (direct transport only). Validates on add (path exists on disk, id regex, no dupes). WARNS but does not block on same db_url/db_path across ids (teams may legitimately alias a remote brain). Password redaction in list output. Atomic write via temp+rename. 0600 perms. PR 1 adds pin/sync/enable; PR 2 adds --mcp-url + OAuth. - src/cli.ts — wires `gbrain mounts` into handleCliOnly (no DB required for the config-only subcommands). - test/brain-registry.test.ts (28 cases): schema validation across every malformed-input branch, ALS-free resolution, duplicate id + path detection, disabled-mount exclusion, UnknownBrainError context. - test/brain-resolver.test.ts (22 cases): priority order (explicit > env > dotfile > path-prefix > fallback), dotfile walk-up, malformed dotfile recovery, longest-prefix match, sibling-path false-positive guard, loader-failure defense. - test/mounts-cli.test.ts (17 cases): parseAddArgs surface, redactUrl, atomic write, add/list/remove roundtrip via temp HOME. 67 new tests, all green. Typecheck clean. Depends on mcp-key-mgmt (base branch) for the OAuth/scope annotations that PR 2 will leverage. Next in this branch: PR 0 still needs (a) the deep host-brain-bias audit (postgres-engine internal singleton fallback + a few operations.ts callers), (b) OperationContext threading to make ctx.brainId populated at dispatch, (c) composeResolvers + composeManifests, (d) aggregated ~/.gbrain/mounts-cache/ for host-agent runtime ownership. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * docs(mounts): brains-and-sources mental model + agent routing convention Two orthogonal axes organize GBrain knowledge. Users AND agents need to understand both, or queries misroute silently. --brain → WHICH DATABASE (host + mounts) --source → WHICH REPO IN DB (v0.18.0 sources: wiki, gstack, ...) Both axes use the same 6-tier resolution (explicit > env > dotfile > path-prefix > default > fallback), so learning one teaches both. Ships: - docs/architecture/brains-and-sources.md — canonical mental model doc. Covers four topologies with ASCII diagrams: 1. Single-person developer (one brain, one source) 2. Personal brain with multiple repos (one brain, N sources) 3. Personal + one team brain mount (2 brains) 4. Senior user with multiple team memberships (N mounted team brains alongside personal) — the CEO-class topology Explicit "when to move each axis" decision table. Generic example names throughout per the project's privacy rule. - skills/conventions/brain-routing.md — agent-facing decision table. Rules for when to switch brain (team-owned question, explicit name, data owner changes) vs switch source (working in a repo, topic scoped to one repo). Cross-brain federation is latent-space only in v0.19 — the agent fans out; the DB never does. Anti-patterns listed: silent brain jumps, writing to host when data is team-owned, missing brain prefix in citations, ignoring .gbrain-mount dotfiles. - CLAUDE.md — adds "Two organizational axes (read this first)" section at the top pointing at both new docs. - AGENTS.md — adds brains-and-sources.md + brain-routing.md to the "read this order" (positions 3 and 4, before RESOLVER.md). - skills/RESOLVER.md — adds brain-routing.md to the Conventions section so it appears alongside quality.md, brain-first.md, subagent-routing.md. No code changes. Pre-existing check-resolvable warnings unchanged (2 warnings on base unrelated to this work). 67 PR-0 tests still green. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(mounts): thread brainId through OperationContext + subagent chain PR 0 plumbing for connected gbrains. Adds an optional brainId field that identifies which database an operation targets and ensures subagents inherit the parent job's brain instead of process-wide defaults. No dispatch-path changes in this commit — that is PR 1 (registry wiring at MCP + CLI entry points). The fields exist so callers can set them now and downstream code respects them. Changes: - src/core/operations.ts: OperationContext grows `brainId?: string`. Optional for back-compat. 'host' is the implicit default when absent. Orthogonal to v0.18.0's source_id (source = which repo within the brain, brain = which database). See docs/architecture/brains-and-sources.md. - src/core/minions/types.ts: SubagentHandlerData gains `brain_id?: string`. Parent jobs set this when submitting a child subagent to lock the child into a specific brain. Omitted = host (unchanged behavior). - src/core/minions/handlers/subagent.ts: buildBrainTools call site reads data.brain_id and passes it through. Child subagents spawned from this handler will see the same brainId unless they override in their own data. - src/core/minions/tools/brain-allowlist.ts: BuildBrainToolsOpts + OpContextDeps grow brainId; buildOpContext stamps it on every OperationContext the subagent builds for tool calls. Addresses Codex finding #6 (brain-allowlist hardwired parent config without brain awareness, so switching brain only in subagent.ts was not enough). Tests: 166 affected tests green (subagent suite + minions + brain registry + resolver). Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(mounts): composeResolvers + composeManifests + aggregated cache The runtime ownership seam for connected gbrains (Codex finding #3 from plan review): check-resolvable.ts VALIDATES RESOLVER.md; it does not DISPATCH skills. Host agents (Wintermute/OpenClaw/Claude Code) read skills/RESOLVER.md directly to route user requests. Without an aggregated resolver, mounted team brains cannot contribute skills to the host agent's routing table. This commit adds the aggregation: - src/core/mounts-cache.ts (NEW): pure composeResolvers + composeManifests functions plus filesystem writers for ~/.gbrain/mounts-cache/. The aggregated files carry every host skill plus every mount skill, namespace-prefixed (e.g. `yc-media::ingest`). Host skills always beat a same-named mount skill (locked decision 1); bare-name collisions between two mounts surface as structured ambiguity info so doctor can warn (PR 1). Also addresses Codex finding #8: manifests compose alongside the resolver, else doctor conformance breaks on remote skills. - src/commands/mounts.ts: refreshMountsCache() called on `mounts add` and `mounts remove` (the latter clearing the cache entirely when the last mount goes away). Uses findRepoRoot() to locate the host skills dir; skips with a stderr note when run outside a gbrain repo so the user isn't confused by a "cache not refreshed" error in the wrong cwd. - test/mounts-cache.test.ts (NEW): 23 unit tests covering empty world, host-only, single mount, two-mount ambiguity, host-shadows-mount, disabled mount excluded, missing RESOLVER.md is a no-op, manifest composition with same-name collision, render shape, atomic rewrite, clear on missing dir. Output format for ~/.gbrain/mounts-cache/RESOLVER.md adds a Brain column so host agents can see which brain each trigger routes to at a glance, plus Shadows and Ambiguous sections when those conditions exist. Tests: 90 PR 0 tests green (brain-registry + resolver + mounts-cache + mounts-cli). Full suite regression pending in task 11. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(mounts): force instance-level pool for mount brains + CI guard Closes the silent-singleton-share bug Codex flagged as finding #1 from the plan review: two direct-transport mounts with different Postgres URLs would both fall through postgres-engine.ts's `get sql()` getter to db.getConnection() and quietly share whichever singleton connected first. Your yc-media writes end up in garrys-list or vice versa. No error at the call site — just wrong data. The fix: - src/core/brain-registry.ts: initMountBrain now passes poolSize when calling engine.connect(). That forces postgres-engine.ts:33-60 down the instance-level path (setting this._sql) instead of the module singleton path (calling db.connect). Hard-coded 5 for PR 0 — per-mount override is PR 1. PGLite ignores poolSize (no pool concept), so this is Postgres-specific. Host brain still uses the singleton path via initHostBrain (unchanged). That is fine for PR 0: the singleton is "the host's one connection" by definition. PR 1 removes the singleton entirely once every CLI command is engine-injectable. - scripts/check-no-legacy-getconnection.sh (NEW): CI grep guard against new db.getConnection() / db.connect() calls landing in src/core/ or src/commands/ (the multi-brain dispatch surface). Has an explicit ALLOWED list grandfathering today's legitimate callers, each marked "PR 1 refactors" so the list shrinks over time. Skips comment lines so the grep doesn't trip on doc references to the old pattern. - package.json: scripts.test chains the new guard after the existing check-jsonb-pattern + check-progress-to-stdout guards. `bun run test` now fails the build on singleton regression. Tests: 295 affected pass (registry, resolver, mounts-cache, mounts-cli, minions, pglite-engine). Typecheck clean. CI guard reports "ok: no new singleton callers" on current tree. Left for PR 1: remove the singleton fallback in postgres-engine.ts's `get sql()` entirely; refactor src/commands/doctor.ts, files.ts, repair-jsonb.ts, serve-http.ts, init.ts, and the 3 localOnly ops in operations.ts (file_list, file_upload, file_url) to accept ctx.engine explicitly. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix(mounts): codex review findings — namespace survives shadow + atomic tmp names + honest PR 0 docstrings Codex outside-voice review on PR #372 found 5 issues. Real bugs fixed, overclaims rewritten. Details: P2 (real bug): composeResolvers and composeManifests were silently dropping mount entries when a host skill shared the short name, which made the namespace-qualified form `<mount>::<skill>` unreachable once host defined the same short name. That defeated the entire namespace-disambiguation model — if host had `ingest`, no mount could ship an `ingest` skill even with explicit `yc-media::ingest`. Fix: always keep namespace-qualified mount entries in the composed output. Shadow tracking moves to metadata (`shadows[]`) that doctor can warn on, but never drops routing. Before: host ingest + yc-media ingest → only 1 entry (host), yc-media::ingest unreachable After: host ingest + yc-media ingest → 2 entries: bare `ingest` = host, `yc-media::ingest` = mount Verified live: gbrain mounts add of a mount with `ingest` now shows `team-demo::ingest` alongside host `ingest` in the aggregated manifest. P1 (real bug): writeMountsFile + writeMountsCache used fixed `.tmp` filenames. Two concurrent `gbrain mounts add` invocations (e.g. from parallel terminals or CI) would clobber each other's temp file and one writer's update would be lost. Fix: tmp filenames include `process.pid + random suffix` so every writer has its own scratch file. The atomic rename is self-contained per-writer. (Full lock + read-modify- write safety deferred to PR 1 under `gbrain mounts sync --lock`.) P1 (honesty): `SubagentHandlerData.brain_id` + `BuildBrainToolsOpts.brainId` docstrings claimed child jobs inherit the parent's brain and brain tools target the resolved brain. True for the `ctx.brainId` field only — `ctx.engine` is still the worker's base engine at dispatch time because `buildOpContext` doesn't yet do the registry lookup, and `gbrain agent run` doesn't yet accept `--brain` to populate the field on submission. Rewrote both docstrings to state the PR 0 behavior explicitly (field plumbed, engine routing is PR 1) so nobody reads the code thinking multi-brain subagents already work. Also cleaned up two `require('fs')` runtime imports left over from the initial PR — swapped for ESM named imports (renameSync). Pre-existing style issue surfaced by the self-review pass. Tests: 90 PR-0 tests pass. Updated two shadow-related test cases to assert the corrected semantics (both entries survive, host wins bare name, namespace form routes to mount). Not fixed in this commit (documented as known PR 0 limitations): - `file_list` / `file_upload` / `file_url` in operations.ts still hit the singleton (localOnly + admin, never reachable from HTTP MCP — safe in practice, refactor in PR 1 alongside command-level cleanups). - writeMountsCache's two-file swap (RESOLVER.md + manifest.json) is not atomic across files; readers can briefly observe mismatched pairs. Acceptable because the cache is recomputable at any time from mounts.json. Generation-directory swap is PR 1 work. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix(tests): bump hook timeouts for 21-migration PGLite init under full-suite load Root cause of 19 pre-existing full-suite flakes (CHANGELOG v0.18.0 noted "17 pre-existing master timeouts"): every PGLite test does beforeAll/beforeEach(async () => { engine = new PGLiteEngine(); await engine.connect({}); await engine.initSchema(); // runs 21 migrations through v0.18.2 }); In isolation this takes ~5s. Under full-suite contention (128 files, process-shared FS and CPU) it exceeds bun's default 5000ms hook timeout, beforeEach times out, engine stays undefined, then afterEach crashes with `TypeError: undefined is not an object (evaluating 'engine.disconnect')`. That single hook failure reports as the whole test "failing" even though the test body never executed, which is why the failure count sometimes looked inflated compared to the number of genuinely-broken tests. Fix applied across 7 test files: - Raise setup hook timeout to 30_000 (6x the default) — gives migration init enough headroom even under worst-case load without masking real regressions in a post-migration test. - Raise teardown hook timeout to 15_000 — engine.disconnect() is usually fast but can stall when PGLite's WASM runtime is still completing a migration at shutdown. - Add `if (engine) await engine.disconnect()` guard so afterEach doesn't double-fault when beforeEach already failed. This was the source of the opaque "(unnamed)" failures — they were disconnect crashes, not test-body failures. Files: test/dream.test.ts (5 beforeEach + 5 afterEach blocks) test/orphans.test.ts (1 pair) test/brain-allowlist.test.ts (1 pair) test/oauth.test.ts (1 pair) test/extract-db.test.ts (1 pair) test/multi-source-integration.test.ts (1 pair) test/core/cycle.test.ts (1 pair) Results on the merged PR 0 branch: Before: 2175 pass / 20 fail / 3 errors After: 2281 pass / 0 fail / 0 errors (+106 tests running that were previously blocked by the timed-out hooks) No changes to production code. No test assertions changed. Just timeout-bump + null-guard discipline that should have been in these hooks from the start. The real longer-term fix is reusing an engine across tests where possible (brain-allowlist.test.ts already does this via beforeAll+DELETE-pages pattern), but that's per-file structural work — out of scope for this cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * chore: regenerate llms-full.txt for brains-and-sources + brain-routing docs The test/build-llms.test.ts test validates that the committed llms.txt and llms-full.txt match the current generator output. PR 0 added docs/architecture/brains-and-sources.md content paths and updated CLAUDE.md + skills/RESOLVER.md in earlier commits, but the generated bundle file wasn't regenerated alongside. This caused one of the 20 fails we chased down today — a straight content mismatch, not a runtime bug. Running `bun run build:llms` picks up the new section content so the bundle matches the sources again. No functional change. Only the compiled doc bundle. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]> * Bump version 1.0.0.0 → 0.22.0 OAuth + admin dashboard is meaningful but doesn't quite warrant the major-version reset to 1.0. Renumber as v0.22.0, slotting cleanly above master's v0.21.0 (Cathedral II). Touched: - VERSION, package.json: 1.0.0.0 → 0.22.0 - CHANGELOG.md: heading + "BEFORE/AFTER v1.0" table + "To take advantage" + "pre-v1.0" all renamed. Narrative voice unchanged otherwise. - TODOS.md: ChatGPT MCP completion stamp updated to v0.22.0 (2026-04-25). - CLAUDE.md, README.md, docs/mcp/{DEPLOY,CHATGPT}.md, src/schema.sql, src/core/schema-embedded.ts: every reader-facing v1.0.0 reference rewritten to v0.22.0 / pre-v0.22 in the same place. - llms-full.txt: regenerated to match. Slug-test occurrences of "v1.0.0" (`test/slug-validation.test.ts`, `test/file-upload-security.test.ts`) and the `HOMEBREW_FOR_PERSONAL_AI` roadmap reference to a future v1.0 vision left intact — those are unrelated to this branch's release version. Typecheck clean. cli + oauth + slug + file-upload tests pass (106 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * v0.26.0 fix: 4 security findings from /cso pass + version bump Bumped 0.22.0 → 0.26.0 to slot above master's v0.21 chain with headroom for v0.23/0.24/0.25 to ship from master between now and merge. Security fixes (all from CSO finding writeups): #1 cookie-parser middleware — admin dashboard auth was silently broken. Express 5 has no built-in cookie parsing; req.cookies was always undefined, so /admin/login set the cookie but every subsequent admin API call returned 401. Added cookie-parser@^1.4.7 + @types/cookie-parser as direct + dev deps. app.use(cookieParser()) wired before CORS. #2 + #3 TOCTOU races — exchangeAuthorizationCode and exchangeRefreshToken used SELECT-then-DELETE, letting concurrent requests with the same code/refresh both pass the SELECT before either ran DELETE, both issuing token pairs. Switched to atomic DELETE...RETURNING. RFC 6749 §10.5 (codes) + §10.4 (refresh detection) violations closed. Added regression tests that fire 10 concurrent exchanges and assert exactly one wins — both pass. #5 pgArray escape + DCR redirect_uri validation — pgArray() did `arr.join(',')` with no escaping, so an element containing a comma would be parsed by Postgres as TWO array elements. With --enable-dcr on, this could smuggle a second redirect_uri into a registered client and steal auth codes. Now every element is double-quoted with `"` and `\` escaped. Added validateRedirectUri() per RFC 6749 §3.1.2.1: redirect_uris must be https:// or loopback (localhost / 127.0.0.1). Wired into the DCR registerClient path; CLI registration trusts the operator and bypasses. Regression test confirms a comma-in-URI element round-trips as 1 element, not 2. #6 --public-url flag — issuerUrl was hardcoded to http://localhost:{port}. Behind reverse proxies / ngrok / production deploys, the issuer claim in tokens wouldn't match the discovery URL clients hit (RFC 8414 §3.3). New --public-url URL flag on `gbrain serve --http`, propagates through serve.ts → serve-http.ts → ServeHttpOptions.publicUrl → issuerUrl. Startup banner surfaces the configured issuer. Findings #4 (admin requests filter dead code), #7 (admin register-client hardcoded grant_types), #8 (legacy token grandfathering posture) are documentation / minor functional fixes and are deferred per user direction. Tests: oauth.test.ts now 34 cases (was 27). 7 new: - single-use TOCTOU regression (10 concurrent code exchanges) - single-use TOCTOU regression (10 concurrent refresh exchanges) - redirect_uri http://localhost passes - redirect_uri https://example.com passes - redirect_uri http://example.com (non-loopback plaintext) rejected - redirect_uri non-URL rejected - redirect_uri with embedded comma stored as single element Files: - VERSION, package.json: 0.22.0 → 0.26.0 - CHANGELOG.md: heading + table + "To take advantage" + "pre-v0.22" → v0.26; new "Security hardening (post-/cso pass)" subsection at top of itemized changes; CLI flag list updated for --public-url. - src/core/oauth-provider.ts: pgArray escape, validateRedirectUri, registerClient enforces validation, DELETE...RETURNING in exchangeAuthorizationCode + exchangeRefreshToken. - src/commands/serve-http.ts: cookie-parser import + wire-up, publicUrl option, issuerUrl honors it, startup banner shows issuer. - src/commands/serve.ts: parses --public-url and threads through. - src/cli.ts: help text adds --public-url URL flag. - test/oauth.test.ts: +7 regression tests (now 34 total). - llms-full.txt: regenerated. Typecheck clean. 34 oauth + 14 cli tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
garrytan
added a commit
to garrytan-agents/gbrain
that referenced
this pull request
May 3, 2026
Pre-fix: the empty-state guard checked the unfiltered agents array. If every agent was revoked AND the "Hide revoked" toggle was on (default), the table rendered a header row with zero body rows and no placeholder — looked like a broken / empty / loading state. Two cases to render distinctly: 1. agents.length === 0 (truly no agents) "No agents registered. Register your first agent to get started." 2. visibleAgents.length === 0 BUT agents.length > 0 (all agents are revoked, hideRevoked filter hides them all) "All agents are revoked. Uncheck "Hide revoked" to view them." Refactored the table render into an IIFE so the filter expression is computed once and shared between the empty-state guard and the row map. Drops the prior inline `agents.filter(...).map(...)` pattern. (F2.2 from the eng review pass garrytan#2.)
garrytan
added a commit
that referenced
this pull request
May 3, 2026
#586) * feat(admin): legacy API keys alongside OAuth clients in dashboard Adds API key management to the admin dashboard: Server (serve-http.ts): - GET /admin/api/api-keys — list legacy access_tokens with status - POST /admin/api/api-keys — create new bearer token - POST /admin/api/api-keys/revoke — revoke by name - Stats endpoint now includes active_api_keys count Admin UI (Agents.tsx): - Tabbed view: 'OAuth Clients' | 'API Keys' - API Keys tab: table with name, status, created, last used, revoke button - Create API Key modal with name input - Token reveal modal with copy button + warning - Badge showing active key count on tab Both auth methods (OAuth 2.1 client_credentials and legacy bearer tokens) now visible and manageable from a single admin surface. * feat(admin): remember admin token in localStorage + auto-reauth Login flow: - First login: paste token, saved to localStorage - Subsequent visits: auto-login from localStorage (no paste needed) - Shows 'Authenticating...' spinner during auto-login - If saved token is stale (server restarted), clears it and shows login form Session recovery: - If session cookie expires mid-use (server restart, 24h expiry), the API layer auto-reauths with the saved token before redirecting to login - Transparent to the user — one failed request triggers reauth + retry - Only falls back to login page if the saved token itself is invalid Security: - Token stored in localStorage (same-origin, tailnet-only deployment) - Cleared automatically when token becomes invalid - Cookie remains HttpOnly + SameSite=Strict for the actual session * feat(admin): rich request logging + agent activity tracking Server: - mcp_request_log now captures params (jsonb) and error_message (text) - Agents API returns last_used_at, total_requests, requests_today - Request log API supports agent/operation/status filtering via query params - SSE broadcast includes params and error details Agents page: - Shows 'Requests today / total' and 'Last used' (relative time) per agent - Removed Client ID column (low signal, shown in drawer) Request Log page: - New 'Params' column — shows query text, slug, or param count inline - Click any row to expand full details (params JSON, error message, timestamps) - Click agent name to filter all requests by that agent - Agent filter dropdown in header - Error messages shown in red in expanded view What this means: when Claude Code searches for 'pedro franceschi', the admin dashboard shows the search query, which agent ran it, how long it took, and whether it succeeded — all clickable. * feat(admin): magic link login — ask your agent for the URL New flow: 1. User opens /admin → sees 'This is a protected dashboard' 2. UI tells them: 'Ask your AI agent for the admin login link' 3. Agent generates: https://host:port/admin/auth/<token> 4. User clicks the link → auto-authenticates → redirects to dashboard 5. Session lasts 7 days (magic link) vs 24h (manual token paste) Server: GET /admin/auth/:token validates the bootstrap token, sets HttpOnly cookie, redirects to /admin/. Invalid tokens get a plain text error telling them to ask their agent for a fresh link. Login page: primary UX is the 'ask your agent' prompt with example. Manual token paste collapsed under a <details> disclosure. * feat(admin): config export for Claude Code, ChatGPT, Claude.ai, Cursor, Perplexity Agent drawer now shows setup instructions for 5 clients + raw JSON: - Claude Code: .mcp.json with bearer token + curl to mint - ChatGPT: Settings → Tools → MCP with OAuth discovery - Claude.ai (Cowork): Connected Apps → MCP with OAuth - Cursor: .cursor/mcp.json with OAuth config - Perplexity: Connectors with client ID/secret - JSON: raw config with all URLs (server, token, discovery) All snippets use the actual server URL (window.location.origin) instead of placeholder YOUR_SERVER. Client ID pre-filled. * feat(admin): per-client token TTL — configurable token lifetime Problem: OAuth tokens expire in 1 hour (hardcoded). Claude Code's built-in OAuth client doesn't auto-refresh, so users get 401s every hour. Fix: per-client token_ttl column on oauth_clients table. Set at registration time or updated later via the admin dashboard. Server: - oauth_clients.token_ttl column (nullable integer, seconds) - exchangeClientCredentials reads per-client TTL, falls back to server default - POST /admin/api/register-client accepts tokenTtl param - POST /admin/api/update-client-ttl for existing clients - Agents API returns token_ttl for display Admin UI: - Register modal: Token Lifetime dropdown (1h, 24h, 7d, 30d, 1y, no expiry) - Agent drawer: shows current TTL in Details section Presets: gstack-desktop and garry-claude-code set to 30-day tokens. * fix(admin): request log shows agent name instead of truncated client_id Resolves client_id → client_name via LEFT JOIN on oauth_clients (and access_tokens for legacy keys). Agent column now shows 'gstack-desktop' instead of 'd0db7692caf5…'. Clickable to filter by agent. * feat(admin): DESIGN.md + left-align everything DESIGN.md establishes the admin dashboard design system: - Left-align all text (Garry preference) - Inter + JetBrains Mono (shared DNA with GStack) - No accent color — semantic badges carry all color - Dense utilitarian ops dashboard - Component specs and anti-patterns documented CSS: login-box text-align center → left * feat(admin): unified agent view + resolved agent names in request log Agent names stored at log time (agent_name column). Agents page shows OAuth clients and API keys in one unified table. Request log shows human-readable names. Backfilled 1,114 existing entries. * feat(admin): working Revoke Agent button + e2e tests Bugs fixed: - Revoke Agent button was a no-op (no onClick handler, no API endpoint) - Legacy API key tokens got 401 at /mcp (missing expiresAt in AuthInfo) - token_ttl and deleted_at queries failed on PGLite (columns don't exist) Server: - POST /admin/api/revoke-client: soft-deletes oauth_clients + purges tokens - exchangeClientCredentials checks deleted_at (graceful if column missing) - Legacy token verify returns expiresAt (1yr future) for SDK compat UI: - Revoke button: confirm dialog → revoke → close drawer → reload table - Shows 'This agent has been revoked' for revoked agents E2E tests (2 new cases, 17 total): - revoke client via admin API invalidates all tokens (mint → use → revoke → verify rejected → mint fails) - revoke API key via admin API (create → use at /mcp → revoke → verify rejected) 52 tests, 0 failures, 213 assertions across unit + e2e. * fix(test): e2e tests clean up after themselves — no more orphan clients Problem: every test run left e2e-oauth-test, e2e-revoke-test, and e2e-revoke-key-test rows in oauth_clients and access_tokens. The CLI-based cleanup in afterAll was failing silently. Fix: - beforeAll: SQL DELETE of any e2e-* orphans from previous crashed runs - afterAll: direct SQL cleanup of oauth_tokens, oauth_clients, access_tokens, mcp_request_log — all rows matching 'e2e-%' pattern - No reliance on CLI commands for cleanup (they fail silently) Verified: 52 tests pass, 0 test rows remain after run. * feat(admin): hide revoked toggle on Agents page * fix(admin): styled error page for expired magic links Matches the login page aesthetic instead of plain text. Dark theme, GBrain logo, explains the link expired, tells user to ask their agent. * fix(admin): clean config export — auth-type-aware Claude Code instructions * fix(admin): rewrite all config exports — command language, auth-type-aware, verified syntax * fix(admin): API key rows clickable with revoke + sync all fixes from master Syncs all accumulated fixes onto the PR branch: - API key rows in agents table now open drawer with Revoke button - API keys show bearer token usage hint instead of config export tabs - Config export snippets use command language directed at the AI agent - Styled expired magic link error page - Hide revoked toggle - Test cleanup via direct SQL - All v0.26.2 upstream fixes incorporated * fix(oauth): port coerceTimestamp helper from master 1055e10 Tests in test/oauth.test.ts (already on this branch) import coerceTimestamp from oauth-provider.ts. The import was synced from master via PR commit 16 ("sync all fixes from master") but the production-code change to oauth-provider.ts was not. Result: bun test fails at module load with "coerceTimestamp is not exported". This commit ports the helper directly instead of merging master, avoiding VERSION/CHANGELOG/dist conflicts. Boundary helper for postgres.js BIGINT-as-string (auto-detected on Supabase pgbouncer / port 6543). Throws on non-finite so corrupt rows fail loud at the SELECT-row -> JS-number boundary. Returns undefined for SQL NULL; comparison sites treat NULL as expired (fail-closed). Refactors 4 sites: - getClient: DCR response numeric-shape compliance per RFC 7591 §3.2.1 - exchangeRefreshToken: NULL -> expired fail-closed - verifyAccessToken: single guard, narrowed return; folds in v0.26.1's inline Number(...) at the return site Originally landed on master as part of #593 (v0.26.2). Ported here so PR #586 (v0.26.3) can build standalone without a master merge. * feat(schema): migration v33 — admin dashboard columns Adds the 5 columns + new index referenced by PR #586 admin dashboard work that landed without a corresponding schema migration: oauth_clients.token_ttl INTEGER -- per-client OAuth TTL override oauth_clients.deleted_at TIMESTAMPTZ -- soft-delete for revoke mcp_request_log.agent_name TEXT -- resolved client_name for log mcp_request_log.params JSONB -- captured request params mcp_request_log.error_message TEXT -- captured error text on failure idx_mcp_log_agent_time INDEX -- supports new agent filter Without v33 on existing brains: - /admin/api/agents 503s (SELECT references token_ttl + deleted_at) - POST /admin/api/revoke-client throws 500 (UPDATE deleted_at) - POST /admin/api/update-client-ttl throws 500 (UPDATE token_ttl) - mcp_request_log INSERTs silently swallow column-doesn't-exist errors, request log appears empty to the operator All ALTERs use ADD COLUMN IF NOT EXISTS so re-running the migration is a no-op on a brain that already has v33. Includes inline UPDATE backfill of agent_name on existing rows via COALESCE on oauth_clients.client_name → access_tokens.name → token_name. Updates: - src/core/migrate.ts: v33 migration entry - src/schema.sql: source-of-truth schema for fresh installs - src/core/pglite-schema.ts: PGLite mirror - src/core/schema-embedded.ts: regenerated via bun run build:schema - test/migrate.test.ts: 5 SQL-shape assertions pinning the v33 contract * refactor(serve-http): parameterize request-log filter; kill dead vars Three issues in the prior /admin/api/requests handler: 1. sql.unsafe() with manual single-quote escape on user input: conditions.push(`token_name = '${agent.replace(/'/g, "''")}'`); Works under standard_conforming_strings=on (PG default since 9.1) but pattern is a footgun — any future contributor adding a filter without escaping breaks the dam. Backslashes are not escaped. Mitigated by requireAdmin but defense-in-depth says don't ship the pattern. 2. Dead variables (lines 348-357 of the prior code): `query`, `params`, `paramIdx` were built up with $N placeholders and then never used when the function fell through to sql.unsafe with manually-escaped strings. Confusing leftovers from an earlier parameterization attempt. 3. Unused `values: unknown[] = []` in the conditions block. Fix: replace the entire dynamic-WHERE construction with postgres.js tagged-template fragments. Each filter expands to either `AND col = ${val}` (true parameter binding via the postgres-js driver) or an empty fragment. `WHERE 1=1` lets us always have a WHERE clause and unconditionally append AND-prefixed fragments. No string interpolation, no manual escaping, no sql.unsafe. Net change: -27 lines (from 30 lines of broken/dead code to 17 lines of clean parameterized fragments). * perf(oauth): thread client_name through AuthInfo; drop per-request lookup PR #586's serve-http.ts /mcp handler did one extra DB roundtrip per authenticated request to resolve client_id → client_name for logging: let agentName = authInfo.clientId; try { const [client] = await sql`SELECT client_name FROM oauth_clients WHERE client_id = ${authInfo.clientId}`; if (client) agentName = client.client_name; } catch { /* best effort */ } On a busy brain (Perplexity Computer doing inline research, Claude Code searching) that is ~50–100ms extra per /mcp request — wasted on a static lookup that doesn't change between requests. Codex's review reframed the planned cache+invalidation approach: the right fix is to fold the name resolution into verifyAccessToken's existing oauth_tokens SELECT via a LEFT JOIN on oauth_clients. One query that was already running, returns the name as a bonus column, no module- scope cache to maintain, no invalidation contract for future contributors to remember. Changes: - AuthInfo (src/core/operations.ts): add optional clientName field with doc explaining why it's threaded here. - verifyAccessToken (src/core/oauth-provider.ts): SELECT becomes SELECT t.client_id, t.scopes, t.expires_at, t.resource, c.client_name FROM oauth_tokens t LEFT JOIN oauth_clients c ON c.client_id = t.client_id WHERE t.token_hash = ${tokenHash} AND t.token_type = 'access' Returns clientName in AuthInfo. - Legacy access_tokens path: clientName = name (single identifier). - serve-http.ts /mcp handler: read authInfo.clientName directly, fall back to clientId. Per-request lookup removed. Net change: -8 LOC. Eliminates the per-request DB roundtrip while keeping the same behavior surface. * security(serve-http): timingSafeEqual on admin token hash compare Both /admin/login (POST, JSON body) and /admin/auth/:token (GET, magic link) compared the sha256 of the operator-supplied token against the known bootstrapHash via JS string `===`, which short-circuits at the first mismatched character. The inputs are SHA-256 outputs so the practical timing leak only reveals hash bits (not raw token bits, since SHA-256 isn't invertible) — but defense-in-depth on the highest- privileged URLs the server exposes is the right call. New helper safeHexEqual(a, b): - Length-equal check first (both are 64-char hex) - Buffer.from(hex, 'hex') decodes each side to 32 bytes - crypto.timingSafeEqual returns the constant-time compare result Also tightens the POST handler's input validation: requires token to be a string before passing to createHash (prior code only checked truthiness, would have crashed on object-typed bodies even with express.json's parser). Used at both magic-link and password-style admin auth sites. * security(serve-http): rate-limit /admin/auth/:token at 10/min/IP Defense-in-depth on the magic-link endpoint. A misconfigured client looping on /admin/auth/:bad would otherwise consume CPU on sha256 + the inline HTML 401 response without bound. Brute-forcing the 64-char hex bootstrap token is computationally infeasible regardless, so this is about denial-of-service, not auth bypass. Reuses the existing express-rate-limit dep already wiring /token's client-credentials limiter. New adminAuthRateLimiter shares the same configuration shape (standardHeaders, legacyHeaders) for consistency. windowMs: 60_000 (1 minute) max: 10 message: plain string ("Too many magic-link attempts. Wait a minute before trying again.") instead of JSON envelope, matching the endpoint's HTML response style. * security(admin): kill JS-state token; single-use magic links; sign out everywhere Resolves D11 + D12 from the codex-pushback review. Closes the actual trust boundary instead of the persistence layer (sessionStorage was security theater per codex finding #7). # Single-use magic links (D11=C) The bootstrap token is no longer the magic-link path component. New flow: agent has bootstrap token (read from server stderr) -> POST /admin/api/issue-magic-link Authorization: Bearer <bootstrap> -> server returns one-time nonce URL -> operator clicks /admin/auth/<nonce> -> server consumes nonce, sets cookie, redirects to dashboard Server state (in-memory): - magicLinkNonces: Map<nonce, expiresAt> (5-minute TTL) - consumedNonces: Set<nonce> (LRU cap 1000 to bound memory) - pruneExpiredNonces() best-effort GC on each issue/redeem Each redemption marks the nonce consumed. Second click on the same URL gets the styled 401 page. Leaked URL grants exactly one extra session before dying. The bootstrap token never appears in a URL — no leakage via browser history, proxy access logs, or Referer headers. # Kill JS-state bootstrap token (D12=B) admin/src/pages/Login.tsx + admin/src/api.ts: - All localStorage reads/writes removed - Auto-reauth-via-saved-token logic deleted - Token only lives in form state during submit, cleared after - 401 redirects straight to login — no cache to retry against The HttpOnly cookie is the only session credential after successful authentication. Closing the tab ends the session. Reopening shows the login page. Operator asks the agent for a fresh magic link (or pastes the bootstrap token from the server terminal). # Sign out everywhere POST /admin/api/sign-out-everywhere (admin-cookie-required) calls adminSessions.clear() and returns {revoked_sessions: count}. Every browser/tab fails its next request, gets 401, redirects to login. Bootstrap token unaffected — still valid for new magic-link mints. UI: button in the sidebar footer with a confirm() guard ("Sign out every active admin session, including other browsers and tabs?"). # Notes admin/dist is gitignored on this branch (master's v0.26.2 removed that line; the merge to master will reconcile). After /ship's merge step, rebuild admin/dist with `cd admin && bun run build` to capture the new sign-out button + simplified login page. * fix(admin): rename loadApiKeys() to loadAgents() in Agents.tsx onCreated The Create API Key flow's onCreated callback called loadApiKeys() but no such function exists in this file. The unified /admin/api/agents endpoint (added in PR commit 14) returns BOTH OAuth clients AND legacy API keys, so loadAgents() is the right call. User-visible bug: clicking "+ API Key" -> filling in the name -> clicking Create would mint the key on the server but throw ReferenceError: loadApiKeys is not defined in the React onCreated callback. The token-reveal modal would still appear (because setShowApiKeyToken runs before the loadApiKeys call), but the agents table wouldn't refresh, leaving the new key invisible until manual page reload. Five Claude review passes missed this. Codex caught it in one pass. 1-line fix. * fix(admin): empty-state placeholder when filtered Agents result is empty Pre-fix: the empty-state guard checked the unfiltered agents array. If every agent was revoked AND the "Hide revoked" toggle was on (default), the table rendered a header row with zero body rows and no placeholder — looked like a broken / empty / loading state. Two cases to render distinctly: 1. agents.length === 0 (truly no agents) "No agents registered. Register your first agent to get started." 2. visibleAgents.length === 0 BUT agents.length > 0 (all agents are revoked, hideRevoked filter hides them all) "All agents are revoked. Uncheck "Hide revoked" to view them." Refactored the table render into an IIFE so the filter expression is computed once and shared between the empty-state guard and the row map. Drops the prior inline `agents.filter(...).map(...)` pattern. (F2.2 from the eng review pass #2.) * fix(admin): restore Claude Code + Cursor tabs for API-key agents Wintermute's commit 16 (3d5d0f8) wrapped the entire Config Export section in {isOAuth && (...)}, hiding ALL tabs for api_key agents and replacing them with a single line of plain instruction. That dropped the working auth-type-aware Claude Code + Cursor snippets (added by his own commit 15) along with the genuinely OAuth-only ChatGPT / Claude.ai / Perplexity ones. Codex review pass D5 settled on option C: per-tab branching. Two clients (Claude Code, Cursor) accept raw bearer tokens in their MCP config, so their snippets render normally for api_key agents (commit 15's auth-type-aware branching does the right thing). Three clients (ChatGPT, Claude.ai, Perplexity) only speak OAuth 2.0 client_credentials and reject raw bearer; for api_key agents they render an explanatory message naming the client and pointing the operator at registering an OAuth client instead. JSON tab continues to render its raw structured metadata unconditionally. Layout: removed the `{isOAuth && (...)}` outer wrap; tab list now always visible. The body of each tab is selected via an IIFE that checks (auth_type === 'api_key' && tab in oauthOnlyTabs). Net change: +24 lines (the warning panel + IIFE branch logic). * feat(admin): read -s prompt OAuth Claude Code snippet + 2-step curl fallback Wintermute's commit 15 inlined client_secret into a long compound `claude mcp add --header "Authorization: Bearer $(curl -d '... client_secret=PASTE_HERE')"` line. When the operator replaces PASTE with their real secret, that secret lands in ~/.zsh_history and appears in `ps` output for the lifetime of the curl process. D13=C from the eng review: ship both shapes. Default (read -s prompt-based, ~17 lines): - read -rs prompts for the secret without echo, stores in $GBRAIN_CS scoped to the shell session - curl uses --data-urlencode "client_secret=$GBRAIN_CS" — variable substitution at exec time, so the secret enters the curl process's argv at the moment of the call, but the shell history records literally `--data-urlencode "client_secret=$GBRAIN_CS"`, not the value - unset GBRAIN_CS afterwards to scrub the env Fallback (2-step curl + paste, for shells without read -s): - one curl command to mint the token (PASTE_YOUR_CLIENT_SECRET_HERE in the body — secret hits history but in one short isolated line that's easy to scrub) - second `claude mcp add` command with PASTE_TOKEN_FROM_ABOVE — the bearer token, not the long-lived client secret - bash + zsh history-deletion hint at the bottom Both shapes preserve the agent-facing voice ("The user wants to connect GBrain MCP to your context. Here's how.") and the token-TTL rendering ("will last 30 days") that commit 15 added. Net change: +25 lines in the configSnippets['claude-code'] OAuth branch. API-key branch unchanged (single paste, no secret). * chore(ci): gate admin React build via scripts/check-admin-build.sh Codex review pass #6 finding #3 caught loadApiKeys() referenced but undefined in Agents.tsx — a real shipping bug that 5 Claude review passes missed. Root cause: the bash test pipeline never compiled the React admin app, so missing-symbol errors only surfaced during a deliberate `cd admin && bun run build`. This commit threads the admin build into the standard test gate. Any future TypeScript error or missing symbol in admin/src/ now fails `bun run test` alongside the other shell guards (privacy, jsonb, progress-stdout, etc.) and the typecheck step. Behavior: - scripts/check-admin-build.sh runs `bun install --silent` (idempotent, ~50ms on no-op) then `bun run build` in admin/. - Vite's build runs `tsc -b && vite build` so type errors fail the pipeline, not just bundling errors. - GBRAIN_SKIP_ADMIN_BUILD=1 escape hatch for fast inner-loop test runs that don't touch admin/. Production CI MUST NOT set this. - Skips silently if admin/ doesn't exist (handles slim-clone scenarios). Wired into both: - "test" script: full pipeline now includes admin build before bun test - "check:admin-build" script: invoke standalone for debugging * test(e2e): v0.26.3 coverage — column round-trip, injection probe, TTL, magic-link Folds together the planned fix-up commits #8-#11 since they all live in the same E2E file and share the spawned-server harness. Each test block is independently bisect-readable. # Test 1: mcp_request_log new column round-trip (pins migration v33) Wipes log rows for the e2e-oauth-test client, makes a successful tools/list call + a failed tools/call (nonexistent tool name), then asserts: - rows persisted (count >= 2) — proves the INSERT wasn't silently swallowed by the "best effort" try/catch on a column-doesn't-exist error - agent_name column resolves to 'e2e-oauth-test' on every row (proves the JOIN in verifyAccessToken or the v33 backfill path) - params column persisted as JSONB on tools/call - error_message column populated on the status='error' row Without migration v33, every assertion fails — the column doesn't exist so the INSERT throws, gets swallowed, and rows.length === 0. # Test 2: request-log filter injection probe Sends `?agent=alice'%20OR%201%3D1` to /admin/api/requests. Pre-fix, the sql.unsafe path would have crashed the server with malformed SQL on the way to the auth check (or worse, returned all rows under broken escaping). Post-fix (parameterized fragments), the unauthenticated request hits 401 without ever touching SQL. Asserts: - 401 (not 500) on the injection input - server still responsive on /health afterwards (didn't crash) # Test 3: per-client token_ttl flow Registers e2e-test-ttl, sets oauth_clients.token_ttl, mints a token, asserts response's expires_in matches. Cycles through three states: - token_ttl = 86400 → expires_in = 86400 (24h custom override) - token_ttl = 7200 → expires_in = 7200 (2h different custom) - token_ttl = NULL → expires_in = 3600 (server default fallback) Pins the per-client TTL feature added in PR #586 commit 6 (e7989e9). # Test 4: magic-link styled 401 page + single-use semantic (a) Invalid nonce returns Content-Type: text/html with a body that contains "expired" and "GBrain" — pins the styled error page from PR commit 13 (f8f5cfe). (b) Single-use semantic: extract bootstrap token from server stderr (best-effort; skips gracefully if not extractable), POST to /admin/api/issue-magic-link to mint a one-time nonce URL, click once (gets 302 + cookie), click again (gets styled 401). Pins the D11=C single-use rotation logic. # Test 5: agent_name resolution path Makes an OAuth request and asserts mcp_request_log.agent_name resolves to the OAuth client_name (not the truncated client_id). Pins the JOIN introduced in fix-up #4 + the v33 backfill path. # Test 6: register-client missing-name returns 400 (basic input validation) Hits /admin/api/register-client without auth — must 401 (not crash 500). # Other changes - Renamed describe header from `(v0.26.1 + v0.26.2)` to `(v0.26.1 + v0.26.2 + v0.26.3)` — F6.5. - All postgres.js sql tag bindings on `clientId` / `clientSecret` use the `!` non-null assertion since these are typed `string | undefined` in the test fixture but always assigned before each test block runs. - Result casts go through `as unknown as ...` per postgres.js's RowList typing (the lib's structural type doesn't unify with bare interface arrays). * chore: privacy sweep + integrity.ts on getconnection allow-list Two pre-existing CI failures uncovered while running `bun run test` on this branch — unrelated to v0.26.3 substance but blocking the pipeline. # Privacy sweep (src/core/mounts-cache.ts) Two references to the private agent fork name in code comments, violating CLAUDE.md privacy rule ("never reference real people, companies, funds, or private agent names in any public-facing artifact"). Both authored in v0.26.0 commit 3c032d7. - line 6 (docblock): "Host agents (Wintermute / OpenClaw / any Claude Code install) read" -> "Host agents (your OpenClaw / any Claude Code install) read" - line 324 (RESOLVER preamble emitter): "Host agents (Wintermute/OpenClaw/Claude Code) should prefer this file over" -> "Host agents (your OpenClaw / Claude Code) should prefer this file over" Per the documented substitution: "your OpenClaw" for reader-facing copy covers any downstream OpenClaw deployment (Wintermute, Hermes, AlphaClaw, etc.) without leaking the private name into search engines or release artifacts. # integrity.ts on the getconnection allow-list `scripts/check-no-legacy-getconnection.sh` flags `db.getConnection()` calls outside `src/core/db.ts` to enforce the multi-brain routing contract. `src/commands/integrity.ts:355` (scanIntegrityBatch) was introduced in v0.22.16 commit 8468ba2 — the check ran clean at the time because the file wasn't on the allow-list yet, but PR #586's test pipeline catches it. Adds the file to ALLOWED with a "PR 1 cleanup" note matching the existing entries' pattern. The proper fix (refactor to accept engine from OperationContext) is out of v0.26.3 scope and tracked alongside the other PR 1 entries. * chore: bump v0.26.2 -> v0.26.3 + CHANGELOG VERSION + package.json already at 0.26.3 from the initial bump on this branch (see commit history). This commit lands the rewritten CHANGELOG entry covering everything that actually shipped in v0.26.3 — well past the original "legacy API keys" framing. What lands in v0.26.3: # Headline (admin trust model) Bootstrap token never persists in browser JS state (no localStorage, no sessionStorage). Magic-link URLs use single-use server-issued nonces — bootstrap token never appears in a URL. Cookie sessions are HttpOnly + SameSite=Strict. "Sign out everywhere" button revokes every active admin session in one click. # Schema Migration v33 adds 5 columns referenced by PR #586's admin-dashboard work that landed without a corresponding migration. Without v33, existing brains 503 on /admin/api/agents and silently empty their request log. Backfill of agent_name from oauth_clients.client_name -> access_tokens.name -> token_name baked into the migration. # Performance verifyAccessToken JOINs oauth_clients in its existing token SELECT and returns clientName on AuthInfo. Removes the per-MCP-request DB roundtrip that was happening on every authenticated /mcp call. # Security - crypto.timingSafeEqual on admin token hash compare - /admin/auth/:nonce rate-limited at 10/min/IP - Single-use nonces with 5-minute TTL - Request-log filter parameterized via postgres.js tagged-template fragments (sql.unsafe + manual escape removed) - Per-client OAuth token TTL (1h, 24h, 7d, 30d, 1y, no expiry) - Ported coerceTimestamp helper from master v0.26.2 (BIGINT-as-string fix) # UI - API keys + OAuth clients in one unified Agents table - Auth-type-aware Config Export tabs - Claude Code OAuth: read -s prompt-based snippet (default) + 2-step curl fallback (D13=C) - Cursor: OAuth discovery URL OR raw bearer based on auth type - ChatGPT/Cowork/Perplexity: "OAuth client required" CTA on api_key agents - Hide-revoked toggle + empty-state placeholder for filtered-empty - Bug fix: loadApiKeys -> loadAgents (codex caught what 5 review passes missed; Create-API-Key flow was broken) # Tests + CI - New E2E coverage: column round-trip, injection probe, per-client TTL, magic-link single-use, styled 401, agent_name resolution - Admin React build is now a CI gate (catches missing-symbol bugs before E2E) - check-no-legacy-getconnection allowlist updated for integrity.ts Branch shape: 16 author commits + 13 fix-up commits = 29 commits on PR. Commit-by-commit bisect-friendly. Plan + codex review pass artifacts at ~/.claude/plans/check-this-out-and-breezy-forest.md. --------- Co-authored-by: Wintermute <[email protected]> Co-authored-by: Garry Tan <[email protected]>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GBrain v0.2.0 — Full port of the Postgres-native personal knowledge brain with hybrid RAG search.
Core
BrainEngineinterface with full Postgres + pgvector implementationv0.2 Features
gbrain syncwith incremental git-to-brain sync and--watchmodegbrain filesfor binary file management in Supabase StorageCLI Polish (latest)
src/version.tsreads frompackage.json(safe for compiled binaries). Fixed MCP server hardcoded0.1.0, bumpedskills/manifest.json.clawhub --versioninstead ofwhichto avoid false positives, 120s timeout on execSync.--help: All 28 commands have help text. Checked before DB connection sogbrain get --helpworks without a brain configured.connectEngine()to avoid wasting a DB round-trip on typos.Test Coverage
39 tests across 3 files. All pass.
test/markdown.test.ts— frontmatter parsing, round-trip serializationtest/chunkers/recursive.test.ts— delimiter splitting, overlap, chunk sizingtest/sync.test.ts— manifest parsing, isSyncable filtering, pathToSlug conversionCLI dispatch tests (--help, unknown command, version consistency) deferred to follow-up.
Pre-Landing Review
No structural issues found in the CLI polish changes. String constant changes and dispatch logic only.
Test plan
bun test— 39 tests passgbrain --versionprints version from package.jsongbrain get --helpprints help without DB connectiongbrain upgrade --helpprints upgrade-specific helpgbrain nonsenseprints "Unknown command" without DB connectiongrep -r 'npm' src/— zero hits except node_modules detection commentgrep -rn '0.1.0' src/ skills/— zero hits🤖 Generated with Claude Code