fix(sync): deadlock on incremental sync with > 10 modified files by sunnnybala · Pull Request #132 · garrytan/gbrain

sunnnybala · 2026-04-15T09:33:23Z

Summary

Symptom: gbrain sync (incremental) hangs indefinitely when the diff touches more than 10 syncable files. --full always works. Reproduced with a 15-file commit: unpatched sync hangs forever, patched sync completes in 3.4 seconds on the same diff.

Cause: src/commands/sync.ts wraps the add/modify loop in engine.transaction() when useTransaction > 10. Each file in the loop then calls importFromContent, which opens another engine.transaction() on the same PGLite instance. PGLite transactions are not reentrant — the inner call queues on the same _runExclusiveTransaction mutex the outer is holding, producing a classic recursive-mutex deadlock. Main thread parks in ep_poll, workers park in futex_wait_queue, zero CPU advancement.

Fix: drop the outer transaction wrap. Each file's inner transaction is already atomic; per-file atomicity is also the right granularity (one file's failure should not roll back the others' successful imports).

Diff: 21 insertions, 21 deletions, no behavior change below the ≤10 threshold.

Details

The deadlock chain

Incremental sync enters the useTransaction > 10 branch at sync.ts:225:

if (useTransaction) {
  await engine.transaction(async () => { await processAddsModifies(); });
}

engine.transaction in pglite-engine.ts delegates to PGLite:

async transaction<T>(fn) {
  return this.db.transaction(async (tx) => {
    const txEngine = Object.create(this) as PGLiteEngine;
    Object.defineProperty(txEngine, 'db', { get: () => tx });
    return fn(txEngine);
  });
}

Closure pitfall: the sync.ts callback is async () => { await processAddsModifies(); } — it does not take txEngine as a parameter. processAddsModifies is defined in the outer scope of performSync and closes over the original engine variable, not the txEngine passed into the callback. So every inner importFile(engine, …) call uses the original PGLiteEngine, whose this.db is the raw PGlite instance (not the outer tx object).
Inside importFromContent (import-file.ts:95):
```
await engine.transaction(async (tx) => {
  // putPage, upsertChunks, tag reconciliation, createVersion
});
```
This calls this.db.transaction(...) on the original PGlite instance — the same instance whose exclusive-transaction mutex is currently held by the outer transaction.
PGLite's transaction() (from @electric-sql/pglite chunk-HDIMFN25.js):
```
async transaction(s) {
  return await this._checkReady(),
    await this._runExclusiveTransaction(async () => {
      await /*BEGIN*/ call(this, "BEGIN");
      // ... await user callback ...
      // COMMIT / ROLLBACK
    })
}
```
_runExclusiveTransaction is a mutex. The inner call queues on the same mutex the outer is still holding. The outer can't release — it's awaiting the user callback, which is awaiting the inner transaction, which is queued forever.

Reproduction

# From a clean gbrain-brain repo
cd /your/gbrain-brain
for f in $(ls people/*.md | grep -v README | head -11); do
  printf "\n<!-- repro -->\n" >> "$f"
done
git commit -am "repro: 11 non-README .md files"

# Unpatched: hangs forever. ep_poll on main thread, futex_wait on workers.
# Zero CPU advancement after entering transaction branch.
gbrain sync --no-pull --no-embed

# Workaround that already exists: --full uses performFullSync which
# goes through runImport() (a different code path without the outer wrap).
gbrain sync --full --no-pull --no-embed

Evidence captured live:

Process state: /proc/<pid>/task/*/wchan → main ep_poll, two worker threads futex_wait_queue
CPU ticks frozen: utime 205 stime 47 unchanged across three 2-second samples
After applying this patch: same 16-file diff completes in real 0m3.4s with status synced, all pages affected, chunks created

Why this has flown under the radar

Only triggers when a single git diff contains more than 10 files that pass isSyncable() (excludes README.md, index.md, schema.md, log.md, hidden dirs, non-.md/.mdx)
Most day-to-day commits are well under 10 files and take the else branch, bypassing the wrap entirely
gbrain import (bulk ingest) and first-sync go through performFullSync → runImport, which has no outer wrap
sync --full works as an apparent workaround, leading users to blame "PGLite is slow" rather than report a bug
The hang is completely silent — no log, no error, no CPU. Looks indistinguishable from a very slow sync

Where it bites in production

Large initial backfills (importing months of daily notes, calendar events, email digests in one commit)
Citation-fixer / maintenance runs that touch many people/company pages at once — e.g. a dream-cycle job that patches 12+ pages for missing [Source: ...] citations
Repo migrations / mass rewrites where a single commit renames or reformats many files
Any agent that batches brain updates before a single git commit + gbrain sync

Why removing the outer wrap is safe

importFromContent is already atomic per file (its inner transaction commits the pages row, tags, chunks, and page_versions together)
The outer wrap did not add any atomicity guarantee beyond "all-or-nothing across files", which is actually worse semantics: a single malformed file would roll back every successfully imported file in the same sync run
Per-file atomicity means partial sync progress survives a transient failure mid-run, and the next sync picks up where the failed one left off
performFullSync path has always done per-file inner transactions with no outer wrap — this fix simply makes the incremental path match

The diff

-  // Process adds and modifies
-  const useTransaction = (filtered.added.length + filtered.modified.length) > 10;
-  const processAddsModifies = async () => {
-    for (const path of [...filtered.added, ...filtered.modified]) {
-      const filePath = join(repoPath, path);
-      if (!existsSync(filePath)) continue;
-      try {
-        const result = await importFile(engine, filePath, path, { noEmbed });
-        if (result.status === 'imported') {
-          chunksCreated += result.chunks;
-          pagesAffected.push(result.slug);
-        }
-      } catch (e: unknown) {
-        const msg = e instanceof Error ? e.message : String(e);
-        console.error(`  Warning: skipped ${path}: ${msg}`);
-      }
-    }
-  };
-
-  if (useTransaction) {
-    await engine.transaction(async () => { await processAddsModifies(); });
-  } else {
-    await processAddsModifies();
-  }
+  // Process adds and modifies.
+  //
+  // NOTE: do NOT wrap this loop in engine.transaction(). importFromContent
+  // already opens its own inner transaction per file, and PGLite transactions
+  // are not reentrant — they acquire the same _runExclusiveTransaction mutex,
+  // so a nested call from inside a user callback queues forever on the mutex
+  // the outer transaction is still holding. Result: incremental sync hangs in
+  // ep_poll whenever the diff crosses the old > 10 threshold that used to
+  // trigger the outer wrap. Per-file atomicity is also the right granularity:
+  // one file's failure should not roll back the others' successful imports.
+  for (const path of [...filtered.added, ...filtered.modified]) {
+    const filePath = join(repoPath, path);
+    if (!existsSync(filePath)) continue;
+    try {
+      const result = await importFile(engine, filePath, path, { noEmbed });
+      if (result.status === 'imported') {
+        chunksCreated += result.chunks;
+        pagesAffected.push(result.slug);
+      }
+    } catch (e: unknown) {
+      const msg = e instanceof Error ? e.message : String(e);
+      console.error(`  Warning: skipped ${path}: ${msg}`);
+    }
+  }

sync.ts wraps the add/modify loop in engine.transaction(), and each importFromContent inside opens another one. PGLite's _runExclusiveTransaction is a non-reentrant mutex — the second call queues on the mutex the first is holding, and the process hangs forever in ep_poll. Reproduced with a 15-file commit: unpatched hangs, patched runs in 3.4s. Fix drops the outer wrap; per-file atomicity is correct anyway (one file's failure should not roll back the others).

Reads src/commands/sync.ts verbatim and asserts no uncommented engine.transaction() call appears above the add/modify loop. Protects against silent reintroduction of the nested-mutex deadlock that hung > 10-file syncs forever in ep_poll.

@sunnnybala

Master shipped v0.12.1 (extract N+1 + migration timeout) and v0.12.2 (JSONB double-encode + splitBody + wiki types + parseEmbedding) while this wave was mid-flight. Ships the remaining pieces as v0.12.3: - sync deadlock (#132, @sunnnybala) - statement_timeout scoping (#158, @garagon) - Obsidian wikilinks + domain patterns (#187 slice, @knee5) - gbrain orphans command (#187 slice, @knee5) - tryParseEmbedding() availability helper - doctor detection for jsonb_integrity + markdown_body_completeness No schema, no migration, no data touch. Co-Authored-By: Claude Opus 4.7 <[email protected]>

@sunnnybala

…kilinks, orphans (#216) * fix(sync): remove nested transaction that deadlocks > 10 file syncs sync.ts wraps the add/modify loop in engine.transaction(), and each importFromContent inside opens another one. PGLite's _runExclusiveTransaction is a non-reentrant mutex — the second call queues on the mutex the first is holding, and the process hangs forever in ep_poll. Reproduced with a 15-file commit: unpatched hangs, patched runs in 3.4s. Fix drops the outer wrap; per-file atomicity is correct anyway (one file's failure should not roll back the others). (cherry picked from commit 4a1ac00) * test(sync): regression guard for #132 top-level engine.transaction wrap Reads src/commands/sync.ts verbatim and asserts no uncommented engine.transaction() call appears above the add/modify loop. Protects against silent reintroduction of the nested-mutex deadlock that hung > 10-file syncs forever in ep_poll. * feat(utils): tryParseEmbedding() skip+warn sibling for availability path parseEmbedding() throws on structural corruption — right call for ingest/ migrate paths where silent skips would be data loss. Wrong call for search/rescore paths where one corrupt row in 10K would kill every query that touches it. tryParseEmbedding() wraps parseEmbedding in try/catch: returns null on any shape that would throw, warns once per session so the bad row is visible in logs. Use it anywhere we'd rather degrade ranking than blow up the whole query. Retrofit postgres-engine.getEmbeddingsByChunkIds (the #175 slice call site) — the 5-line rescore loop was the direct motivator. Keep the throwing parseEmbedding() for everything else (pglite-engine rowToChunk, migrate-engine round-trips, ingest). * postgres-engine: scope search statement_timeout to the transaction searchKeyword and searchVector run on a pooled postgres.js client (max: 10 by default). The original code bounded each search with await sql`SET statement_timeout = '8s'` try { await sql`<query>` } finally { await sql`SET statement_timeout = '0'` } but every tagged template is an independent round-trip that picks an arbitrary connection from the pool. The SET, the query, and the reset could all land on DIFFERENT connections. In practice the GUC sticks to whichever connection ran the SET and then gets returned to the pool — the next unrelated caller on that connection inherits the 8s timeout (clipping legitimate long queries) or the reset-to-0 (disabling the guard for whoever expected it). A crash in the middle leaves the state set permanently. Wrap each search in sql.begin(async sql => …). postgres.js reserves a single connection for the transaction body, so the SET LOCAL, the query, and the implicit COMMIT all run on the same connection. SET LOCAL scopes the GUC to the transaction — COMMIT or ROLLBACK restores the previous value automatically, regardless of the code path out. Error paths can no longer leak the GUC. No API change. Timeout value and semantics are identical (8s cap on search queries, no effect on embed --all / bulk import which runs outside these methods). Only one transaction per search — BEGIN + COMMIT round-trips are negligible next to a ranked FTS or pgvector query. Also closes the earlier audit finding R4-F002 which reported the same pattern on searchKeyword. This PR covers both searchKeyword and searchVector so the pool-leak class is fully closed. Tests (test/postgres-engine.test.ts, new file): - No bare SET statement_timeout remains after stripping comments. - searchKeyword and searchVector each wrap their query in sql.begin. - Both use SET LOCAL. - Neither explicitly clears the timeout with SET statement_timeout=0. Source-level guardrails keep the fast unit suite DB-free. Live Postgres coverage of the search path is in test/e2e/search-quality.test.ts, which continues to exercise these methods end-to-end against pgvector when DATABASE_URL is set. (cherry picked from commit 6146c3b) * feat(orphans): add gbrain orphans command for finding under-connected pages Surfaces pages with zero inbound wikilinks. Essential for content enrichment cycles in KBs with 1000+ pages. By default filters out auto-generated pages, raw sources, and pseudo-pages where no inbound links is expected; --include-pseudo to disable. Supports text (grouped by domain), --json, --count outputs. Also exposed as find_orphans MCP operation. Tests cover basic detection, filtering, all output modes. Co-Authored-By: Claude Sonnet 4.6 <[email protected]> (cherry picked from commit f50954f) * feat(extract): support Obsidian wikilinks + wiki-style domain slugs in canonical extractor extractEntityRefs now recognizes both syntaxes equally: [Name](people/slug) -- upstream original [[people/slug|Name]] -- Obsidian wikilink (new) Extends DIR_PATTERN to include domain-organized wiki slugs used by Karpathy-style knowledge bases: - entities (legacy prefix some brains keep during migration) - projects (gbrain canonical, was missing from regex) - tech, finance, personal, openclaw (domain-organized wiki roots) Before this change, a 2,100-page brain with wikilinks throughout extracted zero auto-links on put_page because the regex only matched markdown-style [name](path). After: 1,377 new typed edges on a single extract --source db pass over the same corpus. Matches the behavior of the extract.ts filesystem walker (which already handled wikilinks as of the wiki-markdown-compat fix wave), so the db and fs sources now produce the same link graph from the same content. Both patterns share the DIR_PATTERN constant so adding a new entity dir only requires updating one string. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> (cherry picked from commit 1cfb156) * feat(doctor): jsonb_integrity + markdown_body_completeness detection Add two v0.12.1-era reliability checks to `gbrain doctor`: - `jsonb_integrity` scans the 4 known write sites from the v0.12.0 double-encode bug (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata) and reports rows where jsonb_typeof(col) = 'string'. The fix hint points at `gbrain repair-jsonb` (the standalone repair command shipped in v0.12.1). - `markdown_body_completeness` flags pages whose compiled_truth is <30% of the raw source content length when raw has multiple H2/H3 boundaries. Heuristic only; suggests `gbrain sync --force` or `gbrain import --force <slug>`. Also adds test/e2e/jsonb-roundtrip.test.ts — the regression coverage that should have caught the original double-encode bug. Hits all four write sites against real Postgres and asserts jsonb_typeof='object' plus `->>'key'` returns the expected scalar. Detection only: doctor diagnoses, `gbrain repair-jsonb` treats. No overlap with the standalone repair path. * chore: bump to v0.12.3 + changelog (reliability wave) Master shipped v0.12.1 (extract N+1 + migration timeout) and v0.12.2 (JSONB double-encode + splitBody + wiki types + parseEmbedding) while this wave was mid-flight. Ships the remaining pieces as v0.12.3: - sync deadlock (#132, @sunnnybala) - statement_timeout scoping (#158, @garagon) - Obsidian wikilinks + domain patterns (#187 slice, @knee5) - gbrain orphans command (#187 slice, @knee5) - tryParseEmbedding() availability helper - doctor detection for jsonb_integrity + markdown_body_completeness No schema, no migration, no data touch. Co-Authored-By: Claude Opus 4.7 <[email protected]> * docs: update project documentation for v0.12.3 CLAUDE.md: - Add src/commands/orphans.ts entry - Expand src/commands/doctor.ts with v0.12.3 jsonb_integrity + markdown_body_completeness check descriptions - Update src/core/link-extraction.ts to mention Obsidian wikilinks + extended DIR_PATTERN (entities/projects/tech/finance/personal/openclaw) - Update src/core/utils.ts to mention tryParseEmbedding sibling - Update src/core/postgres-engine.ts to note statement_timeout scoping + tryParseEmbedding usage in getEmbeddingsByChunkIds - Add Key commands added in v0.12.3 section (orphans, doctor checks) - Add test/orphans.test.ts, test/postgres-engine.test.ts, updated descriptions for test/sync.test.ts, test/doctor.test.ts, test/utils.test.ts - Add test/e2e/jsonb-roundtrip.test.ts with note on intentional overlap - Bump operation count from ~36 to ~41 (find_orphans shipped in v0.12.3) README.md: - Add gbrain orphans to ADMIN commands block Co-Authored-By: Claude Opus 4.7 <[email protected]> --------- Co-authored-by: sunnnybala <[email protected]> Co-authored-by: Gustavo Aragon <[email protected]> Co-authored-by: Clevin Canales <[email protected]> Co-authored-by: Claude Sonnet 4.6 <[email protected]> Co-authored-by: Clevin Canales <[email protected]>

Extract, import, and sync now stream per-file progress to stderr through the shared reporter. All three kept their stdout summaries + JSON action-events intact so existing tests + agent scripts are unaffected. - extract.ts (4 paths: links/timeline × fs/db): replaced the ad-hoc `process.stderr.write({event:"progress"...})` lines with reporter ticks. Same channel (stderr), canonical schema now, visible in both text and --json modes. Stdout action-events (`add_link` / `add_timeline`) untouched — tests grep them. - import.ts: the logProgress() function that printed every 100 files to stdout is now a progress.tick() call per file. Rate-gated by the reporter. Stdout still gets the final "Import complete (Xs)" summary and the --json payload. - sync.ts: three new phases (`sync.deletes`, `sync.renames`, `sync.imports`) tick per file, so big syncs show each step rather than a single end-of-run summary. Phase hierarchy ready to be child()-chained into runImport / runEmbed later, per Codex review #26. Updated the #132 nested-transaction regression test in test/sync.test.ts to also accept the new hoisted-loop shape — the guarantee (this loop is not wrapped in engine.transaction) still holds. 1686 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…t-visible heartbeats) (#293) * feat(progress): step 1 - shared ProgressReporter + CliOptions Adds the foundation for v0.14.2's bulk-action progress streaming work: - src/core/progress.ts: dependency-free reporter with auto/human/json/quiet modes, TTY-aware rendering, time+item rate gating, heartbeat helper for slow single queries, dot-composed child phases, EPIPE defense (both sync throw and async 'error' event), and a singleton module-level signal coordinator so SIGINT/SIGTERM emits abort events for all live phases without leaking per-instance listeners. - src/core/cli-options.ts: parseGlobalFlags() for --quiet / --progress-json / --progress-interval=<ms> (both space and = forms), plus cliOptsToProgressOptions() that resolves to the right mode. Non-TTY default is human-plain one-line-per-event; JSON is explicit opt-in so shell pipelines don't suddenly see structured noise. - test/progress.test.ts (17 cases): mode resolution, rate gating, no-fake- totals on heartbeat paths, EPIPE paths, SIGINT singleton, child phase composition. - test/cli-options.test.ts (14 cases): flag parsing, invalid values, interleaved flags, mode resolution. Follow-ups wire doctor/embed/files/export/extract/import/sync/migrate/ repair-jsonb/backlinks/orphans/lint/integrity/eval/autopilot/jobs plus the apply-migrations orchestrators through this reporter, and route Minion handlers to job.updateProgress instead of stderr. See the plan in ~/.claude/plans/. 1682 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(progress): step 2 - wire global flags into cli.ts Parse --quiet / --progress-json / --progress-interval from argv BEFORE command dispatch, strip them, stash resolved CliOptions on a module-level singleton (same pattern as Commander's program.opts()) and on every OperationContext created for shared-op dispatch. - src/cli.ts: parseGlobalFlags(rawArgs) at the top of main(); setCliOptions once; dispatch sees only the stripped argv. Fixes the "gbrain --progress-json doctor" unknown-command case that Codex flagged. - src/core/cli-options.ts: expose setCliOptions/getCliOptions/ _resetCliOptionsForTest singleton. Commands that want progress call getCliOptions() to construct their reporter. - src/core/operations.ts: OperationContext gains optional cliOpts field so shared-op handlers (and MCP-invoked ops that need a reporter) can read the same settings. MCP callers leave it undefined and consumers default to quiet. - test/cli-options.test.ts: +4 cases covering singleton round-trip and an integration smoke spawning `bun src/cli.ts --progress-json --version` to prove the global flag survives dispatch. 45 relevant unit tests pass (progress + cli-options + cli.test.ts). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(progress): step 3a - doctor + orphans heartbeat streaming Doctor on a 52K-page brain used to sit silent for 10+ minutes while the DB checks ran, then get killed by an agent timeout. Wired through the new reporter so agents see which check is running and the slow ones heartbeat every second. doctor.ts: - Start a single `doctor.db_checks` phase around the DB section, with a per-check heartbeat before each step (connection, pgvector, rls, schema_version, embeddings, graph_coverage, integrity, jsonb_integrity, markdown_body_completeness). - jsonb_integrity now scans 5 targets, not 4: added page_versions. frontmatter so the check surface matches `repair-jsonb` (per Codex review of the plan — the old 4-target scan missed a known repair site). Per-target heartbeat so 50K-row scans show incremental progress. - markdown_body_completeness: wrap the existing query in a 1s heartbeat timer. The regex scan over rd.data ->> 'content' can't be paginated usefully; this just lets agents see life during the sequential scan. No fake totals — the LIMIT 100 query has no meaningful total count. - integrity sample: same heartbeat pattern around the 500-page scan. orphans.ts: - findOrphans() wraps the NOT EXISTS anti-join in a 1s heartbeat. Keyset pagination was considered and rejected: without an index on links.to_page_id it's no faster than the full scan, and may re-plan the anti-join per batch. A schema migration adding that index is the right fix and is queued for v0.14.3. Follow-ups: - Step 3b: wire embed/files/export (the \r-only stdout offenders). - Step 5: end-to-end progress test spawning `gbrain doctor --progress-json` against a fixture brain, asserting stderr events and clean stdout. All existing unit tests continue to pass (76/76 in doctor + orphans + progress + cli-options). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(progress): step 3b - embed + files + export stderr progress Replaces the \r-on-stdout progress pattern in the three worst offenders (embed, files sync, export) with the shared reporter on stderr. Stdout now carries only final summaries, so scripts and tests that grep for counts ("Embedded N chunks", "Files sync complete", "Exported N pages") still work when output is piped. - embed.ts: runEmbedCore accepts an optional onProgress callback. The CLI wrapper builds a reporter and passes reporter.tick(); Minion handlers will pass job.updateProgress in Step 4. Worker-pool is single-threaded JS so no rate-gate race (per Codex review #18). - files.ts syncFiles(): tick per file; summary preserved on stdout. - export.ts: tick per page; summary preserved on stdout. Also fixes a --quiet flag collision. `skillpack-check` has its own --quiet mode (suppress all stdout). parseGlobalFlags strips --quiet globally now, and skillpack-check reads the resolved CliOptions singleton via getCliOptions() instead of re-parsing argv. Test updated to match the stripping behavior. 1686 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(progress): step 3c - extract + import + sync reporter streaming Extract, import, and sync now stream per-file progress to stderr through the shared reporter. All three kept their stdout summaries + JSON action-events intact so existing tests + agent scripts are unaffected. - extract.ts (4 paths: links/timeline × fs/db): replaced the ad-hoc `process.stderr.write({event:"progress"...})` lines with reporter ticks. Same channel (stderr), canonical schema now, visible in both text and --json modes. Stdout action-events (`add_link` / `add_timeline`) untouched — tests grep them. - import.ts: the logProgress() function that printed every 100 files to stdout is now a progress.tick() call per file. Rate-gated by the reporter. Stdout still gets the final "Import complete (Xs)" summary and the --json payload. - sync.ts: three new phases (`sync.deletes`, `sync.renames`, `sync.imports`) tick per file, so big syncs show each step rather than a single end-of-run summary. Phase hierarchy ready to be child()-chained into runImport / runEmbed later, per Codex review #26. Updated the #132 nested-transaction regression test in test/sync.test.ts to also accept the new hoisted-loop shape — the guarantee (this loop is not wrapped in engine.transaction) still holds. 1686 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(progress): step 3d - migrate/repair/backlinks/lint/integrity/eval Wires the remaining bulk commands through the reporter: - migrate-engine: phase starts (migrate.copy_pages, migrate.copy_links), per-page tick. Old \"Progress: N/total\" stdout logs replaced by stderr ticks; final stdout summary preserved. - repair-jsonb: per-column start + a heartbeat timer while each UPDATE runs (minutes on 50K-row tables). CRITICAL: stdout stays clean so migrations/v0_12_2.ts's JSON.parse(child.stdout) still works. Per Codex review #12. - backlinks: 1s heartbeat around findBacklinkGaps() (sync double-walk of the brain dir). - lint: tick per page; per-issue stdout output preserved. - integrity auto: tick per page in the main resolver loop. The separate ~/.gbrain/integrity-progress.jsonl resume marker is untouched (its role shifts from live progress reporting to resume-only). - eval: add an onProgress option to core's runEval(), CLI wraps with a reporter. Phases: eval.single / eval.ab. Tick per query. core/search/eval.ts gains a RunEvalOptions type so future callers (MCP eval op, Minion handlers) can also hook in without the reporter. 1686 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(progress): step 3e - onProgress callbacks on core libs - src/core/embedding.ts: embedBatch() gains an optional EmbedBatchOptions.onBatchComplete callback, fired after each 100-item sub-batch. CLI wrappers pass reporter.tick; Minion handlers can pass job.updateProgress. - src/core/enrichment-service.ts: enrichEntities() config gains onProgress(done, total, name) fired after each entity. Same split: CLI -> reporter, Minion -> DB-backed progress. No CLI behavior change on its own. Wiring these callbacks into the Minion handlers is Step 4. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(progress): step 4 - orchestrators + upgrade + minion handlers - cli-options.ts: childGlobalFlags() returns the flag suffix to append to child gbrain subprocesses. Empty string by default, " --quiet --progress-json" when the parent has them set, so child behavior inherits the parent's progress-mode without scattering string-concat logic across every execSync site. - migrations/v0_12_2.ts: each execSync inherits the parent's global flags. Phase C (repair-jsonb --dry-run --json) pins explicit stdio to ['ignore','pipe','inherit'] so child stderr streams straight through while stdout stays captured for JSON.parse. Per Codex review #12. - migrations/v0_12_0.ts + v0_11_0.ts: same childGlobalFlags wiring at each gbrain-subcommand execSync. - upgrade.ts: post-upgrade timeout bumped 300s → 30min (1_800_000 ms) with GBRAIN_POST_UPGRADE_TIMEOUT_MS override. The old 300s cap killed v0.12.0 graph-backfill migrations on 50K+ brains; the heartbeat wiring added in v0.14.2 makes long waits observable, so a generous ceiling no longer means users stare at a silent terminal. - jobs.ts: the embed Minion handler passes job.updateProgress as the onProgress callback, so per-job progress is durable in minion_jobs and readable via `gbrain jobs get <id>`. Primary Minion progress channel is DB-backed — stderr from `jobs work` stays coarse for daemon liveness only. Per Codex review #20. 1686 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(progress): step 5 - E2E doctor-progress test + CI guard scripts/check-progress-to-stdout.sh greps src/ for the banned `process.stdout.write('\r…')` pattern that v0.14.2 removed from the bulk-action codepaths. Wired into the `bun run test` script so any future regression that puts progress back on stdout fails fast. An empty allowlist documents the position: every known call site was migrated; new exceptions need a rationale in the allowlist. test/e2e/doctor-progress.test.ts (Tier 1, needs Postgres + pgvector): - `gbrain --progress-json doctor --json`: stderr carries JSONL progress events with the canonical {event, phase, ts} shape, starts + finishes for `doctor.db_checks`. Stdout stays parseable JSON — no progress pollution. - `gbrain doctor` (no flag): human-plain progress goes to stderr only, stdout stays free of `[doctor.db_checks]`. - `gbrain --quiet doctor`: reporter emits nothing; doctor still runs to completion. test/cli-options.test.ts: +2 spawning integration tests. One verifies `gbrain --progress-json --version` keeps stdout clean of progress events (single-shot commands that don't use a reporter aren't affected). One guards the skillpack-check --quiet regression — --quiet suppresses stdout by reading the resolved CliOptions singleton, not re-parsing argv. Full test matrix: bun run test -> 1726 pass / 184 skipped (no DB) / 0 fail bun run test:e2e -> 136 pass / 13 skipped / 0 fail Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * feat(progress): step 6 - docs + v0.14.2 release bump - VERSION + package.json bumped to 0.14.2. - docs/progress-events.md (new): canonical JSON event schema reference. Stable from v0.14.2, additive only. Lists every phase name shipped in this release, the five event types (start/tick/heartbeat/finish/ abort), the TTY/non-TTY rendering rules, subprocess inheritance semantics, and the Minion DB-backed progress model. - CLAUDE.md: "Bulk-action progress reporting" section under the build instructions; Key files entries for src/core/progress.ts, src/core/cli-options.ts, scripts/check-progress-to-stdout.sh, and docs/progress-events.md; doctor.ts entry updated to note the v0.14.2 5-target jsonb_integrity scan + heartbeat wiring. - CHANGELOG.md v0.14.2: full release summary per project voice rules. The "numbers that matter" table, per-command before/after grid, backward-compat warnings for stdout→stderr moves, and an itemized changes section covering reporter/CLI plumbing/schema/Minion handlers/doctor fixes/upgrade timeout/CI guard/tests. No em dashes. Real file paths, real commands, real numbers. - skills/migrations/v0.14.2.md (new): agent migration note. Mechanical step is "nothing" since v0.14.2 is purely additive. Walks agents through the three new global flags, the 14 wired commands, the event schema cheat sheet, Minion progress via job.updateProgress, and scripts/verification commands. Full test matrix: bun run test (unit + guards) -> 1726 pass / 184 skipped / 0 fail bun run test:e2e (Postgres) -> 141 pass / 8 skipped / 0 fail Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * chore: bump version to 0.15.2, restore master's [0.14.2] CHANGELOG entry Master sits at 0.14.2 (reliability wave). This PR lands on top as 0.15.2 (progress streaming wave). Splits the merge-time combined CHANGELOG entry back into two discrete release sections so history stays honest: - [0.15.2] = progress reporter, CliOptions, 14 wired commands, Minion embed handler, doctor jsonb_integrity 5-target fix, upgrade timeout bump, CI guard, progress unit+E2E tests. - [0.14.2] = master's eight root-cause bug fixes, restored verbatim from origin/master. Touched files: - VERSION + package.json: 0.14.2 -> 0.15.2 (next patch off master). - skills/migrations/v0.14.2.md -> skills/migrations/v0.15.2.md (rename + rewrite frontmatter + body to v0.15.2). - CHANGELOG.md: split into two entries; progress-wave refs renamed v0.14.2 -> v0.15.2; reliability-wave entry restored from master. - src/core/progress.ts, src/commands/doctor.ts, src/commands/sync.ts, src/commands/upgrade.ts, docs/progress-events.md, test/sync.test.ts: progress-wave v0.14.2 references -> v0.15.2. The remaining v0.14.2 references in test/e2e/migration-flow.test.ts (Bug 3 context) and CLAUDE.md (reliability-wave key commands, Bug 3 ledger move) correctly point at master's 0.14.2 release. Test matrix after version bump: bun run test -> 1780 pass / 179 skipped / 0 fail Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>

garrytan mentioned this pull request Apr 18, 2026

fix: JSONB double-encode + splitBody wiki + parseEmbedding (v0.12.1) #196

Merged

7 tasks

This was referenced Apr 18, 2026

extract: --source fs walker does not respect isSyncable prefix exclusions #202

Open

extractAndEnrich writes pages from untrusted text without a gate #160

Open

garrytan mentioned this pull request Apr 19, 2026

v0.12.3: Reliability wave — sync deadlock, search timeout scoping, wikilinks, orphans #216

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sync): deadlock on incremental sync with > 10 modified files#132

fix(sync): deadlock on incremental sync with > 10 modified files#132
sunnnybala wants to merge 1 commit intogarrytan:masterfrom
sunnnybala:fix/sync-nested-transaction-deadlock

sunnnybala commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sunnnybala commented Apr 15, 2026

Summary

Details

The deadlock chain

Reproduction

Why this has flown under the radar

Where it bites in production

Why removing the outer wrap is safe

The diff

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant