fix(extract+migrate): kill N+1 hang + v0.12.0 migration timeout (v0.12.1)#198
Merged
fix(extract+migrate): kill N+1 hang + v0.12.0 migration timeout (v0.12.1)#198
Conversation
Multi-row INSERT...SELECT FROM unnest() JOIN pages ON CONFLICT DO NOTHING RETURNING 1. 4 array-typed bound parameters (links) or 5 (timeline) regardless of batch size, sidesteps Postgres's 65535-parameter cap. Returns count of rows actually inserted (excluding ON CONFLICT no-ops and JOIN-dropped rows whose slugs don't exist). Per-row addLink / addTimelineEntry signatures and SQL behavior unchanged. All 10 existing call sites compile and behave identically. Tests: 11 PGLite cases (empty batch, missing optionals, within-batch dedup, JOIN drops missing slug, half-existing batch, batch of 100) + 9 E2E postgres-engine cases against real Postgres+pgvector.
…hema timeout Production bug: v0.12.0 schema migration timed out at Supabase Management API's 60s ceiling on brains with 80K+ duplicate timeline rows. The DELETE...USING self-join was O(n²) without an index on the dedup columns. Fix: pre-create idx_links_dedup_helper / idx_timeline_dedup_helper on the dedup columns BEFORE the DELETE, drop after. Turns O(n²) into O(n log n). On 80K+ rows the migration completes in <1s instead of timing out. Also bumps the v0.12.0 orchestrator's phaseASchema timeout 60s -> 600s as belt-and-suspenders for unforeseen slowness. Exports MIGRATIONS for structural test assertions. Tests: 2 structural assertions (helper-index DDL must appear in v8/v9 SQL in the right order — catches regression even at 0-row scale) + 2 behavioral regression tests (1000-row dedup completes <5s).
Production bug: gbrain extract hung 10+ minutes producing zero output on
47K-page brains. The pre-load loop called engine.getLinks(slug) (or
getTimeline) once per page across engine.listPages({limit: 100000}) — 47K
serial round-trips over the Supabase pooler before the first file was read.
Both engines already enforced uniqueness at the SQL layer
(UNIQUE(from, to, link_type) on links, idx_timeline_dedup on timeline_entries).
The in-memory dedup Set was redundant insurance that became the bottleneck.
Fix: delete the pre-load entirely. Buffer 100 candidates per file walk,
flush via engine.addLinksBatch / engine.addTimelineEntriesBatch. ~99% fewer
DB round-trips per re-extract.
Also fixes counter accuracy: 'created' now counts rows actually inserted
(via batch RETURNING 1 row count). Re-run on a fully-extracted brain
prints 'Done: 0 links' instead of lying.
Dry-run mode keeps a per-run dedup Set so duplicate candidates from N
markdown files print exactly once, not N times.
Batch errors are visible in BOTH json and human modes — silent loss of
100 rows is worse than per-row error visibility.
Tests: extract-fs.test.ts (idempotency + truthful counter + dry-run dedup
+ perf regression guard <2s).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Reflect what shipped in v0.12.1: - New engine methods addLinksBatch + addTimelineEntriesBatch (PGLite via unnest() + manual $N, postgres-engine via INSERT...SELECT FROM unnest($1::text[], ...) JOIN pages ON CONFLICT DO NOTHING). - extract.ts no longer pre-loads dedup set; candidates are buffered 100 at a time and flushed via the new batch methods. - v0.12.0 orchestrator phaseASchema timeout bumped 60s to 600s. - Test counts 1297 unit / 105 E2E to 1412 unit / 119 E2E. - New test/extract-fs.test.ts covers the N+1 regression guard. - BrainEngine method count 37/38 to 40. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
joedanz
added a commit
to joedanz/pbrain
that referenced
this pull request
Apr 19, 2026
…rytan#198) Adds addLinksBatch / addTimelineEntriesBatch to the BrainEngine interface. Both engines (PGLite + Postgres) implement the batch as a single `INSERT ... SELECT FROM unnest(...) JOIN pages ON CONFLICT DO NOTHING RETURNING 1` statement — 4-5 array-typed bound parameters regardless of batch size, sidestepping the 65535-parameter cap and the postgres-js sql(rows, ...) helper's identifier-escape gotcha. pbrain extract (file-source) now flushes candidates 100 at a time via the batch API instead of one write per link/entry. On large brains this drops wall-clock from "tens of minutes" to "seconds" end-to-end. Schema changes required for ON CONFLICT (from_page_id, to_page_id, link_type) to match a real unique index: - links table UNIQUE constraint widened from 2 columns to 3 (from_page_id, to_page_id, link_type), matching upstream v0.10.3 knowledge-graph layer. addLink's ON CONFLICT clause updated in both engines. Fresh installs get the new constraint from schema.sql / pglite-schema.ts; existing installs pick it up via migration v8. - src/core/migrate.ts picks up upstream's v5-v10 schema migrations. v5-v7 (minion_jobs / inbox / attachments) create empty tables that this fork doesn't use; they're harmless and keep the schema forward-compatible with a future Minions integration. v8-v10 touch the links and timeline_entries tables we already have. Notes on deferred upstream work: - The DB-source `pbrain extract --source db` path remains deferred with the knowledge-graph layer (depends on src/core/link-extraction.ts which we haven't taken). File-source extract is what uses the new batch API. - Upstream's v0.12.0 Knowledge Graph auto-wire orchestrator is NOT registered (src/commands/migrations/v0_12_0.ts deleted). (cherry picked from commit 699db50d6cb3b96c3ba3cea8cd55c78f0f9c3bae) Co-Authored-By: Garry Tan <[email protected]>
7 tasks
Merged
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two production-blocking bugs Garry hit on his 47K-page brain on April 18.
Performance / Bug Fix:
gbrain extractno longer hangs 10+ minutes producing zero output on large brains. Removed the N+1 dedup pre-load that issued oneengine.getLinks(slug)(orgetTimeline) call per page acrossengine.listPages({limit: 100000})— 47K serial round-trips over the Supabase pooler before the first file was read. Both engines already enforced uniqueness at the SQL layer; the in-memory dedupSetwas redundant insurance that became the bottleneck.extractLinksFromDir,extractTimelineFromDir,extractLinksFromDB,extractTimelineFromDB) now buffer 100 candidates and flush via new batch engine methods. ~99% fewer DB round-trips per re-extract.DELETE...USINGself-join, then drop it. Turns O(n²) dedup into O(n log n). On 80K+ duplicate rows the migration completes in <1s.New Engine API:
addLinksBatch(LinkBatchInput[]) → Promise<number>andaddTimelineEntriesBatch(TimelineBatchInput[]) → Promise<number>on bothPostgresEngineandPGLiteEngine. Multi-rowINSERT...SELECT FROM unnest($1::text[], ...) JOIN pages ON CONFLICT DO NOTHING RETURNING 1— 4 (links) or 5 (timeline) array-typed bound parameters regardless of batch size, sidestepping Postgres's 65535-parameter cap.addLink/addTimelineEntryare unchanged. All 10 existing call sites compile and behave identically. Plugin authors can adopt the batch methods at their own pace.Correctness:
createdcounter now truthful on re-runs (returns count of rows actually inserted viaRETURNING 1, not "calls that didn't throw"). Re-run on a fully-extracted brain printsDone: 0 linksinstead of lying.--dry-rundeduplicates duplicate candidates across files (per-runSetonly whendryRun=true). A link extracted from 3 files prints exactly once, matching what batch insert would actually create.Operational:
phaseASchematimeout bumped 60s → 600s as belt-and-suspenders for unforeseen slowness.Test Coverage
Pre-Landing Review
Pre-Landing Review: No issues found.(Pass 1 + Pass 2 of~/.claude/skills/gstack/review/checklist.md)src/schema.sql:51-107)Specialist Review
5 specialist subagents (testing, maintainability, performance, security, data-migration) ran in parallel against the diff. Real bugs caught:
sql(rows, 'col1', ...)helper inside(VALUES) AS v(...)JOIN syntax. Doesn't compose —escapeIdentifierserrored at runtime with "str.replace is not a function" because the helper interpreted column names as identifiers in a context that expected values. Caught by adding the 9 E2E tests, then rewrote both engines to use the cleanerunnest($1::text[], ...)array-parameter pattern. PGLite was rewritten for consistency too. All 73 mechanical E2E tests now pass.Informational items (deferred):
getAllSlugs()+ per-sluggetPage()). Filed as P1 TODO — separate fix needs agetPagesBatch(slugs[])engine method or streaming cursor.CREATE INDEX(no CONCURRENTLY). Accepted: short window (<1s on 80K rows), CONCURRENTLY illegal in transactions, schema migrations are upgrade-time.Plan Completion
10/10 plan items DONE (per
/Users/garrytan/.claude/plans/system-instruction-you-are-working-velvet-valley.md):TODOS
extractLinksFromDB/extractTimelineFromDB. Needs agetPagesBatch(slugs)engine method.Documentation
Updated
CLAUDE.md(commit2b39e90) to reflect v0.12.1:LinkBatchInput/TimelineBatchInputexports onsrc/core/engine.ts; documented the new batch methods on both engines with the actual SQL shape.createdcounter is truthful on re-runs.phaseASchematimeout bumped from 60s to 600s.README.md, CONTRIBUTING.md, TODOS.md unchanged (no references to changed areas; deferred items already captured).
Test plan
bun test): 1412 pass, 0 fail🤖 Generated with Claude Code