fix(status): paginate col.get() to stay under SQLite variable limit#1016
fix(status): paginate col.get() to stay under SQLite variable limit#1016eldar702 wants to merge 1 commit intoMemPalace:developfrom
Conversation
`mempalace status` currently calls `col.get(limit=total, include=["metadatas"])`
with no pagination. Chroma's backend issues a SQL statement whose host-parameter
count scales with the limit. Once `total > 32766` (SQLite's default
`SQLITE_MAX_VARIABLE_NUMBER`) the call raises:
chromadb.errors.InternalError: Error executing plan:
Internal error: error returned from database:
(code: 1) too many SQL variables
Any palace larger than ~32 K drawers can no longer run `mempalace status`.
Fix: paginate the .get() in 10 K batches and aggregate metadatas.
No behavior change for smaller palaces.
Fixes MemPalace#1015
92f98f9 to
492b316
Compare
Code reviewFound 1 issue:
Lines 850 to 861 in 492b316 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
|
For context — I filed #1036 this morning with the same pagination approach, not realising your PR already existed. Sorry for the dup. Yours is the narrower fix and landed first; the logic looks right to me. The fork has been running a semantically identical paginated |
|
Confirming the reproducer and fix on a large palace. Setup: mempalace v3.3.2, pipx install, chromadb 1.5.8, Python 3.13, macOS, palace with 114,232 drawers across 11 wings (5 project + 6 diary). Crash trace: After applying this PR: The offset-pagination pattern in this PR matches what Also resolves #1073 (filed today), #950, #802. +1 for merge. |
|
Ran into this exact bug last week on a ~465k-drawer palace and filed #1073 before I spotted your PR. Same root cause: One thing before you merge though. There's another call site with the same pattern that this PR leaves alone, total = drawers_col.count() # line 219
...
all_data = drawers_col.get(limit=total, include=["documents", "metadatas"])Reproduces identically via If it's easy to drop the same 10k chunking into |
Mirror the pagination pattern PR MemPalace#851 landed in miner.py:status(). A single drawers_col.get(limit=total, ...) on palaces larger than SQLite's SQLITE_MAX_VARIABLE_NUMBER (32766) crashes inside chromadb. Fetch drawers in batch_size=5000 chunks, stepping offset until the collection is drained. by_source aggregation semantics are preserved exactly — grouping, wing filter, meta capture all unchanged. Closes MemPalace#1073. Related: MemPalace#802, MemPalace#850, MemPalace#1016.
Mirror the pagination pattern PR MemPalace#851 landed in miner.py:status(). A single drawers_col.get(limit=total, ...) on palaces larger than SQLite's SQLITE_MAX_VARIABLE_NUMBER (32766) crashes inside chromadb. Fetch drawers in batch_size=5000 chunks, stepping offset until the collection is drained. by_source aggregation semantics are preserved exactly — grouping, wing filter, meta capture all unchanged. Closes MemPalace#1073. Related: MemPalace#802, MemPalace#850, MemPalace#1016.
Mirror the pagination pattern PR MemPalace#851 landed in miner.py:status(). A single drawers_col.get(limit=total, ...) on palaces larger than SQLite's SQLITE_MAX_VARIABLE_NUMBER (32766) crashes inside chromadb. Fetch drawers in batch_size=5000 chunks, stepping offset until the collection is drained. by_source aggregation semantics are preserved exactly — grouping, wing filter, meta capture all unchanged. Closes MemPalace#1073. Related: MemPalace#802, MemPalace#850, MemPalace#1016.
Fixes #1015.
Summary
mempalace statuscrashes on any palace with more than ~32 766 drawers with:Root cause:
miner.py::status()callscol.get(limit=total)unbounded. Chroma issues one SQL statement whose host-parameter count grows with the limit, and oncetotal > 32 766(SQLite's defaultSQLITE_MAX_VARIABLE_NUMBER) SQLite rejects it.Change
Paginate
col.get()in 10 000-drawer batches and aggregatemetadatas. No behavior change for smaller palaces.One file touched, 9 added / 3 removed.
Reproducer
On any palace with > 32 766 drawers:
Minimal programmatic repro:
Exact inflection point:
Test plan
mempalace statussucceeds on a 43 852-drawer palace after the patch (produced the normal Wing/Room breakdown)mempalace statussucceeds on an empty palacepytest) — not run locally; maintainers please verify in CIWhy 10 000
SQLITE_MAX_VARIABLE_NUMBERis 32 766 on most SQLite ≥ 3.32 builds; some older distributions ship it as 999. 10 000 is comfortably below both and large enough that aggregation overhead stays negligible (palace would need > 100 K drawers before the per-batch overhead mattered).