refactor: consolidate pagination into shared iter_metadatas() (#171) by Formatted · Pull Request #482 · MemPalace/mempalace

Formatted · 2026-04-10T02:13:01Z

What does this PR do?

Started as a CLI-path fix for the 10k truncation bug in miner.status() (#171), then evolved — per reviewer suggestion — into a codebase-wide consolidation.

Before: three separate pagination implementations:

mcp_server.py — private _iter_metadatas() generator
palace_graph.py — inline while offset < total loop with col.get(limit=1000)
miner.py — single col.get(limit=10000) call (the original bug)

After: one shared iter_metadatas(col, where=None, batch=500) generator in palace.py, alongside get_collection and file_already_mined. All three callers import and use it. Single place to tune batch size or fix edge cases going forward.

How to test

pip install -e ".[dev]"
python -m pytest tests/test_miner.py::test_status_paginates_beyond_single_page -v
python -m pytest tests/ -v

The new test mocks a 1100-drawer palace across three pages (500 + 500 + 100) and asserts the correct total is printed and that col.get was called three times with advancing offsets.

Checklist

Tests pass (python -m pytest tests/ -v)
No hardcoded paths
Linter passes (ruff check .)

Fixes MCP server: mempalace_list_wings and mempalace_get_taxonomy return empty despite data in palace #171 (CLI path)
Removes duplication flagged during review of fix(mcp): paginate taxonomy tools and log instead of swallowing errors #307 and feat: MCP tools, hooks, export, search threshold + fixes #493

web3guru888

This fixes a real pain point. We hit the 10k truncation on our integration (710+ KG entities, growing) and had to work around it.

_iter_metadatas generator — Clean pattern. The batch-of-500 with early termination on short page is correct. Good defensive handling of None metadatas — we've seen ChromaDB return that on freshly-created collections.

partial flag in status() — Smart addition. Having the response self-report when it couldn't scan all metadata is useful for clients. One thought: when partial is true, would it also make sense to include total_from_meta in the response so the caller can see how many were actually scanned vs total_drawers?

Miner-side pagination — The parallel fix in miner.py is necessary too. Only note: unlike the MCP side which uses the shared _iter_metadatas helper, the miner reimplements its own pagination loop with _PAGE = 500. Consider extracting a shared utility to avoid the two loops drifting apart — but that's a follow-up, not a blocker for this PR.

Tests — The _TAXONOMY_BATCH=2 monkeypatch to force multi-page iteration on a 4-item seeded collection is clever. The None metadatas edge case test is particularly valuable.

Strong +1. This is the kind of fix that prevents silent data loss at scale.

🔭 Reviewed as part of the MemPalace-AGI integration project — autonomous research with perfect memory. Community interaction updates are posted regularly on the dashboard.

Formatted · 2026-04-10T02:53:17Z

Good call on the shared utility — went ahead and did the refactor. Extracted the pagination loop into palace.py as iter_metadatas(col, where=None, batch=500), alongside get_collection and file_already_mined (that module's own docstring already calls it out as the home for shared ChromaDB patterns). miner.status() now just iterates iter_metadatas(col) — the inline loop is gone. mcp_server.py can swap its own _iter_all_metadatas for the shared one once #307 merges.

web3guru888

This is exactly the right fix. We were running with the 10k truncation bug for months before we caught it — wrong wing/room counts in mempalace status that were just plausible enough to not trigger suspicion.

The pagination approach (offset loop in batches) is consistent with what palace_graph.py already does; good that you matched the pattern rather than introducing a new one.

One observation: mcp_server.py tool_status, tool_list_wings, tool_list_rooms, and tool_get_taxonomy all have the same uncapped col.get(limit=10000) call — tracked in #478. This PR handles the CLI/miner path; those four MCP tools are still outstanding. Might be worth a follow-up PR or a note in this one to make the scope explicit.

Tests look appropriate. LGTM for the miner.py pagination path.

Formatted · 2026-04-10T05:50:08Z

The MCP tools are covered — #493 fixes all four (tool_status, tool_list_wings, tool_list_rooms, tool_get_taxonomy) using the same paginated offset pattern. This PR stays scoped to the CLI/miner path.

web3guru888 · 2026-04-10T06:03:39Z

Got it — good to know #493 covers the MCP side. The split makes sense: fixing the CLI path here and the MCP tools in a focused PR there is cleaner than trying to do it all in one. Both can land independently.

Formatted · 2026-04-10T19:39:31Z

Scope has expanded since that comment. After the discussion on #478, went ahead and did the full consolidation: removed _iter_metadatas from mcp_server.py and the inline loop from palace_graph.py — both now use iter_metadatas() from palace.py. Single implementation across the whole codebase.

web3guru888 · 2026-04-10T20:04:56Z

That's a cleaner outcome than the scoped fix — single iter_metadatas() implementation across the whole codebase means there's exactly one place to update when ChromaDB changes its pagination API again (and it will). Good call on expanding scope after the #478 discussion.

Does this now supersede the pagination work in PR #563 (the col.count() > 10000 dynamic limit fix), or are they complementary? Asking because both seem to be addressing the 10K wall from different angles — #563 fixes the limit query, this PR fixes the shared implementation. Might be worth coordinating to avoid merge conflicts.

Formatted · 2026-04-10T20:15:46Z

Looked at #563 — they conflict rather than complement. #563 replaces limit=10000 with limit=max(col.count(), 1) in mcp_server.py, which is a single unbounded fetch that scales with palace size. Our PR already removed those lines in favour of iter_metadatas(). Beyond the merge conflict, the max(col.count(), 1) approach trades one problem for another — on a 125k-drawer palace it fetches all 125k records in a single col.get() call, which is the kind of memory spike the pagination is meant to avoid.

web3guru888 · 2026-04-10T20:31:55Z

Got it — #482 already removed those lines in favour of iter_metadatas(), so #563's max(col.count(), 1) patch is patching code that doesn't exist in this branch. The pagination approach is strictly better for large palaces anyway; the single-fetch fix just moves the memory wall up rather than removing it.

Formatted · 2026-04-11T14:56:34Z

@milla-jovovich @bensig — rebased and conflict-free. Quick summary of where this stands:

Fixes the 10k truncation bug in miner.status() (CLI path of MCP server: mempalace_list_wings and mempalace_get_taxonomy return empty despite data in palace #171)
Extracts iter_metadatas() into palace.py as a shared utility, consolidating the separate pagination loops in mcp_server.py and palace_graph.py onto a single implementation
jphein has confirmed the pagination piece from the feat: batch writes, concurrent mining, MCP tools, hooks, export, search improvements #562 split will defer to this PR rather than duplicate it (fix: replace hardcoded limit=10000 with dynamic col.count() in navigation tools #563 is also effectively superseded)
535 tests passing; the 3 pre-existing failures in TestTaxonomyPagination are from a test helper signature mismatch already present on main

- add partial flag to tool_status when metadata count diverges from col.count() - fix `if where` → `if where is not None` in _iter_metadatas - add TestTaxonomyPagination covering pagination for all 4 tools, partial flag, and tool_list_rooms with where filter Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…lace#171) Replaces the single col.get(limit=10000) call in status() with a 500-item pagination loop, matching the fix applied to MCP taxonomy tools in MemPalace#307. Palaces with >10k drawers now report correct drawer counts instead of silently truncating at 10k.

Moves the pagination loop from miner.status() into palace.py alongside get_collection and file_already_mined, so mcp_server.py can adopt the same helper instead of maintaining a parallel implementation.

Removes the private _iter_metadatas from mcp_server.py and the inline offset loop from palace_graph.py — both now use the shared iter_metadatas() helper in palace.py. Single implementation, single place to tune batch size or fix edge cases.

Formatted · 2026-04-11T15:01:14Z

Rebased again onto upstream/main after PR #371 landed — that PR added inline pagination loops to mcp_server.py which conflicted with our iter_metadatas() consolidation. Resolved in favour of the shared utility, incorporating the "partial": true response key from #371's tool_status implementation. Conflicts clear, 590 tests passing.

bensig · 2026-04-12T04:41:36Z

Closing — this is superseded by recently merged PRs to develop. Thank you for the contribution!

Formatted force-pushed the fix/171-miner-status-pagination branch from 84438f6 to 1d77784 Compare April 10, 2026 02:15

web3guru888 reviewed Apr 10, 2026

View reviewed changes

web3guru888 mentioned this pull request Apr 10, 2026

fix: use epsilon comparison for mtime dedup + add bulk pre-fetch #483

Closed

2 tasks

web3guru888 mentioned this pull request Apr 10, 2026

BUG: Status/taxonomy MCP tools silently truncate at 10,000 drawers — wrong counts for large palaces #478

Closed

web3guru888 reviewed Apr 10, 2026

View reviewed changes

Formatted force-pushed the fix/171-miner-status-pagination branch from 62c984a to 1ba5ab8 Compare April 10, 2026 05:49

mvalentsev mentioned this pull request Apr 10, 2026

mempalace_list_wings / mempalace_list_rooms / mempalace_get_taxonomy silently return empty results on large collections #338

Closed

Formatted requested review from bensig and milla-jovovich as code owners April 10, 2026 19:37

Formatted changed the title ~~fix(miner): paginate status() to avoid 10k ChromaDB truncation (#171)~~ refactor: consolidate pagination into shared iter_metadatas() (#171) Apr 10, 2026

Formatted force-pushed the fix/171-miner-status-pagination branch from e9c0393 to 48217c9 Compare April 10, 2026 19:40

web3guru888 mentioned this pull request Apr 10, 2026

fix: replace hardcoded limit=10000 with dynamic col.count() in navigation tools #563

Open

web3guru888 mentioned this pull request Apr 11, 2026

There are too few rooms #599

Open

Formatted mentioned this pull request Apr 11, 2026

feat: batch writes, concurrent mining, MCP tools, hooks, export, search improvements #562

Closed

11 tasks

Formatted force-pushed the fix/171-miner-status-pagination branch from 48217c9 to 0e6500a Compare April 11, 2026 14:56

Formatted and others added 6 commits April 11, 2026 08:58

Update test_mcp_server.py

2838c55

fix/mcp-taxonomy-pagination

1c03b55

refactor: extract iter_metadatas into palace.py for shared use

1cf836b

Moves the pagination loop from miner.status() into palace.py alongside get_collection and file_already_mined, so mcp_server.py can adopt the same helper instead of maintaining a parallel implementation.

Formatted force-pushed the fix/171-miner-status-pagination branch from 0e6500a to c4d4d7a Compare April 11, 2026 15:00

bensig changed the base branch from main to develop April 11, 2026 22:22

bensig requested a review from igorls as a code owner April 11, 2026 22:22

bensig closed this Apr 12, 2026

Formatted deleted the fix/171-miner-status-pagination branch April 12, 2026 13:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: consolidate pagination into shared iter_metadatas() (#171)#482

refactor: consolidate pagination into shared iter_metadatas() (#171)#482
Formatted wants to merge 6 commits intoMemPalace:developfrom
Formatted:fix/171-miner-status-pagination

Formatted commented Apr 10, 2026 •

edited

Loading

Uh oh!

web3guru888 left a comment

Uh oh!

Formatted commented Apr 10, 2026

Uh oh!

web3guru888 left a comment

Uh oh!

Formatted commented Apr 10, 2026

Uh oh!

web3guru888 commented Apr 10, 2026

Uh oh!

Formatted commented Apr 10, 2026

Uh oh!

web3guru888 commented Apr 10, 2026

Uh oh!

Formatted commented Apr 10, 2026

Uh oh!

web3guru888 commented Apr 10, 2026

Uh oh!

Formatted commented Apr 11, 2026

Uh oh!

Formatted commented Apr 11, 2026

Uh oh!

bensig commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Formatted commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

How to test

Checklist

Related

Uh oh!

web3guru888 left a comment

Choose a reason for hiding this comment

Uh oh!

Formatted commented Apr 10, 2026

Uh oh!

web3guru888 left a comment

Choose a reason for hiding this comment

Uh oh!

Formatted commented Apr 10, 2026

Uh oh!

web3guru888 commented Apr 10, 2026

Uh oh!

Formatted commented Apr 10, 2026

Uh oh!

web3guru888 commented Apr 10, 2026

Uh oh!

Formatted commented Apr 10, 2026

Uh oh!

web3guru888 commented Apr 10, 2026

Uh oh!

Formatted commented Apr 11, 2026

Uh oh!

Formatted commented Apr 11, 2026

Uh oh!

bensig commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Formatted commented Apr 10, 2026 •

edited

Loading