Skip to content

fix: paginate large collection reads and surface errors in MCP tools (#339, #338)#371

Merged
bensig merged 9 commits intoMemPalace:mainfrom
RhettOP:fix/issue-339-338-silent-exceptions-pagination
Apr 11, 2026
Merged

fix: paginate large collection reads and surface errors in MCP tools (#339, #338)#371
bensig merged 9 commits intoMemPalace:mainfrom
RhettOP:fix/issue-339-338-silent-exceptions-pagination

Conversation

@RhettOP
Copy link
Copy Markdown

@RhettOP RhettOP commented Apr 9, 2026

Summary

Fixes #339 and #338 - both issues share the same root cause in mempalace/mcp_server.py.

Root cause

Four functions (tool_status, tool_list_wings, tool_list_rooms, tool_get_taxonomy) used:

  1. col.get(include=["metadatas"], limit=10000) - a single unbounded fetch that OOMs or hits ChromaDB HNSW errors on large collections (100k+ drawers).
  2. except Exception: pass - silently swallowed every error, making the caller receive an empty {} that looked like a valid but empty palace.

Fix

Pagination (fixes #338): Replace the single col.get() call with a while offset < total loop using batch_size=5000. Each iteration fetches one page and accumulates results.

  • tool_list_wings: paginated count-by-wing aggregation
  • tool_list_rooms: paginated count-by-room, with optional where filter passed to every batch call
  • tool_get_taxonomy: paginated nested wing->room->count tree
  • tool_status: paginated wings/rooms aggregation block

Error surfacing (fixes #339): Replace except Exception: pass with except Exception as e: return {..., "error": str(e), "partial": True}. Callers now receive the partial result collected so far plus a descriptive error message, instead of silent empty output.

Changes

mempalace/mcp_server.py: 69 lines added / 34 removed across the four affected functions. No other files changed.

@bensig bensig self-requested a review April 9, 2026 15:12
@bensig
Copy link
Copy Markdown
Collaborator

bensig commented Apr 9, 2026

@RhettOP pls fix lint

bensig
bensig previously approved these changes Apr 9, 2026
Copy link
Copy Markdown
Collaborator

@bensig bensig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix lint, ready to merge

Copy link
Copy Markdown

@web3guru888 web3guru888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big +1 on this PR — we filed #338 and #339 originally, so this directly addresses bugs we hit in production. Our integration runs 208 discoveries across 5 domains with 710+ KG entities, and the silent except Exception: pass in status tools made debugging near-impossible when collections grew past 10K.

Pagination approach looks solid. The while offset < count loop with batch_size=5000 is clean. One edge case to consider: if drawers are being added concurrently (e.g. an active mining session while querying status), count might increase between the initial call and later batches. This is probably fine for status queries (off-by-a-few is acceptable), but worth a comment noting the race.

Error surfacing is exactly right. Returning {"error": ..., "partial": True} is much better than swallowing exceptions — MCP clients can now distinguish between "empty palace" and "something broke mid-read." This matches how we handle partial results in our own integration.

Minor observations:

  • In tool_list_rooms, total = col.count() returns the unfiltered count even when where={"wing": wing} is set. This means the loop may iterate more pages than needed when filtering by wing (extra empty batches). Not a correctness issue, just suboptimal. Could use a sentinel check: if not batch["metadatas"]: break.
  • Consider extracting the pagination loop into a shared helper (e.g. _fetch_all_metadata(col, where=None)) to reduce duplication across the 4 functions. PR #484 takes this approach — worth coordinating.
  • batch_size=5000 is a reasonable default. In our integration we use similar batch sizes with ChromaDB ≥0.6.0.

Overall: this directly unblocks better diagnostics for everyone. Thank you for picking these up.

🔭 Reviewed as part of the MemPalace-AGI integration project — autonomous research with perfect memory. Community interaction updates are posted regularly on the dashboard.

@bensig
Copy link
Copy Markdown
Collaborator

bensig commented Apr 10, 2026

Code review

Found 2 issues:

  1. tool_list_rooms uses unfiltered col.count() as the pagination loop bound, but applies a where={"wing": wing} filter inside each col.get() call. When a wing filter is active, the loop iterates ceil(total_all_drawers / 5000) times regardless of how many drawers match the filter. For a 200k-drawer palace with a 500-drawer wing, this means ~40 iterations returning empty batches instead of 1.

https://github.com/milla-jovovich/mempalace/blob/a3fec4f5653f977b28314bd6f6aad381db6ecb1c/mempalace/mcp_server.py#L240-L266

Fix: drive the loop off batch size instead of total count — break when len(batch["metadatas"]) < batch_size.

  1. All four paginated functions lack an early-exit when a batch returns fewer results than batch_size. The standard pagination pattern is if len(batch["metadatas"]) < batch_size: break. Without this, the loops always run to ceil(total / batch_size) iterations even when all data was returned in a partial last batch. This compounds issue Congratulations Milla #1 for filtered queries.

https://github.com/milla-jovovich/mempalace/blob/a3fec4f5653f977b28314bd6f6aad381db6ecb1c/mempalace/mcp_server.py#L148-L163

Both issues can be fixed by replacing the while offset < total/count pattern with a while True loop that breaks when a batch is short:

while True:
    batch = col.get(include=["metadatas"], limit=batch_size, offset=offset)
    rows = batch["metadatas"]
    # process rows
    offset += len(rows)
    if len(rows) < batch_size:
        break

Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

…p bound

- Replace 'while offset < count/total' with 'while True' + break on short batch
- Fixes tool_list_rooms iterating over unfiltered col.count() when wing filter active
- Fixes all 4 paginated functions (tool_status, tool_list_wings, tool_list_rooms,
  tool_get_taxonomy) missing early-exit when batch smaller than batch_size
- Remove unused 'total' variable in tool_list_wings, tool_list_rooms, tool_get_taxonomy
  (replaced col.count() with accessibility check only)

Per bensig review comments on PR MemPalace#371
@RhettOP RhettOP requested a review from milla-jovovich as a code owner April 10, 2026 16:15
@RhettOP
Copy link
Copy Markdown
Author

RhettOP commented Apr 10, 2026

Fixed in the latest commit. All 4 paginated functions (tool_status, tool_list_wings, tool_list_rooms, tool_get_taxonomy) now use while True with if len(rows) < batch_size: break instead of looping against col.count(). The unused total variable is also removed — replaced the count call with a simple accessibility check.

@bensig bensig self-requested a review April 11, 2026 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants