Skip to content

fix(mcp): paginate taxonomy tools and log instead of swallowing errors#307

Closed
dr-robert-li wants to merge 2 commits intoMemPalace:developfrom
dr-robert-li:fix/171-mcp-taxonomy-pagination
Closed

fix(mcp): paginate taxonomy tools and log instead of swallowing errors#307
dr-robert-li wants to merge 2 commits intoMemPalace:developfrom
dr-robert-li:fix/171-mcp-taxonomy-pagination

Conversation

@dr-robert-li
Copy link
Copy Markdown

What does this PR do?

tool_list_wings, tool_list_rooms, and tool_get_taxonomy each fetched col.get(limit=10000) once and wrapped the call in except Exception: pass. Two compounding bugs:

  1. Palaces with >10k drawers silently truncated at the first 10k page, returning incomplete counts.
  2. Any error from col.get (e.g., ChromaDB version incompatibility) was swallowed, returning empty {} with no diagnostic — exactly the symptom reporters saw on 95k-drawer palaces under ChromaDB 1.5.x.

Extracts a tiny _iter_all_metadatas helper that paginates via offset until exhausted and logs errors to the existing mempalace_mcp logger (already wired to stderr). All three tools now use it and shrink in the process.

Fixes #171.

How to test

python -m pytest tests/test_mcp_server.py::test_tool_list_wings_paginates_beyond_10k \
                 tests/test_mcp_server.py::test_tool_list_wings_logs_chromadb_error_instead_of_swallowing -v

Tests use a mocked collection that returns multi-page results (15k drawers across 2 pages) and a collection that raises RuntimeError from col.get. Both tests fail without the fix — the pagination test sees only 10k drawers, the log test gets an empty caplog.

Checklist

  • Tests pass (python -m pytest tests/ -v) — 119 passed, 0 failed
  • No hardcoded paths
  • Linter passes (ruff check .)

tool_list_wings, tool_list_rooms, and tool_get_taxonomy each fetched col.get(limit=10000) once and wrapped the call in 'except Exception: pass'. Two compounding bugs:

1. Palaces with >10k drawers silently truncated at the first 10k page, returning incomplete counts.

2. Any error from col.get (e.g., ChromaDB version incompatibility) was swallowed, returning empty {} with no diagnostic, which is exactly the symptom reporters saw on 95k-drawer palaces under ChromaDB 1.5.x.

Extracts a tiny _iter_all_metadatas helper that paginates via offset until exhausted and logs errors to the existing mempalace_mcp logger (already wired to stderr by logging.basicConfig). All three tools now use it and shrink in the process. Closes MemPalace#171
@bgauryy
Copy link
Copy Markdown

bgauryy commented Apr 8, 2026

PR Review: fix(mcp): paginate taxonomy tools and log instead of swallowing errors

Executive Summary

Aspect Value
PR Goal Fix 10k-drawer truncation and swallowed exceptions in MCP taxonomy tools
Files Changed 2 (mcp_server.py, tests/test_mcp_server.py)
Risk Level 🟡 MEDIUM — core data query path changes, but logic is straightforward
Review Effort 2 — small, focused bugfix
Recommendation 🔄 REQUEST_CHANGES

Affected Areas: mcp_server.py (taxonomy tools: tool_list_wings, tool_list_rooms, tool_get_taxonomy)

Business Impact: Palaces with >10k drawers now return correct counts instead of silently truncated data. Errors surface in logs instead of vanishing.

Flow Changes: Three tool functions now delegate to a shared paginating generator (_iter_all_metadatas) instead of each doing a single col.get(limit=10000).

Ratings

Aspect Score
Correctness 4/5
Security 5/5
Performance 4/5
Maintainability 4/5

PR Health

High Priority Issues

(Must fix before merge)

🔗 #1: tool_status and tool_browse have the identical bug — left unfixed

Location: mempalace/mcp_server.pytool_status (~line 50) and tool_browse (~line 121) | Confidence: ✅ HIGH

Both functions iterate col.get(include=["metadatas"])["metadatas"] with no limit or pagination, wrapped in except Exception: pass. This is the same double bug (truncation + swallowed errors) that this PR fixes for the three taxonomy tools. Leaving them unfixed means the same #171 symptom persists for two other MCP endpoints.

tool_status is especially exposed since it's the first tool agents call.

  # tool_status — current code
  try:
-     all_meta = col.get(include=["metadatas"])["metadatas"]
-     for m in all_meta:
-         w = m.get("wing", "unknown")
-         r = m.get("room", "unknown")
-         wings[w] = wings.get(w, 0) + 1
-         rooms[r] = rooms.get(r, 0) + 1
- except Exception:
-     pass
+     for m in _iter_all_metadatas(col):
+         w = m.get("wing", "unknown")
+         r = m.get("room", "unknown")
+         wings[w] = wings.get(w, 0) + 1
+         rooms[r] = rooms.get(r, 0) + 1

Same pattern applies to tool_browse.


Medium Priority Issues

(Should fix, not blocking)

🚨 #2: Mid-pagination error returns partial data with no warning

Location: mempalace/mcp_server.py_iter_all_metadatas() | Confidence: ⚠️ MED

If col.get() raises on page 2+, the generator logs the error and silently finishes. The caller receives partial counts (only page 1 data) presented as the complete result — no flag, no warning in the response dict.

For a 95k-drawer palace that fails at offset 20k, the user sees 20k drawers reported as if that's the full count.

  except Exception as e:
      logger.error("metadata iteration failed at offset %d: %s", offset, e)
+     raise  # let callers decide how to handle partial data

Alternatively, return a "warning" key in the tool response dicts so the MCP client can surface it.


🎨 #3: Tests only cover tool_list_wings — other two tools untested for pagination

Location: tests/test_mcp_server.py | Confidence: ⚠️ MED

tool_list_rooms and tool_get_taxonomy both call _iter_all_metadatas but exercise different aggregation logic (room counting, nested wing→room taxonomy). A quick parametrized or individual test for each would catch regressions in how they consume the generator — e.g., the where filter in tool_list_rooms and the setdefault nesting in tool_get_taxonomy.

def test_tool_list_rooms_paginates_with_wing_filter(monkeypatch):
    """Verify tool_list_rooms passes where-filter and paginates."""
    from unittest.mock import MagicMock
    from mempalace import mcp_server

    fake_col = MagicMock()
    fake_col.get.side_effect = [
        {"metadatas": [{"wing": "alpha", "room": "api"}] * 10000},
        {"metadatas": [{"wing": "alpha", "room": "db"}] * 3000},
    ]
    monkeypatch.setattr(mcp_server, "_get_collection", lambda create=False: fake_col)

    result = mcp_server.tool_list_rooms(wing="alpha")
    assert result["rooms"]["api"] == 10000
    assert result["rooms"]["db"] == 3000
    # Verify where-filter was forwarded
    call_kwargs = fake_col.get.call_args_list[0][1]
    assert call_kwargs["where"] == {"wing": "alpha"}

Low Priority Issues

(Nice to have)

🏗️ #4: Page size inconsistency across codebase

Location: _iter_all_metadatas vs palace_graph.py:build_graph | Confidence: ❓ LOW

_iter_all_metadatas uses a 10,000-item page size, while palace_graph.py (build_graph) uses 1,000. Both work, but a shared constant in constants.py (e.g., CHROMA_PAGE_SIZE) would prevent future drift and make the choice explicit.


🎨 #5: Naming convention — PAGE as a local variable

Location: mempalace/mcp_server.py_iter_all_metadatas, line PAGE, offset = 10000, 0 | Confidence: ❓ LOW

PAGE reads like a constant (PEP 8 UPPER_CASE), which is fine for an effectively-constant local. But page_size would better signal that it's a batch-size parameter rather than a page index.


Flow Impact Analysis

Before (each tool independently):
  tool_list_wings ──→ col.get(limit=10000) ──→ manual loop ──→ {wings: {}}
  tool_list_rooms ──→ col.get(limit=10000) ──→ manual loop ──→ {rooms: {}}
  tool_get_taxonomy → col.get(limit=10000) ──→ manual loop ──→ {taxonomy: {}}
  (all: except Exception: pass)

After (shared generator):
  _iter_all_metadatas(col, where) ──→ paginated col.get() ──→ yields metadata dicts
       ↑                                                        ↓
  tool_list_wings ──────────────────────────────────────→ for m in generator
  tool_list_rooms ──────────────────────────────────────→ for m in generator
  tool_get_taxonomy ────────────────────────────────────→ for m in generator

  Unfixed (same old pattern):
  tool_status ──→ col.get() ──→ manual loop ──→ except Exception: pass
  tool_browse ──→ col.get() ──→ manual loop ──→ except Exception: pass

Created by Octocode MCP https://octocode.ai

…rrors

tool_status used its own col.get(limit=10000, include=["metadatas"])
wrapped in `try/except: pass` — the exact truncation + swallowed-error
pattern that MemPalace#171 set out to fix. Route it through _iter_all_metadatas
so palaces with >10k drawers report accurate wing/room counts.

Also make the generator re-raise after logging: partial-page failures
previously returned truncated counts disguised as a full result. The
MCP handle_request boundary already converts raised exceptions into a
proper JSON-RPC error response, so re-raising is the right escalation.

Parameterized pagination test covers the three taxonomy tools that
lacked direct coverage (list_rooms, get_taxonomy, status). Existing
list_wings error test updated to expect propagation.

Deferred (not in this commit): mempalace/miner.py status() has the
same limit=10000 truncation shape but lives in the CLI path with no
shared helper yet — worth a separate targeted PR.

Closes review on MemPalace#171.
@dr-robert-li
Copy link
Copy Markdown
Author

Addresses review feedback in 75fd84a.

  • tool_status migrated through _iter_all_metadatas, dropping its try/except: pass (review Congratulations Milla #1). Note: tool_browse referenced in review Congratulations Milla #1 does not exist in the current codebase — tool_status was the only actually-unfixed call site.
  • _iter_all_metadatas now re-raises after logging (review Integrating MemPalace with SoulForge's code intelligence system #2). The existing handle_request boundary converts raised exceptions into a proper JSON-RPC error, so mid-pagination failures no longer return partial data disguised as a full result.
  • New parameterized test_taxonomy_tools_paginate_beyond_10k covers tool_list_rooms, tool_get_taxonomy, and tool_status (review feat: Hermes memory provider integration #3). Existing test_tool_list_wings_logs_chromadb_error updated to assert propagation.

Deferred: mempalace/miner.py status() has the same limit=10000 truncation shape but lives in the CLI path with no shared helper access. Worth a separate PR rather than expanding this one's scope.

Full suite green.

Perseusxrltd added a commit to Perseusxrltd/mnemion that referenced this pull request Apr 9, 2026
- pyproject.toml: widen chromadb to <2.0 for Python 3.14 compat (MemPalace#302)
- config.py + miner/convo_miner/mcp_server: add hnsw:space=cosine so
  similarity = 1 - distance stays in [0,1] instead of negative L2 (MemPalace#304)
- searcher.py + layers.py: guard against ChromaDB 1.x empty-outer query
  results (IndexError on fresh collections) (MemPalace#305)
- mcp_server.py: redirect stdout→stderr at import to protect JSON-RPC
  wire from chromadb/posthog chatter (MemPalace#306)
- mcp_server.py: replace 10k-limited col.get with paginating
  _iter_all_metadatas helper; stop swallowing errors silently (MemPalace#307)
- mcp_server.py: drop undocumented wait_for_previous arg injected by
  Gemini MCP clients (MemPalace#322)
- searcher.py + hybrid_searcher.py + mcp_server.py: add min_similarity
  threshold filter so callers get a clean "no results" signal (MemPalace#350)
- entity_detector.py: add CODE_KEYWORDS blocklist (~80 terms) to stop
  Rust types / React / framework names being detected as entities (MemPalace#349)

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Formatted pushed a commit to Formatted/mempalace that referenced this pull request Apr 10, 2026
…lace#171)

Replaces the single col.get(limit=10000) call in status() with a
500-item pagination loop, matching the fix applied to MCP taxonomy
tools in MemPalace#307. Palaces with >10k drawers now report correct drawer
counts instead of silently truncating at 10k.
Formatted pushed a commit to Formatted/mempalace that referenced this pull request Apr 10, 2026
…lace#171)

Replaces the single col.get(limit=10000) call in status() with a
500-item pagination loop, matching the fix applied to MCP taxonomy
tools in MemPalace#307. Palaces with >10k drawers now report correct drawer
counts instead of silently truncating at 10k.
Copy link
Copy Markdown

@web3guru888 web3guru888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent fix — and it directly addresses the class of silent failures we flagged in #338 and #339. The pattern of except Exception: pass around ChromaDB calls was one of the most painful debugging traps in the codebase.

What I like:

  • _iter_all_metadatas generator — clean extraction that all three taxonomy tools share. The pagination logic (offset += PAGE until len(metas) < PAGE) is correct and handles the empty-collection case gracefully.
  • Log-then-raise — this is the right balance. Callers get the exception for control flow, and operators get the error in logs for diagnostics. Much better than silently returning {}.
  • Test coverage — the parameterized test across all three taxonomy tools + the error propagation test are thorough. The 15k-drawer two-page mock perfectly validates the pagination boundary.

One minor thought: the where parameter passthrough in _iter_all_metadatas is a nice touch — it keeps tool_list_rooms(wing=...) working correctly without duplicating the pagination logic.

We run a ~700-entity palace and haven't hit the 10k ceiling yet, but this fix future-proofs us nicely. Strong +1.

🔭 Reviewed as part of the MemPalace-AGI integration project — autonomous research with perfect memory. Community interaction updates are posted regularly on the dashboard.

Formatted pushed a commit to Formatted/mempalace that referenced this pull request Apr 10, 2026
…lace#171)

Replaces the single col.get(limit=10000) call in status() with a
500-item pagination loop, matching the fix applied to MCP taxonomy
tools in MemPalace#307. Palaces with >10k drawers now report correct drawer
counts instead of silently truncating at 10k.
Formatted pushed a commit to Formatted/mempalace that referenced this pull request Apr 10, 2026
…lace#171)

Replaces the single col.get(limit=10000) call in status() with a
500-item pagination loop, matching the fix applied to MCP taxonomy
tools in MemPalace#307. Palaces with >10k drawers now report correct drawer
counts instead of silently truncating at 10k.
Formatted pushed a commit to Formatted/mempalace that referenced this pull request Apr 11, 2026
…lace#171)

Replaces the single col.get(limit=10000) call in status() with a
500-item pagination loop, matching the fix applied to MCP taxonomy
tools in MemPalace#307. Palaces with >10k drawers now report correct drawer
counts instead of silently truncating at 10k.
Formatted pushed a commit to Formatted/mempalace that referenced this pull request Apr 11, 2026
…lace#171)

Replaces the single col.get(limit=10000) call in status() with a
500-item pagination loop, matching the fix applied to MCP taxonomy
tools in MemPalace#307. Palaces with >10k drawers now report correct drawer
counts instead of silently truncating at 10k.
@bensig bensig changed the base branch from main to develop April 11, 2026 22:22
@bensig
Copy link
Copy Markdown
Collaborator

bensig commented Apr 12, 2026

Closing — this is superseded by recently merged PRs to develop. Thank you for the contribution!

@bensig bensig closed this Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP server: mempalace_list_wings and mempalace_get_taxonomy return empty despite data in palace

4 participants