fix: replace hardcoded limit=10000 with dynamic col.count() in navigation tools#563
fix: replace hardcoded limit=10000 with dynamic col.count() in navigation tools#563rolldav wants to merge 1 commit intoMemPalace:developfrom
Conversation
Navigation tools (tool_status, tool_list_wings, tool_list_rooms, tool_get_taxonomy, tool_diary_read) use col.get(limit=10000) to scan metadata. Palaces with >10,000 drawers silently lose wings and rooms from the response. Replace with max(col.count(), 1) so the limit scales with the actual collection size. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
Pull request overview
This PR fixes navigation tools that previously truncated metadata scans at 10,000 ChromaDB records, which caused wings/rooms to be missing for larger “palaces”.
Changes:
- Replaced
limit=10000withlimit=max(col.count(), 1)in multiple metadata-scanning tools. - Updated
tool_list_roomsto compute thelimitdynamically viakwargs. - Updated
tool_diary_readto use a dynamiclimitwhen fetching diary entries.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| count = col.count() | ||
| wings = {} | ||
| rooms = {} | ||
| try: | ||
| all_meta = col.get(include=["metadatas"], limit=10000)["metadatas"] | ||
| all_meta = col.get(include=["metadatas"], limit=max(col.count(), 1))["metadatas"] |
| if not col: | ||
| return _no_palace() | ||
| wings = {} | ||
| try: | ||
| all_meta = col.get(include=["metadatas"], limit=10000)["metadatas"] | ||
| all_meta = col.get(include=["metadatas"], limit=max(col.count(), 1))["metadatas"] | ||
| for m in all_meta: | ||
| w = m.get("wing", "unknown") |
| try: | ||
| kwargs = {"include": ["metadatas"], "limit": 10000} | ||
| kwargs = {"include": ["metadatas"], "limit": max(col.count(), 1)} | ||
| if wing: | ||
| kwargs["where"] = {"wing": wing} | ||
| all_meta = col.get(**kwargs)["metadatas"] |
| if not col: | ||
| return _no_palace() | ||
| taxonomy = {} | ||
| try: | ||
| all_meta = col.get(include=["metadatas"], limit=10000)["metadatas"] | ||
| all_meta = col.get(include=["metadatas"], limit=max(col.count(), 1))["metadatas"] | ||
| for m in all_meta: |
| try: | ||
| results = col.get( | ||
| where={"$and": [{"wing": wing}, {"room": "diary"}]}, | ||
| include=["documents", "metadatas"], | ||
| limit=10000, | ||
| limit=max(col.count(), 1), | ||
| ) |
| try: | ||
| all_meta = col.get(include=["metadatas"], limit=10000)["metadatas"] | ||
| all_meta = col.get(include=["metadatas"], limit=max(col.count(), 1))["metadatas"] | ||
| for m in all_meta: |
web3guru888
left a comment
There was a problem hiding this comment.
This is a real bug with real consequences — 16,003 drawers across 3 wings returning only 1 wing from mempalace_list_wings is a silent data loss failure. The before/after in the PR description makes the impact clear.
The fix is correct. limit=max(col.count(), 1) scales dynamically with actual collection size, and the max(..., 1) guard correctly handles the empty-collection edge case where col.count() returns 0 (ChromaDB returns an error on limit=0).
One observation: calling col.count() before col.get() means two round-trips to ChromaDB per navigation call instead of one. For large collections that's probably fine (the get itself is the expensive part), but worth noting if this ever becomes a hot path. An alternative would be to fetch with a large-but-bounded limit and paginate, but that's a more complex change and the current fix is the right call for now.
All 5 affected functions are covered. LGTM.
[MemPalace-AGI integration — production stats at https://milla-jovovich.github.io/mempalace/integrations/mempalace-agi/]
|
This fix is long overdue — we hit the 10K limit ourselves on a palace with ~16K drawers and spent a while debugging why The The 10K case is the most visible failure mode, but there's a subtler one: if Also — has this been tested against palaces that are just over the limit (e.g. 10,001 drawers)? The off-by-one at the exact boundary is worth a test case. |
|
This conflicts with #482, which takes a different approach to the same problem. Rather than scaling the limit dynamically, #482 extracts a shared |
|
Note: #562 includes the same fix — |
|
@jphein — good point on the batch-with-offset approach. The The paginated-batches approach in #562 (10K offsets) is architecturally sounder for large palaces: bounded peak memory, avoids ChromaDB's serialisation overhead on very large response payloads. The trade-off is a bit more code complexity (N round trips vs one), but for a production memory tool the memory bound matters more. Given that #562 supersedes this fix and adds pagination on top, I'd say this PR's value is as a minimal standalone patch for anyone who needs just this one fix without the full #562 scope — but if #562 merges, this becomes redundant. Worth noting that in the PR description for clarity. |
|
Already fixed in #562 — status uses |
Problem
Navigation tools (
tool_status,tool_list_wings,tool_list_rooms,tool_get_taxonomy,tool_diary_read) usecol.get(include=["metadatas"], limit=10000)to scan metadata. Palaces with more than 10,000 drawers silently lose wings and rooms from the response.Observed behavior: A palace with 16,003 drawers across 3 wings returns only the first 10,000 entries (1 wing) from
mempalace_statusandmempalace_list_wings. The other 2 wings (6,003 drawers) are invisible in navigation tools.Search is NOT affected —
mempalace_searchuses ChromaDB's vector query which has no such limit.Fix
Replace
limit=10000withlimit=max(col.count(), 1)in all 5 affected functions. This scales the limit dynamically with the actual collection size, following the ChromaDB batch iteration pattern.The
max(..., 1)guard handles the edge case of an empty collection wherecol.count()returns 0.Testing
Verified on a production palace with 16,003 drawers across 3 wings:
mempalace_list_wings→{"sos_app": 10000}mempalace_list_wings→{"sos_app": 12585, "sos_medical_ai": 2553, "sos_playbooks": 865}Files Changed
mempalace/mcp_server.py— 5 occurrences oflimit=10000replaced withlimit=max(col.count(), 1)