Skip to content

fix: honor configured drawer collection everywhere#423

Closed
gut-puncture wants to merge 1 commit intoMemPalace:developfrom
gut-puncture:codex/shared-drawer-context
Closed

fix: honor configured drawer collection everywhere#423
gut-puncture wants to merge 1 commit intoMemPalace:developfrom
gut-puncture:codex/shared-drawer-context

Conversation

@gut-puncture
Copy link
Copy Markdown

@gut-puncture gut-puncture commented Apr 9, 2026

Summary

This fixes a trust problem in the current tool: different parts of MemPalace could look at different drawer collections, so a user could mine memories successfully and then have search, wake-up, status, repair, or MCP read from somewhere else.

In simple terms, this PR makes the tool use the same default drawer collection everywhere it already promises to use it.

What changed

  • mining, search, layers, CLI status, CLI repair, CLI compress, and MCP drawer tools now all resolve the same configured default drawer collection
  • palace_path from config and env now expands ~
  • python -m mempalace.mcp_server --palace ~/... now resolves ~ correctly
  • read paths no longer create a fake empty palace or collection when the palace is actually missing
  • metadata-based readers now page instead of truncating at 10,000 drawers, so large-palace status and taxonomy counts stay correct
  • MCP collection caching is now keyed by resolved (palace_path, collection_name) so context changes do not go stale

What this does not change

  • no new commands or flags
  • no public collection override
  • no knowledge-graph redesign
  • no deleted-file cleanup or refresh redesign

Tests

  • added regressions for ~ expansion from config and env
  • added regressions proving project and convo mining write to the configured collection
  • added regressions proving search, layers, and MCP read from that same configured collection
  • added regressions for exact counts above 10,000 drawers
  • added regressions for MCP cache rekeying and create-false/create-true behavior

Validation run locally:

  • .venv/bin/pytest tests/test_config.py tests/test_palace.py tests/test_miner.py tests/test_convo_miner.py tests/test_searcher.py tests/test_layers.py tests/test_mcp_server.py tests/test_cli.py
  • .venv/bin/ruff check mempalace tests
  • .venv/bin/python -m pytest

Copy link
Copy Markdown

@web3guru888 web3guru888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Honor Configured Drawer Collection Everywhere

This is a significant correctness fix. The core problem — different code paths silently using different collections — is exactly the kind of bug that's invisible in testing but devastating in production. We hit a variant of this when our integration's knowledge_graph_bridge.py was writing to one collection while the MCP search read from another.

Architecture

The new resolve_drawer_context() + get_drawer_collection() + open_collection_on_client() factoring in palace.py is clean. Single source of truth for collection resolution, config-aware, with proper create flag semantics.

iter_collection_metadatas() — The paginated iteration is the right fix for the limit=10000 hard cap. The offset-based pagination pattern is correct:

offset += batch_count
if batch_count < batch_size:
    break

This handles the "last batch smaller than page size" termination correctly.

Specific observations

~ expansion from config/env_expand_path() is a good single-point normalization. Previously MEMPALACE_PALACE_PATH=~/palace would silently work differently depending on whether the shell expanded ~ before Python saw it. Now it's consistent.

✅ MCP cache rekeying — The _collection_cache_key = (palace_path, collection_name) tuple ensures that changing either dimension invalidates the cache. The test_get_collection_rekeys_when_context_changes test covers path change, name change, and both change. Thorough.

create=False doesn't materialize empty palacesget_drawer_collection(create=False) returns None if the directory is missing. This prevents mempalace status from creating empty palace directories as a side effect. We added the same guard in our integration and it eliminated a confusing class of "phantom palace" bugs.

⚠️ Potential issue in cmd_repair():

palace_path, collection_name = resolve_drawer_context(palace_path=palace_arg, config=cfg)

Then later:

col = get_drawer_collection(palace_path=palace_path, create=False, config=cfg)

Then:

client = chromadb.PersistentClient(path=palace_path)

The repair flow creates a new client after already getting a collection through get_drawer_collection. This means two separate PersistentClient instances pointing at the same directory. ChromaDB's locking should handle this, but it's a bit fragile. Consider getting the client from the same path as the collection, or restructuring to use a single client.

Test coverage — Excellent. The regression tests for tilde expansion, configured collection mining, paged status, and cache rekeying are well-targeted. The _PagedCollection helper in tests is a clean way to test pagination without real ChromaDB overhead.

Scope

16 files changed, 611 additions, 222 deletions — this is wide-ranging but every change is consistent with the stated goal. No scope creep. The "What this does not change" section in the description is appreciated.

Strong PR. Addresses a real trust problem in the tool.

🔭 Reviewed as part of the MemPalace-AGI integration project — autonomous research with perfect memory. Community interaction updates are posted regularly on the dashboard.

@web3guru888 web3guru888 mentioned this pull request Apr 10, 2026
3 tasks
@skuznetsov
Copy link
Copy Markdown
Contributor

Quick merge-order note after comparing the overlap with #413:

I think #413 should land first, then #423 should rebase on top of it.

Reason:

  • both PRs touch the same collection-access surface: palace.py, mcp_server.py, miner.py, layers.py, searcher.py, plus overlapping tests
  • #413 is the lower-level storage seam / access abstraction
  • #423 is behavioral logic on top of that same surface: configured drawer resolution, ~ expansion, keyed MCP cache, paged metadata reads, and consistent read-path semantics

So they do not look like competing PRs to me; they look ordered.

If #423 lands first, then rebasing #413 later has a higher chance of accidentally flattening or reworking the new drawer-context behavior while introducing the seam.

If #413 lands first, then #423 can rebase onto the seam and carry its behavior changes forward more directly.

In other words: #413 establishes the collection-access seam, #423 refines the configured-drawer behavior on top of that seam.

Not a blocker for the design in #423 itself — just a merge-order caution so we do not create avoidable rebase churn or behavioral regressions.

@gut-puncture
Copy link
Copy Markdown
Author

Restacking this onto current main now that the overlapping collection-access surface has moved upstream. I’m rebuilding it from fresh main, keeping the same goal and scope: consistent configured drawer context, exact metadata reads, and no new feature surface.

@gut-puncture
Copy link
Copy Markdown
Author

Restack is now live on this PR branch. I rebuilt it on top of current main after the collection-access seam changes landed, kept the same user-facing scope, and reran the focused lane for this change:

  • ruff check mempalace tests
  • pytest tests/test_config.py tests/test_palace.py tests/test_searcher.py tests/test_layers.py tests/test_mcp_server.py tests/test_miner.py tests/test_convo_miner.py tests/test_cli.py

Result: 162 passed.

I also folded in one small cleanup from review while restacking: cmd_repair() now stays on the resolved collection name without mixing in an extra hardcoded collection lifecycle.

@gut-puncture
Copy link
Copy Markdown
Author

@skuznetsov @web3guru888 restack is now live on this PR branch.

This is now rebuilt on top of current main, so the earlier merge-order concern should be resolved. I kept the same scope and reran the focused validation lane for this change:

ruff check mempalace tests
pytest tests/test_config.py tests/test_palace.py tests/test_searcher.py tests/test_layers.py tests/test_mcp_server.py tests/test_miner.py tests/test_convo_miner.py tests/test_cli.py
Result: 162 passed.

Copy link
Copy Markdown

@web3guru888 web3guru888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took another pass on the restacked diff — the merge-order concern is resolved.

The collection-access seam changes that were in conflict with #413 are now on main, and your restack picks them up cleanly. The mcp_server.py diff in particular shrank by ~100 lines (net -44) which is exactly what you'd expect after the upstream abstraction landed — less bespoke resolution code needed here.

162 passing on the focused lane (config, palace, searcher, layers, mcp_server, miner, convo_miner, cli) and ruff clean is the right validation set for this change.

The scope is still what I'd want: consistent collection resolution, ~ expansion from config/env, no create-on-read false positives, paginated metadata reads for large palaces, and keyed MCP cache. No scope creep from the restack.

LGTM — approving.

@bensig bensig changed the base branch from main to develop April 11, 2026 22:22
@bensig bensig requested a review from igorls as a code owner April 11, 2026 22:22
@bensig
Copy link
Copy Markdown
Collaborator

bensig commented Apr 12, 2026

Closing — this is superseded by recently merged PRs to develop. Thank you for the contribution!

@bensig bensig closed this Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants