Мempalace backend seam#413
Conversation
PR #413 Review:
|
| Dimension | Score | Notes |
|---|---|---|
| Correctness | 7/10 | One real bug in miner.status() |
| Security | 9/10 | Directory permissions preserved, no new exposure |
| Performance | 9/10 | Minor: PersistentClient no longer cached in MCP server |
| Maintainability | 8/10 | Clean abstraction, but no injection mechanism yet |
Issues
1. [HIGH / HIGH confidence] Bug: miner.status() creates palace instead of failing gracefully
Location: mempalace/miner.py:616
The old code directly used chromadb.PersistentClient(path).get_collection(name) which raises when the collection doesn't exist. The except Exception block catches it and prints "No palace found." The new code calls get_collection(palace_path) which defaults to create=True, meaning it calls ChromaBackend.get_collection(..., create=True) which runs os.makedirs() + get_or_create_collection(). This silently creates an empty palace directory and collection instead of reporting "not found."
Old behavior (main branch):
# miner.py status() - RAISES when palace doesn't exist
client = chromadb.PersistentClient(path=palace_path)
col = client.get_collection("mempalace_drawers") # throws -> except block firesNew behavior (PR branch):
# miner.py status() - CREATES when palace doesn't exist
col = get_collection(palace_path) # create=True by default -> never failsFix:
def status(palace_path: str):
"""Show what's been filed in the palace."""
try:
- col = get_collection(palace_path)
+ col = get_collection(palace_path, create=False)
except Exception:2. [MEDIUM / HIGH confidence] No backend injection mechanism
Location: mempalace/palace.py:36
_DEFAULT_BACKEND = ChromaBackend() is a module-level singleton with no setter, no config-driven selection, and no test override path. The abstraction exists but can't be leveraged. The PR description mentions this is the "first upstreamable seam," so this may be intentional scaffolding — but tests cannot inject a mock backend without patching the private module attribute.
Suggestion (not blocking): Accept an optional backend parameter in get_collection():
def get_collection(
palace_path: str,
collection_name: str = "mempalace_drawers",
create: bool = True,
backend: ChromaBackend = None,
):
backend = backend or _DEFAULT_BACKEND
return backend.get_collection(palace_path, collection_name=collection_name, create=create)3. [MEDIUM / HIGH confidence] No tests for new backends package
Three new files (base.py, chroma.py, __init__.py) with no test coverage. The BaseCollection contract, the ChromaCollection adapter delegation, and the ChromaBackend.get_collection factory (including the create=True/False branching) are all untested. Recommended minimum:
- Test that
ChromaCollectioncorrectly delegates each method - Test that
ChromaBackend.get_collection(create=False)raises when collection is missing - Test that
ChromaBackend.get_collection(create=True)creates directory + collection
4. [LOW / MEDIUM confidence] PersistentClient no longer cached in MCP server
Location: mempalace/mcp_server.py:105 / mempalace/backends/chroma.py:45
The old mcp_server.py had a _client_cache singleton for the PersistentClient. The new code delegates to ChromaBackend.get_collection() which creates a fresh PersistentClient on every call. Since the _collection_cache is still in place, this only fires when the cache is cold or create=True is passed. For a local SQLite-backed store this is likely negligible, but worth awareness if the backend ever becomes remote.
5. [LOW / MEDIUM confidence] Test compatibility shims add implicit coupling
Location: mempalace/layers.py:26-28, mempalace/searcher.py:14-16
chromadb = _chroma_backend.chromadbThis re-exports chromadb at module level so tests can still patch mempalace.layers.chromadb.PersistentClient. It's a pragmatic choice but couples production code to test internals. Consider updating the tests to patch through the backend layer instead, then removing these shims.
Flow Impact
| Consumer | Old Pattern | New Pattern | Behavioral Change |
|---|---|---|---|
palace.py |
Direct chromadb, always creates | ChromaBackend, create param |
create=False now possible (new capability) |
layers.py (5 callsites) |
PersistentClient + get_collection |
palace.get_collection(create=False) |
Correct: read-only callers fail if missing |
searcher.py (2 callsites) |
PersistentClient + get_collection |
palace.get_collection(create=False) |
Correct: same fail-if-missing behavior |
palace_graph.py |
PersistentClient + get_collection |
palace.get_collection(create=False) |
Correct |
mcp_server.py |
Cached client + get_or_create |
palace.get_collection + local cache |
Client cache lost (minor perf) |
miner.status() |
PersistentClient + get_collection (throws) |
palace.get_collection(create=True) |
BUG: creates instead of failing |
What Looks Good
- Clean ABC with keyword-only
add/upsertsignatures prevents positional argument mix-ups - The
createparameter onget_collectionis good design — read-only callers can explicitly declare they expect the palace to exist - Directory permissions (
0o700) andos.makedirsare correctly centralized inChromaBackend - Well-scoped PR — no unrelated changes, no benchmark artifacts, no PostgreSQL spike leaking in
|
Hey @skuznetsov — I've taken a look and ran this through CLI, and the Stage 1 scope here is exactly what #266 was asking for. No Postgres code, no env-based backend selection, ChromaDB stays the only path, read paths preserve the existing "no palace found" behavior via One request before we pick this up for a full review: the checklist has ruff unchecked and the test section cites For Stage 2 (the actual Postgres backend from your fork), let's hold it until after one other in-flight piece lands that depends on the write-path seam — don't want two backend layers in flight at the same time. You'll have first pick on Stage 2 as soon as that clears. Thank you from Milla and Lu✨ |
Introduce the first upstreamable storage seam for MemPalace without bringing in the PostgreSQL spike or any benchmark artifacts. This change adds a small backend package with: - BaseCollection as the minimal collection contract - ChromaBackend/ChromaCollection as the default implementation It then routes the main runtime collection consumers through that seam: - palace.py - searcher.py - layers.py - palace_graph.py - mcp_server.py - miner.status() Behavioral constraints kept for stage 1: - ChromaDB remains the only backend and the default path - no config/env backend selection yet - no PostgreSQL code - no benchmark or research files - existing tests stay unchanged Important compatibility details: - read paths now call the seam with create=False so they still surface the existing 'no palace found' behavior instead of silently creating empty collections - write paths keep create=True semantics through palace.get_collection() - layers/searcher retain a chromadb module attribute so the existing mock-based tests can keep patching PersistentClient unchanged - ChromaBackend only creates palace directories on create=True, which preserves mocked read-path tests that use fake read-only paths Verification: - python3 -m py_compile mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.py - pytest -q # 529 passed, 106 deselected
Tighten the stage-1 backend abstraction branch after review. This follow-up does three small things: - keep the chromadb compatibility hook in searcher.py and layers.py, but express it through the backends.chroma module so it no longer reads like an accidental unused import - fix the palace_graph.py helper alias to avoid the local name collision flagged by ruff (imported helper vs local _get_collection wrapper) - preserve the existing mock-based test patch points unchanged while keeping the new backend seam intact Why this matters: - the direct form looked like a dead import in review, even though it was intentionally preserving the existing test seam ( and ) - palace_graph.py had a real lint issue ( redefinition) that was small but worth fixing before a public PR Verification: - /opt/homebrew/bin/ruff check mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.py - pytest -q tests/test_layers.py tests/test_searcher.py - pytest -q # 529 passed, 106 deselected
Add short code comments in searcher.py and layers.py explaining why the module-level `chromadb` alias remains after the stage-1 backend seam refactor. The alias is intentional: it preserves the existing mock patch points used by the current test suite (`mempalace.searcher.chromadb.PersistentClient` and `mempalace.layers.chromadb.PersistentClient`) while the runtime logic now flows through the backend abstraction. This keeps the public PR easier to review because the apparent "unused import" now has an explicit reason next to it. Verification: - /opt/homebrew/bin/ruff check mempalace/searcher.py mempalace/layers.py - pytest -q tests/test_layers.py tests/test_searcher.py
Tighten the stage-1 backend seam by promoting the default Chroma backend adapter to a module-level singleton in `mempalace/palace.py`. This keeps the stage-1 scope unchanged — Chroma is still the only backend wired in this branch — but avoids constructing a fresh `ChromaBackend()` object on every `get_collection()` call. The backend is stateless today, so this is a readability/cleanup change rather than a behavioral one. Why this helps: - makes `palace.get_collection()` read like a real default factory instead of an inline constructor call - keeps the stage-1 branch a little cleaner before opening the public PR - does not widen the backend surface or change any config/runtime behavior Verification: - python3 -m py_compile mempalace/palace.py - pytest -q tests/test_miner.py tests/test_layers.py tests/test_searcher.py - pytest -q # 529 passed, 106 deselected
Preserve the stage-1 backend abstraction while closing the real read-path regression surfaced in PR review. What changed: - make ChromaBackend.get_collection(create=False) fail fast when the palace directory does not exist instead of letting PersistentClient create it as a side effect - update miner.status() to call get_collection(..., create=False) so status keeps the historical 'No palace found' behavior - remove the temporary chromadb shim aliases from layers.py and searcher.py now that the tests patch the seam directly - add focused tests for the new backends package, including ChromaCollection delegation and ChromaBackend create=True/create=False behavior - retarget layer/searcher tests to patch the backend seam instead of patching chromadb.PersistentClient inside production modules - add a regression test that status() does not create an empty palace when the target path is missing Verification: - ruff check . - uv run pytest -q - uv run pytest -q tests/test_backends.py tests/test_cli.py tests/test_mcp_server.py tests/test_layers.py tests/test_searcher.py tests/test_miner.py Notes: - the separate benchmark/slow/stress layer was started as a soak but not used as the merge gate for this PR branch
Remove a redundant `_collection_cache = None` assignment in `mempalace/mcp_server.py` left over after the stage-1 backend seam refactor. This does not change behavior; it only trims review noise in the MCP server module after the read-path hardening pass. Verification: - ruff check mempalace/mcp_server.py - uv run pytest -q tests/test_mcp_server.py
|
One small follow-up note, but not a blocker for this stage-1 PR: The current If MemPalace later grows dynamic palace switching or backend switching inside one long-lived process, that cache will probably need to become key-based instead of process-global, e.g. keyed by I don't think that should complicate this PR, but I wanted to leave the constraint explicit so neither side forgets it later. |
33e2365 to
90407fe
Compare
web3guru888
left a comment
There was a problem hiding this comment.
Review: MemPalace Backend Seam (Storage Abstraction)
This is the kind of architectural change that needs careful review. The goal — making MemPalace backend-swappable — is valuable. We maintain a knowledge_graph_bridge.py that keeps ChromaDB and our bi-temporal KG in sync, so we're directly affected by storage abstraction changes.
Design Assessment
BaseCollection ABC — Minimal and correct. The 6-method contract (add, upsert, query, get, delete, count) covers exactly the ChromaDB surface area that MemPalace actually uses. No over-abstraction.
ChromaCollection thin adapter — Good approach. Pure delegation, no logic. This means the adapter can't drift from the underlying ChromaDB behavior.
ChromaBackend.get_collection() — Combines directory creation, permission setting, and client creation. The create=False path raises FileNotFoundError when the palace directory is missing — this is correct and matches the "don't create empty palaces on read" principle from #423.
Concerns
BaseCollection:
def add(self, *, documents, ids, metadatas=None):The * forces keyword-only args. This is fine if all callers already use collection.add(documents=[...], ids=[...]), but some existing code might use positional args. Did you verify all callsites? The test suite passing (538 passed) suggests yes, but worth confirming.
ChromaBackend.get_collection() creates a new PersistentClient on every call. The existing caching in mcp_server.py protects against this being called repeatedly, but other paths (layers, searcher) call _get_collection() per operation. If a future backend (Postgres, etc.) has connection overhead, this factory pattern will need a connection pool. Consider adding a note or a # TODO: pool clients per palace_path for future backends.
palace.py module-level singleton:
_DEFAULT_BACKEND = ChromaBackend()This creates a ChromaBackend at import time. Since ChromaBackend.__init__ is empty, this is harmless today. But when backend selection is added (env var, config), this singleton pattern won't work for dynamic dispatch. Might be cleaner as a factory function now to avoid the future refactor.
Migration Path
The test changes are clean — replacing patch("...chromadb.PersistentClient", ...) with patch("..._get_collection", ...) is the right pattern for backend-agnostic tests. Every test that previously mocked ChromaDB directly now mocks the seam.
Missing from tests: There's no test that verifies BaseCollection implementations actually conform to the interface (e.g., a contract test). For now ChromaDB is the only backend so this is fine, but when a second backend is added, consider adding an abstract test fixture that runs the same assertions against any BaseCollection implementation.
Conflict potential
This PR and #423 touch many of the same files (palace.py, mcp_server.py, layers.py, searcher.py, tests). They're solving different problems but will definitely conflict on merge. Coordinate with @gut-puncture on merge order — whichever goes in first will require the other to rebase significantly.
Clean abstraction, well-scoped stage 1. The behavioral constraints (ChromaDB only, no config selection, no Postgres) are the right approach for a safe first step.
🔭 Reviewed as part of the MemPalace-AGI integration project — autonomous research with perfect memory. Community interaction updates are posted regularly on the dashboard.
|
Quick merge-order note after comparing the overlap with I think this PR should land first, and then Reason:
So the clean order seems to be:
That should minimize rebase churn and reduce the chance of flattening the configured-drawer behavior while introducing the seam. |
|
@skuznetsov conflict |
Resolve the PR MemPalace#413 conflict after main advanced with the query sanitizer changes. The only manual conflict was in mempalace/mcp_server.py imports: keep the backend seam import from MemPalace#413 and the sanitize_query import from main. This preserves the backend abstraction while carrying forward the new query sanitizer path. Verification: uv run ruff check . -> All checks passed. uv run pytest -q -> 593 passed, 106 deselected, 413 warnings. git diff --cached --check -> exit 0. staged secret-pattern scan -> no high-risk secret patterns.
…E=palace_store) Adds a PalaceStore backend behind the ``mempalace.backends`` seam introduced in MemPalace#413, gated by the ``MEMPAL_STORAGE`` environment variable at the single choke point where ``palace.py`` instantiates its default backend: * unset / chromadb / chroma → ChromaBackend (default, unchanged) * palace_store / palace / palacestore → PalaceStoreBackend (new) The whole seam is preserved — no mempalace/*.py consumers are modified. The only mempalace change outside the backends package is one line in palace.py that replaces ``ChromaBackend()`` with ``get_default_backend()``. New files: * mempalace/backends/palace_store.py — PalaceStoreBackend + PalaceStoreCollection implementing BaseCollection by wrapping a palace_store.compat collection. Lazy-imports palace_store so users on the chromadb path never pay for its import graph. Preserves ChromaBackend's "no palace found" FileNotFoundError behavior on create=False paths so searcher/palace error surfaces stay identical across backends. Modified: * mempalace/backends/__init__.py — adds ``get_default_backend()`` dispatching on ``MEMPAL_STORAGE``. Unknown values raise rather than silently falling back — an obvious typo is better surfaced. * mempalace/palace.py — one-line swap from direct ``ChromaBackend()`` instantiation to ``get_default_backend()``. * palace_store/compat.py::PersistentClient — passes through parallel_query / max_workers / blas_threads kwargs so the PalaceStoreBackend can expose them via env vars (MEMPAL_PARALLEL_QUERY, MEMPAL_MAX_WORKERS). * pyproject.toml — adds the ``[palace-parallel]`` optional extra pulling in threadpoolctl for users who want the ~3-4x BLAS thread scoping speedup. Also routes ``benchmarks/longmemeval_bench.py`` through its own inline env var selector (not through mempalace.backends, since that seam is about BaseCollection and the benchmark needs chromadb-shaped EphemeralClient). Default unchanged; MEMPAL_STORAGE=palace_store swaps in the compat shim. Validation: * mempalace test suite: 567/567 pass under chromadb backend (no regression) * mempalace test suite: 567/567 pass under palace_store backend * palace_store unit tests: 36/36 pass * LongMemEval full 500: R@5=0.966 byte-equivalent on both backends
…E=palace_store) Adds a PalaceStore backend behind the ``mempalace.backends`` seam introduced in MemPalace#413, gated by the ``MEMPAL_STORAGE`` environment variable at the single choke point where ``palace.py`` instantiates its default backend: * unset / chromadb / chroma → ChromaBackend (default, unchanged) * palace_store / palace / palacestore → PalaceStoreBackend (new) The whole seam is preserved — no mempalace consumer is modified outside palace.py, where ``ChromaBackend()`` is replaced by ``get_default_backend()``. Zero change to the BaseCollection contract or any of the modules that went through the MemPalace#413 refactor (searcher.py, layers.py, palace_graph.py, mcp_server.py, miner.py). New: * mempalace/backends/palace_store.py — PalaceStoreBackend + PalaceStoreCollection implementing BaseCollection by wrapping a palace_store.compat collection. Lazy-imports palace_store so users on the chromadb path never pay for its import graph. Preserves ChromaBackend's "no palace found" FileNotFoundError semantics on create=False so searcher/palace error surfaces stay identical across backends. Modified (mempalace core): * mempalace/backends/__init__.py — adds ``get_default_backend()`` dispatching on ``MEMPAL_STORAGE``. Unknown values raise rather than silently falling back — an obvious typo is better surfaced. * mempalace/palace.py — one-line swap from direct ``ChromaBackend()`` instantiation to ``get_default_backend()``. Modified (shim + benchmark): * palace_store/compat.py::PersistentClient — passes through parallel_query / max_workers / blas_threads kwargs so the PalaceStoreBackend can expose them via env vars (MEMPAL_PARALLEL_QUERY, MEMPAL_MAX_WORKERS). * benchmarks/longmemeval_bench.py — inlines its own MEMPAL_STORAGE selector (not through mempalace.backends, since the benchmark needs chromadb-shaped EphemeralClient, not a BaseCollection). Test fixtures routed through the seam so the suite works under either backend without duplication: * tests/conftest.py::collection — uses palace.get_collection() * tests/test_miner.py — uses palace.get_collection() in two places * tests/test_convo_miner.py — uses palace.get_collection() * tests/test_mcp_server.py::_get_collection — uses palace.get_collection() These routed fixtures yield a BaseCollection (not a raw ChromaDB collection object), which is fine because only BaseCollection methods are actually called on the fixture. Tests that specifically exercise ChromaBackend (tests/test_backends.py) are untouched. Drive-by fix: * .github/workflows/ci.yml — adds ``develop`` to the push and pull_request triggers. The workflow was pinned to ``main`` only; when upstream introduced the ``develop`` branch (MemPalace#413 landed there), PRs targeting ``develop`` stopped getting CI runs. This one-line addition brings them back. Trivially revertable if the maintainers prefer a separate PR for this. Deps: * pyproject.toml — adds the ``[palace-parallel]`` optional extra pulling in threadpoolctl for users who want the ~3-4x BLAS thread scoping speedup. No new hard dependencies. Validation: * mempalace test suite: 598/598 pass under chromadb backend * mempalace test suite: 598/598 pass under palace_store backend * palace_store unit tests: 36/36 pass * ruff check + ruff format --check clean under ruff 0.4.10 (CI's pin) * LongMemEval full 500: R@5=0.966 byte-equivalent on both backends
* refactor: add stage-1 backend abstraction seam Introduce the first upstreamable storage seam for MemPalace without bringing in the PostgreSQL spike or any benchmark artifacts. This change adds a small backend package with: - BaseCollection as the minimal collection contract - ChromaBackend/ChromaCollection as the default implementation It then routes the main runtime collection consumers through that seam: - palace.py - searcher.py - layers.py - palace_graph.py - mcp_server.py - miner.status() Behavioral constraints kept for stage 1: - ChromaDB remains the only backend and the default path - no config/env backend selection yet - no PostgreSQL code - no benchmark or research files - existing tests stay unchanged Important compatibility details: - read paths now call the seam with create=False so they still surface the existing 'no palace found' behavior instead of silently creating empty collections - write paths keep create=True semantics through palace.get_collection() - layers/searcher retain a chromadb module attribute so the existing mock-based tests can keep patching PersistentClient unchanged - ChromaBackend only creates palace directories on create=True, which preserves mocked read-path tests that use fake read-only paths Verification: - python3 -m py_compile mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.py - pytest -q # 529 passed, 106 deselected * refactor: clean up stage-1 seam compatibility shims Tighten the stage-1 backend abstraction branch after review. This follow-up does three small things: - keep the chromadb compatibility hook in searcher.py and layers.py, but express it through the backends.chroma module so it no longer reads like an accidental unused import - fix the palace_graph.py helper alias to avoid the local name collision flagged by ruff (imported helper vs local _get_collection wrapper) - preserve the existing mock-based test patch points unchanged while keeping the new backend seam intact Why this matters: - the direct form looked like a dead import in review, even though it was intentionally preserving the existing test seam ( and ) - palace_graph.py had a real lint issue ( redefinition) that was small but worth fixing before a public PR Verification: - /opt/homebrew/bin/ruff check mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.py - pytest -q tests/test_layers.py tests/test_searcher.py - pytest -q # 529 passed, 106 deselected * docs: explain backend shim imports in search paths Add short code comments in searcher.py and layers.py explaining why the module-level `chromadb` alias remains after the stage-1 backend seam refactor. The alias is intentional: it preserves the existing mock patch points used by the current test suite (`mempalace.searcher.chromadb.PersistentClient` and `mempalace.layers.chromadb.PersistentClient`) while the runtime logic now flows through the backend abstraction. This keeps the public PR easier to review because the apparent "unused import" now has an explicit reason next to it. Verification: - /opt/homebrew/bin/ruff check mempalace/searcher.py mempalace/layers.py - pytest -q tests/test_layers.py tests/test_searcher.py * refactor: reuse a default backend instance in palace helper Tighten the stage-1 backend seam by promoting the default Chroma backend adapter to a module-level singleton in `mempalace/palace.py`. This keeps the stage-1 scope unchanged — Chroma is still the only backend wired in this branch — but avoids constructing a fresh `ChromaBackend()` object on every `get_collection()` call. The backend is stateless today, so this is a readability/cleanup change rather than a behavioral one. Why this helps: - makes `palace.get_collection()` read like a real default factory instead of an inline constructor call - keeps the stage-1 branch a little cleaner before opening the public PR - does not widen the backend surface or change any config/runtime behavior Verification: - python3 -m py_compile mempalace/palace.py - pytest -q tests/test_miner.py tests/test_layers.py tests/test_searcher.py - pytest -q # 529 passed, 106 deselected * fix: harden read-only seam behavior and update seam tests Preserve the stage-1 backend abstraction while closing the real read-path regression surfaced in PR review. What changed: - make ChromaBackend.get_collection(create=False) fail fast when the palace directory does not exist instead of letting PersistentClient create it as a side effect - update miner.status() to call get_collection(..., create=False) so status keeps the historical 'No palace found' behavior - remove the temporary chromadb shim aliases from layers.py and searcher.py now that the tests patch the seam directly - add focused tests for the new backends package, including ChromaCollection delegation and ChromaBackend create=True/create=False behavior - retarget layer/searcher tests to patch the backend seam instead of patching chromadb.PersistentClient inside production modules - add a regression test that status() does not create an empty palace when the target path is missing Verification: - ruff check . - uv run pytest -q - uv run pytest -q tests/test_backends.py tests/test_cli.py tests/test_mcp_server.py tests/test_layers.py tests/test_searcher.py tests/test_miner.py Notes: - the separate benchmark/slow/stress layer was started as a soak but not used as the merge gate for this PR branch * refactor: drop duplicate mcp collection cache declaration Remove a redundant `_collection_cache = None` assignment in `mempalace/mcp_server.py` left over after the stage-1 backend seam refactor. This does not change behavior; it only trims review noise in the MCP server module after the read-path hardening pass. Verification: - ruff check mempalace/mcp_server.py - uv run pytest -q tests/test_mcp_server.py --------- Co-authored-by: Sergey Kuznetsov <[email protected]>
Changes from Lu's review: - Fix architecture naming: consistent WING→ROOM→DRAWER throughout (was mixing "closets" from v4 private with "rooms" from v3 public) - Add AAAK to the architecture section (was missing — dialect.py listed in structure but not explained in how the system works) - Add "background everything" design principle (hooks are a core v4 feature) - Add i18n section under Contributing (8 languages shipped in PR #718) - Add backends/ to project structure (PR #413 already shipped this) - Add hooks/ directory to project structure with all hook scripts - Update architecture diagram: pluggable storage, AAAK index layer, background hooks with their event triggers - Add "adding a backend" and "adding a language" to Key Files section - Soften "100% recall is not a marketing number" to "100% recall is the design requirement" — it's the target, not a measured claim Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Formalizes the BaseCollection/BaseBackend contract introduced as a seam in #413 into an interchangeability spec that third-party backends can build to. Driven by six in-flight backend PRs (#574, #643, #665, #697, #700, #381) each implementing the interface differently. Key decisions captured: entry-point distribution, typed QueryResult/ GetResult replacing Chroma dict shape, daemon-first multi-palace model via PalaceRef, required where-clause subset (incl. $contains), mandatory embedder injection with model-identity validation, capability tokens, shared pytest conformance suite, and a backend-neutral migrate/verify CLI.
scripts/sync-upstream.sh does a fast-forward of develop to upstream/develop (pure mirror) and reports feature-branch drift. Optional --rebase-feature also rebases this branch onto upstream/develop (may hit conflicts — upstream's MemPalace#413 backend seam touched our files). .github/workflows/sync-upstream.yml runs the develop sync weekly and reports drift, also triggerable via workflow_dispatch. Docs section added to CLOUDFLARE.md explaining the sync flow so forks/downstream users know how to stay current. Upstream (MemPalace/mempalace) is local-first and won't accept this CF backend; fork stays as the home for remote-backend users.
scripts/sync-upstream.sh does a fast-forward of develop to upstream/develop (pure mirror) and reports feature-branch drift. Optional --rebase-feature also rebases this branch onto upstream/develop (may hit conflicts — upstream's MemPalace#413 backend seam touched our files). .github/workflows/sync-upstream.yml runs the develop sync weekly and reports drift, also triggerable via workflow_dispatch. Docs section added to CLOUDFLARE.md explaining the sync flow so forks/downstream users know how to stay current. Upstream (MemPalace/mempalace) is local-first and won't accept this CF backend; fork stays as the home for remote-backend users.
…alace#1072 exist, not a write-from-scratch Previous version claimed Postgres was "out of scope — requires writing a new BaseBackend implementation". That was wrong: @skuznetsov's MemPalace#665 already ships a PostgreSQL backend (pg_sorted_heap preferred, pgvector fallback), and @malakhov-dmitrii's MemPalace#1072 wires MEMPALACE_BACKEND env var through palace._DEFAULT_BACKEND so entry-point backends actually get used. RFC 001 seam (MemPalace#413, MemPalace#995) is merged; registry.py advertises mempalace_postgres as the canonical example. Reframed as parallel track: palace-daemon is deployable today (no migration, wraps current ChromaDB palace); Postgres needs both PRs to land plus a drawer migration (export_palace() + importer — not code we'd write) but eliminates the entire 1.5.x failure class at the storage layer. Starting with palace-daemon since it's deployable now; the daemon is storage-agnostic so Postgres remains available as a later swap. Also fixed palace-daemon lock path: /tmp/palace-daemon-8085.lock (per-port), not /tmp/palace-daemon.lock.
What does this PR do?
Introduce the first upstreamable storage seam for MemPalace without bringing in the PostgreSQL spike or any benchmark artifacts.
This change adds a small backend package with:
It then routes the main runtime collection consumers through that seam:
Behavioral constraints kept for stage 1:
Important compatibility details:
create=Falseso they preserve the existing "no palace found" behavior instead of creating empty palacescreate=Truesemantics throughpalace.get_collection()ChromaBackend.get_collection(create=False)now fails fast when the palace directory is missing, so read-only commands do not materialize storage as a side effectchromadbshimsHow to test
Verification:
python3 -m py_compile mempalace/backends/__init__.py mempalace/backends/base.py mempalace/backends/chroma.py mempalace/palace.py mempalace/searcher.py mempalace/layers.py mempalace/palace_graph.py mempalace/mcp_server.py mempalace/miner.pyruff check .uv run pytest -q(538 passed, 106 deselected)Checklist
uv run pytest -q)ruff check .)Refs #266