release: v3.3.2 by igorls · Pull Request #1041 · MemPalace/mempalace

igorls · 2026-04-19T19:56:16Z

Merges develop into main for the v3.3.2 release.

Version bumps

pyproject.toml → 3.3.2
mempalace/version.py → 3.3.2
README.md version badge → 3.3.2
uv.lock → 3.3.2 (also picks up the chromadb >=1.5.4,<2 specifier from fix: upgrade chromadb to >=1.5.4 for Python 3.13/3.14 compatibility + fix 1.5.x queue-stall (closes #1006) #1010)
.claude-plugin/marketplace.json → 3.3.2
.claude-plugin/plugin.json → 3.3.2
.codex-plugin/plugin.json → 3.3.2

Changelog

Develop's CHANGELOG.md hadn't received the main → develop sync after v3.3.1, so this PR takes the post-3.3.1 CHANGELOG from main as the base and adds the new [3.3.2] — 2026-04-19 section on top. Full entry in CHANGELOG.md — highlights below.

Bug Fixes

Fix silent transcript drop: .jsonl ingestion + 500 MB cap + tandem sweeper #998 — Fix silent drop of .jsonl files in the project miner; raise MAX_FILE_SIZE cap from 10 MB to 500 MB. Adds a tandem sweeper (mempalace sweep <target>) — a message-level, timestamp-coordinated, idempotent safety net that catches anything the primary miner missed.
fix(searcher): guard against None metadata in CLI print path #999, fix: guard Layer3.search_raw against None doc/meta from ChromaDB (#1011) #1013 — Guard Layer3.search_raw, searcher CLI, closet loop, miner status histogram, and MCP tool_status / list_wings / list_rooms / get_taxonomy against None doc/metadata rows from ChromaDB. Prevents AttributeError crashes on mixed-schema palaces.
fix: upgrade chromadb to >=1.5.4 for Python 3.13/3.14 compatibility + fix 1.5.x queue-stall (closes #1006) #1010 — Upgrade chromadb floor to >=1.5.4 for Python 3.13 / 3.14 compatibility; pin upper bound to <2 so future breaking majors don't silently install.
fix: replace Unicode checkmark with ASCII for Windows encoding (#535) #681 — Fix Unicode checkmark (✓) rendering on Windows terminals that can't encode the glyph — avoids UnicodeEncodeError crashes on first-run output.
feat(backends): quarantine_stale_hnsw — recover from HNSW/sqlite drift (closes #823) #1000 — quarantine_stale_hnsw: on open, detect HNSW segment directories whose data_level0.bin is significantly older than chroma.sqlite3 and rename them out of the way. Recovers from HNSW/sqlite drift (the [Feature Request]: HNSW index pruning chroma-core/chroma#2594 SIGSEGV failure mode) and rebuilds the index lazily on next use.
fix(hooks): PID file guard prevents stacking mine processes #1023 — mine writes a per-source-directory PID file and refuses to start if an existing mine is still running. Cross-platform PID liveness check — on Windows, os.kill(pid, 0) terminates the target, so a platform-aware probe is used instead.

Improvements (internal)

refactor(backends): RFC 001 §10 cleanup — typed results, PalaceRef, registry #995 — RFC 001 §10: typed QueryResult / GetResult dataclasses, PalaceRef, and registry-based backend discovery on BaseBackend. No user-facing API change.
refactor(sources): RFC 002 §9 scaffolding — BaseSourceAdapter, registry, PalaceContext #1014 — RFC 002 §9: BaseSourceAdapter, adapter registry, and PalaceContext scaffolding for future pluggable ingest sources. No user-facing API change yet.

Documentation

docs: RFC 002 — Source adapter plugin specification #990 — RFC 002: source adapter plugin specification.
docs: CLI help and README reference fictional ~/chats/ path instead of real ~/.claude/projects/ #996, docs: use real ~/.claude/projects/ path in first-run help and README (#996) #1012 — First-run help text and README reference the real ~/.claude/projects/<project>/ path shape.

Smoke test

Ran an end-to-end smoke pass against ~/.claude/projects (148 files, ~6.9K drawers) and a subset of local repos on a fresh palace:

Full convo mine → all drawers persisted, dry-run/real parity, 0 None-metadata rows.
Project mine on 2 repos (mempalace-format, mempalace-ts) → 1697 drawers persisted across correct wings.
Idempotent re-mine → 0 duplicates, 0 new drawers.
Read paths (status, wake-up, search) → verbatim content returned, no crashes.
100-upsert direct stress → count moves +100, all retrievable by ID.

Smoke-test side-note (not release-blocking)

Discovered a pre-existing ~/.mempalace/palace on the author's machine that had gone write-dead (stuck UPSERTs in embeddings_queue from the pre-#1000 develop window). mempalace repair recovered it cleanly (100% drawer retention, auto-backup). Worth calling out so users upgrading from a pre-RC develop snapshot have a recovery path if they see the symptom: mempalace repair --yes. Could also be added as a short note to the release announcement.

Post-merge checklist

Tag v3.3.2 on the squash-merge commit on main (same shape as v3.3.1 was tagged on 6889c6f)
Draft GitHub Release referencing the [3.3.2] CHANGELOG section
Sync main → develop afterwards so the CHANGELOG alignment (finalized [Unreleased] / added [3.3.1] backfill) is also on develop — this was the step that was missed after v3.3.1

Test plan

Full suite: uv run python -m pytest tests/ --ignore=tests/benchmarks -q → 1033 passed, 91s
uv run ruff check . → all checks passed
uv run ruff format --check clean on changed files (pre-existing format drift on 10 unrelated files on develop is unchanged by this PR)
Pre-commit hooks (ruff + ruff-format) pass on the release commit
Version-consistency README-claim tests pass (tests/test_readme_claims.py → 64 passed)

Introduces a version label to the command-line interface, displaying the current MemPalace version in the help text. Adds a `--version` flag to allow users to easily check the version and exit.

Windows terminals using cp1251/cp1252 crash on the Unicode ✓ (U+2713) in progress output. Replace with ASCII + in convo_miner.py and split_mega_files.py. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Also fix miner.py checkmark and box-drawing/arrow chars (─, →) in both miner.py and split_mega_files.py that would crash on cp1251/cp1252. Co-Authored-By: Claude Opus 4.6 <[email protected]>

…e-update # Conflicts: # website/index.md

feat(website): update landing page

- Landing: replace nonexistent `mempalace remember` CLI demo with real `mempalace mine ./notes` - Landing: soften unverifiable absolutes ("forever available", "100% recall by design", "<50 ms", "90%+ compression", "two-thousand-year-old", "tens of thousands of entries") - MCP tool count: 19 → 29 across mcp-integration, claude-code, openclaw, and modules; expand tool overview with Drawers, Tunnels, and System categories to match mcp_server.py - Wake-up token range: ~170–900 → ~600–900 in cli/api-reference/python-api to match cli.py help text and concept docs - Gemini CLI: move `--scope user` before target name and add `--` separator so `-m mempalace.mcp_server` isn't parsed as Gemini flags

fix(website): correct false claims and stale numbers in live docs

feat(cli): add version display and version flag to CLI

…e-update Co-authored-by: Copilot <[email protected]>

Extract 2002-line monolith into landing/ subfolder: - 8 section components (FolioHeader, HeroSection, ForgettingSection, AnatomySection, DialectSection, MechanicsSection, InstallSection, CatalogFooter) - useLandingEffects.js composable for all vanilla-JS effects - landing.css for all styles - Landing.vue reduced to 28-line orchestrator Also restores upstream hero lede text ("permanent. Designed for total recall.").

feat/landing-page: Improve landing page readability

Draft plugin specification for source adapters, mirroring RFC 001's role for storage backends. Formalizes the contract six community ingester PRs (#274, #23, #169, #232, #567, #98, #702) plus #981's metadata-only mode have been reinventing ad-hoc, so adapter authors can build to a stable surface. Key decisions: - Single ingest() method; lazy adapters yield SourceItemMetadata ahead of drawers, eager adapters interleave - Declared-transformation model (§1.4) replaces informal verbatim promise with a verifiable one; byte_preserving adapters declare the empty set, declared_lossy adapters enumerate. Existing miner.py and the convo_miner+normalize pipeline map cleanly - Palace is the incremental cursor via is_current(item, metadata); no sidecar persistence - Routing is adapter-owned; detect_room/detect_hall move into the filesystem adapter - Flat metadata per ChromaDB (RFC 001 §1.4) — entity hints as json_string field, KG triples route to SQLite knowledge graph - Closets stay core-built as a post-step; adapters may emit flat closet_hints. Closes existing gap where convo drawers get no closets - No per-drawer field renames: source_file, filed_at, source_mtime, added_by, normalize_version, entities, ingest_mode all preserved. Spec adds adapter_name, adapter_version, privacy_class §9 enumerates the cleanup PR prerequisites (mempalace/sources/ module, PalaceContext facade, KnowledgeGraph.add_triple gaining backwards-compatible source_drawer_id + adapter_name params). Tracking issue: #989

…nd registry (RFC 001 §10) Advances RFC 001 §10 cleanup so backend-author PRs (#574 LanceDB, #665 Postgres, #700 Qdrant, #697 hosted, #643 PalaceStore, #381 Qdrant) have a stable target to align against. Scope (this PR): - Typed QueryResult / GetResult dataclasses replace Chroma's dict shape at the BaseCollection boundary (§1.3). A transitional _DictCompatMixin keeps existing callers working while the attribute-access migration proceeds. - BaseCollection is now kwargs-only across add/upsert/query/get/delete/update with ABC defaults for estimated_count/close/health and a non-atomic default update() (§1.1–1.2). - PalaceRef replaces raw path strings at the backend boundary (§2.2). - BaseBackend ABC with get_collection/close_palace/close/health/detect (§2.3). - mempalace.backends entry-point group + in-tree registry with resolve_backend_for_palace priority order matching §3.2–3.3. - ChromaCollection normalizes chroma returns into typed results; unknown where-clause operators raise UnsupportedFilterError (no silent drop, §1.4). - ChromaBackend absorbs the inode/mtime client-cache freshness check previously duplicated in mcp_server._get_client() (§10 + PR #757). - searcher.py migrated to typed-attribute access as the reference call site; remaining callers land in a follow-up. - pyproject: chroma registered via [project.entry-points."mempalace.backends"]. Out of scope (explicit follow-ups): - Full caller migration off the dict-compat shim across palace.py, mcp_server.py, miner.py, convo_miner.py, dedup.py, repair.py, exporter.py, palace_graph.py, cli.py, closet_llm.py. - Embedder injection + three-state EmbedderIdentityMismatchError check (§1.5). - maintenance_state() / run_maintenance() benchmark hooks (§7.3). - AbstractBackendContractSuite full coverage (§7.1–7.2). - mempalace migrate / mempalace verify CLI rewrites through BaseCollection (§8). Tests: 970 passed (up from 967 on develop); new coverage for typed results, empty-result outer-shape preservation, \$regex rejection, registry lookup, priority resolver, and PalaceRef-kwarg ChromaBackend.get_collection. Refs: #743 (RFC 001), #989 (RFC 002 tracking issue).

mempalace/miner.py:READABLE_EXTENSIONS contained `.json` but not `.jsonl`. Every jsonl file encountered in a mined directory was silently skipped at miner.py:722: if filepath.suffix.lower() not in READABLE_EXTENSIONS: continue Claude Code transcripts, ChatGPT exports, and every other tool writing line-delimited JSON ship as `.jsonl`. Users running `mempalace mine` against a directory of transcripts saw the command complete with no error and no log line — and their conversations never reached the palace. Silent data loss. Adding `.jsonl` to the whitelist alongside `.json`. jsonl is text line-by-line; the existing chunking pipeline handles it the same way it handles any other text file. Tests: tests/test_miner_jsonl_visibility.py Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Long Claude Code sessions routinely produce transcripts larger than 10 MB. The previous cap at miner.py:65 silently dropped them at line 732 with `if filepath.stat().st_size > MAX_FILE_SIZE: continue` — same silent-failure pattern as the .jsonl extension bug. The cap exists as a safety rail against pathological binaries, not as a limit on legitimate text. Downstream chunking at 800 chars per drawer means source file size does not affect storage or embedding cost. 500 MB leaves headroom for year-long continuous transcripts while still catching accidental multi-GB binary mines. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Mirrors the miner.py fix in this same branch. convo_miner.py had the exact same 10 MB cap at line 58 that silently dropped long transcripts via continue. Long Claude Code sessions, multi-year ChatGPT exports, and lifetime Slack dumps all exceed 10 MB. Same silent-drop pattern, different file. Raised to 500 MB to match miner.py for consistency; downstream chunking means source file size does not affect storage or embedding cost. Tests: tests/test_convo_miner_size_cap.py (1 test) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

The primary miners (miner.py, convo_miner.py) operate at file granularity and can drop data for several reasons: size caps, silent OSError on read, dedup false positives, extensions the project miner does not recognize. Even with tonight's hotfixes, any future bug in the file-level path risks silent data loss. The sweeper is a second, cooperating miner that works at MESSAGE granularity: - Parses Claude Code .jsonl line by line, yielding only user/assistant records (filters progress, file-history-snapshot, etc. noise). - For each session_id, queries the palace for max(timestamp) and treats that as the cursor. - Ingests only messages newer than the cursor, as one small drawer per exchange (never hits a size cap — each drawer is 1-5 KB). - Deterministic drawer IDs from session_id + message UUID make reruns idempotent; crash mid-sweep is safe. Tandem coordination is free: if the primary miner committed up to timestamp T, the sweeper resumes from T. If the primary miner missed everything, the sweeper catches it all. Neither duplicates the other. Smoke test on a real Claude Code transcript: 1st run: +39 drawers, 0 already present 2nd run: +0 drawers, 39 already present (perfect idempotence) Opt-in via: mempalace sweep <file.jsonl> mempalace sweep <transcript-dir> No changes to existing miners. No schema migration. Purely additive. Tests: tests/test_sweeper.py (7 tests covering parsing, tandem coordination, idempotency, resume-from-cursor, metadata correctness). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

… logged failures Four changes on top of the proposal's initial sweeper draft, driven by the CLAUDE.md design principles: 1. Drop the 500-char truncation on tool_use / tool_result content in _flatten_content. The "verbatim always" principle forbids lossy compression of user-adjacent data; a long code-edit diff handed to the assistant must round-trip intact. Unknown block types now also serialize their full payload instead of just a type marker. New test test_parse_preserves_tool_blocks_verbatim covers a 5000-char input. 2. Use the full session_id in drawer IDs (not session_id[:12]). Rules out cross-session collisions if a transcript source ever uses non-UUID session identifiers or shared prefixes. 3. Replace silent `except Exception: return None` in get_palace_cursor with a logger.warning — the exact anti-pattern this PR otherwise criticizes in miner.py. The fallback behavior is still safe (deterministic IDs make a missed cursor recover on the next run), but the failure is now discoverable. 4. sweep_directory now collects per-file failures into the result dict and the CLI exits non-zero when any file failed, so a partial-sweep outcome is visible rather than swallowed. Co-Authored-By: MSL <[email protected]>

Four defects surfaced by the automated review, fixed with targeted tests: 1. BaseCollection.update() default now validates that documents / metadatas / embeddings lengths match ids, raising ValueError instead of silently misaligning pairs or raising IndexError (base.py). 2. ChromaCollection.query() now rejects the two ambiguous input shapes up front — neither or both of query_texts / query_embeddings, and empty input lists — with clear ValueError messages rather than delegating to chromadb's less-obvious errors (chroma.py). 3. QueryResult.empty() accepts embeddings_requested=True to preserve the outer-query dimension with empty hit lists when the caller asked for embeddings, matching the spec rule that included fields carry the outer shape even when empty (base.py). ChromaCollection.query() threads this through on the empty-result path (chroma.py). 4. ChromaBackend cache-freshness check now matches the semantics from mcp_server._get_client (merged via #757) on three edge cases Copilot called out: (a) invalidate when chroma.sqlite3 disappears while a cached client is held, (b) treat a 0→nonzero stat transition as a change so a cache built when the DB did not yet exist is refreshed, (c) re-stat after PersistentClient constructs the DB lazily so freshness reflects the post-creation state (chroma.py). Tests: 978 passed (up from 970), 8 new tests covering the fixes.

…mments Six items from the automated review on PR #998: 1. **Cursor tie-break bug (correctness).** The skip condition was `rec.timestamp <= cursor`; if multiple messages share the max timestamp and only some were ingested before a crash, the rest would be lost forever. Changed to `< cursor`, relying on deterministic drawer IDs for safe re-attempt at the boundary. Regression test `test_sweep_recovers_untaken_message_at_cursor_timestamp`. 2. **`drawers_added` counted upserts, not adds.** Added a pre-flight `collection.get(ids=batch)` to distinguish new rows from already- present ones. Return value now carries `drawers_added`, `drawers_already_present`, `drawers_upserted`, and `drawers_skipped` separately. Dict-compatible access (`existing.get("ids")`) keeps it working on both the raw Chroma return and the typed `GetResult`. 3. **`sweep_directory` hid failures in the summary.** `files_processed` used to exclude failed files. Replaced with `files_attempted` (all discovered) + `files_succeeded` (subset that completed); CLI output shows `succeeded/attempted`. 4. **Coordination claim was overstated.** The primary miners don't stamp `session_id`/`timestamp` metadata, so the sweeper coordinates only with its own prior runs. Softened docstrings on module and CLI command. Uniform cross-miner metadata is flagged as a follow-up. 5. **MAX_FILE_SIZE comments were misleading.** Said source size "does not affect storage or embedding cost" — true per-drawer, but source size still scales drawer count, embedding work, and memory usage (files are read in full, not streamed). Corrected in both `miner.py` and `convo_miner.py`. 6. Added the tie-break regression test that reproduces the correctness bug from (1). Tests: 970 passed (was 969), ruff + pre-commit clean. Co-Authored-By: MSL <[email protected]>

…ests test_base_collection_update_default_validates_list_lengths and test_base_collection_update_default_rejects_mismatched_lengths were spinning up a real ChromaBackend and calling add(documents=...), which triggered ChromaDB's default ONNX embedding function and attempted a network download — failing in offline/sandboxed CI. BaseCollection.update() validates list lengths before any DB access, so no items need to be pre-loaded for the length-check to fire. Switch both tests to use _FakeCollection (same as the rest of the unit tests in this file) so they are pure in-memory and network-free. Also fixes a structural bug in test 1: collection._collection.add() was accidentally placed inside the pytest.raises(ValueError) block, masking the real assertion. Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/55fc663e-b256-4b8b-88ce-4271560def8d Co-authored-by: igorls <[email protected]>

Fix silent transcript drop: .jsonl ingestion + 500 MB cap + tandem sweeper

PermissionError [WinError 32] on Windows when Path.unlink() runs while chromadb.PersistentClient still holds a handle on chroma.sqlite3. Rewrite test_chroma_cache_invalidates_when_db_file_missing to prime backend._clients/_freshness with a sentinel object instead of opening a real PersistentClient, so the unlink runs against an unheld file. The assertion is also corrected: after invalidation, ChromaBackend's _client rebuilds a fresh PersistentClient which re-creates chroma.sqlite3 and re-stats it, so freshness ends up at the post-rebuild stat (not (0, 0.0) as the assertion previously expected). The meaningful invariant is "freshness advanced past the pre-unlink value AND the sentinel was replaced", which the test now checks. Ref: Windows CI failure on 995.

Five findings from the automated review, fixed with targeted tests where behavior changed: 1. Transformation Protocol (transforms.py). The registry mixed a bytes-to-str transform (utf8_replace_invalid) with str-to-str transforms under a single Callable[..., str] type, misleading static type checkers and adapter authors. Introduced a Transformation Protocol with __call__(data: bytes|str) -> str and retyped the registry + get_transformation return. 2. Drawer-id collision risk (context.py). Switched _build_drawer_id from sha1[:16]=64 bits to sha256[:24]=96 bits. 64 bits sits uncomfortably close to the birthday bound for palace-sized corpora; 96 bits keeps the collision probability negligible while preserving the existing <prefix>_<chunk> layout adapters rely on. 3. Fresh-schema KG columns (knowledge_graph.py). source_drawer_id and adapter_name now live in the canonical CREATE TABLE so new palaces don't take an ALTER round-trip on first open. _migrate_schema stays for legacy palaces (SQLite has no ADD COLUMN IF NOT EXISTS, so PRAGMA introspection is still needed there). 4. Identity-shim comment (transforms.py). Comment said the adapter-specific transforms "raise if invoked without adapter context" but they return the input unchanged. Updated the comment to match the actual identity- shim behavior Copilot suggested. 5. Test docstring (test_sources.py). Comment mentioned default_factory=list but SourceRef.options uses default_factory=dict. Corrected. Tests: 1020 passed (up from 1018), +2 new tests for the sha256 id shape and the fresh-schema column presence on new palaces.

…996) The CLI help text and README told first-time users to mine from ~/chats/, a path that doesn't exist on any machine. Real location where Claude Code writes session JSONL is ~/.claude/projects/<escaped-project-path>/. Updates three user-visible strings: - mempalace/cli.py line 7 ("Two ways to ingest" block) - mempalace/cli.py line 25 (Examples block) - README.md line 58 (Quickstart) Website guides (website/guide/mining.md, getting-started.md) still reference ~/chats/ for ChatGPT/Slack export scenarios where that remains a valid placeholder. Those can be a separate PR if the maintainers want to tilt the website examples toward Claude Code specifically. Fixes #996.

Same class of bug as #1007: ChromaDB's query() can return None in the documents and metadatas arrays when a drawer's HNSW vector entry exists but its metadata/document rows haven't been materialized. The code in Layer3.search_raw (mempalace/layers.py) calls meta.get("wing", ...), meta.get("room", ...), meta.get("source_file", ...) directly without null safety, so it raises: AttributeError: 'NoneType' object has no attribute 'get' Two-line defensive coercion matching the pattern in #1009 / PR #999 for searcher.py: meta = meta or {}, doc = doc or "". The hit still appears with its real distance; source/wing/room fall back to their fallback values where the metadata row is missing. Frequently hit on chromadb 1.5.x (root cause #1006). Even after the chromadb floor lands (#1010), partial-state results remain possible during interrupted mines and schema upgrade boundaries, so the guard is worth having on its own. Fixes #1011.

fix(searcher): guard against None metadata in CLI print path

…ard-1011 fix: guard Layer3.search_raw against None doc/meta from ChromaDB (#1011)

…mpat-via-581 fix: upgrade chromadb to >=1.5.4 for Python 3.13/3.14 compatibility + fix 1.5.x queue-stall (closes #1006)

…-path-996 docs: use real ~/.claude/projects/ path in first-run help and README (#996)

…folding refactor(sources): RFC 002 §9 scaffolding — BaseSourceAdapter, registry, PalaceContext

…-spec docs: RFC 002 — Source adapter plugin specification

Add a helper that renames HNSW segment directories whose `data_level0.bin` is significantly older than `chroma.sqlite3`. Drift between the on-disk HNSW graph and the live embeddings table is the root cause of a segfault class where the Rust graph-walk dereferences dangling neighbor pointers for entries in the metadata segment that no longer exist in the HNSW index, crashing in a background thread on `count()` or `query()`. Issue #823 describes the same drift as a silent-staleness symptom (semantic search returns stale results after `add_drawer` because `data_level0.bin` lags the sqlite metadata under the default `sync_threshold=1000`). Under heavier load or after an interrupted write, the same drift can escalate from "silent stale results" to "SIGSEGV on next open," which is the failure mode observed at neo-cortex-mcp#2 (chromadb 1.5.5, Python 3.12) and acknowledged at chroma-core/chroma#2594. On one 135K-drawer palace where `index_metadata.pickle` claimed 137,813 elements against 135,464 rows in sqlite (2,349-entry drift), fresh Python processes crashed in `col.count()` 17/20 times; after renaming the segment dir out of the way and letting ChromaDB rebuild lazily, the same 20-run check went to 0 crashes. The recovery path #823 suggests (export / recreate / reimport) is heavy — it re-embeds every drawer. This helper is lighter: rename the segment dir so ChromaDB reopens without it, and the indexer rebuilds lazily on the next write. The original directory is renamed (not deleted) so the operator can recover if the heuristic misfires. If `chroma.sqlite3` is more than `stale_seconds` (default 3600) newer than the segment's `data_level0.bin`, the segment is considered suspect. One hour is deliberately conservative — normal HNSW flush cadence is seconds to minutes, so an hour of drift implies a crashed mid-write, not routine lag. - Additive: exposes `quarantine_stale_hnsw(palace_path, stale_seconds)` as a helper. Not wired into `_client()` / startup on this PR — the goal is to land the primitive first so operators and higher layers can opt in. A follow-up could call it automatically on palace open behind an env var or config flag. - Closes #823 by giving operators a first-class recovery path without having to install `chromadb-ops` or re-mine. Four new tests in `tests/test_backends.py`: - renames drifted segment, preserves original files under `.drift-TS` suffix - leaves fresh segments alone - no-op on missing palace path / missing `chroma.sqlite3` - skips already-quarantined (`.drift-` suffixed) directories `pytest tests/test_backends.py` → 11 passed. `ruff check` / `ruff format --check` — clean.

Every stop hook fire spawned a new background `mempalace mine` via subprocess.Popen with no dedup — 4 concurrent mines at ~770% CPU observed in production. Add `_mine_already_running()` (reads `hook_state/mine.pid`, uses `os.kill(pid, 0)` as an existence check) and `_spawn_mine()` (writes the child PID to the lock file after Popen returns). `_maybe_auto_ingest` bails early when the guard reports True. Tests: 4 new unit tests for `_mine_already_running` (no file, dead PID, live PID using `os.getpid()`, corrupt file), 1 new test covering the skip-when-running branch of `_maybe_auto_ingest`, and existing spawn tests patched to redirect `_MINE_PID_FILE` into tmp_path so they don't touch the real state dir. Co-Authored-By: Claude Opus 4.7 <[email protected]>

…in OSError On Windows, os.kill(bogus_pid, 0) raises OSError[WinError 87] "The parameter is incorrect" — NOT ProcessLookupError. The old except tuple missed it, so test_mine_already_running_dead_pid failed on Windows CI. Catching OSError covers ProcessLookupError + PermissionError + FileNotFoundError on POSIX and WinError 87 on Windows. ValueError still guards the int() parse. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Real bug surfaced on CI for this PR. On POSIX, os.kill(pid, 0) is the canonical no-op existence probe. On Windows, Python's os.kill maps to TerminateProcess(handle, sig), which *terminates* the target with exit code sig. os.kill(pid, 0) therefore kills the target with exit code 0 — silently destroying our mine child (or, as happened in test_mine_already_running_live_pid, the pytest process itself). Fix: split into _pid_alive(pid) helper with a Windows branch using ctypes.windll.kernel32.OpenProcess + GetExitCodeProcess. PROCESS_QUERY_LIMITED_INFORMATION opens a handle only if the PID exists; STILL_ACTIVE (259) distinguishes running from exited processes. No new dependencies — stdlib ctypes. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

fix: replace Unicode checkmark with ASCII for Windows encoding (#535)

feat(backends): quarantine_stale_hnsw — recover from HNSW/sqlite drift (closes #823)

fix(hooks): PID file guard prevents stacking mine processes

Version bumps across pyproject.toml, mempalace/version.py, README badge, uv.lock, and plugin manifests (.claude-plugin/*, .codex-plugin/*). CHANGELOG aligned with main (post-3.3.1) and a new [3.3.2] section added covering the 11 PRs merged on develop since v3.3.1 — silent-transcript-drop fix + tandem sweeper (#998), None-metadata guards (#999, #1013), chromadb ≥1.5.4 for Py 3.13/3.14 (#1010), Windows Unicode (#681), HNSW quarantine recovery (#1000), PID stacking guard (#1023), doc-path cleanup (#996, #1012), and RFC 001/002 internal scaffolding (#995, #1014, #990).

Copilot

Pull request overview

Release PR for v3.3.2, merging develop into main with version bumps, dependency updates, new ingest safety tooling, backend/source plugin scaffolding, and substantial documentation/website updates.

Changes:

Bump package/plugin/docs versions to 3.3.2 and update chromadb spec to >=1.5.4,<2.
Add the tandem sweeper (mempalace sweep) and expand ingestion robustness (jsonl visibility, size caps, PID guard, None-metadata guards, HNSW drift quarantine).
Introduce RFC scaffolding for typed backend contracts and source adapter plugins, with supporting tests and docs site updates (including a new landing page).

Reviewed changes

Copilot reviewed 61 out of 64 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
website/reference/python-api.md	Update `wake_up` token estimate
website/reference/modules.md	Update MCP tool count + module description
website/reference/cli.md	Update `wake-up` token estimate
website/reference/api-reference.md	Update `wake_up` token estimate
website/index.md	Switch homepage to `<Landing />`
website/guide/openclaw.md	Update MCP tool count
website/guide/mcp-integration.md	Update MCP tool count + tool list expansion
website/guide/gemini-cli.md	Update Gemini MCP command syntax/docs
website/guide/claude-code.md	Update MCP tool count
website/.vitepress/theme/landing/useLandingEffects.js	Landing page effects + waitlist submission
website/.vitepress/theme/landing/MechanicsSection.vue	New landing “Mechanics” section
website/.vitepress/theme/landing/InstallSection.vue	New landing “Install” section
website/.vitepress/theme/landing/HeroSection.vue	New landing hero section
website/.vitepress/theme/landing/ForgettingSection.vue	New landing demo section
website/.vitepress/theme/landing/FolioHeader.vue	New landing header/nav
website/.vitepress/theme/landing/DialectSection.vue	New landing AAAK section
website/.vitepress/theme/landing/CatalogFooter.vue	New landing footer
website/.vitepress/theme/landing/AnatomySection.vue	New landing anatomy section
website/.vitepress/theme/index.ts	Register `Landing` component
website/.vitepress/theme/Landing.vue	Compose landing sections
website/.vitepress/config.mts	Add fonts + GA scripts
tests/test_sweeper.py	Sweeper parsing/idempotency tests
tests/test_sources.py	Source adapter scaffolding tests
tests/test_searcher.py	None-metadata regression tests
tests/test_miner_jsonl_visibility.py	Miner `.jsonl` + size-cap tests
tests/test_miner.py	Miner `status()` None-metadata test
tests/test_mcp_server.py	MCP `tool_status` None-metadata test
tests/test_hooks_cli.py	PID guard tests for hook auto-ingest
tests/test_convo_miner_size_cap.py	Convo miner size-cap test
tests/test_backends.py	Typed results/registry/quarantine tests
pyproject.toml	Version bump, chromadb pin, entry points
mempalace/version.py	Version bump to 3.3.2
mempalace/sweeper.py	New message-level sweeper implementation
mempalace/split_mega_files.py	ASCII-safe CLI output
mempalace/sources/transforms.py	Reserved transformations registry
mempalace/sources/registry.py	Source adapter registry/entry points
mempalace/sources/context.py	`PalaceContext` facade + drawer-id helper
mempalace/sources/base.py	`BaseSourceAdapter` contract + typed records
mempalace/sources/init.py	Public exports for sources subsystem
mempalace/searcher.py	Typed-result compatibility + None-metadata guards
mempalace/miner.py	`.jsonl` support, size cap, ASCII output, None-meta guard
mempalace/migrate.py	Update migration guidance for chromadb 1.x
mempalace/mcp_server.py	None-metadata guards in MCP tools
mempalace/layers.py	None doc/meta guards in L3 search paths
mempalace/knowledge_graph.py	KG provenance cols + backward-compatible migration
mempalace/hooks_cli.py	PID file guard + cross-platform liveness probe
mempalace/convo_miner.py	Raise size cap + ASCII output
mempalace/cli.py	Add `sweep` command + `--version`
mempalace/backends/registry.py	Backend entry-point registry
mempalace/backends/chroma.py	Typed chroma backend + cache freshness + HNSW quarantine
mempalace/backends/base.py	RFC 001 typed contracts + results dataclasses
mempalace/backends/init.py	Re-export backend public surface
mempalace/init.py	Update chromadb telemetry note
README.md	Update convo mining example + version badge
CHANGELOG.md	Add 3.3.2 release notes section
.codex-plugin/plugin.json	Plugin version bump to 3.3.2
.claude-plugin/plugin.json	Plugin version bump to 3.3.2
.claude-plugin/marketplace.json	Marketplace version bump to 3.3.2

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-19T20:00:41Z

+            raise TypeError("collection_name is required")
+        create = kwargs.pop("create", False)
+        if rest:
+            create = rest.pop(0)


_normalize_get_collection_args consumes at most one extra positional arg for create, but it never errors if more positional arguments are provided (they end up silently ignored). This can mask call-site bugs; please validate rest is empty after parsing and raise TypeError when unexpected args remain.

Suggested change

create = rest.pop(0)

create = rest.pop(0)

if rest:

raise TypeError(f"unexpected positional args: {rest!r}")

Copilot · 2026-04-19T20:00:42Z

+      ['script', { async: '', src: 'https://www.googletagmanager.com/gtag/js?id=G-PPQE4Z7P1K' }],
+      ['script', {}, `window.dataLayer = window.dataLayer || [];\nfunction gtag(){dataLayer.push(arguments);}\ngtag('js', new Date());\ngtag('config', 'G-PPQE4Z7P1K');`],


Google Analytics is injected unconditionally via gtag scripts. This has privacy/compliance implications and can surprise self-hosters; consider gating it behind an env var/build flag (default off) and/or documenting the tracking/consent model for the docs site.

Copilot · 2026-04-19T20:00:42Z

+                f"  ⚠ {len(failures)} file(s) failed to sweep — see stderr / logs for details.",
+                file=sys.stderr,
+            )
+            sys.exit(2)
+    else:
+        print(f"  ✗ Not a file or directory: {target}", file=sys.stderr)


The CLI output here uses non-ASCII glyphs (⚠ and an em dash). This can reintroduce UnicodeEncodeError crashes on Windows terminals (same class of issue as the ✓ fix). Consider switching to plain ASCII (e.g., "WARNING:") for stderr messages.

Suggested change

f" ⚠ {len(failures)} file(s) failed to sweep — see stderr / logs for details.",

file=sys.stderr,

)

sys.exit(2)

else:

print(f" ✗ Not a file or directory: {target}", file=sys.stderr)

f" WARNING: {len(failures)} file(s) failed to sweep - see stderr / logs for details.",

file=sys.stderr,

)

sys.exit(2)

else:

print(f" ERROR: Not a file or directory: {target}", file=sys.stderr)

Copilot · 2026-04-19T20:00:42Z

+            )
+            sys.exit(2)
+    else:
+        print(f"  ✗ Not a file or directory: {target}", file=sys.stderr)


This error path prints a Unicode ✗ glyph. For Windows terminals with non-UTF-8 encodings this can raise UnicodeEncodeError (similar to the earlier ✓ issue). Prefer ASCII-only output (e.g., "ERROR:" / "Not a file or directory").

Suggested change

print(f" ✗ Not a file or directory: {target}", file=sys.stderr)

print(f" ERROR: Not a file or directory: {target}", file=sys.stderr)

Copilot · 2026-04-19T20:00:42Z

+            result = sweep(str(f), palace_path, source_label=str(f))
+        except Exception as exc:
+            logger.error("sweeper: sweep failed on %s: %s", f, exc)
+            print(f"  \u26a0 sweep failed on {f}: {exc}", file=sys.stderr)


This stderr message uses the Unicode warning sign (⚠). Given the Windows encoding crashes addressed elsewhere (✓ → ASCII), this should likely be ASCII-only as well to avoid UnicodeEncodeError in non-UTF-8 terminals.

Suggested change

print(f" \u26a0 sweep failed on {f}: {exc}", file=sys.stderr)

print(f" WARNING: sweep failed on {f}: {exc}", file=sys.stderr)

Copilot · 2026-04-19T20:00:43Z

+  ;(function initWaitlist(){
+    const ENDPOINT = 'https://br.staging.mempalaceofficial.com/waitlist'
+    const forms = document.querySelectorAll('.mempalace-landing .waitlist')
+    const emailRe = /^[^\s@]+@[^\s@]+\.[^\s@]+$/


The waitlist endpoint is hard-coded to a staging host (br.staging.mempalaceofficial.com). Shipping this in a release will send real user emails to a staging environment. Make the endpoint production, or gate it behind a build-time env var with a safe default (disabled).

Copilot · 2026-04-19T20:00:43Z

+onBeforeUnmount(() => {
+  if (typeof document === 'undefined') return
+  document.body.classList.remove('mempalace-active')
+})


The onMounted hook creates IntersectionObservers and registers DOM event listeners (form submit/input, replay click), but onBeforeUnmount only removes the body class. If the user navigates away and back in the SPA, observers/listeners can leak or fire against detached nodes. Consider capturing observer instances and removing listeners/Disconnecting observers in onBeforeUnmount.

Non-ASCII glyphs (regression of the #681 class of Windows UnicodeEncodeError): - mempalace/cli.py: "✗" → "ERROR:", "⚠" → "WARNING:", em dash → "-" - mempalace/sweeper.py: "⚠" → "WARNING:" Backend arg validation: - mempalace/backends/chroma.py: `_normalize_get_collection_args` now raises TypeError on unexpected trailing positional args instead of silently dropping them — surfaces call-site bugs early. Docs site: - website/.vitepress/config.mts: gate Google Analytics scripts behind MEMPALACE_DOCS_GA_ID env var (default off). Self-hosters no longer get GA injected unconditionally. Landing page SPA hygiene: - website/.vitepress/theme/landing/useLandingEffects.js: collect all IntersectionObserver disconnects and removeEventListener thunks in a shared `cleanups` registry; drain it in `onBeforeUnmount` so observers and form/replay listeners don't leak across SPA navigations.

fix: address Copilot review on release/3.3.2

Conflicts resolved by taking the 3.3.2 side for all version files: - pyproject.toml, mempalace/version.py (3.3.2) - .claude-plugin/marketplace.json, .claude-plugin/plugin.json (3.3.2) - .codex-plugin/plugin.json (3.3.2) - README.md version badge (3.3.2) - uv.lock (3.3.2) - CHANGELOG.md keeps [3.3.2] section on top of main's [3.3.1] No source-code conflicts; main's 3.3.1 commit footprint is already in develop's history via the earlier sync boundaries. 1033 tests pass on the merged tree.

release: merge main into release/3.3.2

almirus and others added 30 commits April 15, 2026 21:44

feat(cli): add version display and version flag to CLI

10cdd93

Introduces a version label to the command-line interface, displaying the current MemPalace version in the help text. Adds a `--version` flag to allow users to easily check the version and exit.

fix: replace Unicode checkmark with ASCII + for Windows encoding (#535)

542b53b

Windows terminals using cp1251/cp1252 crash on the Unicode ✓ (U+2713) in progress output. Replace with ASCII + in convo_miner.py and split_mega_files.py. Co-Authored-By: Claude Opus 4.6 <[email protected]>

fix: replace all non-ASCII progress markers for Windows encoding

15ea385

Also fix miner.py checkmark and box-drawing/arrow chars (─, →) in both miner.py and split_mega_files.py that would crash on cp1251/cp1252. Co-Authored-By: Claude Opus 4.6 <[email protected]>

new landing page

9893fa2

new landing page pt 2

d8ac4c3

Merge remote-tracking branch 'upstream/develop' into feat/landing-pag…

44c525d

…e-update # Conflicts: # website/index.md

chore(website): add Google Analytics

c8727b3

Merge pull request #963 from domiscd/feat/landing-page-update

51919fe

feat(website): update landing page

Merge pull request #964 from MemPalace/fix/website-false-claims

596f3d3

fix(website): correct false claims and stale numbers in live docs

Merge pull request #918 from almirus/develop

41bff26

feat(cli): add version display and version flag to CLI

landing hero container

28d4f67

Merge remote-tracking branch 'upstream/develop' into feat/landing-pag…

8c3d1ba

…e-update Co-authored-by: Copilot <[email protected]>

(landing) added Closets section

e5f5009

(landing) svg icons animations

9e8281a

Update landing.css

2e3e0b9

Merge pull request #984 from domiscd/feat/landing-page-update

e4a2cd4

feat/landing-page: Improve landing page readability

Merge pull request #998 from MemPalace/fix/silent-transcript-drop

74a31b7

Fix silent transcript drop: .jsonl ingestion + 500 MB cap + tandem sweeper

igorls and others added 17 commits April 18, 2026 17:17

Merge pull request #999 from jphein/fix/searcher-none-metadata

1b89b49

fix(searcher): guard against None metadata in CLI print path

Merge pull request #1013 from MemPalace/fix/layer3-search-raw-none-gu…

6426669

…ard-1011 fix: guard Layer3.search_raw against None doc/meta from ChromaDB (#1011)

Merge pull request #1010 from MemPalace/fix/chromadb-1-5-4-py-3-13-co…

7af3bfa

…mpat-via-581 fix: upgrade chromadb to >=1.5.4 for Python 3.13/3.14 compatibility + fix 1.5.x queue-stall (closes #1006)

Merge pull request #1012 from MemPalace/docs/use-real-claude-projects…

8a130fc

…-path-996 docs: use real ~/.claude/projects/ path in first-run help and README (#996)

Merge pull request #1014 from MemPalace/refactor/rfc-002-sources-scaf…

66090b2

…folding refactor(sources): RFC 002 §9 scaffolding — BaseSourceAdapter, registry, PalaceContext

Merge pull request #990 from MemPalace/docs/rfc-source-adapter-plugin…

109d7f2

…-spec docs: RFC 002 — Source adapter plugin specification

Merge pull request #681 from jphein/fix/unicode-checkmark

62439e1

fix: replace Unicode checkmark with ASCII for Windows encoding (#535)

Merge pull request #1000 from jphein/fix/quarantine-stale-hnsw

caf503f

feat(backends): quarantine_stale_hnsw — recover from HNSW/sqlite drift (closes #823)

Merge pull request #1023 from jphein/pr/pid-file-guard

32ec74d

fix(hooks): PID file guard prevents stacking mine processes

igorls requested review from bensig and milla-jovovich as code owners April 19, 2026 19:56

Copilot AI review requested due to automatic review settings April 19, 2026 19:56

Copilot started reviewing on behalf of igorls April 19, 2026 19:56 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

igorls mentioned this pull request Apr 19, 2026

fix: address Copilot review on release/3.3.2 #1045

Merged

3 tasks

igorls added 2 commits April 20, 2026 15:16

Merge pull request #1045 from MemPalace/fix/copilot-review-release-3.3.2

04e11ae

fix: address Copilot review on release/3.3.2

igorls mentioned this pull request Apr 20, 2026

release: merge main into release/3.3.2 #1052

Merged

Merge pull request #1052 from MemPalace/merge/main-into-release-3.3.2

cf0477b

release: merge main into release/3.3.2

bensig approved these changes Apr 20, 2026

View reviewed changes

bensig merged commit 87102fb into main Apr 20, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: v3.3.2#1041

release: v3.3.2#1041
bensig merged 60 commits intomainfrom
release/3.3.2

igorls commented Apr 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

		['script', { async: '', src: 'https://www.googletagmanager.com/gtag/js?id=G-PPQE4Z7P1K' }],
		['script', {}, `window.dataLayer = window.dataLayer \|\| [];\nfunction gtag(){dataLayer.push(arguments);}\ngtag('js', new Date());\ngtag('config', 'G-PPQE4Z7P1K');`],

	print(f" ✗ Not a file or directory: {target}", file=sys.stderr)
	print(f" ERROR: Not a file or directory: {target}", file=sys.stderr)

	print(f" \u26a0 sweep failed on {f}: {exc}", file=sys.stderr)
	print(f" WARNING: sweep failed on {f}: {exc}", file=sys.stderr)

Conversation

igorls commented Apr 19, 2026

Version bumps

Changelog

Bug Fixes

Improvements (internal)

Documentation

Smoke test

Smoke-test side-note (not release-blocking)

Post-merge checklist

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants