Skip to content

fix: handle Windows file-lock errors in test cleanup#6

Closed
claudlos wants to merge 2 commits intoMemPalace:mainfrom
claudlos:fix/windows-test-cleanup
Closed

fix: handle Windows file-lock errors in test cleanup#6
claudlos wants to merge 2 commits intoMemPalace:mainfrom
claudlos:fix/windows-test-cleanup

Conversation

@claudlos
Copy link
Copy Markdown

@claudlos claudlos commented Apr 7, 2026

What does this PR do?

Fixes PermissionError: [WinError 32] on Windows when running pytest tests/ -v. Two tests (test_convo_mining, test_project_mining) fail during temp directory cleanup because ChromaDB's PersistentClient holds file locks on HNSW data files (data_level0.bin, etc.) even after all assertions pass.

The fix:

  • Delete ChromaDB client/collection references and call gc.collect() before shutil.rmtree, releasing most file handles
  • Use shutil.rmtree(onerror=...) to gracefully handle any files still locked by the process, letting the OS clean temp dirs on its own schedule
  • Remove unused sys imports flagged by ruff

All actual test logic is unchanged — this only affects cleanup.

How to test

# On Windows (Python 3.9+)
pip install -e ".[dev]"
pytest tests/ -v

Before this fix: 2 FAILED, 7 passed (PermissionError on rmtree)
After this fix: 9 passed

Also works on Linux/macOS (gc.collect + onerror are no-ops when there are no file locks).

Checklist

  • Tests pass (python -m pytest tests/ -v)
  • No hardcoded paths
  • Linter passes (ruff check .)

ChromaDB's PersistentClient holds file locks on HNSW data files
(data_level0.bin etc.) even after assertions pass. On Windows this
causes shutil.rmtree to raise PermissionError during temp dir cleanup.

Fix:
- Delete ChromaDB client/collection references and gc.collect() before
  rmtree, releasing most handles.
- Use shutil.rmtree(onerror=...) to gracefully skip any files still
  locked, letting the OS clean temp dirs on its own schedule.
- Remove unused sys imports flagged by ruff.

Tested on Windows 11 + Python 3.13 — all 9 tests now pass clean.
@bensig
Copy link
Copy Markdown
Collaborator

bensig commented Apr 7, 2026

Hey, thanks for the PR! Tests pass which is great, but CI is failing on ruff format --check . — the two test files you touched need reformatting:

Would reformat: tests/test_convo_miner.py
Would reformat: tests/test_miner.py

Should just be a quick ruff format tests/ and push. Let me know if you have any questions.

- Add _force_rmtree() with gc.collect + onerror handler to all test files
  using tempfile.mkdtemp (test_convo_miner, test_miner, test_config)
- Fix ruff format violations on test_convo_miner.py and test_miner.py
- Fix test_env_override to save/restore old env var instead of deleting it
- Use Path(tempfile.gettempdir()) instead of hardcoded /tmp for
  convomem_cache benchmark cache dir
- All checks pass: ruff format --check + ruff check
IgorTavcar added a commit to IgorTavcar/mempalace that referenced this pull request Apr 7, 2026
Cherry-picked from upstream PR MemPalace#6 (claudlos).
- Fix PermissionError on Windows from ChromaDB HNSW file locks
- Add _force_rmtree() with gc.collect() + onerror handler
- Fix hardcoded /tmp paths to use tempfile.gettempdir()
- Tested on Windows 11 + Python 3.13

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
igorls added a commit to igorls/mempalace that referenced this pull request Apr 7, 2026
MCP tool_add_drawer:
- Make drawer_id content-based: hash full content instead of
  content[:100] + timestamp. Same content → same ID, eliminating
  TOCTOU race conditions
- Switch from col.add() to col.upsert() so re-filing with updated
  content updates the existing drawer

miner.add_drawer:
- Switch from collection.add() to collection.upsert() so re-mining
  a modified file updates instead of silently failing
- Remove the try/except catching 'already exists' — upsert handles
  this naturally

Findings: MemPalace#11 (HIGH — add ignores updates), MemPalace#6 (MEDIUM — TOCTOU),
          MemPalace#13 (MEDIUM — non-deterministic IDs)

Includes test infrastructure from PR MemPalace#131.
92 tests pass.
@bensig
Copy link
Copy Markdown
Collaborator

bensig commented Apr 7, 2026

This has been open since day 1 and conflicts with main now. The Windows file-lock approach is addressed in the test refactoring from other contributors. Closing — thanks for the early contribution!

@bensig bensig closed this Apr 7, 2026
igorls added a commit to igorls/mempalace that referenced this pull request Apr 7, 2026
MCP tool_add_drawer:
- Make drawer_id content-based: hash full content instead of
  content[:100] + timestamp. Same content → same ID, eliminating
  TOCTOU race conditions
- Switch from col.add() to col.upsert() so re-filing with updated
  content updates the existing drawer

miner.add_drawer:
- Switch from collection.add() to collection.upsert() so re-mining
  a modified file updates instead of silently failing
- Remove the try/except catching 'already exists' — upsert handles
  this naturally

Findings: MemPalace#11 (HIGH — add ignores updates), MemPalace#6 (MEDIUM — TOCTOU),
          MemPalace#13 (MEDIUM — non-deterministic IDs)

Includes test infrastructure from PR MemPalace#131.
92 tests pass.
@cktang88 cktang88 mentioned this pull request Apr 7, 2026
igorls added a commit to igorls/mempalace that referenced this pull request Apr 8, 2026
MCP tool_add_drawer:
- Make drawer_id content-based: hash full content instead of
  content[:100] + timestamp. Same content → same ID, eliminating
  TOCTOU race conditions
- Switch from col.add() to col.upsert() so re-filing with updated
  content updates the existing drawer

miner.add_drawer:
- Switch from collection.add() to collection.upsert() so re-mining
  a modified file updates instead of silently failing
- Remove the try/except catching 'already exists' — upsert handles
  this naturally

Findings: MemPalace#11 (HIGH — add ignores updates), MemPalace#6 (MEDIUM — TOCTOU),
          MemPalace#13 (MEDIUM — non-deterministic IDs)

Includes test infrastructure from PR MemPalace#131.
92 tests pass.
igorls added a commit to igorls/mempalace that referenced this pull request Apr 8, 2026
MCP tool_add_drawer:
- Make drawer_id content-based: hash full content instead of
  content[:100] + timestamp. Same content → same ID, eliminating
  TOCTOU race conditions
- Switch from col.add() to col.upsert() so re-filing with updated
  content updates the existing drawer

miner.add_drawer:
- Switch from collection.add() to collection.upsert() so re-mining
  a modified file updates instead of silently failing
- Remove the try/except catching 'already exists' — upsert handles
  this naturally

Findings: MemPalace#11 (HIGH — add ignores updates), MemPalace#6 (MEDIUM — TOCTOU),
          MemPalace#13 (MEDIUM — non-deterministic IDs)

Includes test infrastructure from PR MemPalace#131.
92 tests pass.
igorls added a commit to igorls/mempalace that referenced this pull request Apr 8, 2026
MCP tool_add_drawer:
- Make drawer_id content-based: hash full content instead of
  content[:100] + timestamp. Same content → same ID, eliminating
  TOCTOU race conditions
- Switch from col.add() to col.upsert() so re-filing with updated
  content updates the existing drawer

miner.add_drawer:
- Switch from collection.add() to collection.upsert() so re-mining
  a modified file updates instead of silently failing
- Remove the try/except catching 'already exists' — upsert handles
  this naturally

Findings: MemPalace#11 (HIGH — add ignores updates), MemPalace#6 (MEDIUM — TOCTOU),
          MemPalace#13 (MEDIUM — non-deterministic IDs)

Includes test infrastructure from PR MemPalace#131.
92 tests pass.
gnusam pushed a commit to gnusam/mempalace-pgsql that referenced this pull request Apr 9, 2026
Port of upstream a4149ab (by @igorls, findings MemPalace#6, MemPalace#11, MemPalace#13) adapted to
the PG+pgvector backend.

1. MCP/diary path ID is now hash(content), not content[:200]+timestamp.
   Same content → same deterministic ID. Eliminates TOCTOU races and
   stops identical content from piling up as distinct duplicate rows.
   Matches upstream's intent (findings MemPalace#6 TOCTOU, MemPalace#13 non-deterministic
   IDs).

2. INSERT ... ON CONFLICT (id) DO NOTHING → ON CONFLICT (id) DO UPDATE.
   Re-mining a modified file previously got the new content silently
   dropped because the slot ID matched an existing row (upstream
   finding MemPalace#11 HIGH — "add ignores updates"). True upsert: content,
   embedding, metadata, and filed_at are all refreshed.

3. add_drawer now always returns the drawer_id (both insert and update
   paths). The old "return None on conflict" signal implicitly encoded
   the stagnation bug and is no longer meaningful.

Updated tests/test_db.py::test_mine_drawer_reupsert_by_source to assert
the new upsert semantics instead of the stagnation assertion it replaced,
and added test_mcp_add_drawer_is_idempotent_on_same_content to cover the
deterministic-content-ID path. The mtime-based file_already_mined half
of bf88daa is in a follow-up commit.

Co-Authored-By: Igor Lins e Silva <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
brandonhon added a commit to brandonhon/mempalace that referenced this pull request Apr 10, 2026
…tion

Addresses PR MemPalace#548 review feedback about scan amplification on large
palaces. The previous implementation made up to six ChromaDB scans per
clean operation:

  1. count_drawers(drawers_col)       — scan MemPalace#1
  2. count_drawers(compressed_col)    — scan MemPalace#2
  3. delete_drawers(drawers_col)      — internal get(where=...) scan MemPalace#3
  4. delete_drawers(drawers_col)      — delete(where=...) scan MemPalace#4
  5. delete_drawers(compressed_col)   — scan MemPalace#5
  6. delete_drawers(compressed_col)   — scan MemPalace#6

Each call was doing its own metadata filter over the collection,
meaning a 100K-drawer palace paid the filter cost six times for a
single cleanup. This refactor drops it to exactly two scans — one per
collection — regardless of palace size.

Changes:

  * palace.py: introduce `find_drawer_ids(col, wing, room)` which
    returns the matching ID list in a single `get(where=..., include=[])`
    call. ChromaDB fetches only IDs — no documents, embeddings, or
    metadatas — so the scan is as cheap as ChromaDB can make it.
  * palace.py: `count_drawers` is now a thin wrapper around
    `find_drawer_ids`. The old standalone `delete_drawers` helper is
    removed because its count-then-delete pattern is exactly what we
    are trying to avoid.
  * cli.py::cmd_clean: call `find_drawer_ids` once per collection,
    reuse the returned ID lists for both the preview counts and the
    subsequent delete. Deletes go through `col.delete(ids=[...])`
    which is an O(n) primary-key delete, not another metadata scan.
    Empty lists are guarded to stay compatible with ChromaDB versions
    that reject `delete()` with no ids or where filter.

Why two scans and not one:

ChromaDB collections are independent — there is no cross-collection
query. `mempalace_drawers` and `mempalace_compressed` must each be
filtered separately. A single-scan variant would have to assume that
compressed IDs are a strict subset of drawer IDs and delete compressed
by the drawer IDs we already found, but `tool_delete_drawer` in
mcp_server.py does not cascade to compressed, so real palaces can
contain orphaned compressed rows whose drawer is already gone. Going
to one scan would silently leak those orphans. Two scans is the
minimum that preserves correctness.

Tests:

  * New `test_find_drawer_ids_*` unit tests cover wing-only, wing+room,
    and no-match cases.
  * New `test_find_drawer_ids_single_scan` monkey-patches
    `collection.get` to assert exactly one call.
  * New `test_clean_scans_each_collection_exactly_once` is a
    regression test that wraps `chromadb.PersistentClient` and counts
    `where`-filtered `get()` calls per collection during a full
    `cmd_clean` invocation, failing if either collection is scanned
    more than once.
  * The existing 15 CLI black-box tests stay identical — the behavior
    is unchanged, only the scan count dropped.

Full suite: 551 passed (was 549, +2 new perf regression tests).

Also in this commit: `-V` / `--version` flags, because every good app
needs a version flag and we somehow shipped three minor releases
without one. The installed version is now embedded in the `-h`
description line too, so `mempalace -h` answers "what am I running?"
without a separate invocation.
rusel95 added a commit to rusel95/mempalace that referenced this pull request Apr 15, 2026
Addresses review items MemPalace#5 and MemPalace#6 from @igorls:

1. Extract core sync logic from cmd_sync (~200 lines) into mempalace/sync.py
   as sync_palace(...) returning a SyncReport dataclass. cmd_sync is now a
   thin CLI wrapper. Makes sync callable from MCP tools, tests, and future
   change-detection features (PII Guard, KG sync).

2. Replace direct chromadb.PersistentClient calls in _force_clean and
   cmd_sync with ChromaBackend.get_collection. All storage access now goes
   through the backend abstraction. _force_clean is also now a thin wrapper
   around sync.force_clean.

3. Document mempalace_sync_status in website/reference/mcp-tools.md so it
   passes test_no_undocumented_tools.

Also ran ruff format with CI-pinned 0.4.x.

All 956 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
felipetruman added a commit to felipetruman/mempalace that referenced this pull request Apr 17, 2026
…ut, tests

Addresses the six Copilot review comments on the initial commit.

1) MemPalace#6 (critical) — mcp_server.py `_get_collection` bypassed ChromaBackend

   The MCP server creates its palace collection directly via
   `chromadb.PersistentClient.get_or_create_collection` in `_get_collection`,
   not through `ChromaBackend.get_collection`. That path was missing the
   `hnsw:num_threads=1` metadata, so the primary crash surface for MemPalace#974
   and MemPalace#965 was untouched by the original patch. Fixed by passing
   `hnsw:num_threads=1` at the mcp_server create site too. Documented
   in a code comment that the setting is only honored at creation
   time — existing palaces created before this fix still need a
   `mempalace nuke` + re-mine to gain the protection.

2) MemPalace#3 — mine_global_lock over-serialized mines across unrelated palaces

   Replaced the single global lock file `mine_global.lock` with a
   per-palace lock keyed by `sha256(os.path.abspath(palace_path))`
   (`mine_palace_<hash>.lock`). Mines against the same palace still
   collapse to a single runner (the correctness boundary), but mines
   against *different* palaces are now free to run in parallel.
   `mine_global_lock` is kept as a backward-compatible alias for
   `mine_palace_lock` so any external callers that imported the
   previous name keep working.

3) MemPalace#1 — hook_precompact swallowed OSError but not subprocess.TimeoutExpired

   `subprocess.run(..., timeout=60)` raises `TimeoutExpired` on slow
   palaces. The previous `except OSError` clause didn't catch it, so
   the hook could raise and fail to emit any JSON decision — leaving
   the harness without a block/passthrough signal. Fixed by catching
   `(OSError, subprocess.TimeoutExpired)` together and always falling
   through to the block decision so the hook reliably emits a response.

4) MemPalace#2 + MemPalace#4 — tests

   - tests/test_hooks_cli.py: added
     `test_precompact_first_two_attempts_block`,
     `test_precompact_passes_through_after_cap`, and
     `test_precompact_counter_is_per_session` to lock in the MemPalace#955
     deadlock fix.
   - tests/test_palace_locks.py (new): covers `mine_palace_lock`
     single-acquire, reuse-after-release, cross-process serialization
     on the same palace, non-interference across different palaces,
     path normalization, and the `mine_global_lock` back-compat alias.

5) MemPalace#5 — known limitation, documented but not auto-fixed

   Copilot suggested detecting collections missing `hnsw:num_threads=1`
   and calling `collection.modify(metadata=...)` to retrofit existing
   palaces. Verified against chromadb 1.5.7: `modify(metadata=...)`
   replaces metadata rather than merging, and re-passing
   `hnsw:space="cosine"` then raises `ValueError: Changing the
   distance function of a collection once it is created is not
   supported currently.` The HNSW runtime configuration
   (`configuration_json`) also does not expose `num_threads` in
   chromadb 1.5.x, so the flag appears to be read only at creation
   time. Rather than paper over the limitation with a best-effort
   `modify` that silently drops `hnsw:space`, documented in the
   mcp_server comment that pre-existing palaces need a
   `mempalace nuke` + re-mine to gain the protection. Fresh palaces
   are always protected.

Testing
- pytest tests/test_palace_locks.py tests/test_hooks_cli.py
  tests/test_backends.py tests/test_cli.py → **98 passed, 0 failed**.
- Runtime validation with two concurrent `mempalace mine` calls:
  - Different palaces → both complete in parallel ✓
  - Same palace     → one completes, the other exits with
    "another `mine` is already running against <palace> — exiting
    cleanly." ✓
felipetruman added a commit to felipetruman/mempalace that referenced this pull request Apr 17, 2026
…ut, tests

Addresses the six Copilot review comments on the initial commit.

1) MemPalace#6 (critical) — mcp_server.py `_get_collection` bypassed ChromaBackend

   The MCP server creates its palace collection directly via
   `chromadb.PersistentClient.get_or_create_collection` in `_get_collection`,
   not through `ChromaBackend.get_collection`. That path was missing the
   `hnsw:num_threads=1` metadata, so the primary crash surface for MemPalace#974
   and MemPalace#965 was untouched by the original patch. Fixed by passing
   `hnsw:num_threads=1` at the mcp_server create site too. Documented
   in a code comment that the setting is only honored at creation
   time — existing palaces created before this fix still need a
   `mempalace nuke` + re-mine to gain the protection.

2) MemPalace#3 — mine_global_lock over-serialized mines across unrelated palaces

   Replaced the single global lock file `mine_global.lock` with a
   per-palace lock keyed by `sha256(os.path.abspath(palace_path))`
   (`mine_palace_<hash>.lock`). Mines against the same palace still
   collapse to a single runner (the correctness boundary), but mines
   against *different* palaces are now free to run in parallel.
   `mine_global_lock` is kept as a backward-compatible alias for
   `mine_palace_lock` so any external callers that imported the
   previous name keep working.

3) MemPalace#1 — hook_precompact swallowed OSError but not subprocess.TimeoutExpired

   `subprocess.run(..., timeout=60)` raises `TimeoutExpired` on slow
   palaces. The previous `except OSError` clause didn't catch it, so
   the hook could raise and fail to emit any JSON decision — leaving
   the harness without a block/passthrough signal. Fixed by catching
   `(OSError, subprocess.TimeoutExpired)` together and always falling
   through to the block decision so the hook reliably emits a response.

4) MemPalace#2 + MemPalace#4 — tests

   - tests/test_hooks_cli.py: added
     `test_precompact_first_two_attempts_block`,
     `test_precompact_passes_through_after_cap`, and
     `test_precompact_counter_is_per_session` to lock in the MemPalace#955
     deadlock fix.
   - tests/test_palace_locks.py (new): covers `mine_palace_lock`
     single-acquire, reuse-after-release, cross-process serialization
     on the same palace, non-interference across different palaces,
     path normalization, and the `mine_global_lock` back-compat alias.

5) MemPalace#5 — known limitation, documented but not auto-fixed

   Copilot suggested detecting collections missing `hnsw:num_threads=1`
   and calling `collection.modify(metadata=...)` to retrofit existing
   palaces. Verified against chromadb 1.5.7: `modify(metadata=...)`
   replaces metadata rather than merging, and re-passing
   `hnsw:space="cosine"` then raises `ValueError: Changing the
   distance function of a collection once it is created is not
   supported currently.` The HNSW runtime configuration
   (`configuration_json`) also does not expose `num_threads` in
   chromadb 1.5.x, so the flag appears to be read only at creation
   time. Rather than paper over the limitation with a best-effort
   `modify` that silently drops `hnsw:space`, documented in the
   mcp_server comment that pre-existing palaces need a
   `mempalace nuke` + re-mine to gain the protection. Fresh palaces
   are always protected.

Testing
- pytest tests/test_palace_locks.py tests/test_hooks_cli.py
  tests/test_backends.py tests/test_cli.py → **98 passed, 0 failed**.
- Runtime validation with two concurrent `mempalace mine` calls:
  - Different palaces → both complete in parallel ✓
  - Same palace     → one completes, the other exits with
    "another `mine` is already running against <palace> — exiting
    cleanly." ✓
igorls pushed a commit to felipetruman/mempalace that referenced this pull request Apr 25, 2026
…ut, tests

Addresses the six Copilot review comments on the initial commit.

1) MemPalace#6 (critical) — mcp_server.py `_get_collection` bypassed ChromaBackend

   The MCP server creates its palace collection directly via
   `chromadb.PersistentClient.get_or_create_collection` in `_get_collection`,
   not through `ChromaBackend.get_collection`. That path was missing the
   `hnsw:num_threads=1` metadata, so the primary crash surface for MemPalace#974
   and MemPalace#965 was untouched by the original patch. Fixed by passing
   `hnsw:num_threads=1` at the mcp_server create site too. Documented
   in a code comment that the setting is only honored at creation
   time — existing palaces created before this fix still need a
   `mempalace nuke` + re-mine to gain the protection.

2) MemPalace#3 — mine_global_lock over-serialized mines across unrelated palaces

   Replaced the single global lock file `mine_global.lock` with a
   per-palace lock keyed by `sha256(os.path.abspath(palace_path))`
   (`mine_palace_<hash>.lock`). Mines against the same palace still
   collapse to a single runner (the correctness boundary), but mines
   against *different* palaces are now free to run in parallel.
   `mine_global_lock` is kept as a backward-compatible alias for
   `mine_palace_lock` so any external callers that imported the
   previous name keep working.

3) MemPalace#1 — hook_precompact swallowed OSError but not subprocess.TimeoutExpired

   `subprocess.run(..., timeout=60)` raises `TimeoutExpired` on slow
   palaces. The previous `except OSError` clause didn't catch it, so
   the hook could raise and fail to emit any JSON decision — leaving
   the harness without a block/passthrough signal. Fixed by catching
   `(OSError, subprocess.TimeoutExpired)` together and always falling
   through to the block decision so the hook reliably emits a response.

4) MemPalace#2 + MemPalace#4 — tests

   - tests/test_hooks_cli.py: added
     `test_precompact_first_two_attempts_block`,
     `test_precompact_passes_through_after_cap`, and
     `test_precompact_counter_is_per_session` to lock in the MemPalace#955
     deadlock fix.
   - tests/test_palace_locks.py (new): covers `mine_palace_lock`
     single-acquire, reuse-after-release, cross-process serialization
     on the same palace, non-interference across different palaces,
     path normalization, and the `mine_global_lock` back-compat alias.

5) MemPalace#5 — known limitation, documented but not auto-fixed

   Copilot suggested detecting collections missing `hnsw:num_threads=1`
   and calling `collection.modify(metadata=...)` to retrofit existing
   palaces. Verified against chromadb 1.5.7: `modify(metadata=...)`
   replaces metadata rather than merging, and re-passing
   `hnsw:space="cosine"` then raises `ValueError: Changing the
   distance function of a collection once it is created is not
   supported currently.` The HNSW runtime configuration
   (`configuration_json`) also does not expose `num_threads` in
   chromadb 1.5.x, so the flag appears to be read only at creation
   time. Rather than paper over the limitation with a best-effort
   `modify` that silently drops `hnsw:space`, documented in the
   mcp_server comment that pre-existing palaces need a
   `mempalace nuke` + re-mine to gain the protection. Fresh palaces
   are always protected.

Testing
- pytest tests/test_palace_locks.py tests/test_hooks_cli.py
  tests/test_backends.py tests/test_cli.py → **98 passed, 0 failed**.
- Runtime validation with two concurrent `mempalace mine` calls:
  - Different palaces → both complete in parallel ✓
  - Same palace     → one completes, the other exits with
    "another `mine` is already running against <palace> — exiting
    cleanly." ✓
lealvona pushed a commit to lealvona/mempalace that referenced this pull request Apr 29, 2026
…ut, tests

Addresses the six Copilot review comments on the initial commit.

1) MemPalace#6 (critical) — mcp_server.py `_get_collection` bypassed ChromaBackend

   The MCP server creates its palace collection directly via
   `chromadb.PersistentClient.get_or_create_collection` in `_get_collection`,
   not through `ChromaBackend.get_collection`. That path was missing the
   `hnsw:num_threads=1` metadata, so the primary crash surface for MemPalace#974
   and MemPalace#965 was untouched by the original patch. Fixed by passing
   `hnsw:num_threads=1` at the mcp_server create site too. Documented
   in a code comment that the setting is only honored at creation
   time — existing palaces created before this fix still need a
   `mempalace nuke` + re-mine to gain the protection.

2) MemPalace#3 — mine_global_lock over-serialized mines across unrelated palaces

   Replaced the single global lock file `mine_global.lock` with a
   per-palace lock keyed by `sha256(os.path.abspath(palace_path))`
   (`mine_palace_<hash>.lock`). Mines against the same palace still
   collapse to a single runner (the correctness boundary), but mines
   against *different* palaces are now free to run in parallel.
   `mine_global_lock` is kept as a backward-compatible alias for
   `mine_palace_lock` so any external callers that imported the
   previous name keep working.

3) MemPalace#1 — hook_precompact swallowed OSError but not subprocess.TimeoutExpired

   `subprocess.run(..., timeout=60)` raises `TimeoutExpired` on slow
   palaces. The previous `except OSError` clause didn't catch it, so
   the hook could raise and fail to emit any JSON decision — leaving
   the harness without a block/passthrough signal. Fixed by catching
   `(OSError, subprocess.TimeoutExpired)` together and always falling
   through to the block decision so the hook reliably emits a response.

4) MemPalace#2 + MemPalace#4 — tests

   - tests/test_hooks_cli.py: added
     `test_precompact_first_two_attempts_block`,
     `test_precompact_passes_through_after_cap`, and
     `test_precompact_counter_is_per_session` to lock in the MemPalace#955
     deadlock fix.
   - tests/test_palace_locks.py (new): covers `mine_palace_lock`
     single-acquire, reuse-after-release, cross-process serialization
     on the same palace, non-interference across different palaces,
     path normalization, and the `mine_global_lock` back-compat alias.

5) MemPalace#5 — known limitation, documented but not auto-fixed

   Copilot suggested detecting collections missing `hnsw:num_threads=1`
   and calling `collection.modify(metadata=...)` to retrofit existing
   palaces. Verified against chromadb 1.5.7: `modify(metadata=...)`
   replaces metadata rather than merging, and re-passing
   `hnsw:space="cosine"` then raises `ValueError: Changing the
   distance function of a collection once it is created is not
   supported currently.` The HNSW runtime configuration
   (`configuration_json`) also does not expose `num_threads` in
   chromadb 1.5.x, so the flag appears to be read only at creation
   time. Rather than paper over the limitation with a best-effort
   `modify` that silently drops `hnsw:space`, documented in the
   mcp_server comment that pre-existing palaces need a
   `mempalace nuke` + re-mine to gain the protection. Fresh palaces
   are always protected.

Testing
- pytest tests/test_palace_locks.py tests/test_hooks_cli.py
  tests/test_backends.py tests/test_cli.py → **98 passed, 0 failed**.
- Runtime validation with two concurrent `mempalace mine` calls:
  - Different palaces → both complete in parallel ✓
  - Same palace     → one completes, the other exits with
    "another `mine` is already running against <palace> — exiting
    cleanly." ✓
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants