fix(hooks): PID file guard prevents stacking mine processes#1023
fix(hooks): PID file guard prevents stacking mine processes#1023bensig merged 3 commits intoMemPalace:developfrom
Conversation
Every stop hook fire spawned a new background `mempalace mine` via subprocess.Popen with no dedup — 4 concurrent mines at ~770% CPU observed in production. Add `_mine_already_running()` (reads `hook_state/mine.pid`, uses `os.kill(pid, 0)` as an existence check) and `_spawn_mine()` (writes the child PID to the lock file after Popen returns). `_maybe_auto_ingest` bails early when the guard reports True. Tests: 4 new unit tests for `_mine_already_running` (no file, dead PID, live PID using `os.getpid()`, corrupt file), 1 new test covering the skip-when-running branch of `_maybe_auto_ingest`, and existing spawn tests patched to redirect `_MINE_PID_FILE` into tmp_path so they don't touch the real state dir. Co-Authored-By: Claude Opus 4.7 <[email protected]>
Both filed against upstream/develop today: - MemPalace#1023 — PID file guard prevents stacking mine processes - MemPalace#1024 — configurable chunk_size / chunk_overlap / min_chunk_size Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
There was a problem hiding this comment.
Pull request overview
Prevents Claude Code stop-hook auto-ingest from spawning multiple concurrent mempalace mine processes by adding a PID-file guard.
Changes:
- Add PID-file based “mine already running” detection and a centralized mine spawn helper.
- Update
_maybe_auto_ingestto skip spawning when an existing mine process is detected. - Add/adjust tests to cover PID guard behavior and ensure PID-file location is redirected in relevant unit tests.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
mempalace/hooks_cli.py |
Introduces mine.pid guard logic and routes _maybe_auto_ingest through _spawn_mine. |
tests/test_hooks_cli.py |
Adds tests for PID-guard behavior and updates existing auto-ingest tests to avoid writing PID state outside temp dirs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _mine_already_running() -> bool: | ||
| """Return True if a background mine process from a previous hook fire is still alive.""" | ||
| try: | ||
| pid = int(_MINE_PID_FILE.read_text().strip()) |
There was a problem hiding this comment.
_mine_already_running() should reject non-positive PIDs (e.g., 0, -1). On POSIX, os.kill(0, 0) targets the current process group and os.kill(-1, 0) checks “all processes”, which can incorrectly return success and permanently suppress auto-ingest until the PID file is removed. Add an explicit if pid <= 0: return False (or raise ValueError) after parsing.
| pid = int(_MINE_PID_FILE.read_text().strip()) | |
| pid = int(_MINE_PID_FILE.read_text().strip()) | |
| if pid <= 0: | |
| return False |
| _MINE_PID_FILE = STATE_DIR / "mine.pid" | ||
|
|
||
|
|
||
| def _mine_already_running() -> bool: | ||
| """Return True if a background mine process from a previous hook fire is still alive.""" | ||
| try: | ||
| pid = int(_MINE_PID_FILE.read_text().strip()) | ||
| os.kill(pid, 0) # signal 0 = existence check, no actual signal sent |
There was a problem hiding this comment.
_MINE_PID_FILE is computed from STATE_DIR at import time, so patching/changing STATE_DIR later won’t affect where the PID file is read/written. This already forces tests to patch both STATE_DIR and _MINE_PID_FILE. Consider deriving the PID path from STATE_DIR inside _mine_already_running()/_spawn_mine() (or via a small helper) to keep state-location overrides consistent.
| log_path = STATE_DIR / "hook.log" | ||
| with open(log_path, "a") as log_f: | ||
| proc = subprocess.Popen(cmd, stdout=log_f, stderr=log_f) | ||
| _MINE_PID_FILE.write_text(str(proc.pid)) |
There was a problem hiding this comment.
If the subprocess spawns successfully but writing mine.pid fails (permissions/disk full), the guard won’t work and the error is silently swallowed by _maybe_auto_ingest’s broad OSError handler. Handle PID-file write failures explicitly (e.g., catch OSError around write_text and log a warning) so it’s diagnosable and you can choose whether to keep or terminate the spawned process.
| _MINE_PID_FILE.write_text(str(proc.pid)) | |
| try: | |
| _MINE_PID_FILE.write_text(str(proc.pid)) | |
| except OSError as e: | |
| _log(f"Failed to write mine PID file {_MINE_PID_FILE}: {e}") | |
| try: | |
| proc.terminate() | |
| try: | |
| proc.wait(timeout=5) | |
| except subprocess.TimeoutExpired: | |
| proc.kill() | |
| proc.wait(timeout=5) | |
| except OSError as terminate_error: | |
| _log( | |
| f"Failed to terminate mine process {proc.pid} after PID-file write failure: " | |
| f"{terminate_error}" | |
| ) |
| def test_maybe_auto_ingest_with_env(tmp_path): | ||
| """With MEMPAL_DIR set to a valid directory, spawns subprocess.""" | ||
| mempal_dir = tmp_path / "project" | ||
| mempal_dir.mkdir() | ||
| with patch.dict("os.environ", {"MEMPAL_DIR": str(mempal_dir)}): | ||
| with patch("mempalace.hooks_cli.STATE_DIR", tmp_path): | ||
| with patch("mempalace.hooks_cli.subprocess.Popen") as mock_popen: | ||
| _maybe_auto_ingest() | ||
| mock_popen.assert_called_once() | ||
| with patch("mempalace.hooks_cli._MINE_PID_FILE", tmp_path / "mine.pid"): | ||
| with patch("mempalace.hooks_cli.subprocess.Popen") as mock_popen: | ||
| _maybe_auto_ingest() | ||
| mock_popen.assert_called_once() |
There was a problem hiding this comment.
The updated _maybe_auto_ingest tests verify that subprocess.Popen is called, but they don’t assert the new behavior that a PID is written to the PID file (the core of the guard). Add an assertion that mine.pid is created/updated with the spawned PID when Popen succeeds (mock the proc.pid).
…in OSError On Windows, os.kill(bogus_pid, 0) raises OSError[WinError 87] "The parameter is incorrect" — NOT ProcessLookupError. The old except tuple missed it, so test_mine_already_running_dead_pid failed on Windows CI. Catching OSError covers ProcessLookupError + PermissionError + FileNotFoundError on POSIX and WinError 87 on Windows. ValueError still guards the int() parse. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ses plain OSError On Windows, os.kill(bogus_pid, 0) raises OSError[WinError 87] "The parameter is incorrect" — NOT ProcessLookupError. The old except tuple missed it, so test_mine_already_running_dead_pid failed on Windows CI. Catching OSError covers ProcessLookupError + PermissionError + FileNotFoundError on POSIX and WinError 87 on Windows. ValueError still guards the int() parse. Fix surfaced on upstream CI for PR MemPalace#1023 (mirror branch). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Real bug surfaced on CI for this PR. On POSIX, os.kill(pid, 0) is the canonical no-op existence probe. On Windows, Python's os.kill maps to TerminateProcess(handle, sig), which *terminates* the target with exit code sig. os.kill(pid, 0) therefore kills the target with exit code 0 — silently destroying our mine child (or, as happened in test_mine_already_running_live_pid, the pytest process itself). Fix: split into _pid_alive(pid) helper with a Windows branch using ctypes.windll.kernel32.OpenProcess + GetExitCodeProcess. PROCESS_QUERY_LIMITED_INFORMATION opens a handle only if the PID exists; STILL_ACTIVE (259) distinguishes running from exited processes. No new dependencies — stdlib ctypes. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
2094c5c to
dfba247
Compare
…NATES the target Real bug surfaced on Windows CI for MemPalace#1023. On POSIX, os.kill(pid, 0) is the canonical no-op existence probe. On Windows, Python's os.kill maps to TerminateProcess(handle, sig), which *terminates* the target with exit code sig. os.kill(pid, 0) therefore kills the target with exit code 0 — silently destroying our mine child (or, as in the test, the pytest process itself). Fix: split into _pid_alive(pid) helper with a Windows branch using ctypes.windll.kernel32.OpenProcess + GetExitCodeProcess. PROCESS_QUERY_LIMITED_INFORMATION returns a handle only if the PID exists; STILL_ACTIVE (259) distinguishes running from exited processes. No new dependencies — stdlib ctypes. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code |
…emPalace#1036 Overnight/morning: - MemPalace#681, MemPalace#1000, MemPalace#1023 merged — moved from "Still ahead" to "Merged upstream (post-v3.3.1)" - bensig reviewed MemPalace#659 (wing_ prefix + agent filter) and MemPalace#1021 (silent_guard default) — both addressed on their PR branches - MemPalace#673 needed re-rebase after overnight develop merges; done - MemPalace#1036 filed: paginate miner.status(), closes upstream MemPalace#802 and MemPalace#1015 Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Scanned all 233 open upstream PRs today against our open PRs and fork-ahead / planned-work items. Findings merged into README: - P2 (decay) and P3 Tier-0 (LLM rerank): both covered by MemPalace#1032 (@zackchiutw, MERGEABLE, 2026-04-19 — Weibull decay + 4-stage rerank pipeline). Older simpler version at MemPalace#337. Dropped as fork work; watching MemPalace#1032. - P7 (alternative storage): formally out of scope. RFC 001 MemPalace#743 (@igorls) defines the plugin contract; four backend PRs already in flight (MemPalace#700, MemPalace#381 Qdrant; MemPalace#574, MemPalace#575 LanceDB). Fork consumes, does not rebuild. - P0 (multi-label tags): still fork/upstream candidate. MemPalace#1033 (@zackchiutw) ships adjacent privacy-tag + progressive disclosure but not the full multi-label scheme. - Merged MemPalace#1023 section acknowledges complementary MemPalace#976 (felipetruman) which adds broader mine_global_lock() + HNSW num_threads pin. Gives future-us a map so we don't re-file MemPalace#1036-style duplicates.
Changes (all 7 approved items): 1. Tighten line-15 status line to keep only non-redundant bits (test count, Discussion MemPalace#1017, issues link) 2. Remove "Fork Changes / Headlines" subsection — duplicated the three differentiators already in the lead paragraph 3. Remove "Planned work / Done" — 5 bullets of PR status already in the "Open upstream PRs" table 4. Trim P2 "original design notes" — MemPalace#1032 is MERGEABLE; kept the prune CLI fork-opportunity note 5. Tighten MemPalace#1023 row in "Merged upstream" — moved the MemPalace#976 mapping detail to a single compact sentence 6. Collapse "Pulled in from upstream v3.3.1" from 6 bullets to one paragraph pointing at the release notes 7. Prune three oldest "Superseded by upstream" bullets (epsilon mtime, .jsonl, max_distance — all upstream-authored or shipped by upstream before the fork claimed them as contributions) 8. Drop P7 "Original fork-mode design notes" (flashcard/AAAK/ diary modes) now that the section explicitly marks P7 as dropped fork work Net: 345 → 316 lines. No unique info removed; anything cut is either duplicated elsewhere in the same README or retrievable from the cited PR/release link.
Version bumps across pyproject.toml, mempalace/version.py, README badge, uv.lock, and plugin manifests (.claude-plugin/*, .codex-plugin/*). CHANGELOG aligned with main (post-3.3.1) and a new [3.3.2] section added covering the 11 PRs merged on develop since v3.3.1 — silent-transcript-drop fix + tandem sweeper (#998), None-metadata guards (#999, #1013), chromadb ≥1.5.4 for Py 3.13/3.14 (#1010), Windows Unicode (#681), HNSW quarantine recovery (#1000), PID stacking guard (#1023), doc-path cleanup (#996, #1012), and RFC 001/002 internal scaffolding (#995, #1014, #990).
Resolutions: - `.claude-plugin/.mcp.json`, `plugin.json` — adopt upstream's `mempalace-mcp` console-script command (added via upstream MemPalace#340 for pipx/uv). Run `pip install -e .` in plugin venv after merge to install the entry point. - `.claude-plugin/hooks/*.sh`, `hooks/*.sh` — adopt upstream's console-command resolution order (`mempalace` script → `python3 -m mempalace` → `python`). `MEMPAL_PYTHON` override still works inside `hooks_cli.py`. - `mempalace/hooks_cli.py`, `tests/test_hooks_cli.py` — keep fork's `_mempalace_python()` helper (fork-ahead #4); upstream only had `sys.executable`, which loses MEMPAL_PYTHON override. - `mempalace/miner.py` — keep fork's concurrent mining path (fork-ahead #1), apply upstream's unicode-`✓` → ASCII-`+` fix (MemPalace#681) to both paths. - `mempalace/backends/chroma.py` — take upstream's refined `quarantine_stale_hnsw` docstring (it's the version merged via our own MemPalace#1000). Brought in: 33 upstream commits including Belarusian/Chinese/German/Spanish/French entity detection, console-script entry points, hook plugin-root space quoting, and v3.3.2 tag (which contains our MemPalace#681/MemPalace#1000/MemPalace#1023). Tests: 1096 passed, 106 deselected (benchmarks). Ruff clean.
…#1023 merged, MemPalace#673 rebased - Bump fork-ahead section header to "after v3.3.2" - Strike #11 (quarantine_stale_hnsw) and MemPalace#18 (PID file guard) as merged-into-upstream-via-v3.3.2, keep entries for traceability - Add v3.3.2-shipped items to "Merged into upstream (post-v3.3.1)" - Rebuild PR table: 10 merged / 7 open / 7 closed; add MemPalace#1024 row, reclassify MemPalace#681/MemPalace#1000/MemPalace#1023 as merged, note MemPalace#673 rebased 2026-04-21 - Annotate MemPalace#661 status with the GitHub review-state machine caveat (CHANGES_REQUESTED persists until reviewer dismisses, not owed) - Bump test count 1063 → 1096 post-merge
…unts - Upstream released v3.3.2 on 2026-04-21 (our MemPalace#681/MemPalace#1000/MemPalace#1023) - Drawer count 152,682 → 165,632; wings 23 → 28; tests 1063 → 1096 - MemPalace#673 now MERGEABLE after fresh rebase + squash to 1 commit - MemPalace#661 status clarified: CHANGES_REQUESTED persists until reviewer dismisses, not an outstanding owe - Merged-upstream section split out v3.3.2 release group
Summary
Claude Code stop hooks fire every 15 user messages. Each fire calls
_maybe_auto_ingest(), which spawnsmempalace mineviasubprocess.Popen.Popenreturns immediately, so a long-running mine never blocks the next hook fire. Over a long session this stacks mine processes — 4 concurrent mines pinning ~770% CPU observed in production on my workstation before this fix.Root cause
_maybe_auto_ingesthas no dedup.Popen(...)returns before the child has finished; the next stop-hook fire has no way to see that a previous background mine is still running, so it spawns another. Each mine competes for the same ChromaDB writer and HNSW index — result is both high CPU and contention on the palace.Fix
Two new helpers in
mempalace/hooks_cli.py:_mine_already_running()— reads$STATE_DIR/mine.pid, parses it, and callsos.kill(pid, 0). Signal 0 is the documented POSIX idiom for "does this PID exist, and may I signal it?" — no signal delivered. SwallowsFileNotFoundError,ValueError,ProcessLookupError, andPermissionErrorand returnsFalse— any of those means "no live mine from us."_spawn_mine(cmd)— performs the oldPopencall plus_MINE_PID_FILE.write_text(str(proc.pid))immediately after. Centralizes the spawn so the PID-file write can't drift out of sync with the process._maybe_auto_ingestchecks the guard up front; if True, it logs"Skipping auto-ingest: mine already running"and returns. Otherwise it delegates to_spawn_mine.The PID file lives at
$STATE_DIR/mine.pid(default~/.mempalace/hook_state/mine.pid) next to the existinghook.log. No new directories.Why
os.kill(pid, 0)is safe0sends no signal — pure existence probe.ProcessLookupError→ PID is dead → guard returnsFalse, new mine spawned.PermissionError→ PID belongs to another user → treated asFalse; can't be our mine.Scope
mempalace minespawn path in_maybe_auto_ingest._mine_sync(precompact) is left alone — it's synchronous and already self-serializing.Test coverage
5 new tests in
tests/test_hooks_cli.py:test_mine_already_running_no_file— False whenmine.piddoesn't existtest_mine_already_running_dead_pid— False when PID file contains999999999test_mine_already_running_live_pid— True when PID file containsos.getpid()test_mine_already_running_corrupt_file— False on a non-integer PID filetest_maybe_auto_ingest_skips_when_mine_running— assertssubprocess.Popenis NOT called when_mine_already_runningis patched to TrueThree existing tests are patched to redirect
_MINE_PID_FILEintotmp_pathso the suite never writes to the real~/.mempalace/hook_state/mine.pid.Test plan
pytest tests/test_hooks_cli.py -q— all green (51 tests)mempalace mineprocess appears inpsgrep "Skipping auto-ingest" ~/.mempalace/hook_state/hook.log— guard log line on the second fire