Skip to content

fix(hooks): PID file guard prevents stacking mine processes#1023

Merged
bensig merged 3 commits intoMemPalace:developfrom
jphein:pr/pid-file-guard
Apr 19, 2026
Merged

fix(hooks): PID file guard prevents stacking mine processes#1023
bensig merged 3 commits intoMemPalace:developfrom
jphein:pr/pid-file-guard

Conversation

@jphein
Copy link
Copy Markdown
Collaborator

@jphein jphein commented Apr 19, 2026

Summary

Claude Code stop hooks fire every 15 user messages. Each fire calls _maybe_auto_ingest(), which spawns mempalace mine via subprocess.Popen. Popen returns immediately, so a long-running mine never blocks the next hook fire. Over a long session this stacks mine processes — 4 concurrent mines pinning ~770% CPU observed in production on my workstation before this fix.

Root cause

_maybe_auto_ingest has no dedup. Popen(...) returns before the child has finished; the next stop-hook fire has no way to see that a previous background mine is still running, so it spawns another. Each mine competes for the same ChromaDB writer and HNSW index — result is both high CPU and contention on the palace.

Fix

Two new helpers in mempalace/hooks_cli.py:

  • _mine_already_running() — reads $STATE_DIR/mine.pid, parses it, and calls os.kill(pid, 0). Signal 0 is the documented POSIX idiom for "does this PID exist, and may I signal it?" — no signal delivered. Swallows FileNotFoundError, ValueError, ProcessLookupError, and PermissionError and returns False — any of those means "no live mine from us."
  • _spawn_mine(cmd) — performs the old Popen call plus _MINE_PID_FILE.write_text(str(proc.pid)) immediately after. Centralizes the spawn so the PID-file write can't drift out of sync with the process.

_maybe_auto_ingest checks the guard up front; if True, it logs "Skipping auto-ingest: mine already running" and returns. Otherwise it delegates to _spawn_mine.

The PID file lives at $STATE_DIR/mine.pid (default ~/.mempalace/hook_state/mine.pid) next to the existing hook.log. No new directories.

Why os.kill(pid, 0) is safe

  • Signal 0 sends no signal — pure existence probe.
  • ProcessLookupError → PID is dead → guard returns False, new mine spawned.
  • PermissionError → PID belongs to another user → treated as False; can't be our mine.
  • PID reuse is a theoretical concern but not practical at the seconds-to-minutes timescales between hook fires. Palace is single-user in typical use.

Scope

  • Only touches the background mempalace mine spawn path in _maybe_auto_ingest. _mine_sync (precompact) is left alone — it's synchronous and already self-serializing.
  • No behavior change when no mine is running: identical spawn, just with a PID write afterward.
  • No new configuration flags or env vars.

Test coverage

5 new tests in tests/test_hooks_cli.py:

  • test_mine_already_running_no_file — False when mine.pid doesn't exist
  • test_mine_already_running_dead_pid — False when PID file contains 999999999
  • test_mine_already_running_live_pid — True when PID file contains os.getpid()
  • test_mine_already_running_corrupt_file — False on a non-integer PID file
  • test_maybe_auto_ingest_skips_when_mine_running — asserts subprocess.Popen is NOT called when _mine_already_running is patched to True

Three existing tests are patched to redirect _MINE_PID_FILE into tmp_path so the suite never writes to the real ~/.mempalace/hook_state/mine.pid.

pytest tests/test_hooks_cli.py -q
51 passed in 0.17s

Test plan

  • pytest tests/test_hooks_cli.py -q — all green (51 tests)
  • Fire the stop hook twice in rapid succession within the mine duration; confirm only one mempalace mine process appears in ps
  • grep "Skipping auto-ingest" ~/.mempalace/hook_state/hook.log — guard log line on the second fire
  • Wait for first mine to exit, fire hook again, confirm fresh mine spawns (guard doesn't stick)

Every stop hook fire spawned a new background `mempalace mine` via
subprocess.Popen with no dedup — 4 concurrent mines at ~770% CPU
observed in production. Add `_mine_already_running()` (reads
`hook_state/mine.pid`, uses `os.kill(pid, 0)` as an existence check)
and `_spawn_mine()` (writes the child PID to the lock file after
Popen returns). `_maybe_auto_ingest` bails early when the guard
reports True.

Tests: 4 new unit tests for `_mine_already_running` (no file, dead
PID, live PID using `os.getpid()`, corrupt file), 1 new test
covering the skip-when-running branch of `_maybe_auto_ingest`, and
existing spawn tests patched to redirect `_MINE_PID_FILE` into
tmp_path so they don't touch the real state dir.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Copilot AI review requested due to automatic review settings April 19, 2026 03:55
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
Both filed against upstream/develop today:
- MemPalace#1023 — PID file guard prevents stacking mine processes
- MemPalace#1024 — configurable chunk_size / chunk_overlap / min_chunk_size

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Prevents Claude Code stop-hook auto-ingest from spawning multiple concurrent mempalace mine processes by adding a PID-file guard.

Changes:

  • Add PID-file based “mine already running” detection and a centralized mine spawn helper.
  • Update _maybe_auto_ingest to skip spawning when an existing mine process is detected.
  • Add/adjust tests to cover PID guard behavior and ensure PID-file location is redirected in relevant unit tests.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
mempalace/hooks_cli.py Introduces mine.pid guard logic and routes _maybe_auto_ingest through _spawn_mine.
tests/test_hooks_cli.py Adds tests for PID-guard behavior and updates existing auto-ingest tests to avoid writing PID state outside temp dirs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/hooks_cli.py
def _mine_already_running() -> bool:
"""Return True if a background mine process from a previous hook fire is still alive."""
try:
pid = int(_MINE_PID_FILE.read_text().strip())
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_mine_already_running() should reject non-positive PIDs (e.g., 0, -1). On POSIX, os.kill(0, 0) targets the current process group and os.kill(-1, 0) checks “all processes”, which can incorrectly return success and permanently suppress auto-ingest until the PID file is removed. Add an explicit if pid <= 0: return False (or raise ValueError) after parsing.

Suggested change
pid = int(_MINE_PID_FILE.read_text().strip())
pid = int(_MINE_PID_FILE.read_text().strip())
if pid <= 0:
return False

Copilot uses AI. Check for mistakes.
Comment thread mempalace/hooks_cli.py Outdated
Comment on lines +153 to +160
_MINE_PID_FILE = STATE_DIR / "mine.pid"


def _mine_already_running() -> bool:
"""Return True if a background mine process from a previous hook fire is still alive."""
try:
pid = int(_MINE_PID_FILE.read_text().strip())
os.kill(pid, 0) # signal 0 = existence check, no actual signal sent
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_MINE_PID_FILE is computed from STATE_DIR at import time, so patching/changing STATE_DIR later won’t affect where the PID file is read/written. This already forces tests to patch both STATE_DIR and _MINE_PID_FILE. Consider deriving the PID path from STATE_DIR inside _mine_already_running()/_spawn_mine() (or via a small helper) to keep state-location overrides consistent.

Copilot uses AI. Check for mistakes.
Comment thread mempalace/hooks_cli.py
log_path = STATE_DIR / "hook.log"
with open(log_path, "a") as log_f:
proc = subprocess.Popen(cmd, stdout=log_f, stderr=log_f)
_MINE_PID_FILE.write_text(str(proc.pid))
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the subprocess spawns successfully but writing mine.pid fails (permissions/disk full), the guard won’t work and the error is silently swallowed by _maybe_auto_ingest’s broad OSError handler. Handle PID-file write failures explicitly (e.g., catch OSError around write_text and log a warning) so it’s diagnosable and you can choose whether to keep or terminate the spawned process.

Suggested change
_MINE_PID_FILE.write_text(str(proc.pid))
try:
_MINE_PID_FILE.write_text(str(proc.pid))
except OSError as e:
_log(f"Failed to write mine PID file {_MINE_PID_FILE}: {e}")
try:
proc.terminate()
try:
proc.wait(timeout=5)
except subprocess.TimeoutExpired:
proc.kill()
proc.wait(timeout=5)
except OSError as terminate_error:
_log(
f"Failed to terminate mine process {proc.pid} after PID-file write failure: "
f"{terminate_error}"
)

Copilot uses AI. Check for mistakes.
Comment thread tests/test_hooks_cli.py
Comment on lines 249 to +258
def test_maybe_auto_ingest_with_env(tmp_path):
"""With MEMPAL_DIR set to a valid directory, spawns subprocess."""
mempal_dir = tmp_path / "project"
mempal_dir.mkdir()
with patch.dict("os.environ", {"MEMPAL_DIR": str(mempal_dir)}):
with patch("mempalace.hooks_cli.STATE_DIR", tmp_path):
with patch("mempalace.hooks_cli.subprocess.Popen") as mock_popen:
_maybe_auto_ingest()
mock_popen.assert_called_once()
with patch("mempalace.hooks_cli._MINE_PID_FILE", tmp_path / "mine.pid"):
with patch("mempalace.hooks_cli.subprocess.Popen") as mock_popen:
_maybe_auto_ingest()
mock_popen.assert_called_once()
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated _maybe_auto_ingest tests verify that subprocess.Popen is called, but they don’t assert the new behavior that a PID is written to the PID file (the core of the guard). Add an assertion that mine.pid is created/updated with the spawned PID when Popen succeeds (mock the proc.pid).

Copilot uses AI. Check for mistakes.
…in OSError

On Windows, os.kill(bogus_pid, 0) raises OSError[WinError 87]
"The parameter is incorrect" — NOT ProcessLookupError. The old
except tuple missed it, so test_mine_already_running_dead_pid
failed on Windows CI.

Catching OSError covers ProcessLookupError + PermissionError +
FileNotFoundError on POSIX and WinError 87 on Windows. ValueError
still guards the int() parse.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
…ses plain OSError

On Windows, os.kill(bogus_pid, 0) raises OSError[WinError 87]
"The parameter is incorrect" — NOT ProcessLookupError. The old
except tuple missed it, so test_mine_already_running_dead_pid
failed on Windows CI.

Catching OSError covers ProcessLookupError + PermissionError +
FileNotFoundError on POSIX and WinError 87 on Windows. ValueError
still guards the int() parse.

Fix surfaced on upstream CI for PR MemPalace#1023 (mirror branch).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Real bug surfaced on CI for this PR. On POSIX, os.kill(pid, 0) is
the canonical no-op existence probe. On Windows, Python's os.kill
maps to TerminateProcess(handle, sig), which *terminates* the target
with exit code sig. os.kill(pid, 0) therefore kills the target with
exit code 0 — silently destroying our mine child (or, as happened
in test_mine_already_running_live_pid, the pytest process itself).

Fix: split into _pid_alive(pid) helper with a Windows branch using
ctypes.windll.kernel32.OpenProcess + GetExitCodeProcess.
PROCESS_QUERY_LIMITED_INFORMATION opens a handle only if the PID
exists; STILL_ACTIVE (259) distinguishes running from exited processes.

No new dependencies — stdlib ctypes.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@jphein jphein force-pushed the pr/pid-file-guard branch from 2094c5c to dfba247 Compare April 19, 2026 04:30
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
…NATES the target

Real bug surfaced on Windows CI for MemPalace#1023. On POSIX, os.kill(pid, 0)
is the canonical no-op existence probe. On Windows, Python's os.kill
maps to TerminateProcess(handle, sig), which *terminates* the target
with exit code sig. os.kill(pid, 0) therefore kills the target with
exit code 0 — silently destroying our mine child (or, as in the test,
the pytest process itself).

Fix: split into _pid_alive(pid) helper with a Windows branch using
ctypes.windll.kernel32.OpenProcess + GetExitCodeProcess.
PROCESS_QUERY_LIMITED_INFORMATION returns a handle only if the PID
exists; STILL_ACTIVE (259) distinguishes running from exited processes.

No new dependencies — stdlib ctypes.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@bensig
Copy link
Copy Markdown
Collaborator

bensig commented Apr 19, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

@bensig bensig merged commit 32ec74d into MemPalace:develop Apr 19, 2026
6 checks passed
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
…emPalace#1036

Overnight/morning:
- MemPalace#681, MemPalace#1000, MemPalace#1023 merged — moved from "Still ahead" to "Merged upstream (post-v3.3.1)"
- bensig reviewed MemPalace#659 (wing_ prefix + agent filter) and MemPalace#1021 (silent_guard
  default) — both addressed on their PR branches
- MemPalace#673 needed re-rebase after overnight develop merges; done
- MemPalace#1036 filed: paginate miner.status(), closes upstream MemPalace#802 and MemPalace#1015

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
Scanned all 233 open upstream PRs today against our open PRs and
fork-ahead / planned-work items. Findings merged into README:

- P2 (decay) and P3 Tier-0 (LLM rerank): both covered by MemPalace#1032
  (@zackchiutw, MERGEABLE, 2026-04-19 — Weibull decay + 4-stage
  rerank pipeline). Older simpler version at MemPalace#337. Dropped as
  fork work; watching MemPalace#1032.
- P7 (alternative storage): formally out of scope. RFC 001 MemPalace#743
  (@igorls) defines the plugin contract; four backend PRs already
  in flight (MemPalace#700, MemPalace#381 Qdrant; MemPalace#574, MemPalace#575 LanceDB). Fork consumes,
  does not rebuild.
- P0 (multi-label tags): still fork/upstream candidate. MemPalace#1033
  (@zackchiutw) ships adjacent privacy-tag + progressive disclosure
  but not the full multi-label scheme.
- Merged MemPalace#1023 section acknowledges complementary MemPalace#976 (felipetruman)
  which adds broader mine_global_lock() + HNSW num_threads pin.

Gives future-us a map so we don't re-file MemPalace#1036-style duplicates.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
Changes (all 7 approved items):
1. Tighten line-15 status line to keep only non-redundant bits
   (test count, Discussion MemPalace#1017, issues link)
2. Remove "Fork Changes / Headlines" subsection — duplicated the
   three differentiators already in the lead paragraph
3. Remove "Planned work / Done" — 5 bullets of PR status already
   in the "Open upstream PRs" table
4. Trim P2 "original design notes" — MemPalace#1032 is MERGEABLE; kept
   the prune CLI fork-opportunity note
5. Tighten MemPalace#1023 row in "Merged upstream" — moved the MemPalace#976 mapping
   detail to a single compact sentence
6. Collapse "Pulled in from upstream v3.3.1" from 6 bullets to
   one paragraph pointing at the release notes
7. Prune three oldest "Superseded by upstream" bullets (epsilon
   mtime, .jsonl, max_distance — all upstream-authored or shipped
   by upstream before the fork claimed them as contributions)
8. Drop P7 "Original fork-mode design notes" (flashcard/AAAK/
   diary modes) now that the section explicitly marks P7 as
   dropped fork work

Net: 345 → 316 lines. No unique info removed; anything cut is
either duplicated elsewhere in the same README or retrievable
from the cited PR/release link.
igorls added a commit that referenced this pull request Apr 19, 2026
Version bumps across pyproject.toml, mempalace/version.py, README badge,
uv.lock, and plugin manifests (.claude-plugin/*, .codex-plugin/*).

CHANGELOG aligned with main (post-3.3.1) and a new [3.3.2] section added
covering the 11 PRs merged on develop since v3.3.1 — silent-transcript-drop
fix + tandem sweeper (#998), None-metadata guards (#999, #1013),
chromadb ≥1.5.4 for Py 3.13/3.14 (#1010), Windows Unicode (#681),
HNSW quarantine recovery (#1000), PID stacking guard (#1023), doc-path
cleanup (#996, #1012), and RFC 001/002 internal scaffolding (#995, #1014, #990).
@igorls igorls mentioned this pull request Apr 19, 2026
8 tasks
jphein added a commit to jphein/mempalace that referenced this pull request Apr 21, 2026
Resolutions:
- `.claude-plugin/.mcp.json`, `plugin.json` — adopt upstream's `mempalace-mcp`
  console-script command (added via upstream MemPalace#340 for pipx/uv). Run
  `pip install -e .` in plugin venv after merge to install the entry point.
- `.claude-plugin/hooks/*.sh`, `hooks/*.sh` — adopt upstream's console-command
  resolution order (`mempalace` script → `python3 -m mempalace` → `python`).
  `MEMPAL_PYTHON` override still works inside `hooks_cli.py`.
- `mempalace/hooks_cli.py`, `tests/test_hooks_cli.py` — keep fork's
  `_mempalace_python()` helper (fork-ahead #4); upstream only had
  `sys.executable`, which loses MEMPAL_PYTHON override.
- `mempalace/miner.py` — keep fork's concurrent mining path (fork-ahead #1),
  apply upstream's unicode-`✓` → ASCII-`+` fix (MemPalace#681) to both paths.
- `mempalace/backends/chroma.py` — take upstream's refined `quarantine_stale_hnsw`
  docstring (it's the version merged via our own MemPalace#1000).

Brought in: 33 upstream commits including Belarusian/Chinese/German/Spanish/French
entity detection, console-script entry points, hook plugin-root space quoting,
and v3.3.2 tag (which contains our MemPalace#681/MemPalace#1000/MemPalace#1023).

Tests: 1096 passed, 106 deselected (benchmarks). Ruff clean.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 21, 2026
…#1023 merged, MemPalace#673 rebased

- Bump fork-ahead section header to "after v3.3.2"
- Strike #11 (quarantine_stale_hnsw) and MemPalace#18 (PID file guard) as
  merged-into-upstream-via-v3.3.2, keep entries for traceability
- Add v3.3.2-shipped items to "Merged into upstream (post-v3.3.1)"
- Rebuild PR table: 10 merged / 7 open / 7 closed; add MemPalace#1024 row,
  reclassify MemPalace#681/MemPalace#1000/MemPalace#1023 as merged, note MemPalace#673 rebased 2026-04-21
- Annotate MemPalace#661 status with the GitHub review-state machine caveat
  (CHANGES_REQUESTED persists until reviewer dismisses, not owed)
- Bump test count 1063 → 1096 post-merge
jphein added a commit to jphein/mempalace that referenced this pull request Apr 21, 2026
…unts

- Upstream released v3.3.2 on 2026-04-21 (our MemPalace#681/MemPalace#1000/MemPalace#1023)
- Drawer count 152,682 → 165,632; wings 23 → 28; tests 1063 → 1096
- MemPalace#673 now MERGEABLE after fresh rebase + squash to 1 commit
- MemPalace#661 status clarified: CHANGES_REQUESTED persists until reviewer
  dismisses, not an outstanding owe
- Merged-upstream section split out v3.3.2 release group
@jphein jphein deleted the pr/pid-file-guard branch April 25, 2026 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants