Stop hook spawns concurrent `mempalace mine` processes that survive across sessions and bypass PID guard

  ## Summary

  The Claude Code Stop hook (`hooks/mempal-stop-hook.sh` → `mempalace hook run --hook stop`) spawns `mempalace mine` subprocesses that:

  1. Run **concurrently** for the same source path, in spite of the PID-file guard added in #1023.
  2. **Survive across Claude Code sessions** as orphaned processes — they remain alive after the IDE that triggered them is closed and after the parent shell exits.
  3. **Re-spawn within seconds** of being killed, as long as the IDE is still running and the hook is still wired up — making manual cleanup ineffective without disabling the binary itself.

  Combined with #344 (HNSW link_lists.bin unbounded growth, fixed in #1191) and the absence of a "max concurrent mines" cap, this can fill a 1.8 TB disk in under two hours of normal IDE use.

  ## Environment

  - MemPalace: **3.3.3** (also reproducible on 3.3.2)
  - ChromaDB: **1.5.8**
  - Claude Code: **2.1.119**
  - Plugin: `~/.claude/plugins/cache/mempalace/mempalace/3.3.2` (hooks/hooks.json registers `Stop` and `PreCompact`)
  - OS: macOS 15 (Darwin 25.3.0), Apple Silicon
  - Source dir mined: `~/.claude/projects/-Users-...-recipe-as-graph` (~684 MB of jsonl session transcripts)

  ## Steps to Reproduce

  1. Use Claude Code with the MemPalace plugin enabled.
  2. Open a project that generates substantial transcript volume in `~/.claude/projects/<encoded-path>/`.
  3. Let the Stop hook fire normally over a long session.
  4. Run `pgrep -fl 'mempalace mine'` periodically.

  ## Observed Behavior

  After ~1 hour of normal use:

  $ pgrep -fl 'mempalace mine'
  1568  python -m mempalace mine /Users/.../-Users-...-recipe-as-graph --mode convos --wing sessions
  1569  python -m mempalace mine /Users/.../-Users-...-recipe-as-graph

  Two `mempalace mine` processes against the same source path, both alive at the same time, both writing to the same palace. `~/.mempalace/hook_state/mine.pid` contains the PID of one of them; the other started
  anyway.

  After `kill -9 1568 1569`:

  $ pgrep -fl 'mempalace mine'
  45810  python -m mempalace mine ... --mode convos --wing sessions
  45811  python -m mempalace mine ...

  New PIDs within seconds. The Stop hook re-fired and the PID guard accepted the new processes because the previous PIDs were now dead.

  The only way to stop the loop without quitting the IDE was to rename the `mempalace` binary out of `$PATH` so the hook script's `command -v mempalace` check returns false:

  ```bash
  mv ~/.local/bin/mempalace ~/.local/bin/mempalace.disabled
  mv ~/.local/bin/mempalace-mcp ~/.local/bin/mempalace-mcp.disabled

  Impact

  Concurrent mines on the same source produce duplicate upsert() calls with identical drawer IDs against the same ChromaDB collection. With #344 not yet fixed (PR #1191 still open at time of writing), each
  duplicate upsert grows link_lists.bin unboundedly:

  -rw-r--r--  9_732_718_377_168 bytes  link_lists.bin    # 9.7 TB apparent size, ~1 TB actual disk

  This filled an 1.8 TB disk to 100% in roughly two hours of normal IDE use. After deleting ~/.mempalace, the hook re-spawned mines that recreated the directory within seconds, requiring binary disablement to
  break the loop.

  Why the PID Guard (#1023) Doesn't Suffice

  hook_state/mine.pid only protects against re-entry of the exact same process. It does not protect against:

  - Two different mine invocations from the same hook script firing in parallel before either has written its PID file (TOCTOU on file create).
  - A second mine invocation from a different mode (--mode convos vs default --mode projects) — the lock is keyed by file existence, not by source path tuple.
  - Re-spawn after a hook fires while a previous mine is mid-shutdown but its PID file has already been removed.
  - Hook-triggered mines stacking with user-triggered mines (e.g. mempalace mine in a terminal during a CC session).

  Suggested Fixes

  1. Lock by (source_path, mode, wing) tuple, not by global mine.pid. A hash of the tuple in the lock filename would let concurrent mines of different sources coexist while serialising same-source calls.
  2. Use flock(2) (or fcntl.lockf) on the lock file instead of "PID file exists + PID alive" checks. POSIX advisory locks are released on process death by the kernel — no stale-PID windows, no TOCTOU.
  3. Hard-cap the number of concurrent mempalace child processes globally (e.g. via a parent supervisor or a max-concurrency lock in hook run). One slow mine should not be allowed to silently stack three more
  behind it.
  4. Make the Stop hook a no-op fast path when a mine is already running for the relevant source. Today the hook unconditionally invokes mempalace hook run, which then decides whether to mine; pushing the "already
   running" check earlier in the hook would prevent the orphan-class entirely.
  5. Reap orphans on hook entry. If the hook's mine.pid points to a process that's been alive for > N minutes (configurable, default 30 min), assume it's stuck and SIGTERM it before starting a new one.
  Belt-and-braces with #2 above.

  Severity

  High. Combined with #344 (which is in the merge queue but not yet released) this caused a 1 TB disk fill in 2 hours on a workstation with no warning or error from MemPalace. The user-visible signal was macOS
  reporting "Disk Almost Full". Even after #1191 ships, concurrent same-ID upserts will still degrade write throughput, fragment the segment metadata, and exercise the exact link_lists.bin growth path that #1191
  mitigates rather than eliminates.

  Related

  - #344 — HNSW link_lists.bin unbounded growth (root cause that turns this concurrency bug into a disk-fill incident).
  - #1191 — Open PR addressing #344. Does not address the concurrency issue.
  - #1023 — Original PID file guard. Necessary but insufficient.
  - #1208 / #1210 — Repair-time data loss case from the same incident class.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop hook spawns concurrent `mempalace mine` processes that survive across sessions and bypass PID guard #1212

Summary

Environment

Steps to Reproduce

Observed Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stop hook spawns concurrent mempalace mine processes that survive across sessions and bypass PID guard #1212

Description

Summary

Environment

Steps to Reproduce

Observed Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Stop hook spawns concurrent `mempalace mine` processes that survive across sessions and bypass PID guard #1212