Skip to content

Stop hook spawns concurrent mempalace mine processes that survive across sessions and bypass PID guard #1212

@ttessarolo

Description

@ttessarolo

Summary

The Claude Code Stop hook (hooks/mempal-stop-hook.shmempalace hook run --hook stop) spawns mempalace mine subprocesses that:

  1. Run concurrently for the same source path, in spite of the PID-file guard added in fix(hooks): PID file guard prevents stacking mine processes #1023.
  2. Survive across Claude Code sessions as orphaned processes — they remain alive after the IDE that triggered them is closed and after the parent shell exits.
  3. Re-spawn within seconds of being killed, as long as the IDE is still running and the hook is still wired up — making manual cleanup ineffective without disabling the binary itself.

Combined with #344 (HNSW link_lists.bin unbounded growth, fixed in #1191) and the absence of a "max concurrent mines" cap, this can fill a 1.8 TB disk in under two hours of normal IDE use.

Environment

  • MemPalace: 3.3.3 (also reproducible on 3.3.2)
  • ChromaDB: 1.5.8
  • Claude Code: 2.1.119
  • Plugin: ~/.claude/plugins/cache/mempalace/mempalace/3.3.2 (hooks/hooks.json registers Stop and PreCompact)
  • OS: macOS 15 (Darwin 25.3.0), Apple Silicon
  • Source dir mined: ~/.claude/projects/-Users-...-recipe-as-graph (~684 MB of jsonl session transcripts)

Steps to Reproduce

  1. Use Claude Code with the MemPalace plugin enabled.
  2. Open a project that generates substantial transcript volume in ~/.claude/projects/<encoded-path>/.
  3. Let the Stop hook fire normally over a long session.
  4. Run pgrep -fl 'mempalace mine' periodically.

Observed Behavior

After ~1 hour of normal use:

$ pgrep -fl 'mempalace mine'
1568 python -m mempalace mine /Users/.../-Users-...-recipe-as-graph --mode convos --wing sessions
1569 python -m mempalace mine /Users/.../-Users-...-recipe-as-graph

Two mempalace mine processes against the same source path, both alive at the same time, both writing to the same palace. ~/.mempalace/hook_state/mine.pid contains the PID of one of them; the other started
anyway.

After kill -9 1568 1569:

$ pgrep -fl 'mempalace mine'
45810 python -m mempalace mine ... --mode convos --wing sessions
45811 python -m mempalace mine ...

New PIDs within seconds. The Stop hook re-fired and the PID guard accepted the new processes because the previous PIDs were now dead.

The only way to stop the loop without quitting the IDE was to rename the mempalace binary out of $PATH so the hook script's command -v mempalace check returns false:

mv ~/.local/bin/mempalace ~/.local/bin/mempalace.disabled
mv ~/.local/bin/mempalace-mcp ~/.local/bin/mempalace-mcp.disabled

Impact

Concurrent mines on the same source produce duplicate upsert() calls with identical drawer IDs against the same ChromaDB collection. With #344 not yet fixed (PR #1191 still open at time of writing), each
duplicate upsert grows link_lists.bin unboundedly:

-rw-r--r--  9_732_718_377_168 bytes  link_lists.bin    # 9.7 TB apparent size, ~1 TB actual disk

This filled an 1.8 TB disk to 100% in roughly two hours of normal IDE use. After deleting ~/.mempalace, the hook re-spawned mines that recreated the directory within seconds, requiring binary disablement to
break the loop.

Why the PID Guard (#1023) Doesn't Suffice

hook_state/mine.pid only protects against re-entry of the exact same process. It does not protect against:

- Two different mine invocations from the same hook script firing in parallel before either has written its PID file (TOCTOU on file create).
- A second mine invocation from a different mode (--mode convos vs default --mode projects) — the lock is keyed by file existence, not by source path tuple.
- Re-spawn after a hook fires while a previous mine is mid-shutdown but its PID file has already been removed.
- Hook-triggered mines stacking with user-triggered mines (e.g. mempalace mine in a terminal during a CC session).

Suggested Fixes

1. Lock by (source_path, mode, wing) tuple, not by global mine.pid. A hash of the tuple in the lock filename would let concurrent mines of different sources coexist while serialising same-source calls.
2. Use flock(2) (or fcntl.lockf) on the lock file instead of "PID file exists + PID alive" checks. POSIX advisory locks are released on process death by the kernel — no stale-PID windows, no TOCTOU.
3. Hard-cap the number of concurrent mempalace child processes globally (e.g. via a parent supervisor or a max-concurrency lock in hook run). One slow mine should not be allowed to silently stack three more
behind it.
4. Make the Stop hook a no-op fast path when a mine is already running for the relevant source. Today the hook unconditionally invokes mempalace hook run, which then decides whether to mine; pushing the "already
 running" check earlier in the hook would prevent the orphan-class entirely.
5. Reap orphans on hook entry. If the hook's mine.pid points to a process that's been alive for > N minutes (configurable, default 30 min), assume it's stuck and SIGTERM it before starting a new one.
Belt-and-braces with #2 above.

Severity

High. Combined with #344 (which is in the merge queue but not yet released) this caused a 1 TB disk fill in 2 hours on a workstation with no warning or error from MemPalace. The user-visible signal was macOS
reporting "Disk Almost Full". Even after #1191 ships, concurrent same-ID upserts will still degrade write throughput, fragment the segment metadata, and exercise the exact link_lists.bin growth path that #1191
mitigates rather than eliminates.

Related

- #344 — HNSW link_lists.bin unbounded growth (root cause that turns this concurrency bug into a disk-fill incident).
- #1191 — Open PR addressing #344. Does not address the concurrency issue.
- #1023 — Original PID file guard. Necessary but insufficient.
- #1208 / #1210 — Repair-time data loss case from the same incident class.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstorage

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions