Skip to content

fix(plugin): filter PostToolUse[Write] indexing by extension and path#536

Merged
memtomem merged 2 commits intomainfrom
feat/postool-monorepo-guard
Apr 29, 2026
Merged

fix(plugin): filter PostToolUse[Write] indexing by extension and path#536
memtomem merged 2 commits intomainfrom
feat/postool-monorepo-guard

Conversation

@memtomem
Copy link
Copy Markdown
Owner

Summary

The plugin's PostToolUse[Write] hook was indexing every Write
unconditionally, which fanned out to node_modules/, dist/,
__pycache__/, lock files, binaries, and images in monorepo
checkouts. Embedding-cost amplifier + search-result noise that scaled
with build-artifact churn instead of source-file churn.

This PR replaces the inline command with an extension allowlist
(md, py, ts/tsx, js/jsx, go, rs, rb, java, kt,
swift, c/cpp/h/hpp, sh, toml, yaml/yml, json)
and a path blocklist (node_modules, dist, build, target,
.next, .nuxt, __pycache__, .git, .venv/venv, coverage,
.cache) expressed as inline case statements in
packages/memtomem-claude-plugin/hooks/hooks.json. No external
wrapper script — hooks.json stays self-contained.

docs/guides/integrations/claude-code.md Hooks Automation snippet is
synced verbatim with the plugin file (drift prevention). Caveats
section gains two bullets covering the new allowlist/blocklist
defaults and the remaining debounce gap (rapid consecutive writes
can still re-index the same file; mm index --debounce-window is
the planned follow-up).

Other plugin hooks (UserPromptSubmit, PostToolUse activity log,
Stop) are unchanged.

Why not also add PreCompact / SessionStart hooks?

Both gaps were considered together with this one and intentionally
deferred to separate RFCs because they need new CLI primitives:

  • SessionStart hook would call mm session start, but that
    command is non-idempotent (session_cmd.py:110 overwrites
    ~/.memtomem/.current_session with a fresh UUID every call). A
    hook firing on every Claude Code session would orphan the previous
    session whenever Stop failed (crash, kill, timeout). Safe
    integration needs mm session start --idempotent (or
    --resume-if-active) first.
  • PreCompact hook is most valuable when it can snapshot working
    context, but the existing CLI surface only offers mm activity log
    (marker only). A meaningful checkpoint needs both a new
    mm session checkpoint CLI and a clear story for what content the
    hook payload can actually capture.

PostToolUse hardening has zero CLI dependency and ships standalone.

Test plan

  • python -m json.tool packages/memtomem-claude-plugin/hooks/hooks.json — JSON valid
  • pytest packages/memtomem/tests/test_docs_guards.py packages/memtomem/tests/test_config_overrides.py -x -q — 37 passed
  • jq round-trip: command field in docs snippet is byte-identical to hooks.json
  • Reviewer: visual diff of CHANGELOG ### Changed entry under [Unreleased]
  • Reviewer: confirm allowlist/blocklist patterns make sense for your stack — they are inline case statements, easy to extend per-project

Out of scope (tracked separately)

  • RFC: hooks-friendly session CLI primitives (mm session start --idempotent, etc.) — prerequisite for SessionStart hook
  • RFC: working-context capture for PreCompact (mm session checkpoint design + payload story)
  • mm index --debounce-window CLI option — debounce gap follow-up
  • Plugin v0.1.0 → marketplace publish — separate RFC

🤖 Generated with Claude Code

pandas-studio and others added 2 commits April 29, 2026 12:28
The plugin's `PostToolUse[Write]` hook in
`packages/memtomem-claude-plugin/hooks/hooks.json` ran
`mm index "${tool_input.file_path}"` unconditionally on every Write,
fanning out to `node_modules/`, `dist/`, `__pycache__/`, lock files,
binaries, and images in monorepo checkouts. Result: embedding-cost
amplification and search-result noise that scaled with build-artifact
churn rather than source-file churn.

Replace the inline command with an extension allowlist (`md`, `py`,
`ts`/`tsx`, `js`/`jsx`, `go`, `rs`, `rb`, `java`, `kt`, `swift`,
`c`/`cpp`/`h`/`hpp`, `sh`, `toml`, `yaml`/`yml`, `json`) and a path
blocklist (`node_modules`, `dist`, `build`, `target`, `.next`,
`.nuxt`, `__pycache__`, `.git`, `.venv`/`venv`, `coverage`, `.cache`)
expressed as inline `case` statements — no external wrapper script,
hooks.json stays self-contained. Adjust the patterns inline for
project-specific needs.

`docs/guides/integrations/claude-code.md` Hooks Automation snippet
synced verbatim with the plugin file (drift-prevention; the snippet
the docs render is the same JSON the plugin ships). Caveats section
gains two bullets: one documenting the allowlist/blocklist defaults,
one acknowledging the debounce gap that remains — rapid consecutive
writes still re-index the same file. Native `mm index
--debounce-window` is the planned follow-up, tracked separately.

Other plugin hooks (`UserPromptSubmit`, `PostToolUse activity log`,
`Stop`) are unchanged.

Verification: `pytest packages/memtomem/tests/test_docs_guards.py
packages/memtomem/tests/test_config_overrides.py` (37 passed); JSON
validity (`python -m json.tool`); `jq` round-trip confirms the docs
snippet `command` field is byte-identical to the plugin hooks.json.

Co-Authored-By: Claude <[email protected]>
…s polish

Address PR #536 review:

- **Parity test** (`TestPluginHooksDocsParity`) — the docs snippet in
  `claude-code.md` Hooks Automation Setup and the plugin's shipped
  `hooks.json` now lock against silent drift. Test extracts the
  fenced ``json`` block right after the
  "Add the following to ~/.claude/settings.json:" anchor, parses it,
  and asserts that every (event, matcher) entry the docs render has
  a byte-identical `command` string in the plugin file. Same shape
  as the existing `test_mcp_clients_matches_reference_footnote`
  cross-file invariant, scoped per the docstring at the top of the
  file. Two test methods so a missing entry and a drifted command
  surface as distinct failures.
- **Defensive blocklist** — `case` patterns now include both bareform
  (`node_modules/*`) and any-segment (`*/node_modules/*`) globs.
  Previously only `*/node_modules/*` was listed, which would not
  match a `tool_input.file_path` of `node_modules/pkg/index.js` (no
  leading segment). Claude Code typically passes absolute paths so
  this is theoretical, but cheap to harden. Applied to all 12
  blocklist patterns; `hooks.json` and the docs snippet stay in sync
  (the new parity test enforces this).
- **Caveat polish** — added a sentence noting that extension matching
  is case-sensitive (`*.MD` / `*.JS` would skip the allowlist) and
  that the new patterns include both segment forms. Reworded the
  debounce caveat so the previously-named `mm index --debounce-window`
  flag (which doesn't exist yet) is described as "native debounce
  support tracked separately" — keeps the docs honest about the
  current CLI surface per the docs-as-tests rule.
- **CHANGELOG** — added the parity-test fact and a note that existing
  installs which copy-pasted the previous snippet into
  `~/.claude/settings.json` continue with the old unfiltered behavior
  until they re-pull the snippet (no auto-migration path for
  user-installed hooks).

Verification: `pytest packages/memtomem/tests/test_docs_guards.py
packages/memtomem/tests/test_config_overrides.py` — 39 passed (was
37; +2 parity tests). Sabotage-tested locally by mutating one side's
command, confirmed `test_docs_snippet_commands_match_plugin` fires
with the diff in the assertion message. ruff check + format clean
on the touched test file.

Co-Authored-By: Claude <[email protected]>
@memtomem memtomem merged commit ffd38e2 into main Apr 29, 2026
7 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 29, 2026
@memtomem memtomem deleted the feat/postool-monorepo-guard branch April 29, 2026 03:44
memtomem added a commit that referenced this pull request Apr 29, 2026
…536 documented gap) (#548)

PR #536 shipped the PostToolUse[Write] extension/path filter and
documented but didn't address the rapid-consecutive-writes gap:
codegen loops that touch the same file many times in a few seconds
re-index it on every Write, fanning out embedding cost and producing
noisy index churn. This PR closes the gap with a file-system-backed
debounce queue and three new CLI flags on ``mm index``.

The persistent queue lives at
``~/.memtomem/index_debounce_queue.json``, guarded by ``flock`` on a
sidecar lockfile (the queue file itself is replaced atomically via
``os.replace``, so locking it directly would disconnect the lock
from the file mid-mutation; the sidecar is never replaced and
therefore serializes correctly under concurrent writers — exercised
by ``test_concurrent_enqueue_does_not_lose_entries`` with 20 parallel
threads).

### CLI surface

- ``--debounce-window <SECONDS>``: record PATH in the queue and drain
  entries that have been silent at least SECONDS. Repeated writes to
  the same path restart the window — the *last* hook in a burst (or
  any later hook firing on a different file) indexes the burst's final
  state once.
- ``--flush``: synchronously drain every queued entry. Blocks until
  every queued file is indexed (or recorded as an error). Worst-case
  latency ≈ queue depth × per-file index cost. The plugin's Stop hook
  now chains ``mm index --flush`` before ``mm session end --auto`` so
  the burst at session end isn't left in the queue.
- ``--status``: snapshot of the queue (depth, oldest entry). Read
  without a lock by design — concurrent hooks may add or drain
  entries between the read and any caller action. The docstring and
  ``--help`` text both flag the race so callers don't try
  status-then-flush as a correctness pattern; ``--flush`` is the
  only "drain the queue" primitive.

The three flags are mutually exclusive with each other and with the
plain ``mm index <path>`` invocation.

### Indexer error semantics

Indexing errors keep the entry in the queue for retry on the next
hook fire — failing files are not silently lost. Last-write-wins for
``--namespace``/``--force`` when the same path is enqueued twice (most
recent caller's intent applies on drain).

### RFC-B (PreCompact, deferred) — future-extensibility reserved

When the PreCompact hook contract lands and a checkpoint handler
wants to flush only the files Claude Code reports as in-flight,
``--flush`` will gain a ``--paths <list>`` form. The underlying
``debounce.drain_all`` already accepts an optional ``paths=`` filter
(verified by ``test_paths_filter_drains_subset_only``) so the CLI
addition is additive — no second ABI change. CHANGELOG note covers
this so a future RFC-B reviewer doesn't have to rediscover the
extension shape.

### Plugin + docs sync

Plugin ``hooks.json`` and ``docs/guides/integrations/claude-code.md``
Hooks Automation Setup are updated byte-for-byte
(``TestPluginHooksDocsParity`` from #536 catches drift). The
PostToolUse[Write] hook calls ``mm index --debounce-window 5``; the
Stop hook chains ``mm index --flush; mm session end --auto`` (Stop
hook timeout bumped from 5000 to 10000 ms to accommodate the flush
worst-case latency).

The "Important Caveats" section is rewritten — the old "Debounce gap"
caveat (which pointed at native debounce as future work) is replaced
with a "Debounce mechanics" caveat that explains the queue file,
window restart semantics, the Stop-hook flush chain, the race-prone
nature of ``--status``, and the RFC-B selective-payload extension
shape.

### Tests

19 new unit tests in ``test_index_debounce.py`` covering enqueue
semantics (last-write-wins, first-seen vs last-seen, distinct paths),
``drain_ready`` (caller's own enqueue doesn't qualify, mixed-readiness
queue, error retry, namespace/force forwarding), ``drain_all`` (full
drain, subset via ``paths=``, empty-queue no-op), ``status_snapshot``
(empty/oldest-by-first-seen), and persistence + concurrency
(round-trip across calls, 20-thread parallel enqueue, partial-drain
write-back). The parity test from #536 keeps the plugin and docs
snippets aligned automatically.

Co-authored-by: pandas-studio <[email protected]>
Co-authored-by: Claude <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants