fix(plugin): filter PostToolUse[Write] indexing by extension and path#536
Merged
fix(plugin): filter PostToolUse[Write] indexing by extension and path#536
Conversation
The plugin's `PostToolUse[Write]` hook in
`packages/memtomem-claude-plugin/hooks/hooks.json` ran
`mm index "${tool_input.file_path}"` unconditionally on every Write,
fanning out to `node_modules/`, `dist/`, `__pycache__/`, lock files,
binaries, and images in monorepo checkouts. Result: embedding-cost
amplification and search-result noise that scaled with build-artifact
churn rather than source-file churn.
Replace the inline command with an extension allowlist (`md`, `py`,
`ts`/`tsx`, `js`/`jsx`, `go`, `rs`, `rb`, `java`, `kt`, `swift`,
`c`/`cpp`/`h`/`hpp`, `sh`, `toml`, `yaml`/`yml`, `json`) and a path
blocklist (`node_modules`, `dist`, `build`, `target`, `.next`,
`.nuxt`, `__pycache__`, `.git`, `.venv`/`venv`, `coverage`, `.cache`)
expressed as inline `case` statements — no external wrapper script,
hooks.json stays self-contained. Adjust the patterns inline for
project-specific needs.
`docs/guides/integrations/claude-code.md` Hooks Automation snippet
synced verbatim with the plugin file (drift-prevention; the snippet
the docs render is the same JSON the plugin ships). Caveats section
gains two bullets: one documenting the allowlist/blocklist defaults,
one acknowledging the debounce gap that remains — rapid consecutive
writes still re-index the same file. Native `mm index
--debounce-window` is the planned follow-up, tracked separately.
Other plugin hooks (`UserPromptSubmit`, `PostToolUse activity log`,
`Stop`) are unchanged.
Verification: `pytest packages/memtomem/tests/test_docs_guards.py
packages/memtomem/tests/test_config_overrides.py` (37 passed); JSON
validity (`python -m json.tool`); `jq` round-trip confirms the docs
snippet `command` field is byte-identical to the plugin hooks.json.
Co-Authored-By: Claude <[email protected]>
…s polish Address PR #536 review: - **Parity test** (`TestPluginHooksDocsParity`) — the docs snippet in `claude-code.md` Hooks Automation Setup and the plugin's shipped `hooks.json` now lock against silent drift. Test extracts the fenced ``json`` block right after the "Add the following to ~/.claude/settings.json:" anchor, parses it, and asserts that every (event, matcher) entry the docs render has a byte-identical `command` string in the plugin file. Same shape as the existing `test_mcp_clients_matches_reference_footnote` cross-file invariant, scoped per the docstring at the top of the file. Two test methods so a missing entry and a drifted command surface as distinct failures. - **Defensive blocklist** — `case` patterns now include both bareform (`node_modules/*`) and any-segment (`*/node_modules/*`) globs. Previously only `*/node_modules/*` was listed, which would not match a `tool_input.file_path` of `node_modules/pkg/index.js` (no leading segment). Claude Code typically passes absolute paths so this is theoretical, but cheap to harden. Applied to all 12 blocklist patterns; `hooks.json` and the docs snippet stay in sync (the new parity test enforces this). - **Caveat polish** — added a sentence noting that extension matching is case-sensitive (`*.MD` / `*.JS` would skip the allowlist) and that the new patterns include both segment forms. Reworded the debounce caveat so the previously-named `mm index --debounce-window` flag (which doesn't exist yet) is described as "native debounce support tracked separately" — keeps the docs honest about the current CLI surface per the docs-as-tests rule. - **CHANGELOG** — added the parity-test fact and a note that existing installs which copy-pasted the previous snippet into `~/.claude/settings.json` continue with the old unfiltered behavior until they re-pull the snippet (no auto-migration path for user-installed hooks). Verification: `pytest packages/memtomem/tests/test_docs_guards.py packages/memtomem/tests/test_config_overrides.py` — 39 passed (was 37; +2 parity tests). Sabotage-tested locally by mutating one side's command, confirmed `test_docs_snippet_commands_match_plugin` fires with the diff in the assertion message. ruff check + format clean on the touched test file. Co-Authored-By: Claude <[email protected]>
memtomem
added a commit
that referenced
this pull request
Apr 29, 2026
…536 documented gap) (#548) PR #536 shipped the PostToolUse[Write] extension/path filter and documented but didn't address the rapid-consecutive-writes gap: codegen loops that touch the same file many times in a few seconds re-index it on every Write, fanning out embedding cost and producing noisy index churn. This PR closes the gap with a file-system-backed debounce queue and three new CLI flags on ``mm index``. The persistent queue lives at ``~/.memtomem/index_debounce_queue.json``, guarded by ``flock`` on a sidecar lockfile (the queue file itself is replaced atomically via ``os.replace``, so locking it directly would disconnect the lock from the file mid-mutation; the sidecar is never replaced and therefore serializes correctly under concurrent writers — exercised by ``test_concurrent_enqueue_does_not_lose_entries`` with 20 parallel threads). ### CLI surface - ``--debounce-window <SECONDS>``: record PATH in the queue and drain entries that have been silent at least SECONDS. Repeated writes to the same path restart the window — the *last* hook in a burst (or any later hook firing on a different file) indexes the burst's final state once. - ``--flush``: synchronously drain every queued entry. Blocks until every queued file is indexed (or recorded as an error). Worst-case latency ≈ queue depth × per-file index cost. The plugin's Stop hook now chains ``mm index --flush`` before ``mm session end --auto`` so the burst at session end isn't left in the queue. - ``--status``: snapshot of the queue (depth, oldest entry). Read without a lock by design — concurrent hooks may add or drain entries between the read and any caller action. The docstring and ``--help`` text both flag the race so callers don't try status-then-flush as a correctness pattern; ``--flush`` is the only "drain the queue" primitive. The three flags are mutually exclusive with each other and with the plain ``mm index <path>`` invocation. ### Indexer error semantics Indexing errors keep the entry in the queue for retry on the next hook fire — failing files are not silently lost. Last-write-wins for ``--namespace``/``--force`` when the same path is enqueued twice (most recent caller's intent applies on drain). ### RFC-B (PreCompact, deferred) — future-extensibility reserved When the PreCompact hook contract lands and a checkpoint handler wants to flush only the files Claude Code reports as in-flight, ``--flush`` will gain a ``--paths <list>`` form. The underlying ``debounce.drain_all`` already accepts an optional ``paths=`` filter (verified by ``test_paths_filter_drains_subset_only``) so the CLI addition is additive — no second ABI change. CHANGELOG note covers this so a future RFC-B reviewer doesn't have to rediscover the extension shape. ### Plugin + docs sync Plugin ``hooks.json`` and ``docs/guides/integrations/claude-code.md`` Hooks Automation Setup are updated byte-for-byte (``TestPluginHooksDocsParity`` from #536 catches drift). The PostToolUse[Write] hook calls ``mm index --debounce-window 5``; the Stop hook chains ``mm index --flush; mm session end --auto`` (Stop hook timeout bumped from 5000 to 10000 ms to accommodate the flush worst-case latency). The "Important Caveats" section is rewritten — the old "Debounce gap" caveat (which pointed at native debounce as future work) is replaced with a "Debounce mechanics" caveat that explains the queue file, window restart semantics, the Stop-hook flush chain, the race-prone nature of ``--status``, and the RFC-B selective-payload extension shape. ### Tests 19 new unit tests in ``test_index_debounce.py`` covering enqueue semantics (last-write-wins, first-seen vs last-seen, distinct paths), ``drain_ready`` (caller's own enqueue doesn't qualify, mixed-readiness queue, error retry, namespace/force forwarding), ``drain_all`` (full drain, subset via ``paths=``, empty-queue no-op), ``status_snapshot`` (empty/oldest-by-first-seen), and persistence + concurrency (round-trip across calls, 20-thread parallel enqueue, partial-drain write-back). The parity test from #536 keeps the plugin and docs snippets aligned automatically. Co-authored-by: pandas-studio <[email protected]> Co-authored-by: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The plugin's
PostToolUse[Write]hook was indexing everyWriteunconditionally, which fanned out to
node_modules/,dist/,__pycache__/, lock files, binaries, and images in monorepocheckouts. Embedding-cost amplifier + search-result noise that scaled
with build-artifact churn instead of source-file churn.
This PR replaces the inline command with an extension allowlist
(
md,py,ts/tsx,js/jsx,go,rs,rb,java,kt,swift,c/cpp/h/hpp,sh,toml,yaml/yml,json)and a path blocklist (
node_modules,dist,build,target,.next,.nuxt,__pycache__,.git,.venv/venv,coverage,.cache) expressed as inlinecasestatements inpackages/memtomem-claude-plugin/hooks/hooks.json. No externalwrapper script —
hooks.jsonstays self-contained.docs/guides/integrations/claude-code.mdHooks Automation snippet issynced verbatim with the plugin file (drift prevention). Caveats
section gains two bullets covering the new allowlist/blocklist
defaults and the remaining debounce gap (rapid consecutive writes
can still re-index the same file;
mm index --debounce-windowisthe planned follow-up).
Other plugin hooks (
UserPromptSubmit,PostToolUse activity log,Stop) are unchanged.Why not also add PreCompact / SessionStart hooks?
Both gaps were considered together with this one and intentionally
deferred to separate RFCs because they need new CLI primitives:
mm session start, but thatcommand is non-idempotent (
session_cmd.py:110overwrites~/.memtomem/.current_sessionwith a fresh UUID every call). Ahook firing on every Claude Code session would orphan the previous
session whenever
Stopfailed (crash, kill, timeout). Safeintegration needs
mm session start --idempotent(or--resume-if-active) first.context, but the existing CLI surface only offers
mm activity log(marker only). A meaningful checkpoint needs both a new
mm session checkpointCLI and a clear story for what content thehook payload can actually capture.
PostToolUse hardening has zero CLI dependency and ships standalone.
Test plan
python -m json.tool packages/memtomem-claude-plugin/hooks/hooks.json— JSON validpytest packages/memtomem/tests/test_docs_guards.py packages/memtomem/tests/test_config_overrides.py -x -q— 37 passedjqround-trip:commandfield in docs snippet is byte-identical tohooks.json### Changedentry under[Unreleased]casestatements, easy to extend per-projectOut of scope (tracked separately)
mm session start --idempotent, etc.) — prerequisite for SessionStart hookmm session checkpointdesign + payload story)mm index --debounce-windowCLI option — debounce gap follow-up🤖 Generated with Claude Code