fix(hooks): pass --mode convos when mining Claude Code transcript dirs#1230
fix(hooks): pass --mode convos when mining Claude Code transcript dirs#1230
Conversation
The Stop and PreCompact hooks spawn `mempalace mine <dir>` with no `--mode` flag, which defaults to `projects` in cli.py. When MEMPAL_DIR is unset, _get_mine_dir falls back to the parent of the transcript JSONL — and miner.py's READABLE_EXTENSIONS includes `.jsonl`, so the projects miner happily ingests Claude Code session JSONL as if it were source code instead of conversation. Make _get_mine_dir return (dir, mode): MEMPAL_DIR keeps `projects`, the JSONL fallback yields `convos`. Both _maybe_auto_ingest and _mine_sync now thread the mode into the spawned command.
There was a problem hiding this comment.
Pull request overview
Fixes Claude Code hook auto-ingest so transcript directories are mined using --mode convos (instead of the default projects mode) when MEMPAL_DIR is not set, preventing session .jsonl files from being ingested as source code.
Changes:
- Updated
_get_mine_dirto return(dir, mode)so hooks can chooseprojectsvsconvosappropriately. - Threaded
--mode <mode>into the spawnedmempalace mine ...commands for both Stop (background) and PreCompact (sync) hook paths. - Expanded/adjusted tests to assert the spawned command includes the correct
--modefor env vs transcript fallback cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
mempalace/hooks_cli.py |
Returns mine target + mode and passes --mode through to background and synchronous mine invocations. |
tests/test_hooks_cli.py |
Updates and adds tests to validate mode selection and spawned command shape for both hook paths. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mempal_dir = os.environ.get("MEMPAL_DIR", "") | ||
| if mempal_dir and os.path.isdir(mempal_dir): | ||
| return mempal_dir | ||
| return mempal_dir, "projects" | ||
| if transcript_path: | ||
| path = Path(transcript_path).expanduser() | ||
| if path.is_file(): | ||
| return str(path.parent) | ||
| return "" | ||
| return str(path.parent), "convos" |
There was a problem hiding this comment.
_get_mine_dir() bypasses the existing transcript_path validation logic: it uses Path(transcript_path).expanduser().is_file() (no .resolve(), no extension check, no '..' rejection), so a crafted transcript_path could still cause the hook to mine an unintended parent directory. Also, MEMPAL_DIR is checked with os.path.isdir() without expanding '', so common values like "/proj" will be treated as invalid and fall back unexpectedly. Consider reusing _validate_transcript_path() for the transcript fallback and normalizing MEMPAL_DIR via Path(mempal_dir).expanduser().resolve() (and then .is_dir()).
There was a problem hiding this comment.
@copilot apply changes based on this feedback
There was a problem hiding this comment.
Fixed in 6a8beef.
- MEMPAL_DIR: now normalized via
Path(mempal_dir).expanduser().resolve()before theis_dir()check, so~/proj-style values work correctly and the resolved path is what gets passed to the miner command. - Transcript fallback: replaced the bare
Path(...).expanduser().is_file()check with a call to the existing_validate_transcript_path(), which handles.resolve(),.jsonl/.jsonextension enforcement, and..path-traversal rejection.
Added three new tests covering tilde-expansion of MEMPAL_DIR, .. traversal rejection, and non-.jsonl extension rejection in _get_mine_dir. Existing test assertions updated to compare resolved paths for cross-platform correctness.
- Normalize MEMPAL_DIR via Path.expanduser().resolve() so ~/proj paths are correctly accepted instead of falling through to transcript fallback - Replace bare Path.expanduser().is_file() transcript check with the existing _validate_transcript_path() which adds .resolve(), enforces .jsonl/.json extension, and rejects '..' path-traversal components - Update tests to compare resolved paths (cross-platform correctness) - Add tests for tilde expansion, path-traversal rejection, and non-jsonl extension rejection in _get_mine_dir Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/f69176c7-d752-40ef-ba71-d0e4adc3a689 Co-authored-by: igorls <[email protected]>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
…MEMPAL_DIR MemPalace#1230 fixed --mode convos for the case where MEMPAL_DIR was unset, but left two configurations broken: - MEMPAL_DIR set to a project dir: convos never mined (MEMPAL_DIR overrode the transcript path); only project files were ingested. - MEMPAL_DIR set to a conversations dir per the old hooks/README: the projects miner ran on JSONL — same wrong-miner behaviour. The shell hooks (mempal_save_hook.sh, mempal_precompact_hook.sh) had the same MEMPAL_DIR-overrides-transcript bug AND were missing --mode on every spawned `mempalace mine` call. Make the auto-ingest *additive*. _get_mine_dir → _get_mine_targets, returning a list of (dir, mode) pairs: - MEMPAL_DIR (when valid) contributes (dir, "projects") - A valid transcript JSONL contributes (parent, "convos") - Both can appear together; the hook spawns one ingest per target Same change applied to the shell save and precompact hooks. Precompact also gained transcript_path parsing so it can run the convos mine synchronously before context is compressed. hooks/README.md updated to describe MEMPAL_DIR as a project-files target, never a convos target.
Bumps every version source from 3.3.3 to 3.3.4: - pyproject.toml - mempalace/version.py (canonical) - .claude-plugin/plugin.json - .claude-plugin/marketplace.json - .codex-plugin/plugin.json - README.md badge Dates the CHANGELOG section and adds entries for the bug fixes that landed this cycle (#1135, #1191, #1230, #1231) plus expands the #1194 entry to credit the lookup-side recovery path from #1197. Pre-tag verification: - 1441 passed, 1 skipped (full suite minus benchmarks, all platforms) - ruff check + format clean - 44/44 in test_version_consistency + test_readme_claims (6-file sync) - JPH invariant: pyproject.toml + .claude-plugin/plugin.json both reference mempalace-mcp - Wheel build + fresh-venv install: mempalace --version reports 3.3.4, mempalace-mcp --help works (catches the v3.3.2-class regression)
22 commits from upstream/develop, including: - HNSW capacity divergence detection + BM25-only sqlite fallback when vector layer is unloadable (MemPalace#1222 / MemPalace#1227 — vector_disabled kwarg routes search through chromadb-bypass path on segfault risk). - HNSW index bloat prevention via batch_size + sync_threshold metadata on collection create (MemPalace#1191). - Hooks: always mine the active transcript as convos, additive to MEMPAL_DIR (MemPalace#1230 / MemPalace#1231). Restructured into `_get_mine_targets()` list approach + separate `_ingest_transcript()`. Daemon-strict gate preserved on each entry point. - Wing-name normalization for hyphenated dirs across miner / convo_miner / palace_graph (MemPalace#1194 / MemPalace#1195 / MemPalace#1197). - `narrow _fix_blob_seq_ids` shim + `repair --mode max-seq-id` for legacy 0.6.x BLOB-poisoned palaces (MemPalace#1135). Conflict resolution: - README.md: kept fork-shaped narrative, dropped upstream's sweep tip injection (per fork-readme handling memory). - hooks/README.md: adopted upstream's accurate `MEMPAL_DIR` additive description; kept `MEMPAL_PYTHON` env-var name (matches actual `hooks/*.sh` scripts in both forks). - mempalace/cli.py: consolidated duplicate `--mode` argparse declarations into one with all 4 choices (rebuild/legacy/reorganize/max-seq-id). - mempalace/hooks_cli.py: adopted upstream's `_get_mine_targets()` + `_ingest_transcript()` shape; added `_daemon_strict()` guard at entry of `_maybe_auto_ingest`, `_mine_sync`, and `_ingest_transcript` so the daemon-strict architecture still skips local writes. - mempalace/mcp_server.py: kept both `kind=` and `vector_disabled=` kwargs on the `search_memories` call. - mempalace/searcher.py: kept fork's `_count_in_scope`, `_sqlite_fallback_and_scope`, `_apply_kind_text_filter` AND upstream's `_bm25_only_via_sqlite`. Both `kind` and `vector_disabled` parameters on `search_memories`. Tests: 1510 passing (up from 1366 — upstream brought new test suites).
…MEMPAL_DIR MemPalace#1230 fixed --mode convos for the case where MEMPAL_DIR was unset, but left two configurations broken: - MEMPAL_DIR set to a project dir: convos never mined (MEMPAL_DIR overrode the transcript path); only project files were ingested. - MEMPAL_DIR set to a conversations dir per the old hooks/README: the projects miner ran on JSONL — same wrong-miner behaviour. The shell hooks (mempal_save_hook.sh, mempal_precompact_hook.sh) had the same MEMPAL_DIR-overrides-transcript bug AND were missing --mode on every spawned `mempalace mine` call. Make the auto-ingest *additive*. _get_mine_dir → _get_mine_targets, returning a list of (dir, mode) pairs: - MEMPAL_DIR (when valid) contributes (dir, "projects") - A valid transcript JSONL contributes (parent, "convos") - Both can appear together; the hook spawns one ingest per target Same change applied to the shell save and precompact hooks. Precompact also gained transcript_path parsing so it can run the convos mine synchronously before context is compressed. hooks/README.md updated to describe MEMPAL_DIR as a project-files target, never a convos target.
Catches up on a month of upstream work. Highlights pulled in: - MemPalace#1306 searcher: hybrid candidate union (vector ∪ BM25 reranking pool) - MemPalace#1325 mcp security: omit absolute paths from tool_get_drawer / tool_status - MemPalace#1322 chroma: wire quarantine_stale_hnsw to prevent SIGSEGV on stale HNSW - MemPalace#1320 mcp: forward valid_to / source params in kg_add / kg_invalidate - MemPalace#1321 cli: honor --palace flag in cmd_init - MemPalace#1314 kg temporal params fix - MemPalace#1244 cli: cmd_compress writes to mempalace_closets so palace can read - MemPalace#1243 mcp: case-insensitive agent name in diary_write/diary_read - MemPalace#1303 mcp_server: pass embedding_function= on collection reopen - MemPalace#1076/MemPalace#1077 hooks: quote CLAUDE_PLUGIN_ROOT / CODEX_PLUGIN_ROOT in hooks.json - Various ruff format passes on touched files Conflict resolution (CHANGELOG.md only — code files all auto-merged): - 3.3.5 unreleased section header from upstream kept above 3.3.4 - 3.3.4 section: kept our 2026-04-30 release date; merged upstream's new MemPalace#1299 SIGSEGV-on-default-EF entry in alongside our existing topic-tunnels (MemPalace#1194/MemPalace#1195/MemPalace#1197), HNSW-bloat (MemPalace#1191), max_seq_id (MemPalace#1135), and auto-ingest (MemPalace#1230/MemPalace#1231) entries. Kept our richer topic-tunnels detail (upstream's version was a strict subset). xdev patches preserved (still on this branch, untouched by merge): - 6ef44cb fix(hooks): route CC transcripts via convo_miner with cwd-based wings - 3fad61d fix(config): allow leading dash in wing names Not pushed to origin — run tests locally and decide when to push. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Summary
The Stop and PreCompact auto-ingest hooks spawn
mempalace mine <dir>with no--modeflag.--modedefaults toprojectsincli.py, so whenMEMPAL_DIRis unset and_get_mine_dirfalls back to the parent of the transcript JSONL, the projects miner runs against a directory of Claude Code session JSONL files.miner.py'sREADABLE_EXTENSIONSincludes.jsonl, so those files are silently ingested as if they were source code instead of going throughconvo_miner.This affects users who installed the hooks without setting
MEMPAL_DIR. Two call sites are wrong with the same bug:hooks_cli.py:_maybe_auto_ingest(Stop hook, background)hooks_cli.py:_mine_sync(PreCompact hook, synchronous — the "save before context dies" path, so the worse of the two)Fix
_get_mine_dirnow returns(dir, mode):MEMPAL_DIRset and valid ->(MEMPAL_DIR, "projects")— unchanged behavior(parent_dir, "convos")("", "projects")Both
_maybe_auto_ingestand_mine_syncthreadmodeinto the spawnedmempalace mine ... --mode <mode>command.Test plan
_get_mine_dirtests to assert the new(dir, mode)tupletest_maybe_auto_ingest_with_env/_with_transcriptto assert the spawned cmd carries the right--modetest_mine_sync_with_transcript_uses_convos_modeandtest_mine_sync_with_env_uses_projects_modetest_precompact_mines_transcript_dirfor the new cmd shape1411 passedruff check+ruff format --checkclean