Skip to content

fix(hooks): pass --mode convos when mining Claude Code transcript dirs#1230

Merged
igorls merged 2 commits intodevelopfrom
fix/hooks-convos-mode-1232
Apr 27, 2026
Merged

fix(hooks): pass --mode convos when mining Claude Code transcript dirs#1230
igorls merged 2 commits intodevelopfrom
fix/hooks-convos-mode-1232

Conversation

@igorls
Copy link
Copy Markdown
Member

@igorls igorls commented Apr 27, 2026

Summary

The Stop and PreCompact auto-ingest hooks spawn mempalace mine <dir> with no --mode flag. --mode defaults to projects in cli.py, so when MEMPAL_DIR is unset and _get_mine_dir falls back to the parent of the transcript JSONL, the projects miner runs against a directory of Claude Code session JSONL files. miner.py's READABLE_EXTENSIONS includes .jsonl, so those files are silently ingested as if they were source code instead of going through convo_miner.

This affects users who installed the hooks without setting MEMPAL_DIR. Two call sites are wrong with the same bug:

  • hooks_cli.py:_maybe_auto_ingest (Stop hook, background)
  • hooks_cli.py:_mine_sync (PreCompact hook, synchronous — the "save before context dies" path, so the worse of the two)

Fix

_get_mine_dir now returns (dir, mode):

  • MEMPAL_DIR set and valid -> (MEMPAL_DIR, "projects") — unchanged behavior
  • transcript JSONL fallback -> (parent_dir, "convos")
  • nothing available -> ("", "projects")

Both _maybe_auto_ingest and _mine_sync thread mode into the spawned mempalace mine ... --mode <mode> command.

Test plan

  • Updated _get_mine_dir tests to assert the new (dir, mode) tuple
  • Tightened test_maybe_auto_ingest_with_env / _with_transcript to assert the spawned cmd carries the right --mode
  • Added test_mine_sync_with_transcript_uses_convos_mode and test_mine_sync_with_env_uses_projects_mode
  • Updated test_precompact_mines_transcript_dir for the new cmd shape
  • Full suite green: 1411 passed
  • ruff check + ruff format --check clean

The Stop and PreCompact hooks spawn `mempalace mine <dir>` with no
`--mode` flag, which defaults to `projects` in cli.py. When MEMPAL_DIR
is unset, _get_mine_dir falls back to the parent of the transcript
JSONL — and miner.py's READABLE_EXTENSIONS includes `.jsonl`, so the
projects miner happily ingests Claude Code session JSONL as if it were
source code instead of conversation.

Make _get_mine_dir return (dir, mode): MEMPAL_DIR keeps `projects`,
the JSONL fallback yields `convos`. Both _maybe_auto_ingest and
_mine_sync now thread the mode into the spawned command.
Copilot AI review requested due to automatic review settings April 27, 2026 02:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Claude Code hook auto-ingest so transcript directories are mined using --mode convos (instead of the default projects mode) when MEMPAL_DIR is not set, preventing session .jsonl files from being ingested as source code.

Changes:

  • Updated _get_mine_dir to return (dir, mode) so hooks can choose projects vs convos appropriately.
  • Threaded --mode <mode> into the spawned mempalace mine ... commands for both Stop (background) and PreCompact (sync) hook paths.
  • Expanded/adjusted tests to assert the spawned command includes the correct --mode for env vs transcript fallback cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
mempalace/hooks_cli.py Returns mine target + mode and passes --mode through to background and synchronous mine invocations.
tests/test_hooks_cli.py Updates and adds tests to validate mode selection and spawned command shape for both hook paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/hooks_cli.py Outdated
Comment on lines +212 to +218
mempal_dir = os.environ.get("MEMPAL_DIR", "")
if mempal_dir and os.path.isdir(mempal_dir):
return mempal_dir
return mempal_dir, "projects"
if transcript_path:
path = Path(transcript_path).expanduser()
if path.is_file():
return str(path.parent)
return ""
return str(path.parent), "convos"
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_mine_dir() bypasses the existing transcript_path validation logic: it uses Path(transcript_path).expanduser().is_file() (no .resolve(), no extension check, no '..' rejection), so a crafted transcript_path could still cause the hook to mine an unintended parent directory. Also, MEMPAL_DIR is checked with os.path.isdir() without expanding '', so common values like "/proj" will be treated as invalid and fall back unexpectedly. Consider reusing _validate_transcript_path() for the transcript fallback and normalizing MEMPAL_DIR via Path(mempal_dir).expanduser().resolve() (and then .is_dir()).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6a8beef.

  • MEMPAL_DIR: now normalized via Path(mempal_dir).expanduser().resolve() before the is_dir() check, so ~/proj-style values work correctly and the resolved path is what gets passed to the miner command.
  • Transcript fallback: replaced the bare Path(...).expanduser().is_file() check with a call to the existing _validate_transcript_path(), which handles .resolve(), .jsonl/.json extension enforcement, and .. path-traversal rejection.

Added three new tests covering tilde-expansion of MEMPAL_DIR, .. traversal rejection, and non-.jsonl extension rejection in _get_mine_dir. Existing test assertions updated to compare resolved paths for cross-platform correctness.

- Normalize MEMPAL_DIR via Path.expanduser().resolve() so ~/proj paths
  are correctly accepted instead of falling through to transcript fallback
- Replace bare Path.expanduser().is_file() transcript check with the
  existing _validate_transcript_path() which adds .resolve(), enforces
  .jsonl/.json extension, and rejects '..' path-traversal components
- Update tests to compare resolved paths (cross-platform correctness)
- Add tests for tilde expansion, path-traversal rejection, and
  non-jsonl extension rejection in _get_mine_dir

Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/f69176c7-d752-40ef-ba71-d0e4adc3a689

Co-authored-by: igorls <[email protected]>
Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 27, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • chroma-onnx-models.s3.amazonaws.com
    • Triggering command: /usr/bin/python python -m pytest tests/ -v --ignore=tests/benchmarks -x -q (dns block)
    • Triggering command: /usr/bin/python python -m pytest tests/ -v --ignore=tests/benchmarks --ignore=tests/test_closet_llm.py -q (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@igorls igorls merged commit 8bb1772 into develop Apr 27, 2026
6 checks passed
igorls added a commit to funguf/mempalace that referenced this pull request Apr 27, 2026
…MEMPAL_DIR

MemPalace#1230 fixed --mode convos for the case where MEMPAL_DIR was unset, but
left two configurations broken:

  - MEMPAL_DIR set to a project dir: convos never mined (MEMPAL_DIR
    overrode the transcript path); only project files were ingested.
  - MEMPAL_DIR set to a conversations dir per the old hooks/README: the
    projects miner ran on JSONL — same wrong-miner behaviour.

The shell hooks (mempal_save_hook.sh, mempal_precompact_hook.sh) had
the same MEMPAL_DIR-overrides-transcript bug AND were missing --mode
on every spawned `mempalace mine` call.

Make the auto-ingest *additive*. _get_mine_dir → _get_mine_targets,
returning a list of (dir, mode) pairs:

  - MEMPAL_DIR (when valid) contributes (dir, "projects")
  - A valid transcript JSONL contributes (parent, "convos")
  - Both can appear together; the hook spawns one ingest per target

Same change applied to the shell save and precompact hooks. Precompact
also gained transcript_path parsing so it can run the convos mine
synchronously before context is compressed. hooks/README.md updated to
describe MEMPAL_DIR as a project-files target, never a convos target.
igorls added a commit that referenced this pull request Apr 27, 2026
Bumps every version source from 3.3.3 to 3.3.4:
- pyproject.toml
- mempalace/version.py (canonical)
- .claude-plugin/plugin.json
- .claude-plugin/marketplace.json
- .codex-plugin/plugin.json
- README.md badge

Dates the CHANGELOG section and adds entries for the bug fixes that
landed this cycle (#1135, #1191, #1230, #1231) plus expands the #1194
entry to credit the lookup-side recovery path from #1197.

Pre-tag verification:
- 1441 passed, 1 skipped (full suite minus benchmarks, all platforms)
- ruff check + format clean
- 44/44 in test_version_consistency + test_readme_claims (6-file sync)
- JPH invariant: pyproject.toml + .claude-plugin/plugin.json both
  reference mempalace-mcp
- Wheel build + fresh-venv install: mempalace --version reports 3.3.4,
  mempalace-mcp --help works (catches the v3.3.2-class regression)
@igorls igorls mentioned this pull request Apr 27, 2026
jphein added a commit to jphein/mempalace that referenced this pull request Apr 27, 2026
22 commits from upstream/develop, including:

- HNSW capacity divergence detection + BM25-only sqlite fallback when
  vector layer is unloadable (MemPalace#1222 / MemPalace#1227 — vector_disabled kwarg
  routes search through chromadb-bypass path on segfault risk).
- HNSW index bloat prevention via batch_size + sync_threshold metadata
  on collection create (MemPalace#1191).
- Hooks: always mine the active transcript as convos, additive to
  MEMPAL_DIR (MemPalace#1230 / MemPalace#1231). Restructured into `_get_mine_targets()`
  list approach + separate `_ingest_transcript()`. Daemon-strict gate
  preserved on each entry point.
- Wing-name normalization for hyphenated dirs across miner / convo_miner
  / palace_graph (MemPalace#1194 / MemPalace#1195 / MemPalace#1197).
- `narrow _fix_blob_seq_ids` shim + `repair --mode max-seq-id` for
  legacy 0.6.x BLOB-poisoned palaces (MemPalace#1135).

Conflict resolution:
- README.md: kept fork-shaped narrative, dropped upstream's sweep tip
  injection (per fork-readme handling memory).
- hooks/README.md: adopted upstream's accurate `MEMPAL_DIR` additive
  description; kept `MEMPAL_PYTHON` env-var name (matches actual
  `hooks/*.sh` scripts in both forks).
- mempalace/cli.py: consolidated duplicate `--mode` argparse declarations
  into one with all 4 choices (rebuild/legacy/reorganize/max-seq-id).
- mempalace/hooks_cli.py: adopted upstream's `_get_mine_targets()` +
  `_ingest_transcript()` shape; added `_daemon_strict()` guard at entry
  of `_maybe_auto_ingest`, `_mine_sync`, and `_ingest_transcript` so
  the daemon-strict architecture still skips local writes.
- mempalace/mcp_server.py: kept both `kind=` and `vector_disabled=`
  kwargs on the `search_memories` call.
- mempalace/searcher.py: kept fork's `_count_in_scope`,
  `_sqlite_fallback_and_scope`, `_apply_kind_text_filter` AND upstream's
  `_bm25_only_via_sqlite`. Both `kind` and `vector_disabled` parameters
  on `search_memories`.

Tests: 1510 passing (up from 1366 — upstream brought new test suites).
lealvona pushed a commit to lealvona/mempalace that referenced this pull request Apr 29, 2026
…MEMPAL_DIR

MemPalace#1230 fixed --mode convos for the case where MEMPAL_DIR was unset, but
left two configurations broken:

  - MEMPAL_DIR set to a project dir: convos never mined (MEMPAL_DIR
    overrode the transcript path); only project files were ingested.
  - MEMPAL_DIR set to a conversations dir per the old hooks/README: the
    projects miner ran on JSONL — same wrong-miner behaviour.

The shell hooks (mempal_save_hook.sh, mempal_precompact_hook.sh) had
the same MEMPAL_DIR-overrides-transcript bug AND were missing --mode
on every spawned `mempalace mine` call.

Make the auto-ingest *additive*. _get_mine_dir → _get_mine_targets,
returning a list of (dir, mode) pairs:

  - MEMPAL_DIR (when valid) contributes (dir, "projects")
  - A valid transcript JSONL contributes (parent, "convos")
  - Both can appear together; the hook spawns one ingest per target

Same change applied to the shell save and precompact hooks. Precompact
also gained transcript_path parsing so it can run the convos mine
synchronously before context is compressed. hooks/README.md updated to
describe MEMPAL_DIR as a project-files target, never a convos target.
xcarbo added a commit to xcarbo/mempalace that referenced this pull request May 4, 2026
Catches up on a month of upstream work. Highlights pulled in:

- MemPalace#1306 searcher: hybrid candidate union (vector ∪ BM25 reranking pool)
- MemPalace#1325 mcp security: omit absolute paths from tool_get_drawer / tool_status
- MemPalace#1322 chroma: wire quarantine_stale_hnsw to prevent SIGSEGV on stale HNSW
- MemPalace#1320 mcp: forward valid_to / source params in kg_add / kg_invalidate
- MemPalace#1321 cli: honor --palace flag in cmd_init
- MemPalace#1314 kg temporal params fix
- MemPalace#1244 cli: cmd_compress writes to mempalace_closets so palace can read
- MemPalace#1243 mcp: case-insensitive agent name in diary_write/diary_read
- MemPalace#1303 mcp_server: pass embedding_function= on collection reopen
- MemPalace#1076/MemPalace#1077 hooks: quote CLAUDE_PLUGIN_ROOT / CODEX_PLUGIN_ROOT in hooks.json
- Various ruff format passes on touched files

Conflict resolution (CHANGELOG.md only — code files all auto-merged):

- 3.3.5 unreleased section header from upstream kept above 3.3.4
- 3.3.4 section: kept our 2026-04-30 release date; merged upstream's new
  MemPalace#1299 SIGSEGV-on-default-EF entry in alongside our existing topic-tunnels
  (MemPalace#1194/MemPalace#1195/MemPalace#1197), HNSW-bloat (MemPalace#1191), max_seq_id (MemPalace#1135), and
  auto-ingest (MemPalace#1230/MemPalace#1231) entries. Kept our richer topic-tunnels detail
  (upstream's version was a strict subset).

xdev patches preserved (still on this branch, untouched by merge):
- 6ef44cb fix(hooks): route CC transcripts via convo_miner with cwd-based wings
- 3fad61d fix(config): allow leading dash in wing names

Not pushed to origin — run tests locally and decide when to push.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants