Skip to content

fix(hooks): always mine the active transcript as convos, additive to MEMPAL_DIR#1231

Merged
igorls merged 5 commits intodevelopfrom
fix/hooks-convos-additive-mining
Apr 27, 2026
Merged

fix(hooks): always mine the active transcript as convos, additive to MEMPAL_DIR#1231
igorls merged 5 commits intodevelopfrom
fix/hooks-convos-additive-mining

Conversation

@igorls
Copy link
Copy Markdown
Member

@igorls igorls commented Apr 27, 2026

Summary

Follow-up to #1230. That PR fixed --mode convos for one specific case: a Python hook user with MEMPAL_DIR unset. Two configurations were still silently broken:

  • MEMPAL_DIR set to a project directory — convos never mined; the hook ran the projects miner on MEMPAL_DIR and ignored the transcript path entirely.
  • MEMPAL_DIR set to a conversations directory (per the old hooks/README.md recommendation) — the projects miner ran on JSONL, ingesting it as if it were source code.

Both shell hooks (hooks/mempal_save_hook.sh, hooks/mempal_precompact_hook.sh) had the same MEMPAL_DIR-overrides-TRANSCRIPT_PATH bug and were missing --mode on every spawned mempalace mine. The PreCompact shell hook didn't even read transcript_path from stdin.

This PR closes all of those.

Approach

Make the auto-ingest additive, not "MEMPAL_DIR overrides transcript":

  • _get_mine_dir is renamed to _get_mine_targets and now returns list[tuple[str, str]].
  • MEMPAL_DIR (when valid) contributes (dir, "projects").
  • A valid transcript JSONL contributes (parent_dir, "convos").
  • Both can appear together — the hook then spawns one ingest per target.

The shell save and precompact hooks get the same dual-target structure (each writes one independent mempalace mine ... --mode <mode> call). The precompact shell hook also gained transcript_path parsing so it can run the convos mine synchronously before context is compressed.

hooks/README.md updated to describe MEMPAL_DIR accurately: it is a project files target, never a convos target. The active conversation transcript is mined automatically regardless of MEMPAL_DIR.

Why this exists

Milla flagged that users had been told their conversations were stored verbatim, but with the original MEMPAL_DIR-overrides-transcript semantics + the missing --mode flag, that was only true for one of four common configurations. After this PR, the auto-mine path produces verbatim convo drawers for every supported hook configuration.

Stack

This branch was cut from fix/hooks-convos-mode-1232 (#1230). The two are textually adjacent but logically independent — either can land first; if #1230 lands first, this rebases cleanly.

Test plan

  • _get_mine_targets covers all four cases (none / env-only / transcript-only / both)
  • New test_maybe_auto_ingest_with_both_set asserts two Popen spawns with the right --mode per spawn
  • New test_mine_sync_with_both_set asserts the same for the precompact-sync path
  • test_hook_mines_transcript_path updated to assert dirname "$TRANSCRIPT_PATH" + --mode convos are present in the shell hook (the previous assertion required MINE_DIR, which no longer exists)
  • Full suite green: 1416 passed, 1 skipped
  • ruff check + ruff format --check clean

igorls and others added 3 commits April 26, 2026 23:25
The Stop and PreCompact hooks spawn `mempalace mine <dir>` with no
`--mode` flag, which defaults to `projects` in cli.py. When MEMPAL_DIR
is unset, _get_mine_dir falls back to the parent of the transcript
JSONL — and miner.py's READABLE_EXTENSIONS includes `.jsonl`, so the
projects miner happily ingests Claude Code session JSONL as if it were
source code instead of conversation.

Make _get_mine_dir return (dir, mode): MEMPAL_DIR keeps `projects`,
the JSONL fallback yields `convos`. Both _maybe_auto_ingest and
_mine_sync now thread the mode into the spawned command.
- Normalize MEMPAL_DIR via Path.expanduser().resolve() so ~/proj paths
  are correctly accepted instead of falling through to transcript fallback
- Replace bare Path.expanduser().is_file() transcript check with the
  existing _validate_transcript_path() which adds .resolve(), enforces
  .jsonl/.json extension, and rejects '..' path-traversal components
- Update tests to compare resolved paths (cross-platform correctness)
- Add tests for tilde expansion, path-traversal rejection, and
  non-jsonl extension rejection in _get_mine_dir

Agent-Logs-Url: https://github.com/MemPalace/mempalace/sessions/f69176c7-d752-40ef-ba71-d0e4adc3a689

Co-authored-by: igorls <[email protected]>
…MEMPAL_DIR

#1230 fixed --mode convos for the case where MEMPAL_DIR was unset, but
left two configurations broken:

  - MEMPAL_DIR set to a project dir: convos never mined (MEMPAL_DIR
    overrode the transcript path); only project files were ingested.
  - MEMPAL_DIR set to a conversations dir per the old hooks/README: the
    projects miner ran on JSONL — same wrong-miner behaviour.

The shell hooks (mempal_save_hook.sh, mempal_precompact_hook.sh) had
the same MEMPAL_DIR-overrides-transcript bug AND were missing --mode
on every spawned `mempalace mine` call.

Make the auto-ingest *additive*. _get_mine_dir → _get_mine_targets,
returning a list of (dir, mode) pairs:

  - MEMPAL_DIR (when valid) contributes (dir, "projects")
  - A valid transcript JSONL contributes (parent, "convos")
  - Both can appear together; the hook spawns one ingest per target

Same change applied to the shell save and precompact hooks. Precompact
also gained transcript_path parsing so it can run the convos mine
synchronously before context is compressed. hooks/README.md updated to
describe MEMPAL_DIR as a project-files target, never a convos target.
Copilot AI review requested due to automatic review settings April 27, 2026 03:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes MemPalace Claude Code/Codex hooks so the active transcript is always mined as conversations (--mode convos) while MEMPAL_DIR (when set) becomes an additional project-files mining target (--mode projects), rather than overriding transcript ingestion.

Changes:

  • Refactors Python hook auto-ingest targeting from a single directory to a list of (dir, mode) targets and spawns one mine per target.
  • Updates shell save/precompact hooks to mine transcript parent dir with --mode convos and mine MEMPAL_DIR with --mode projects.
  • Updates tests and hook documentation to reflect the additive targeting semantics.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
mempalace/hooks_cli.py Switches auto-ingest to multi-target (dir, mode) and spawns mines per target.
hooks/mempal_save_hook.sh Mines transcript dir as convos and MEMPAL_DIR as projects (both optional/additive).
hooks/mempal_precompact_hook.sh Adds transcript_path parsing and mines transcript synchronously as convos before compaction.
hooks/README.md Clarifies MEMPAL_DIR is for project files only; transcript mining is automatic.
tests/test_hooks_cli.py Adds/updates tests for multi-target spawning and mode correctness.
tests/test_save_hook_mines.py Updates assertions to require transcript-driven --mode convos mining in the save hook.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/hooks_cli.py
Comment on lines +213 to +220
if mempal_dir:
resolved = Path(mempal_dir).expanduser().resolve()
if resolved.is_dir():
targets.append((str(resolved), "projects"))
path = _validate_transcript_path(transcript_path)
if path is not None and path.is_file():
targets.append((str(path.parent), "convos"))
return targets
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_mine_targets() adds a convos target for the transcript directory, but hook_stop()/hook_precompact() already call _ingest_transcript(transcript_path), which spawns a convos mine as well (with a different --wing). This can result in duplicate conversation ingestion (and potentially into different wings), plus extra load on every save/precompact. Consider consolidating to a single transcript-ingest path (either remove the transcript-derived target here and leave transcript mining to _ingest_transcript, or remove/adjust _ingest_transcript and ensure the spawned mine uses the intended --wing consistently).

Copilot uses AI. Check for mistakes.
Comment thread mempalace/hooks_cli.py Outdated
Comment on lines +277 to +289
def _maybe_auto_ingest(transcript_path: str = ""):
"""Run mempalace mine in background if a mine directory is available."""
mine_dir = _get_mine_dir(transcript_path)
if not mine_dir:
"""Run mempalace mine in background for every available target."""
targets = _get_mine_targets(transcript_path)
if not targets:
return
if _mine_already_running():
_log("Skipping auto-ingest: mine already running")
return
try:
_spawn_mine([sys.executable, "-m", "mempalace", "mine", mine_dir])
except OSError:
pass
for mine_dir, mode in targets:
try:
_spawn_mine([sys.executable, "-m", "mempalace", "mine", mine_dir, "--mode", mode])
except OSError:
pass
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_maybe_auto_ingest() now spawns multiple background mempalace mine processes when both targets are present, but _spawn_mine() writes a single PID to _MINE_PID_FILE each time. The second spawn overwrites the first PID, so _mine_already_running() can return false while the first mine is still running, allowing overlapping mines on later hook fires. Consider tracking multiple PIDs (e.g., one PID file per target/mode) or switching to a filesystem lock (flock/portalocker) rather than a single-PID sentinel when spawning more than one process.

Copilot uses AI. Check for mistakes.
Comment thread mempalace/hooks_cli.py
Comment on lines +285 to +288
for mine_dir, mode in targets:
try:
_spawn_mine([sys.executable, "-m", "mempalace", "mine", mine_dir, "--mode", mode])
except OSError:
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subprocesses here are spawned via sys.executable, but the module-level _mempalace_python() docstring explicitly calls out that sys.executable may be a system Python without chromadb when invoked by hooks. To avoid background/sync mines failing silently in those environments, consider using _mempalace_python() for the interpreter in these spawned mempalace mine commands.

Copilot uses AI. Check for mistakes.
Comment thread mempalace/hooks_cli.py Outdated
Comment on lines +292 to +316
def _mine_sync(transcript_path: str = ""):
"""Run mempalace mine synchronously (for precompact -- data must land first)."""
mine_dir = _get_mine_dir(transcript_path)
if not mine_dir:
"""Run mempalace mine synchronously for every target (precompact)."""
targets = _get_mine_targets(transcript_path)
if not targets:
return
try:
STATE_DIR.mkdir(parents=True, exist_ok=True)
log_path = STATE_DIR / "hook.log"
with open(log_path, "a") as log_f:
subprocess.run(
[sys.executable, "-m", "mempalace", "mine", mine_dir],
stdout=log_f,
stderr=log_f,
timeout=60,
)
except (OSError, subprocess.TimeoutExpired):
pass
STATE_DIR.mkdir(parents=True, exist_ok=True)
log_path = STATE_DIR / "hook.log"
for mine_dir, mode in targets:
try:
with open(log_path, "a") as log_f:
subprocess.run(
[
sys.executable,
"-m",
"mempalace",
"mine",
mine_dir,
"--mode",
mode,
],
stdout=log_f,
stderr=log_f,
timeout=60,
)
except (OSError, subprocess.TimeoutExpired):
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_mine_sync() now potentially runs two separate mempalace mine commands sequentially (projects + convos), each with timeout=60. In the “both set” case this can push total hook runtime well past common harness hook timeouts (e.g., 30s), and if it times out mid-way the more important transcript ingest might not run. Consider prioritizing the transcript/convos target first (or running the projects ingest async / with a smaller timeout) so precompact reliably captures the active transcript before compaction proceeds.

Copilot uses AI. Check for mistakes.
Comment thread tests/test_hooks_cli.py Outdated
def test_get_mine_targets_transcript_path_traversal_rejected(tmp_path):
"""Transcript paths with '..' components are rejected (no convos target)."""
with patch.dict("os.environ", {}, clear=True):
targets = _get_mine_targets("../../etc/passwd")
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test claims to verify '..' path traversal rejection, but the provided input ("../../etc/passwd") lacks a .json/.jsonl suffix, so it is rejected by the extension check instead and doesn't actually exercise the traversal guard. To test the traversal behavior specifically, use a .jsonl/.json path that includes '..' components (e.g. "../t.jsonl") and assert that it yields no convos target.

Suggested change
targets = _get_mine_targets("../../etc/passwd")
targets = _get_mine_targets("../t.jsonl")

Copilot uses AI. Check for mistakes.
Comment thread hooks/mempal_precompact_hook.sh Outdated
Comment on lines +69 to +78
# interpolating into shell — same contract as mempal_save_hook.sh.
eval $(echo "$INPUT" | "$MEMPAL_PYTHON_BIN" -c "
import sys, json, re
data = json.load(sys.stdin)
sid = data.get('session_id', 'unknown')
tp = data.get('transcript_path', '')
safe = lambda s: re.sub(r'[^a-zA-Z0-9_/.\-~]', '', str(s))
print(f'SESSION_ID=\"{safe(sid)}\"')
print(f'TRANSCRIPT_PATH=\"{safe(tp)}\"')
" 2>/dev/null)
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using eval $(...) here increases the blast radius of any parsing/sanitization mistake (it executes code, not just data assignment). Since you only need two scalar values, prefer a non-eval approach (e.g., have Python print NUL/line-delimited values and read them into variables) so the shell never executes generated code.

Suggested change
# interpolating into shell — same contract as mempal_save_hook.sh.
eval $(echo "$INPUT" | "$MEMPAL_PYTHON_BIN" -c "
import sys, json, re
data = json.load(sys.stdin)
sid = data.get('session_id', 'unknown')
tp = data.get('transcript_path', '')
safe = lambda s: re.sub(r'[^a-zA-Z0-9_/.\-~]', '', str(s))
print(f'SESSION_ID=\"{safe(sid)}\"')
print(f'TRANSCRIPT_PATH=\"{safe(tp)}\"')
" 2>/dev/null)
# assigning in shell — same contract as mempal_save_hook.sh.
mapfile -t _mempal_parsed < <(echo "$INPUT" | "$MEMPAL_PYTHON_BIN" -c "
import sys, json, re
data = json.load(sys.stdin)
sid = data.get('session_id', 'unknown')
tp = data.get('transcript_path', '')
safe = lambda s: re.sub(r'[^a-zA-Z0-9_/.\-~]', '', str(s))
print(safe(sid))
print(safe(tp))
" 2>/dev/null)
SESSION_ID="${_mempal_parsed[0]:-unknown}"
TRANSCRIPT_PATH="${_mempal_parsed[1]:-}"

Copilot uses AI. Check for mistakes.
Comment thread hooks/mempal_precompact_hook.sh Outdated
Comment on lines +85 to +89
# Run ingest synchronously so memories land before compaction. Two
# independent targets — both run if both are set:
# 1. TRANSCRIPT_PATH (from Claude Code) → parent dir, --mode convos
# 2. MEMPAL_DIR → --mode projects
if [ -n "$TRANSCRIPT_PATH" ] && [ -f "$TRANSCRIPT_PATH" ]; then
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before mining dirname "$TRANSCRIPT_PATH", consider validating that TRANSCRIPT_PATH is actually a transcript file (e.g., require .jsonl/.json) and reject traversal segments like ... The Python hook’s _validate_transcript_path enforces these constraints, but the shell hook currently mines any existing file path the harness provides.

Suggested change
# Run ingest synchronously so memories land before compaction. Two
# independent targets — both run if both are set:
# 1. TRANSCRIPT_PATH (from Claude Code) → parent dir, --mode convos
# 2. MEMPAL_DIR → --mode projects
if [ -n "$TRANSCRIPT_PATH" ] && [ -f "$TRANSCRIPT_PATH" ]; then
is_valid_transcript_path() {
local path="$1"
[ -n "$path" ] || return 1
case "$path" in
*.json|*.jsonl) ;;
*) return 1 ;;
esac
case "/$path/" in
*/../*|*/./*) return 1 ;;
esac
return 0
}
# Run ingest synchronously so memories land before compaction. Two
# independent targets — both run if both are set:
# 1. TRANSCRIPT_PATH (from Claude Code) → parent dir, --mode convos
# 2. MEMPAL_DIR → --mode projects
if is_valid_transcript_path "$TRANSCRIPT_PATH" && [ -f "$TRANSCRIPT_PATH" ]; then

Copilot uses AI. Check for mistakes.
Comment thread hooks/mempal_save_hook.sh
Comment on lines +169 to +170
mempalace mine "$(dirname "$TRANSCRIPT_PATH")" --mode convos \
>> "$STATE_DIR/hook.log" 2>&1 &
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transcript-derived mine target only checks -f "$TRANSCRIPT_PATH" before running, so any existing file path (including ones with .. segments) would be accepted and its parent directory mined. Consider adding lightweight validation (e.g., require .jsonl/.json and reject .. path components) before invoking mempalace mine "$(dirname ...)" --mode convos, to align with the Python hook’s transcript_path validation.

Suggested change
mempalace mine "$(dirname "$TRANSCRIPT_PATH")" --mode convos \
>> "$STATE_DIR/hook.log" 2>&1 &
case "$TRANSCRIPT_PATH" in
*.json|*.jsonl)
case "$TRANSCRIPT_PATH" in
..|../*|*/../*|*/..)
echo "[$(date '+%H:%M:%S')] Skipping invalid transcript path: $TRANSCRIPT_PATH" >> "$STATE_DIR/hook.log"
;;
*)
mempalace mine "$(dirname "$TRANSCRIPT_PATH")" --mode convos \
>> "$STATE_DIR/hook.log" 2>&1 &
;;
esac
;;
*)
echo "[$(date '+%H:%M:%S')] Skipping non-transcript path: $TRANSCRIPT_PATH" >> "$STATE_DIR/hook.log"
;;
esac

Copilot uses AI. Check for mistakes.
…review)

Address Copilot review on #1231:

1. Stop double-mining the transcript on the Python side. ``_get_mine_targets``
   now returns only the ``MEMPAL_DIR`` projects target — the convos target
   for the transcript dir is dropped because ``_ingest_transcript`` already
   handles it on every hook fire. The duplicate spawn was using
   ``sys.executable`` (vs ``_mempalace_python()``) and a different ``--wing``,
   so each Stop/PreCompact event was writing the same transcript into two
   wings under asymmetric interpreters and overwriting the single
   ``_MINE_PID_FILE`` lock.

2. ``_maybe_auto_ingest`` and ``_mine_sync`` now spawn via
   ``_mempalace_python()`` so the resolved interpreter matches the venv
   that owns mempalace (matters under GUI-launched harnesses where
   ``sys.executable`` may resolve to a system Python without chromadb).

3. Replace ``eval $(...)`` in both shell hooks with a ``mapfile``-based
   reader. Sanitized values are still emitted by the same Python parser,
   but the shell now does plain variable assignment instead of executing
   the parser's stdout — smaller blast radius if the sanitizer is ever
   bypassed.

4. Mirror ``_validate_transcript_path`` in the shell hooks via a
   ``is_valid_transcript_path`` helper — extension + traversal-segment
   rejection, parity with the Python validator. The convos mine in each
   shell hook is now gated on the validator instead of bare ``-f``.

5. Tighten the ``..`` traversal test that previously exercised the
   suffix gate by mistake (``../../etc/passwd`` lacks ``.json[l]``).
   Use ``.jsonl`` paths with traversal segments to actually hit the
   ``..`` rejection branch.

6. README: add a one-liner pointing at ``mempalace sweep`` for users
   who want per-message recall on top of the file-level chunks the
   hooks produce. The sweeper was undiscoverable previously.

Tests: 1418 passed, 1 skipped (full suite minus benchmarks).
@igorls
Copy link
Copy Markdown
Member Author

igorls commented Apr 27, 2026

Pushed fe56797 addressing the Copilot review:

The shell-side additive structure (separate MEMPAL_DIR and transcript mines) is preserved — the shell hooks have no _ingest_transcript equivalent, so the additive shape is correct there.

Tests: 1418 passed, 1 skipped (full suite minus benchmarks); ruff check + format clean.

`bash` on the Windows CI runner resolves to `wsl.exe` which fails with
"Windows Subsystem for Linux has no installed distributions." The shell
hooks themselves are POSIX-only — Windows users use the Python entry
point — so the bash-subprocess exercise is non-applicable on win32.

The static-grep validator tests still run on every platform, so the
shell-side validation is still asserted under Windows; only the live
bash invocation is skipped.
@igorls igorls merged commit bc5d3fa into develop Apr 27, 2026
6 checks passed
igorls added a commit that referenced this pull request Apr 27, 2026
Bumps every version source from 3.3.3 to 3.3.4:
- pyproject.toml
- mempalace/version.py (canonical)
- .claude-plugin/plugin.json
- .claude-plugin/marketplace.json
- .codex-plugin/plugin.json
- README.md badge

Dates the CHANGELOG section and adds entries for the bug fixes that
landed this cycle (#1135, #1191, #1230, #1231) plus expands the #1194
entry to credit the lookup-side recovery path from #1197.

Pre-tag verification:
- 1441 passed, 1 skipped (full suite minus benchmarks, all platforms)
- ruff check + format clean
- 44/44 in test_version_consistency + test_readme_claims (6-file sync)
- JPH invariant: pyproject.toml + .claude-plugin/plugin.json both
  reference mempalace-mcp
- Wheel build + fresh-venv install: mempalace --version reports 3.3.4,
  mempalace-mcp --help works (catches the v3.3.2-class regression)
@igorls igorls mentioned this pull request Apr 27, 2026
gnusam pushed a commit to gnusam/mempalace-pgsql that referenced this pull request Apr 27, 2026
Previously the save and precompact shell hooks only auto-ingested when
MEMPAL_DIR was set, and even then ran the project-files miner against
the JSONL transcript. Result: the most common configuration in the wild
(MEMPAL_DIR set to a project dir) silently skipped the conversation
mine — the hook would block, the AI would call mempalace_diary_write,
but the raw transcript chunks never reached the palace.

Make the auto-ingest additive instead:
  - TRANSCRIPT_PATH (always present from Claude Code) → parent dir,
    --mode convos. Always runs when the path passes validation.
  - MEMPAL_DIR (user-configured) → --mode projects. Independent target,
    runs in addition to (not in place of) the transcript mine.

Adds an `is_valid_transcript_path` helper to both hooks (extension
allowlist + traversal-segment rejection) so junk paths bail before
reaching dirname/the miner — defence-in-depth over the existing
``[ -f ... ]`` check. Also tightens precompact_hook to sanitize
SESSION_ID and read transcript_path the same way the save hook does.

Adapted from upstream MemPalace eb4de04 (PR MemPalace#1231 by @igorls) +
fe56797 (PR MemPalace#1231 review). The Python ``hooks_cli.py`` portion of
those commits is N/A here — this fork ships hooks via Docker Compose,
not the marketplace flow, and the eval-removal in the shell parser
also doesn't apply because the fork already uses three separate
``python3 -c`` calls instead of a single ``eval $(...)``. The
behaviour change (always-mine-transcript) and the validator are the
load-bearing portable parts.

Co-Authored-By: Igor Lins e Silva <[email protected]>
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 27, 2026
22 commits from upstream/develop, including:

- HNSW capacity divergence detection + BM25-only sqlite fallback when
  vector layer is unloadable (MemPalace#1222 / MemPalace#1227 — vector_disabled kwarg
  routes search through chromadb-bypass path on segfault risk).
- HNSW index bloat prevention via batch_size + sync_threshold metadata
  on collection create (MemPalace#1191).
- Hooks: always mine the active transcript as convos, additive to
  MEMPAL_DIR (MemPalace#1230 / MemPalace#1231). Restructured into `_get_mine_targets()`
  list approach + separate `_ingest_transcript()`. Daemon-strict gate
  preserved on each entry point.
- Wing-name normalization for hyphenated dirs across miner / convo_miner
  / palace_graph (MemPalace#1194 / MemPalace#1195 / MemPalace#1197).
- `narrow _fix_blob_seq_ids` shim + `repair --mode max-seq-id` for
  legacy 0.6.x BLOB-poisoned palaces (MemPalace#1135).

Conflict resolution:
- README.md: kept fork-shaped narrative, dropped upstream's sweep tip
  injection (per fork-readme handling memory).
- hooks/README.md: adopted upstream's accurate `MEMPAL_DIR` additive
  description; kept `MEMPAL_PYTHON` env-var name (matches actual
  `hooks/*.sh` scripts in both forks).
- mempalace/cli.py: consolidated duplicate `--mode` argparse declarations
  into one with all 4 choices (rebuild/legacy/reorganize/max-seq-id).
- mempalace/hooks_cli.py: adopted upstream's `_get_mine_targets()` +
  `_ingest_transcript()` shape; added `_daemon_strict()` guard at entry
  of `_maybe_auto_ingest`, `_mine_sync`, and `_ingest_transcript` so
  the daemon-strict architecture still skips local writes.
- mempalace/mcp_server.py: kept both `kind=` and `vector_disabled=`
  kwargs on the `search_memories` call.
- mempalace/searcher.py: kept fork's `_count_in_scope`,
  `_sqlite_fallback_and_scope`, `_apply_kind_text_filter` AND upstream's
  `_bm25_only_via_sqlite`. Both `kind` and `vector_disabled` parameters
  on `search_memories`.

Tests: 1510 passing (up from 1366 — upstream brought new test suites).
lealvona pushed a commit to lealvona/mempalace that referenced this pull request Apr 29, 2026
…alace#1231 review)

Address Copilot review on MemPalace#1231:

1. Stop double-mining the transcript on the Python side. ``_get_mine_targets``
   now returns only the ``MEMPAL_DIR`` projects target — the convos target
   for the transcript dir is dropped because ``_ingest_transcript`` already
   handles it on every hook fire. The duplicate spawn was using
   ``sys.executable`` (vs ``_mempalace_python()``) and a different ``--wing``,
   so each Stop/PreCompact event was writing the same transcript into two
   wings under asymmetric interpreters and overwriting the single
   ``_MINE_PID_FILE`` lock.

2. ``_maybe_auto_ingest`` and ``_mine_sync`` now spawn via
   ``_mempalace_python()`` so the resolved interpreter matches the venv
   that owns mempalace (matters under GUI-launched harnesses where
   ``sys.executable`` may resolve to a system Python without chromadb).

3. Replace ``eval $(...)`` in both shell hooks with a ``mapfile``-based
   reader. Sanitized values are still emitted by the same Python parser,
   but the shell now does plain variable assignment instead of executing
   the parser's stdout — smaller blast radius if the sanitizer is ever
   bypassed.

4. Mirror ``_validate_transcript_path`` in the shell hooks via a
   ``is_valid_transcript_path`` helper — extension + traversal-segment
   rejection, parity with the Python validator. The convos mine in each
   shell hook is now gated on the validator instead of bare ``-f``.

5. Tighten the ``..`` traversal test that previously exercised the
   suffix gate by mistake (``../../etc/passwd`` lacks ``.json[l]``).
   Use ``.jsonl`` paths with traversal segments to actually hit the
   ``..`` rejection branch.

6. README: add a one-liner pointing at ``mempalace sweep`` for users
   who want per-message recall on top of the file-level chunks the
   hooks produce. The sweeper was undiscoverable previously.

Tests: 1418 passed, 1 skipped (full suite minus benchmarks).
xcarbo added a commit to xcarbo/mempalace that referenced this pull request May 1, 2026
Catches up xdev-patches with 112 commits from MemPalace/develop, including:
- v3.3.4 release
- MemPalace#1262/MemPalace#1289 ChromaDB collection-reopen crash fix (relevant to long-running
  MCP server & mempalace-api)
- MemPalace#1287 HNSW divergence floor fix
- MemPalace#1288 BLOB seq_id decode in repair
- MemPalace#1180 cross-wing tunnels by shared topics
- MemPalace#1194 wing-slug normalization for hyphenated dirs

Conflict resolution: hooks_cli.py and mcp_server.py both had local patches
(6ef44cb route CC transcripts via convo_miner; 3fad61d allow leading dash)
that overlap with upstream fixes (MemPalace#1231, MemPalace#1194). Took upstream entirely on
those two files — upstream's version handles separate transcript/project
ingest, uses _mempalace_python(), and adds _pin_hnsw_threads. The local
config.py regex relaxation auto-merged cleanly and is preserved.

Safety tag: pre-upstream-merge-20260501-091227 (rollback target).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
xcarbo added a commit to xcarbo/mempalace that referenced this pull request May 4, 2026
Catches up on a month of upstream work. Highlights pulled in:

- MemPalace#1306 searcher: hybrid candidate union (vector ∪ BM25 reranking pool)
- MemPalace#1325 mcp security: omit absolute paths from tool_get_drawer / tool_status
- MemPalace#1322 chroma: wire quarantine_stale_hnsw to prevent SIGSEGV on stale HNSW
- MemPalace#1320 mcp: forward valid_to / source params in kg_add / kg_invalidate
- MemPalace#1321 cli: honor --palace flag in cmd_init
- MemPalace#1314 kg temporal params fix
- MemPalace#1244 cli: cmd_compress writes to mempalace_closets so palace can read
- MemPalace#1243 mcp: case-insensitive agent name in diary_write/diary_read
- MemPalace#1303 mcp_server: pass embedding_function= on collection reopen
- MemPalace#1076/MemPalace#1077 hooks: quote CLAUDE_PLUGIN_ROOT / CODEX_PLUGIN_ROOT in hooks.json
- Various ruff format passes on touched files

Conflict resolution (CHANGELOG.md only — code files all auto-merged):

- 3.3.5 unreleased section header from upstream kept above 3.3.4
- 3.3.4 section: kept our 2026-04-30 release date; merged upstream's new
  MemPalace#1299 SIGSEGV-on-default-EF entry in alongside our existing topic-tunnels
  (MemPalace#1194/MemPalace#1195/MemPalace#1197), HNSW-bloat (MemPalace#1191), max_seq_id (MemPalace#1135), and
  auto-ingest (MemPalace#1230/MemPalace#1231) entries. Kept our richer topic-tunnels detail
  (upstream's version was a strict subset).

xdev patches preserved (still on this branch, untouched by merge):
- 6ef44cb fix(hooks): route CC transcripts via convo_miner with cwd-based wings
- 3fad61d fix(config): allow leading dash in wing names

Not pushed to origin — run tests locally and decide when to push.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants