Skip to content

fix(hooks): drop checkpoint write path (verbatim-only, part 1)#6

Merged
jphein merged 3 commits intomainfrom
fix/drop-checkpoint-write-path
May 6, 2026
Merged

fix(hooks): drop checkpoint write path (verbatim-only, part 1)#6
jphein merged 3 commits intomainfrom
fix/drop-checkpoint-write-path

Conversation

@jphein
Copy link
Copy Markdown
Owner

@jphein jphein commented May 6, 2026

Reopened after parent #2 squash-merge orphaned the original PR (#3 closed, branch retargeted from feature-branch base to main).

Closes the checkpoint write half of the verbatim-only architecture. Code-level changes unchanged from the original PR — drops _save_diary_direct from hook_stop and hook_precompact silent paths, removes _extract_themes/_THEME_STOPWORDS as dead code, ships the verbatim-only design spec.

Original PR: #3 (closed by parent merge)
Tests: 1558 pass, ruff clean.

🤖 Generated with Claude Code

jphein and others added 3 commits May 5, 2026 18:18
The Stop hook used to do two things on each fire:
  (a) write a 1-KB checkpoint summary diary entry into the
      mempalace_session_recovery collection
  (b) auto-mine the verbatim transcript into mempalace_drawers

The summary in (a) is redundant once (b) is searchable, AND the
recovery collection had no semantic-search MCP surface — so
checkpoints were structurally invisible to mempalace_search. Net
effect from a user's seat: agents (and JP) couldn't find recent
session content via search even though everything was on disk.

Drop the summary half. Verbatim transcript chunks already contain
every word a checkpoint summary would have surfaced. Leaving only
the transcript-ingest path matches the verbatim-only architecture
described in docs/superpowers/specs/2026-05-05-verbatim-only-design.md.

Changes:

* hook_stop silent path: remove _save_diary_direct call. Save marker
  advances unconditionally (fire-and-forget mining doesn't have a
  "confirmed save" signal to gate on; failure detection moves to
  daemon-side observability — hook.log + systemd journal).
* hook_precompact: remove the recovery-marker write. Mine + compaction
  proceed as before.
* systemMessage shape changes from "✦ N memories woven into the
  palace — themes" to "✦ Transcript ingest triggered (wing=...)".
  Lighter, accurate, no false count.
* Delete the now-unused _save_diary_direct function (~120 LOC) and
  its dependencies _extract_themes + _THEME_STOPWORDS (~30 LOC).
* Update 4 hook tests + 1 OSError test to mock _ingest_transcript
  rather than _save_diary_direct, and to expect the new
  systemMessage shape.

Scope deliberately limited: this PR removes hook-side WRITES.
mempalace_session_recovery collection still exists; the
mempalace_session_recovery_read MCP tool still works against the
existing 763 archived entries; mempalace.migrate is untouched.
A follow-up PR will dispose of the collection itself once this
change has run in production for a verification window.

Net diff: −230 / +53 LOC. 1558 tests pass, 1 skipped, lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Captures the architectural shift driving this PR family — drop
checkpoint summaries, expose only the verbatim transcripts that
mempalace_search can already reach. Documents:

* Why: the recovery collection had no semantic-search MCP surface,
  so checkpoints in it were structurally invisible to search
* What: hook write-path deletion (this PR) + collection disposal
  (follow-up PR)
* How: code-deletion checklist for both repos, migration plan with
  tarball safety net, test plan
* Open questions queued for review at merge time

Spec gates the destructive parts on the answers to those open
questions; the write-path removal in this PR is scoped narrow
enough to land without them.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copilot review on #3 caught that the docstring claimed
the recovery collection was "now-deleted," but this PR only removes
the write call site — collection retirement is the follow-up PR #5.
Reword to match actual scope: the recovery-marker write is removed
here; the collection still exists and gets disposed in #5.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copilot AI review requested due to automatic review settings May 6, 2026 01:18
@jphein jphein merged commit 69768fc into main May 6, 2026
7 checks passed
@jphein jphein deleted the fix/drop-checkpoint-write-path branch May 6, 2026 01:18
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves the hooks toward the fork’s verbatim-only recovery model by removing the direct checkpoint-summary write path from hook execution, updating hook tests to match the new stop-hook messaging/behavior, and adding the design spec for later cleanup of the recovery collection.

Changes:

  • Deletes _save_diary_direct, _extract_themes, and _THEME_STOPWORDS from hooks_cli.py.
  • Changes silent hook_stop to trigger transcript ingest + emit a wing-based system message, and removes the precompact recovery-marker write.
  • Updates hook tests for the new ingest-based behavior and adds a verbatim-only architecture spec.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
mempalace/hooks_cli.py Removes checkpoint-summary write helpers and rewires stop/precompact hook behavior around transcript ingest only.
tests/test_hooks_cli.py Updates hook expectations and mocks to reflect ingest-driven silent saves and new system message text.
docs/superpowers/specs/2026-05-05-verbatim-only-design.md Adds the design/spec document describing the verbatim-only architecture and later cleanup phases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/hooks_cli.py
Comment on lines +625 to +629
try:
last_save_file.write_text(str(exchange_count), encoding="utf-8")
except OSError:
pass
sys_msg = f"\u2726 Transcript ingest triggered (wing={project_wing})"
Comment thread mempalace/hooks_cli.py
Comment on lines +667 to +671
The recovery-marker write was removed in this PR (it used to land
a "where we were" diary entry in the dedicated
mempalace_session_recovery collection). The verbatim transcript
chunks captured by _ingest_transcript already cover the recovery
use case — searching for any phrase from the last few messages
Comment thread mempalace/hooks_cli.py
Comment on lines +665 to +666
"""Precompact hook: mine the transcript synchronously, then allow compaction.

Comment thread tests/test_hooks_cli.py
Comment on lines 264 to 266
def test_stop_hook_tracks_save_point(tmp_path):
"""Save marker advances on each fire so the next fire short-circuits at the same count."""
transcript = tmp_path / "t.jsonl"
jphein added a commit that referenced this pull request May 6, 2026
Four findings, batched into one cleanup PR.

PR #6 — hooks_cli.hook_precompact docstring lie. Said "mine the
transcript synchronously" but the local-mode _ingest_transcript
uses subprocess.Popen (background) and daemon-strict mode is
fire-and-forget HTTP. Rewrite docstring to describe the actual
two-step flow: (1) transcript ingest (best-effort, may still be
running when compaction starts), (2) _mine_sync of MEMPAL_DIR
(real synchronous subprocess.run).

PR #7 — cmd_mined unpaginated col.get + silent truncation. Old
flow called col.get(where=..., include=[]) once to size the loop,
then paginated with the same where clause. ChromaDB's get() has
a silent ~10K id truncation, so the upfront sizing call would
undercount on a wing >10K drawers. Drop the upfront sizing
entirely; loop on empty-batch instead. Plus an early-exit when
a batch is shorter than batch_size (saves one trailing empty
get on exact-multiple wing counts).

PR #8 — cmd_repair docstring stale. Said "rebuild palace vector
index" but the function still dispatches max-seq-id mode. Rewrite
to describe both modes + acknowledge the deleted reorganize mode.

PR #10 — non-Projects dashed-project divergence. Documented
limitation. Tried adding -dev-/-code-/etc. markers; reverted
because those are also ambiguous (`~/dev/<project>` and
`~/dev/<parent>/<project>` encode identically — a -dev-<rest>
match would catch parent+project, breaking the existing
test_wing_from_transcript_path_non_projects_layout test that
expects the last-token fallback for `-dev-MemPalace-mempalace`
→ `mempalace`). Instead document the limitation in the docstring
and add a regression test pinning the current behavior so future
"fixes" don't break the layouts that DO want the fallback.

Tests: +1 (regression test for the documented limitation). Full
suite: 1552 passed, 1 skipped, ruff clean.

Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants