Skip to content

fix(hooks): drop checkpoint write path (verbatim-only, part 1)#3

Closed
jphein wants to merge 3 commits intofix/restore-transcript-ingest-via-daemonfrom
fix/drop-checkpoint-write-path
Closed

fix(hooks): drop checkpoint write path (verbatim-only, part 1)#3
jphein wants to merge 3 commits intofix/restore-transcript-ingest-via-daemonfrom
fix/drop-checkpoint-write-path

Conversation

@jphein
Copy link
Copy Markdown
Owner

@jphein jphein commented May 6, 2026

Summary

  • Removes the 1KB checkpoint-summary diary write from both Stop and PreCompact hooks. Verbatim transcript chunks (mined via the existing _ingest_transcript path) cover everything a summary would surface.
  • Deletes the now-unused _save_diary_direct (~120 LOC), _extract_themes, and _THEME_STOPWORDS helpers — no remaining callers in source or tests.
  • Updates 4 hook tests to mock _ingest_transcript rather than _save_diary_direct, and to expect the new systemMessage shape (✦ Transcript ingest triggered (wing=...)).
  • Ships the architecture spec at docs/superpowers/specs/2026-05-05-verbatim-only-design.md.

Why

The Stop hook used to do two things on each fire:

  • (a) write a 1KB checkpoint summary diary entry into the dedicated mempalace_session_recovery collection
  • (b) auto-mine the verbatim transcript into mempalace_drawers

(a) is redundant once (b) is searchable. Worse, the recovery collection had no semantic-search MCP surface — only filter-based read via mempalace_session_recovery_read(session_id, agent, since, until, wing). So checkpoints in it were structurally invisible to mempalace_search. Net effect from a user's seat: agents (and JP) couldn't find recent session content via search even though everything was on disk.

This PR drops (a). Verbatim transcripts in mempalace_drawers carry every word a checkpoint summary would have surfaced — searching is the recovery query.

Stacked on #2

Base branch is fix/restore-transcript-ingest-via-daemon (the Phase 1 PR that restores transcript ingest). When that merges, this PR's base auto-updates to main. Reviewers see only the new commits in this PR.

Behavior change

hook_stop silent path:

  • Old: result = _save_diary_direct(...); advance save marker only if count > 0; systemMessage was ✦ N memories woven into the palace — themes.
  • New: _ingest_transcript(...) only; advance save marker unconditionally (fire-and-forget mining doesn't have a "confirmed save" signal); systemMessage is ✦ Transcript ingest triggered (wing=...). Failure detection moves to daemon-side observability (hook.log + systemd journal).

hook_precompact:

  • Old: write a recovery-marker diary entry, then ingest, then sync-mine.
  • New: ingest, then sync-mine. The recovery-marker write is removed.

Scope deliberately limited

This PR removes hook-side writes only. Read paths and existing data are untouched:

  • mempalace_session_recovery collection still exists with its 763 archived entries
  • mempalace_session_recovery_read MCP tool still works for those entries
  • mempalace.migrate module still exists
  • palace-daemon lifespan auto-migrate still runs

A follow-up PR (Phase 2b, see spec) disposes of the collection and its read paths once this change has run in production for a verification window. That PR is the destructive one and gates on JP's explicit ack of the open questions in the spec.

Test plan

  • 1558 passed, 1 skipped — full suite green on Linux 3.12 locally
  • ruff check clean
  • ruff format --check clean
  • No remaining references to _save_diary_direct, _extract_themes, or _THEME_STOPWORDS in source or tests (confirmed via grep)
  • No "internal coordination jargon" leakage (the meta-test at tests/test_corpus_origin_integration.py::test_no_internal_coordination_jargon_in_source_or_tests passes; original draft used "Phase 2" comments which it correctly flagged, swapped to feature-language)

Risk

  • Save marker advancement is now unconditional. Failure-mode change: previously a failed save left the marker in place and the next fire retried; now the marker advances regardless. If _ingest_transcript silently fails for many fires in a row, transcripts go un-mined without retry. Mitigation: the daemon's /mine endpoint logs every accept/reject, and hook.log records every _post_daemon_mine call (success or fail). Detection moves from "hook self-reports" to "daemon-side log inspection" — appropriate for the verbatim-only model where the daemon owns writes.
  • _save_diary_direct and _extract_themes are removed. Anyone who imported these from mempalace.hooks_cli externally would break. Searched repo — no callers remain. Not an upstream API.

🤖 Generated with Claude Code

jphein and others added 2 commits May 5, 2026 17:20
The Stop hook used to do two things on each fire:
  (a) write a 1-KB checkpoint summary diary entry into the
      mempalace_session_recovery collection
  (b) auto-mine the verbatim transcript into mempalace_drawers

The summary in (a) is redundant once (b) is searchable, AND the
recovery collection had no semantic-search MCP surface — so
checkpoints were structurally invisible to mempalace_search. Net
effect from a user's seat: agents (and JP) couldn't find recent
session content via search even though everything was on disk.

Drop the summary half. Verbatim transcript chunks already contain
every word a checkpoint summary would have surfaced. Leaving only
the transcript-ingest path matches the verbatim-only architecture
described in docs/superpowers/specs/2026-05-05-verbatim-only-design.md.

Changes:

* hook_stop silent path: remove _save_diary_direct call. Save marker
  advances unconditionally (fire-and-forget mining doesn't have a
  "confirmed save" signal to gate on; failure detection moves to
  daemon-side observability — hook.log + systemd journal).
* hook_precompact: remove the recovery-marker write. Mine + compaction
  proceed as before.
* systemMessage shape changes from "✦ N memories woven into the
  palace — themes" to "✦ Transcript ingest triggered (wing=...)".
  Lighter, accurate, no false count.
* Delete the now-unused _save_diary_direct function (~120 LOC) and
  its dependencies _extract_themes + _THEME_STOPWORDS (~30 LOC).
* Update 4 hook tests + 1 OSError test to mock _ingest_transcript
  rather than _save_diary_direct, and to expect the new
  systemMessage shape.

Scope deliberately limited: this PR removes hook-side WRITES.
mempalace_session_recovery collection still exists; the
mempalace_session_recovery_read MCP tool still works against the
existing 763 archived entries; mempalace.migrate is untouched.
A follow-up PR will dispose of the collection itself once this
change has run in production for a verification window.

Net diff: −230 / +53 LOC. 1558 tests pass, 1 skipped, lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Captures the architectural shift driving this PR family — drop
checkpoint summaries, expose only the verbatim transcripts that
mempalace_search can already reach. Documents:

* Why: the recovery collection had no semantic-search MCP surface,
  so checkpoints in it were structurally invisible to search
* What: hook write-path deletion (this PR) + collection disposal
  (follow-up PR)
* How: code-deletion checklist for both repos, migration plan with
  tarball safety net, test plan
* Open questions queued for review at merge time

Spec gates the destructive parts on the answers to those open
questions; the write-path removal in this PR is scoped narrow
enough to land without them.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copilot AI review requested due to automatic review settings May 6, 2026 00:25
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Removes the hook-side checkpoint-summary diary write path in favor of verbatim transcript ingestion as the sole “silent save” mechanism, updates hook tests accordingly, and adds a design spec documenting the verbatim-only direction.

Changes:

  • Deleted _save_diary_direct / _extract_themes helpers and switched hook_stop silent mode to trigger _ingest_transcript only (with a new systemMessage format).
  • Removed the PreCompact recovery-marker write; PreCompact now just ingests transcript (then sync-mines project data).
  • Updated hook tests to mock _ingest_transcript and assert the new systemMessage shape; added a verbatim-only architecture spec.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
mempalace/hooks_cli.py Drops checkpoint diary writes in Stop (silent) and PreCompact; emits new “Transcript ingest triggered (wing=…)” system message.
tests/test_hooks_cli.py Updates tests to mock _ingest_transcript instead of _save_diary_direct and to assert new messaging/wing surfacing.
docs/superpowers/specs/2026-05-05-verbatim-only-design.md Adds design document describing the verbatim-only architecture and planned follow-on deletions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/hooks_cli.py Outdated
Comment on lines +665 to +672
"""Precompact hook: mine the transcript synchronously, then allow compaction.

The recovery-marker write was removed (it used to land a
"where we were" diary entry in the now-deleted
mempalace_session_recovery collection). The verbatim transcript
chunks captured by _ingest_transcript already cover the recovery
use case — searching for any phrase from the last few messages
locates the session.
Copilot review on #3 caught that the docstring claimed
the recovery collection was "now-deleted," but this PR only removes
the write call site — collection retirement is the follow-up PR #5.
Reword to match actual scope: the recovery-marker write is removed
here; the collection still exists and gets disposed in #5.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@jphein
Copy link
Copy Markdown
Owner Author

jphein commented May 6, 2026

Copilot review addressed in commit 75069ef:

  • mempalace/hooks_cli.py:672 — "now-deleted" mismatch: Reworded the hook_precompact docstring to match this PR's actual scope. The recovery-marker write is removed here; the collection itself and mempalace_session_recovery_read are retired in the follow-up refactor(mcp): retire mempalace_session_recovery collection + read tool #5. Thanks for catching the inconsistency between docstring and PR description.

Stacked on #2 — no CI here because base isn't main, but full suite passes locally: 1558 tests pass, 1 skipped, ruff clean.

@jphein jphein deleted the branch fix/restore-transcript-ingest-via-daemon May 6, 2026 01:16
@jphein jphein closed this May 6, 2026
jphein added a commit that referenced this pull request May 6, 2026
Copilot review on #3 caught that the docstring claimed
the recovery collection was "now-deleted," but this PR only removes
the write call site — collection retirement is the follow-up PR #5.
Reword to match actual scope: the recovery-marker write is removed
here; the collection still exists and gets disposed in #5.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit that referenced this pull request May 6, 2026
* fix(hooks): drop checkpoint write path from Stop + PreCompact

The Stop hook used to do two things on each fire:
  (a) write a 1-KB checkpoint summary diary entry into the
      mempalace_session_recovery collection
  (b) auto-mine the verbatim transcript into mempalace_drawers

The summary in (a) is redundant once (b) is searchable, AND the
recovery collection had no semantic-search MCP surface — so
checkpoints were structurally invisible to mempalace_search. Net
effect from a user's seat: agents (and JP) couldn't find recent
session content via search even though everything was on disk.

Drop the summary half. Verbatim transcript chunks already contain
every word a checkpoint summary would have surfaced. Leaving only
the transcript-ingest path matches the verbatim-only architecture
described in docs/superpowers/specs/2026-05-05-verbatim-only-design.md.

Changes:

* hook_stop silent path: remove _save_diary_direct call. Save marker
  advances unconditionally (fire-and-forget mining doesn't have a
  "confirmed save" signal to gate on; failure detection moves to
  daemon-side observability — hook.log + systemd journal).
* hook_precompact: remove the recovery-marker write. Mine + compaction
  proceed as before.
* systemMessage shape changes from "✦ N memories woven into the
  palace — themes" to "✦ Transcript ingest triggered (wing=...)".
  Lighter, accurate, no false count.
* Delete the now-unused _save_diary_direct function (~120 LOC) and
  its dependencies _extract_themes + _THEME_STOPWORDS (~30 LOC).
* Update 4 hook tests + 1 OSError test to mock _ingest_transcript
  rather than _save_diary_direct, and to expect the new
  systemMessage shape.

Scope deliberately limited: this PR removes hook-side WRITES.
mempalace_session_recovery collection still exists; the
mempalace_session_recovery_read MCP tool still works against the
existing 763 archived entries; mempalace.migrate is untouched.
A follow-up PR will dispose of the collection itself once this
change has run in production for a verification window.

Net diff: −230 / +53 LOC. 1558 tests pass, 1 skipped, lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* docs(spec): verbatim-only architecture design

Captures the architectural shift driving this PR family — drop
checkpoint summaries, expose only the verbatim transcripts that
mempalace_search can already reach. Documents:

* Why: the recovery collection had no semantic-search MCP surface,
  so checkpoints in it were structurally invisible to search
* What: hook write-path deletion (this PR) + collection disposal
  (follow-up PR)
* How: code-deletion checklist for both repos, migration plan with
  tarball safety net, test plan
* Open questions queued for review at merge time

Spec gates the destructive parts on the answers to those open
questions; the write-path removal in this PR is scoped narrow
enough to land without them.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* docs(hooks): clarify hook_precompact docstring scope

Copilot review on #3 caught that the docstring claimed
the recovery collection was "now-deleted," but this PR only removes
the write call site — collection retirement is the follow-up PR #5.
Reword to match actual scope: the recovery-marker write is removed
here; the collection still exists and gets disposed in #5.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit that referenced this pull request May 6, 2026
Follow-up to fix/drop-checkpoint-write-path. With nothing writing to
the recovery collection anymore (the hooks moved to verbatim-only on
the parent branch), the read paths and the migration that fed the
collection are dead code. Delete them.

Removed in mempalace/:
- palace.py: _SESSION_RECOVERY_COLLECTION constant,
  get_session_recovery_collection(), _CHECKPOINT_TOPICS tuple
- mcp_server.py: _get_session_recovery_collection(),
  _recovery_collection_cache global, the topic-routing branch in
  tool_diary_write (now always lands in mempalace_drawers),
  tool_session_recovery_read() and its TOOLS dict registration
- migrate.py: migrate_checkpoints_to_recovery() and the dependent
  _CHECKPOINT_TOPICS import
- cli.py: cmd_repair --mode reorganize handler + the choice flag

Removed in tests/:
- tests/test_session_recovery.py — entire file (recovery-collection
  module test suite)
- tests/test_migrate.py: TestMigrateCheckpointsToRecovery class
- tests/test_mcp_server.py: TestCheckpointRouting and
  TestSessionRecoveryRead classes

Removed in docs/:
- website/reference/mcp-tools.md: mempalace_session_recovery_read
  section (caught by the readme-claims meta-test before deploy)

Production data on disks still has the mempalace_session_recovery
collection with its 763 archived entries — this PR's code change
makes the collection unreachable through any MCP/CLI path, but does
NOT delete the on-disk data. A separate one-shot script
(scripts/phase2_purge_recovery.py per the spec) handles the
collection-level delete after deploy. JP signed off on hard-delete
in the spec ack.

Net diff: 12 files, −1300 lines / +30 lines. 1540 tests pass, 1
skipped. Ruff lint + format clean.

Stacked on #3 (which stops new writes); will
auto-rebase against main when that merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit that referenced this pull request May 6, 2026
…ol (#8)

* refactor(mcp): retire mempalace_session_recovery collection + read tool

Follow-up to fix/drop-checkpoint-write-path. With nothing writing to
the recovery collection anymore (the hooks moved to verbatim-only on
the parent branch), the read paths and the migration that fed the
collection are dead code. Delete them.

Removed in mempalace/:
- palace.py: _SESSION_RECOVERY_COLLECTION constant,
  get_session_recovery_collection(), _CHECKPOINT_TOPICS tuple
- mcp_server.py: _get_session_recovery_collection(),
  _recovery_collection_cache global, the topic-routing branch in
  tool_diary_write (now always lands in mempalace_drawers),
  tool_session_recovery_read() and its TOOLS dict registration
- migrate.py: migrate_checkpoints_to_recovery() and the dependent
  _CHECKPOINT_TOPICS import
- cli.py: cmd_repair --mode reorganize handler + the choice flag

Removed in tests/:
- tests/test_session_recovery.py — entire file (recovery-collection
  module test suite)
- tests/test_migrate.py: TestMigrateCheckpointsToRecovery class
- tests/test_mcp_server.py: TestCheckpointRouting and
  TestSessionRecoveryRead classes

Removed in docs/:
- website/reference/mcp-tools.md: mempalace_session_recovery_read
  section (caught by the readme-claims meta-test before deploy)

Production data on disks still has the mempalace_session_recovery
collection with its 763 archived entries — this PR's code change
makes the collection unreachable through any MCP/CLI path, but does
NOT delete the on-disk data. A separate one-shot script
(scripts/phase2_purge_recovery.py per the spec) handles the
collection-level delete after deploy. JP signed off on hard-delete
in the spec ack.

Net diff: 12 files, −1300 lines / +30 lines. 1540 tests pass, 1
skipped. Ruff lint + format clean.

Stacked on #3 (which stops new writes); will
auto-rebase against main when that merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* docs(README): align with verbatim-only retirement

Copilot review on #5 caught that the README still
documented session_recovery_read as current, which would leave users
calling a tool that returns "Unknown tool" after this PR ships.

Updates the four user-facing sections that described the recovery
collection as a current shipping feature:

* "What just shipped" — adds 2026-05-05 update note explaining the
  split was retired, with the generalizable lesson preserved (a side
  collection without a semantic-search MCP read tool is invisible).
* "Verbatim vs derivative" principle — reframes the recovery
  collection as the failed first attempt rather than a current
  example; pattern stays valid but each future sibling needs its
  own read surface.
* "Structural retrieval fixes" bullet — describes verbatim-only
  end-state, links to the new spec, retains historical context for
  Apr 25 → May 5.
* "Deterministic hook saves" bullet — updates to the new ingest-only
  save path, removes "recovery-collection marker" mention.
* "What it looks like in production" code block — updates the
  systemMessage example to the new "Transcript ingest triggered
  (wing=...)" shape.
* "P8" architectural principle — keeps it but reframes as on-hold
  with a precondition (read-surface parity) drawn from the cycle.
* Fork-ahead tracking table — replaces the old multi-collection-split
  row with a verbatim-only retirement row pointing at the four PRs;
  also retires references to session_recovery_read in the drawer_id
  surfacing row; adds new row for the mining management CLI (#4).

42 README meta-tests pass (test_readme_claims.py).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants