Skip to content

feat(mem_add,mem_edit): canonical writer + mem_edit header preservation#464

Merged
memtomem merged 2 commits intomainfrom
feat/blockquote-tags-writer
Apr 25, 2026
Merged

feat(mem_add,mem_edit): canonical writer + mem_edit header preservation#464
memtomem merged 2 commits intomainfrom
feat/blockquote-tags-writer

Conversation

@memtomem
Copy link
Copy Markdown
Owner

Follow-up to #463 (chunker reader). Two coupled writer-side changes plus the e2e round-trip the chunker enabled.

Summary

  • Writer canonical formappend_entry now emits > tags: ["a", "b"] (explicit > prefix on every line, JSON-quoted) instead of the lazy-continuation form tags: ['a', 'b'] (Python repr()). Chunker reader (PR feat(chunking): promote per-entry blockquote tags to ChunkMetadata.tags #463) already accepts both shapes, so old files still parse.
  • mem_edit header preservation — new replace_chunk_body helper keeps the heading + > created: / > tags: blockquote intact when the caller supplies body-only new_content. Prefix new_content with ## to bypass preservation and revert to full-replacement (pre-RFC semantic).
  • mem_search(tag_filter=...) round-trip — multi-agent e2e now asserts mem_search(tag_filter=f"shared-from={src}") returns the share copy. Closes the gap discovered in PR test(multi-agent): end-to-end integration coverage #462.

Why two writer changes in one PR

The reviewer flagged on #463 that mem_edit's existing replace_lines call wasn't a new regression but became a much louder footgun once the chunker strips the blockquote from chunk content. Post-PR-A, a caller who reads chunk.content and feeds it back through mem_edit would supply body-only and silently erase the metadata header. Fixing that in the same PR as the canonical writer keeps the related contract change coherent.

What changed

  • packages/memtomem/src/memtomem/tools/memory_writer.py
    • append_entry switches to > tags: {json.dumps(list(tags))}.
    • New _find_body_start_index(chunk_lines) walks a chunk's line range and returns the index where the body starts, after heading + leading blockquote group (with lazy-continuation tolerance for legacy on-disk files).
    • New replace_chunk_body(file_path, start_line, end_line, new_content) preserves heading + leading blockquote unless new_content starts with ## (full replace).
  • packages/memtomem/src/memtomem/server/tools/memory_crud.py
    • mem_edit calls replace_chunk_body instead of replace_lines.
  • packages/memtomem/tests/test_memory_writer.py — existing tag-format pin updated to canonical form.
  • New tests/test_memory_writer_tag_format.py — 5 cases pinning the canonical output (explicit > prefix, JSON serialization, round-trip via json.loads, empty-tag omission, > created: prefix preserved).
  • New tests/test_mem_edit_header_preservation.py — 6 cases (canonical / legacy lazy / no-header / explicit ## full-replace / non-first sub-chunk of oversized section / trailing-newline round-trip).
  • tests/test_chunking_blockquote_tags.py — adds block-list-shape case (> tags:\n> - a\n> - b). Writer never emits this, but the shared _parse_tags_value helper supports it and we lock that in.
  • tests/test_multi_agent_integration.py test_share_copies_chunk_with_audit_tag — adds forward-direction tag_filter round-trip assertion.

Test plan

  • Full suite: uv run pytest -m "not ollama"2402 passed, 0 failed (PR-A baseline 2390 + 12 new).
  • uv run ruff check && uv run ruff format --check (full repo).
  • uv run mypy on changed source files — clean.

Out of scope (still tracked)

Three remaining follow-ups noted in #463 review, all in planning/mem-add-tags-blockquote-promote-rfc.md §Follow-ups:

  • _split_section line-number drift after blockquote strip (small follow-up PR; mem_edit / UI only, search unaffected).
  • add_entries_batch broadcast-tag cleanup (memory_crud.py:392–401); orthogonal cleanup PR.
  • chunk_links table to permanently close the shared-from=<old-uuid> audit-chain gap on reindex UUID churn — separate RFC mem-agent-share-chunk-links-rfc.md.

🤖 Generated with Claude Code

pandas-studio and others added 2 commits April 25, 2026 08:53
Stacks on PR #463 (chunker reader). Two coupled writer-side changes:

1. append_entry emits canonical blockquote tags

   > tags: ["a", "b"]   # explicit `> ` prefix, JSON-quoted

   instead of the legacy lazy-continuation form

   tags: ['a', 'b']     # Python repr(), no `> ` prefix

   Lazy continuation is well-defined CommonMark but easy to break
   (a single blank line ends the blockquote), and Python repr() is
   not valid YAML. The chunker's section-leading parser already
   accepts both shapes, so existing files still parse — the writer
   just stops producing the fragile form on new entries.

   New test_memory_writer_tag_format.py pins the canonical output so
   a future regression that reverts to lazy / Python-repr fails fast.

2. mem_edit preserves the per-entry header

   Pre-PR, mem_edit called replace_lines(start_line, end_line,
   new_content), which replaced the entire chunk range — heading,
   blockquote header, body — with whatever the caller supplied.
   That was already wide pre-RFC, but post-PR-A the chunker strips
   the blockquote from chunk content, so a caller working from
   chunk.content would supply body-only and silently erase the
   metadata header on disk.

   New replace_chunk_body helper detects the heading + section-
   leading blockquote group at the top of the chunk's line range and
   keeps them intact while replacing only the body. If new_content
   itself starts with `## `, the call reverts to a full replacement
   (preserving the pre-RFC mem_edit semantic for users who want to
   override the heading explicitly).

   New test_mem_edit_header_preservation.py covers six cases
   (canonical, legacy lazy, no header, full-replace via `## `,
   non-first sub-chunk of an oversized section, trailing-newline
   round-trip).

Round-trip and follow-ups:

- test_multi_agent_integration test_share_copies_chunk_with_audit_tag
  now also asserts the forward direction:
  mem_search(tag_filter="shared-from=<src>") returns the share copy
  and only the share copy. Closes the e2e gap discovered during PR
  #462 that drove RFC mem-add-tags-blockquote-promote-rfc.md.
- test_chunking_blockquote_tags adds a block-list-shape case
  (`> tags:\n> - a\n> - b`). The current writer never emits this
  shape, but the shared _parse_tags_value helper supports it and
  this locks the support in.

Implements RFC §Migration & tests #2 + #3 and the writer-PR scope
addition (mem_edit header preservation, ③-1) noted in PR #463
review.

Co-Authored-By: Claude <[email protected]>
Reviewer caught that web/routes/chunks.py:edit_chunk still called
replace_lines directly, which bypasses the header-preservation logic
mem_edit gained in this PR. Same footgun, different surface — the Web
UI editor surfaces chunk.content (already header-stripped by the
chunker), so a Save from the browser would feed body-only content into
replace_lines and silently erase the heading + > created: + > tags:
on disk.

Swap the call to replace_chunk_body so mem_edit and the Web UI route
share the same preservation contract. Add a regression test
(test_edit_chunk_preserves_blockquote_header) that drives the route
end-to-end with a real tmp-file source and asserts the metadata
header survives a body-only PATCH.

Also folded in two reviewer nits:

- mem_edit MCP-tool docstring now mentions the body-only / `## `
  bypass policy (was only documented on the underlying helper).
- replace_chunk_body has a one-line comment noting that ``## `` is the
  full-replace trigger because append_entry always emits H2; other
  heading levels in user input are body content.

grep across src/ confirms replace_lines now has zero non-test callers
in the codebase. Whether to deprecate it or keep it as a low-level
escape hatch is tracked as a follow-up; not closing that question
here.

Co-Authored-By: Claude <[email protected]>
@memtomem memtomem merged commit f215428 into main Apr 25, 2026
7 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 25, 2026
@memtomem memtomem deleted the feat/blockquote-tags-writer branch April 27, 2026 14:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants