feat(mem_add,mem_edit): canonical writer + mem_edit header preservation#464
Merged
feat(mem_add,mem_edit): canonical writer + mem_edit header preservation#464
Conversation
Stacks on PR #463 (chunker reader). Two coupled writer-side changes: 1. append_entry emits canonical blockquote tags > tags: ["a", "b"] # explicit `> ` prefix, JSON-quoted instead of the legacy lazy-continuation form tags: ['a', 'b'] # Python repr(), no `> ` prefix Lazy continuation is well-defined CommonMark but easy to break (a single blank line ends the blockquote), and Python repr() is not valid YAML. The chunker's section-leading parser already accepts both shapes, so existing files still parse — the writer just stops producing the fragile form on new entries. New test_memory_writer_tag_format.py pins the canonical output so a future regression that reverts to lazy / Python-repr fails fast. 2. mem_edit preserves the per-entry header Pre-PR, mem_edit called replace_lines(start_line, end_line, new_content), which replaced the entire chunk range — heading, blockquote header, body — with whatever the caller supplied. That was already wide pre-RFC, but post-PR-A the chunker strips the blockquote from chunk content, so a caller working from chunk.content would supply body-only and silently erase the metadata header on disk. New replace_chunk_body helper detects the heading + section- leading blockquote group at the top of the chunk's line range and keeps them intact while replacing only the body. If new_content itself starts with `## `, the call reverts to a full replacement (preserving the pre-RFC mem_edit semantic for users who want to override the heading explicitly). New test_mem_edit_header_preservation.py covers six cases (canonical, legacy lazy, no header, full-replace via `## `, non-first sub-chunk of an oversized section, trailing-newline round-trip). Round-trip and follow-ups: - test_multi_agent_integration test_share_copies_chunk_with_audit_tag now also asserts the forward direction: mem_search(tag_filter="shared-from=<src>") returns the share copy and only the share copy. Closes the e2e gap discovered during PR #462 that drove RFC mem-add-tags-blockquote-promote-rfc.md. - test_chunking_blockquote_tags adds a block-list-shape case (`> tags:\n> - a\n> - b`). The current writer never emits this shape, but the shared _parse_tags_value helper supports it and this locks the support in. Implements RFC §Migration & tests #2 + #3 and the writer-PR scope addition (mem_edit header preservation, ③-1) noted in PR #463 review. Co-Authored-By: Claude <[email protected]>
Reviewer caught that web/routes/chunks.py:edit_chunk still called replace_lines directly, which bypasses the header-preservation logic mem_edit gained in this PR. Same footgun, different surface — the Web UI editor surfaces chunk.content (already header-stripped by the chunker), so a Save from the browser would feed body-only content into replace_lines and silently erase the heading + > created: + > tags: on disk. Swap the call to replace_chunk_body so mem_edit and the Web UI route share the same preservation contract. Add a regression test (test_edit_chunk_preserves_blockquote_header) that drives the route end-to-end with a real tmp-file source and asserts the metadata header survives a body-only PATCH. Also folded in two reviewer nits: - mem_edit MCP-tool docstring now mentions the body-only / `## ` bypass policy (was only documented on the underlying helper). - replace_chunk_body has a one-line comment noting that ``## `` is the full-replace trigger because append_entry always emits H2; other heading levels in user input are body content. grep across src/ confirms replace_lines now has zero non-test callers in the codebase. Whether to deprecate it or keep it as a low-level escape hatch is tracked as a follow-up; not closing that question here. Co-Authored-By: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #463 (chunker reader). Two coupled writer-side changes plus the e2e round-trip the chunker enabled.
Summary
append_entrynow emits> tags: ["a", "b"](explicit>prefix on every line, JSON-quoted) instead of the lazy-continuation formtags: ['a', 'b'](Pythonrepr()). Chunker reader (PR feat(chunking): promote per-entry blockquote tags to ChunkMetadata.tags #463) already accepts both shapes, so old files still parse.mem_editheader preservation — newreplace_chunk_bodyhelper keeps the heading +> created:/> tags:blockquote intact when the caller supplies body-onlynew_content. Prefixnew_contentwith##to bypass preservation and revert to full-replacement (pre-RFC semantic).mem_search(tag_filter=...)round-trip — multi-agent e2e now assertsmem_search(tag_filter=f"shared-from={src}")returns the share copy. Closes the gap discovered in PR test(multi-agent): end-to-end integration coverage #462.Why two writer changes in one PR
The reviewer flagged on #463 that mem_edit's existing
replace_linescall wasn't a new regression but became a much louder footgun once the chunker strips the blockquote from chunk content. Post-PR-A, a caller who readschunk.contentand feeds it back through mem_edit would supply body-only and silently erase the metadata header. Fixing that in the same PR as the canonical writer keeps the related contract change coherent.What changed
packages/memtomem/src/memtomem/tools/memory_writer.pyappend_entryswitches to> tags: {json.dumps(list(tags))}._find_body_start_index(chunk_lines)walks a chunk's line range and returns the index where the body starts, after heading + leading blockquote group (with lazy-continuation tolerance for legacy on-disk files).replace_chunk_body(file_path, start_line, end_line, new_content)preserves heading + leading blockquote unlessnew_contentstarts with##(full replace).packages/memtomem/src/memtomem/server/tools/memory_crud.pymem_editcallsreplace_chunk_bodyinstead ofreplace_lines.packages/memtomem/tests/test_memory_writer.py— existing tag-format pin updated to canonical form.tests/test_memory_writer_tag_format.py— 5 cases pinning the canonical output (explicit>prefix, JSON serialization, round-trip viajson.loads, empty-tag omission,> created:prefix preserved).tests/test_mem_edit_header_preservation.py— 6 cases (canonical / legacy lazy / no-header / explicit##full-replace / non-first sub-chunk of oversized section / trailing-newline round-trip).tests/test_chunking_blockquote_tags.py— adds block-list-shape case (> tags:\n> - a\n> - b). Writer never emits this, but the shared_parse_tags_valuehelper supports it and we lock that in.tests/test_multi_agent_integration.pytest_share_copies_chunk_with_audit_tag— adds forward-directiontag_filterround-trip assertion.Test plan
uv run pytest -m "not ollama"→ 2402 passed, 0 failed (PR-A baseline 2390 + 12 new).uv run ruff check && uv run ruff format --check(full repo).uv run mypyon changed source files — clean.Out of scope (still tracked)
Three remaining follow-ups noted in #463 review, all in
planning/mem-add-tags-blockquote-promote-rfc.md§Follow-ups:_split_sectionline-number drift after blockquote strip (small follow-up PR; mem_edit / UI only, search unaffected).add_entries_batchbroadcast-tag cleanup (memory_crud.py:392–401); orthogonal cleanup PR.chunk_linkstable to permanently close theshared-from=<old-uuid>audit-chain gap on reindex UUID churn — separate RFCmem-agent-share-chunk-links-rfc.md.🤖 Generated with Claude Code