feat(multi-agent): chunk_links writer + reader API; mem_agent_share records link#470
Merged
feat(multi-agent): chunk_links writer + reader API; mem_agent_share records link#470
Conversation
…ecords structured link PR-2 of the chunk_links series (private RFC `mem-agent-share-chunk-links-rfc.md`). PR-1 (#469) added the table and the one-shot back-fill from `shared-from=<uuid>` audit tags; this PR wires the writer into `mem_agent_share` and exposes the Python reader API. Public MCP surface is unchanged — the link is a storage invariant a follow-up `mem_share_lineage` tool (out of scope here) would expose. ## Why Provenance was only recoverable by `LIKE '%shared-from=%'` scans over `chunks.tags`. That does not benefit from an index, breaks on UUID churn (reindex re-issues chunk ids), and cannot be enforced across delete. PR-1 gave us the indexed, FK-bounded table; this PR makes `mem_agent_share` actually write into it so the structure is populated without waiting for a re-share. ## Changes **Model** (`models.py`): new `ChunkLink` dataclass — `target_id` (non-null `UUID`), `source_id: UUID | None` (null after source delete or back-fill of unresolvable tag), `link_type`, `namespace_target`, `created_at`. **Storage** (`storage/mixins/share_links.py`, new): `ShareLinkMixin` wired into `SqliteBackend` with four methods — `add_chunk_link` (INSERT OR REPLACE, idempotent on `(target_id, link_type)`; validates `link_type` against `_VALID_LINK_TYPES` in Python since the table has no CHECK constraint), `get_chunk_link`, `get_chunks_shared_from` (fanout by indexed source_id; optional `link_type` filter), `walk_share_chain` (cycle defence via visited set + `max_depth` ceiling; terminates on NULL `source_id`, including the terminal row in the result). **Writer** (`server/tools/multi_agent.py`): `mem_agent_share` now calls `_mem_add_core` instead of the MCP-wrapped `mem_add` so it can read `IndexingStats.new_chunk_ids` and pick the freshly-indexed destination UUID. Writer failure is best-effort + logged — the markdown file and the `shared-from=` tag still provide the durable record, so a link insert failure must not roll back the copy. ## Tests (+17 tests, total 2421 → 2438) - `test_chunk_links_writer.py` — `TestAddChunkLinkUnit` pins the writer contract (round-trip, `INSERT OR REPLACE` on re-share, NULL source acceptance, invalid `link_type` rejection); `TestMemAgentShareWritesLink` verifies the end-to-end integration records a row with the right source/target/namespace; a separate test covers the source-delete-nulls-source_id durability flow. - `test_chunk_links_reader.py` — `get_chunk_link`, `get_chunks_shared_from`, and `walk_share_chain` (happy path, NULL-terminal, unknown target, A↔B cycle, `max_depth=3` on a 10-deep chain, `max_depth=0` degenerate input). - `test_multi_agent_integration.py` — extends the existing `test_share_copies_chunk_with_audit_tag` to also assert the `chunk_links` row alongside the existing audit-tag assertion. Out of scope (tracked separately): `mem_share_lineage` MCP tool (optional PR-3 per the RFC); multi-target share-into-distinct- namespaces collapsing onto the same daily markdown file (pre-existing indexer behavior, not introduced here). `ruff check` / `ruff format --check` / `pytest -m "not ollama"` clean. Co-Authored-By: Claude <[email protected]>
…ailure-path test Review-driven follow-ups for PR #470: - multi_agent.py: expand the comment above add_chunk_link to flag the _merge_short_chunks edge case — when the chunker folds a short share entry into the previous trailing chunk of the daily file, new_chunk_ids[0] points at the re-merged chunk rather than a pure share copy. Also clarify that back-fill self-heal needs a _CHUNK_LINKS_BACKFILL_KEY bump, which this PR does not do. - storage/mixins/share_links.py: tighten _row_to_link's source_id guard from truthy (``if source_id``) to ``is not None`` so a hypothetical empty-string source row surfaces as UUID-parse error instead of silently collapsing to None. - test_chunk_links_writer.py: add TestMemAgentShareWriterFailureNonFatal — monkeypatches storage.add_chunk_link to raise and pins four contracts at once: the caller sees the normal success string, the warning is logged (incl. exc_info for ops), the markdown share copy is still indexed, and no chunk_links row is left behind (matches the back-fill-heals-later story). `ruff check` / `ruff format --check` / `pytest -m "not ollama"` green — 2439 passed (prev 2438). Co-Authored-By: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR-2 of the
chunk_linksseries (private RFCmem-agent-share-chunk-links-rfc.md). PR-1 (#469) added thechunk_linkstable plus a one-shot back-fill from the legacyshared-from=<uuid>audit tags. This PR wires the writer intomem_agent_shareand exposes the reader Python API so structured provenance (indexed fanout, FK-bounded cascade, O(depth) walk-back) is populated going forward, not just for pre-RFC rows.mem_agent_share's signature, return shape, and copy-on-share semantics are identical. The link is a storage invariant.mem_share_lineageMCP tool that exposeswalk_share_chainis the optional PR-3 per the RFC and is intentionally out of scope here.Why
Before PR-1, provenance was only recoverable via
LIKE '%shared-from=%'scans overchunks.tags. That doesn't benefit from an index, breaks on UUID churn (reindex re-issues chunk ids), and can't be enforced across delete. PR-1 gave us the indexed, FK-bounded table. This PR makesmem_agent_shareactually write into it so the structure is populated without waiting for a re-share.Changes
Model (
models.py)New
ChunkLinkdataclass:target_id: UUID— non-null, PK componentsource_id: UUID | None— null after source delete (ON DELETE SET NULL) or back-fill of an unresolvable taglink_type: str— currently'shared';_VALID_LINK_TYPESinsqlite_schema.pyis the single source of truthnamespace_target: str— denormalized so "list everything shared out" is one index lookupcreated_at: datetimeStorage (
storage/mixins/share_links.py, new)ShareLinkMixinwired intoSqliteBackendwith four methods:add_chunk_link(source_id, target_id, link_type, namespace_target)—INSERT OR REPLACE, idempotent on(target_id, link_type). Validateslink_typeagainst_VALID_LINK_TYPESin Python (the table has noCHECKconstraint so adding a type is one PR, not two).get_chunk_link(target_id, link_type='shared')— exact PK lookup.get_chunks_shared_from(source_id, link_type=None)— fanout viaidx_chunk_links_source; optionallink_typefilter.walk_share_chain(target_id, *, link_type='shared', max_depth=100)— walkstarget_id → source_idbackward. Cycle defence via visited set, plusmax_depthas a worst-case ceiling. Stops (and includes the terminal row) whensource_id IS NULL; returns[]for unknown targets.Abstract
StorageBackendprotocol inbase.pygets the four signatures too.Writer (
server/tools/multi_agent.py)mem_agent_sharenow calls_mem_add_core(returnsIndexingStats) instead of the MCP-wrappedmem_add(string only), so it can readstats.new_chunk_ids[0]— the freshly-indexed destination chunk — and pass it toadd_chunk_link.Writer failure is best-effort + logged, not fatal:
shared-from=tag (copy-on-share durability is untouched).Tests (+17 tests; total 2421 → 2438)
test_chunk_links_writer.py—TestAddChunkLinkUnitpins the writer contract (round-trip,INSERT OR REPLACEon re-share, NULL source acceptance, invalidlink_typerejection).TestMemAgentShareWritesLinkverifies the end-to-end integration writes a row with the right(source_id, target_id, namespace_target)shape.TestMemAgentShareLinkSurvivesSourceDeleteexercises theON DELETE SET NULLflow through the MCP surface.test_chunk_links_reader.py—get_chunk_link(missing target,link_typefilter);get_chunks_shared_from(empty fanout, multi-target ordering,link_typenarrowing);walk_share_chain(happy path, NULL-terminal, unknown target,A↔Bcycle,max_depth=3on a 10-deep chain,max_depth=0degenerate).test_multi_agent_integration.py—test_share_copies_chunk_with_audit_tagextended to assert both the pre-existing audit-tag behavior and the newchunk_linksrow.One test I did not add: a multi-target "share this into two different namespaces" case. Turned out to expose a pre-existing indexer behavior (both shares append to the same daily markdown file and the second
index_file(path, namespace=...)re-namespaces the first) — orthogonal to PR-2 and not something I want to slip into this series. Noting it here so it's tracked.Test plan
uv run ruff check packages/memtomem/src packages/memtomem/testsuv run ruff format --check packages/memtomem/src packages/memtomem/testsuv run pytest -m "not ollama"— 2438 passed, 46 deselectedpytest tests/test_chunk_links_{schema,writer,reader}.py tests/test_multi_agent_integration.py tests/test_multi_agent.py tests/test_storage_noop.py— 64 passed🤖 Generated with Claude Code