Skip to content

feat(multi-agent): chunk_links writer + reader API; mem_agent_share records link#470

Merged
memtomem merged 2 commits intomainfrom
feat/chunk-links-writer-reader
Apr 25, 2026
Merged

feat(multi-agent): chunk_links writer + reader API; mem_agent_share records link#470
memtomem merged 2 commits intomainfrom
feat/chunk-links-writer-reader

Conversation

@memtomem
Copy link
Copy Markdown
Owner

Summary

PR-2 of the chunk_links series (private RFC mem-agent-share-chunk-links-rfc.md). PR-1 (#469) added the chunk_links table plus a one-shot back-fill from the legacy shared-from=<uuid> audit tags. This PR wires the writer into mem_agent_share and exposes the reader Python API so structured provenance (indexed fanout, FK-bounded cascade, O(depth) walk-back) is populated going forward, not just for pre-RFC rows.

Why

Before PR-1, provenance was only recoverable via LIKE '%shared-from=%' scans over chunks.tags. That doesn't benefit from an index, breaks on UUID churn (reindex re-issues chunk ids), and can't be enforced across delete. PR-1 gave us the indexed, FK-bounded table. This PR makes mem_agent_share actually write into it so the structure is populated without waiting for a re-share.

Changes

Model (models.py)

New ChunkLink dataclass:

  • target_id: UUID — non-null, PK component
  • source_id: UUID | None — null after source delete (ON DELETE SET NULL) or back-fill of an unresolvable tag
  • link_type: str — currently 'shared'; _VALID_LINK_TYPES in sqlite_schema.py is the single source of truth
  • namespace_target: str — denormalized so "list everything shared out" is one index lookup
  • created_at: datetime

Storage (storage/mixins/share_links.py, new)

ShareLinkMixin wired into SqliteBackend with four methods:

  • add_chunk_link(source_id, target_id, link_type, namespace_target)INSERT OR REPLACE, idempotent on (target_id, link_type). Validates link_type against _VALID_LINK_TYPES in Python (the table has no CHECK constraint so adding a type is one PR, not two).
  • get_chunk_link(target_id, link_type='shared') — exact PK lookup.
  • get_chunks_shared_from(source_id, link_type=None) — fanout via idx_chunk_links_source; optional link_type filter.
  • walk_share_chain(target_id, *, link_type='shared', max_depth=100) — walks target_id → source_id backward. Cycle defence via visited set, plus max_depth as a worst-case ceiling. Stops (and includes the terminal row) when source_id IS NULL; returns [] for unknown targets.

Abstract StorageBackend protocol in base.py gets the four signatures too.

Writer (server/tools/multi_agent.py)

mem_agent_share now calls _mem_add_core (returns IndexingStats) instead of the MCP-wrapped mem_add (string only), so it can read stats.new_chunk_ids[0] — the freshly-indexed destination chunk — and pass it to add_chunk_link.

Writer failure is best-effort + logged, not fatal:

  • The markdown file still gets the shared-from= tag (copy-on-share durability is untouched).
  • A link-table error must not surface to the caller or roll back the copy.
  • The back-fill migration already handles rows without link records on upgrade, so even a silent writer drop would self-heal on the next version bump.

Tests (+17 tests; total 2421 → 2438)

  • test_chunk_links_writer.pyTestAddChunkLinkUnit pins the writer contract (round-trip, INSERT OR REPLACE on re-share, NULL source acceptance, invalid link_type rejection). TestMemAgentShareWritesLink verifies the end-to-end integration writes a row with the right (source_id, target_id, namespace_target) shape. TestMemAgentShareLinkSurvivesSourceDelete exercises the ON DELETE SET NULL flow through the MCP surface.
  • test_chunk_links_reader.pyget_chunk_link (missing target, link_type filter); get_chunks_shared_from (empty fanout, multi-target ordering, link_type narrowing); walk_share_chain (happy path, NULL-terminal, unknown target, A↔B cycle, max_depth=3 on a 10-deep chain, max_depth=0 degenerate).
  • test_multi_agent_integration.pytest_share_copies_chunk_with_audit_tag extended to assert both the pre-existing audit-tag behavior and the new chunk_links row.

One test I did not add: a multi-target "share this into two different namespaces" case. Turned out to expose a pre-existing indexer behavior (both shares append to the same daily markdown file and the second index_file(path, namespace=...) re-namespaces the first) — orthogonal to PR-2 and not something I want to slip into this series. Noting it here so it's tracked.

Test plan

  • uv run ruff check packages/memtomem/src packages/memtomem/tests
  • uv run ruff format --check packages/memtomem/src packages/memtomem/tests
  • uv run pytest -m "not ollama" — 2438 passed, 46 deselected
  • Targeted pass: pytest tests/test_chunk_links_{schema,writer,reader}.py tests/test_multi_agent_integration.py tests/test_multi_agent.py tests/test_storage_noop.py — 64 passed

🤖 Generated with Claude Code

pandas-studio and others added 2 commits April 25, 2026 10:38
…ecords structured link

PR-2 of the chunk_links series (private RFC
`mem-agent-share-chunk-links-rfc.md`). PR-1 (#469) added the table
and the one-shot back-fill from `shared-from=<uuid>` audit tags; this
PR wires the writer into `mem_agent_share` and exposes the Python
reader API. Public MCP surface is unchanged — the link is a storage
invariant a follow-up `mem_share_lineage` tool (out of scope here)
would expose.

## Why

Provenance was only recoverable by `LIKE '%shared-from=%'` scans over
`chunks.tags`. That does not benefit from an index, breaks on UUID
churn (reindex re-issues chunk ids), and cannot be enforced across
delete. PR-1 gave us the indexed, FK-bounded table; this PR makes
`mem_agent_share` actually write into it so the structure is populated
without waiting for a re-share.

## Changes

**Model** (`models.py`): new `ChunkLink` dataclass — `target_id`
(non-null `UUID`), `source_id: UUID | None` (null after source delete
or back-fill of unresolvable tag), `link_type`, `namespace_target`,
`created_at`.

**Storage** (`storage/mixins/share_links.py`, new): `ShareLinkMixin`
wired into `SqliteBackend` with four methods — `add_chunk_link`
(INSERT OR REPLACE, idempotent on `(target_id, link_type)`;
validates `link_type` against `_VALID_LINK_TYPES` in Python since
the table has no CHECK constraint), `get_chunk_link`,
`get_chunks_shared_from` (fanout by indexed source_id; optional
`link_type` filter), `walk_share_chain` (cycle defence via visited
set + `max_depth` ceiling; terminates on NULL `source_id`, including
the terminal row in the result).

**Writer** (`server/tools/multi_agent.py`): `mem_agent_share` now
calls `_mem_add_core` instead of the MCP-wrapped `mem_add` so it can
read `IndexingStats.new_chunk_ids` and pick the freshly-indexed
destination UUID. Writer failure is best-effort + logged — the
markdown file and the `shared-from=` tag still provide the durable
record, so a link insert failure must not roll back the copy.

## Tests (+17 tests, total 2421 → 2438)

- `test_chunk_links_writer.py` — `TestAddChunkLinkUnit` pins the
  writer contract (round-trip, `INSERT OR REPLACE` on re-share, NULL
  source acceptance, invalid `link_type` rejection);
  `TestMemAgentShareWritesLink` verifies the end-to-end integration
  records a row with the right source/target/namespace; a separate
  test covers the source-delete-nulls-source_id durability flow.
- `test_chunk_links_reader.py` — `get_chunk_link`,
  `get_chunks_shared_from`, and `walk_share_chain` (happy path,
  NULL-terminal, unknown target, A↔B cycle, `max_depth=3` on a
  10-deep chain, `max_depth=0` degenerate input).
- `test_multi_agent_integration.py` — extends the existing
  `test_share_copies_chunk_with_audit_tag` to also assert the
  `chunk_links` row alongside the existing audit-tag assertion.

Out of scope (tracked separately): `mem_share_lineage` MCP tool
(optional PR-3 per the RFC); multi-target share-into-distinct-
namespaces collapsing onto the same daily markdown file (pre-existing
indexer behavior, not introduced here).

`ruff check` / `ruff format --check` / `pytest -m "not ollama"` clean.

Co-Authored-By: Claude <[email protected]>
…ailure-path test

Review-driven follow-ups for PR #470:

- multi_agent.py: expand the comment above add_chunk_link to flag the
  _merge_short_chunks edge case — when the chunker folds a short share
  entry into the previous trailing chunk of the daily file,
  new_chunk_ids[0] points at the re-merged chunk rather than a pure
  share copy. Also clarify that back-fill self-heal needs a
  _CHUNK_LINKS_BACKFILL_KEY bump, which this PR does not do.

- storage/mixins/share_links.py: tighten _row_to_link's source_id
  guard from truthy (``if source_id``) to ``is not None`` so a
  hypothetical empty-string source row surfaces as UUID-parse error
  instead of silently collapsing to None.

- test_chunk_links_writer.py: add TestMemAgentShareWriterFailureNonFatal
  — monkeypatches storage.add_chunk_link to raise and pins four
  contracts at once: the caller sees the normal success string, the
  warning is logged (incl. exc_info for ops), the markdown share copy
  is still indexed, and no chunk_links row is left behind (matches the
  back-fill-heals-later story).

`ruff check` / `ruff format --check` / `pytest -m "not ollama"` green —
2439 passed (prev 2438).

Co-Authored-By: Claude <[email protected]>
@memtomem memtomem merged commit 3797fc3 into main Apr 25, 2026
7 checks passed
@memtomem memtomem deleted the feat/chunk-links-writer-reader branch April 25, 2026 01:55
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 25, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants