Conversation
…-from tags PR-1 of the chunk_links series (private RFC `mem-agent-share-chunk-links-rfc.md`). Storage-only change — no public API surface yet, no behavior change for existing callers. `mem_agent_share` historically encoded provenance as a `shared-from=<uuid>` audit tag baked into the destination chunk's `tags` array. Tag-based provenance has three problems: it doesn't benefit from an index (fanout query is `tags LIKE '%shared-from=%'`, a full table scan), it breaks on UUID churn (reindex re-issues chunk ids and the audit chain breaks at the gap), and it can't enforce a relationship across delete. This PR adds the storage substrate. PR-2 wires the writer (`mem_agent_share` records into `chunk_links` on share) and reader Python API; PR-3 (optional) exposes a `mem_share_lineage` MCP tool. ## Schema `chunk_links` has `PRIMARY KEY (target_id, link_type)` so every destination chunk has at most one link of each type, plus indexes on `(source_id, link_type)` and `namespace_target` for fanout / per-NS audit queries. FK semantics: - `ON DELETE SET NULL` on `source_id` — preserves the existing copy-on-share durability: a teammate deleting their note does not delete yours; the link row stays with `source_id=NULL` and the destination chunk lives on. Provenance is still recoverable from the markdown `shared-from=` tag (still written into content). - `ON DELETE CASCADE` on `target_id` — destination delete drops the row, no dangling pointer. `link_type` validation lives in Python (`_VALID_LINK_TYPES`) rather than as a CHECK constraint so adding a new value (`consolidated_from`, `reflected_from`) is one PR, not two. ## Back-fill Existing share copies (created before this PR ships) have a `shared-from=<uuid>` tag in `chunks.tags` but no row in `chunk_links`. A one-shot pass scans those rows, parses the source UUID, resolves it against `chunks.id` (NULL if the source was already deleted), and `INSERT OR IGNORE`s into `chunk_links`. Completion is recorded in `_memtomem_meta` (`chunk_links_backfill_v1`) so subsequent startups short-circuit. Bumping the version key triggers a re-run if the parser ever needs to widen. ## Tests - `test_chunk_links_schema.py`: table+indexes shape, idempotent re-run, FK SET NULL on source delete, FK CASCADE on target delete, PK uniqueness conflict. - Back-fill cases: existing source resolved, missing source → NULL, unrelated tags ignored, marker prevents re-scan of post-migration rows, malformed `tags` JSON skipped. Full suite: 2421 passed (was 2411, +10). ruff clean. Co-Authored-By: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR-1 of the chunk_links series (private RFC
mem-agent-share-chunk-links-rfc.md). Storage-only change — nopublic API surface yet, no behavior change for existing callers. PR-2
wires the writer / reader Python API; PR-3 (optional) exposes a
mem_share_lineageMCP tool.Why
mem_agent_sharehistorically encoded provenance as ashared-from=<uuid>audit tag in the destination chunk'stagsarray.Tag-based provenance:
tags LIKE '%shared-from=%', a full table scan;shared-from=<old-uuid>chain breaks at the gap;This PR adds the storage substrate so PR-2 can wire structured
provenance without a coordinated big-bang change.
Schema
chunk_linkskeyed on(target_id, link_type)so each destinationchunk has at most one link of each type. Indexes on
(source_id, link_type)andnamespace_targetcover fanout / per-NSqueries. FK semantics:
ON DELETE SET NULLonsource_id— preserves the existingcopy-on-share durability (a teammate's delete doesn't yank yours).
The markdown
shared-from=tag stays in content for human-readableprovenance after the join becomes NULL.
ON DELETE CASCADEontarget_id— no dangling pointer when thedestination chunk is deleted.
link_typevalidation lives in Python (_VALID_LINK_TYPES), not aCHECK constraint, so adding a new value (
consolidated_from,reflected_from) is one PR, not two.Back-fill
One-shot pass on first startup after upgrade:
SELECT id, namespace, tags FROM chunks WHERE tags LIKE '%shared-from=%'.chunks.id; missing source →source_id=NULL(same end-state as a post-RFC share whose source was later
deleted).
INSERT OR IGNORE INTO chunk_links (...)._memtomem_meta(chunk_links_backfill_v1).Bumping the version key triggers a re-run if the parser ever needs to
widen.
Test plan
test_chunk_links_schema.py— 10 cases:create_tablesidempotent.SET NULLon source delete,CASCADEon targetdelete, PK uniqueness conflict.
unrelated tags ignored, marker prevents re-scan of
post-migration rows, malformed
tagsJSON skipped.uv run pytest -m "not ollama"→ 2421 passed(was 2411, +10 new cases). No regression.
ruff check+ruff format --check→ clean.🤖 Generated with Claude Code