Conversation
…undtrip
Adds tests/test_export_import_roundtrip.py with four hermetic ONNX-backed
baselines for cross-instance export/import behaviour. The tests do not try
to assert a future desired state — they pin down what the current impl
actually does today, so Phase 2 has measurable targets.
Why:
The mm export/import plan wants cross-PC roundtrip fidelity plus additive
ingestion. Before designing the fix, we needed concrete numbers for what
breaks today. This file is that measurement.
What the baselines show:
* Single cross-PC roundtrip: content/metadata/top-k search are fully
preserved; only chunk UUIDs are reassigned on import.
* Re-importing the same bundle creates exact 2x row duplication
(4 contents -> 8 rows, max 2 rows per content_hash) because
upsert_chunks dedupes by UUID and import assigns a fresh UUID.
* Disjoint merge works (A+B rows, no hash collisions, B's native
content stays searchable post-import).
* Content-collision merge also duplicates: identical content on both
sides produces two rows with the same content_hash but different UUIDs.
Two of the four tests hard-assert the current buggy behaviour (2x dup on
re-import, duplicate row per collision). They are intentional regression
markers: when Phase 2 introduces content_hash-based dedup, these must fail
and be flipped to assert the corrected counts.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
5 tasks
The Phase 1 baseline previously printed metadata and top-k overlap as
diagnostic output without enforcing them in hard asserts. The PR body
claimed "content / metadata / top-k search fully preserved" but only
the content-hash equality was automatically guarded, so a future
regression in metadata wiring or embedder drift would silently slip
through with just a print delta.
Promote both to hard asserts in test_phase1_baseline:
* metadata mismatches across tags, namespace, heading_hierarchy, and
source_file must be empty (pair by content_hash).
* top-3 result sets for all four probe queries must match exactly
between the source and the re-imported instance (set equality,
order-insensitive to tolerate any tie-breaking noise).
Verified locally: overlap=3/3 for all queries, metadata mismatches=0.
Claim in the PR body now matches what the test actually enforces.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
tests/test_export_import_roundtrip.py— four hermetic ONNX-backed baselines that pin down whatmem_export/mem_importactually do today, as the foundation for Phase 2 (cross-PC roundtrip fidelity + additive ingestion). This is a measurement-only PR: no source code changes.Why
The
mmexport/import plan wants two guarantees: (1) cross-PC roundtrip fidelity (export on PC1, import on PC2 reconstitutes the original) and (2) additive ingestion that merges cleanly back. Before designing the fix, we needed concrete numbers on what breaks today. This file is that measurement.What the four baselines show
content_hash) —upsert_chunksdedupes by UUID, import assigns fresh UUID.content_hashbut different UUIDs.Two of the four tests hard-assert the current buggy behaviour (
2×duplication on re-import; duplicate row per collision). They are intentional regression markers — when Phase 2 addscontent_hash-based dedup, these tests must fail and be flipped to assert the corrected counts.Phase 2 requirements derived from these numbers
version="2": include per-chunkcontent_hash+ originalchunk_id.import_chunksgainson_conflictparameter:"skip"(default, idempotent re-import) /"update"/"duplicate"(back-compat).Test plan
uv run ruff check packages/memtomem/tests/test_export_import_roundtrip.py— cleanuv run ruff format --check …— formatteduv run mypy packages/memtomem/tests/test_export_import_roundtrip.py— cleanuv run pytest packages/memtomem/tests/test_export_import_roundtrip.py -s -v— 4 passed, 3.29s (ONNX model cached)uv run pytest -m "not ollama" -q— 2257 passed, 0 regressions, 36.82s🤖 Generated with Claude Code