Skip to content

test(export-import): Phase 1 baseline for mem_export -> mem_import roundtrip#451

Merged
memtomem merged 2 commits intomainfrom
feat/export-import-roundtrip-baseline
Apr 24, 2026
Merged

test(export-import): Phase 1 baseline for mem_export -> mem_import roundtrip#451
memtomem merged 2 commits intomainfrom
feat/export-import-roundtrip-baseline

Conversation

@pandas-studio
Copy link
Copy Markdown
Collaborator

Summary

Adds tests/test_export_import_roundtrip.py — four hermetic ONNX-backed baselines that pin down what mem_export / mem_import actually do today, as the foundation for Phase 2 (cross-PC roundtrip fidelity + additive ingestion). This is a measurement-only PR: no source code changes.

Why

The mm export/import plan wants two guarantees: (1) cross-PC roundtrip fidelity (export on PC1, import on PC2 reconstitutes the original) and (2) additive ingestion that merges cleanly back. Before designing the fix, we needed concrete numbers on what breaks today. This file is that measurement.

What the four baselines show

Scenario Finding
Cross-PC single roundtrip Content / metadata / top-k search fully preserved. Only chunk UUIDs reassigned.
Re-import same bundle twice Exact 2× row duplication (4 contents → 8 rows, max 2 rows per content_hash) — upsert_chunks dedupes by UUID, import assigns fresh UUID.
Disjoint merge (different content on PC_A / PC_B) Works cleanly: A+B row count, no hash collisions, B's native content remains searchable post-import.
Merge with content collision Byte-identical content on both sides produces two rows with the same content_hash but different UUIDs.

Two of the four tests hard-assert the current buggy behaviour ( duplication on re-import; duplicate row per collision). They are intentional regression markers — when Phase 2 adds content_hash-based dedup, these tests must fail and be flipped to assert the corrected counts.

Phase 2 requirements derived from these numbers

  • Bundle schema version="2": include per-chunk content_hash + original chunk_id.
  • import_chunks gains on_conflict parameter: "skip" (default, idempotent re-import) / "update" / "duplicate" (back-compat).
  • Optional UUID preservation from the bundle.

Test plan

  • uv run ruff check packages/memtomem/tests/test_export_import_roundtrip.py — clean
  • uv run ruff format --check … — formatted
  • uv run mypy packages/memtomem/tests/test_export_import_roundtrip.py — clean
  • uv run pytest packages/memtomem/tests/test_export_import_roundtrip.py -s -v — 4 passed, 3.29s (ONNX model cached)
  • uv run pytest -m "not ollama" -q — 2257 passed, 0 regressions, 36.82s

🤖 Generated with Claude Code

…undtrip

Adds tests/test_export_import_roundtrip.py with four hermetic ONNX-backed
baselines for cross-instance export/import behaviour. The tests do not try
to assert a future desired state — they pin down what the current impl
actually does today, so Phase 2 has measurable targets.

Why:
  The mm export/import plan wants cross-PC roundtrip fidelity plus additive
  ingestion. Before designing the fix, we needed concrete numbers for what
  breaks today. This file is that measurement.

What the baselines show:
  * Single cross-PC roundtrip: content/metadata/top-k search are fully
    preserved; only chunk UUIDs are reassigned on import.
  * Re-importing the same bundle creates exact 2x row duplication
    (4 contents -> 8 rows, max 2 rows per content_hash) because
    upsert_chunks dedupes by UUID and import assigns a fresh UUID.
  * Disjoint merge works (A+B rows, no hash collisions, B's native
    content stays searchable post-import).
  * Content-collision merge also duplicates: identical content on both
    sides produces two rows with the same content_hash but different UUIDs.

Two of the four tests hard-assert the current buggy behaviour (2x dup on
re-import, duplicate row per collision). They are intentional regression
markers: when Phase 2 introduces content_hash-based dedup, these must fail
and be flipped to assert the corrected counts.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The Phase 1 baseline previously printed metadata and top-k overlap as
diagnostic output without enforcing them in hard asserts. The PR body
claimed "content / metadata / top-k search fully preserved" but only
the content-hash equality was automatically guarded, so a future
regression in metadata wiring or embedder drift would silently slip
through with just a print delta.

Promote both to hard asserts in test_phase1_baseline:
  * metadata mismatches across tags, namespace, heading_hierarchy, and
    source_file must be empty (pair by content_hash).
  * top-3 result sets for all four probe queries must match exactly
    between the source and the re-imported instance (set equality,
    order-insensitive to tolerate any tie-breaking noise).

Verified locally: overlap=3/3 for all queries, metadata mismatches=0.
Claim in the PR body now matches what the test actually enforces.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@memtomem memtomem merged commit 18ded9b into main Apr 24, 2026
7 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 24, 2026
@memtomem memtomem deleted the feat/export-import-roundtrip-baseline branch April 24, 2026 14:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants