Skip to content

Embedding-loss race when two processes use different models on the same DB (follow-up from #691) #707

@memtomem

Description

@memtomem

What

Follow-up from PR #705 review: with the new INSERT OR IGNORE race-loser path, two processes that compute different embeddings for the same content_hash will keep whichever one commits first and silently drop the other. This is a non-issue in steady state but matters during embedding-model swaps.

Why it's safe today

content_hash = sha256(NFC(content)) (see models.py:Chunk.__post_init__). The hash is a function of content alone — the embedding doesn't enter it. As long as both processes run the same embedding model, both compute byte-identical vectors for the same hash. INSERT OR IGNORE keeps one row's embedding; the other process's identical embedding is the loser, but they're indistinguishable.

The unique key (namespace, source_file, content_hash, start_line) is also embedding-independent.

When it would bite

A user runs two processes at the same time with different embedding models configured — e.g.:

  • Process A is mm web started before a model swap (still serving bge-m3).
  • User runs mm embedding-reset --mode apply-current to switch to a different model, restarts MCP server with the new config, and the new MCP starts indexing.
  • The two processes briefly co-exist with different embedding models pointing at the same DB.

In that window the race-loser's embedding (computed against a different model) is dropped. The keeper's embedding might be from the old model even though the user thought they were upgrading.

This is already mostly guarded by the dim=0 / real-provider gate (sqlite_schema.py line ~212, issue #298) — the storage refuses to start with mismatched stored vs configured dimension. But that gate only fires for the dimension mismatch case, not same-dim different-model.

Suggested handling

Options, smallest-blast first:

  1. Document the constraint. Note in docs/guides/ that mm embedding-reset should be run after stopping all mm processes, not while another is alive. Cheap, non-invasive.
  2. Single-writer guard. Have mm web and the MCP server fight over a process-level lock file (e.g., ~/.memtomem/.indexing.pid) so only one writes at a time. Fixes other multi-process concerns too but bigger change.
  3. Embed-on-read. Stop persisting embeddings; recompute on search. Solves all model-swap races but unacceptable latency hit.

Probably (1) for now, with (2) as a separate RFC if it keeps coming up.

Reproduction sketch

Not currently reproducible without manual setup — would need:

  • Start mm web with bge-m3 config.
  • Edit ~/.memtomem/config.json to change the embedding model to a different one with the same dimension (so the dim gate doesn't fire — e.g., another 1024-d model).
  • Start an mm index from a separate shell against the same DB while mm web is still alive.
  • Inspect chunks_vec rows: some will be from model A, some from model B, indistinguishably.

Severity

Low. Realistic users don't run two embedding models against one DB on purpose, and the dim mismatch path already catches the common reset misuse. Tracking here so it isn't forgotten when the multi-process surface grows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions