Skip to content

docs(embeddings): warn that embedding-reset needs an idle DB#710

Merged
memtomem merged 2 commits intomainfrom
docs/707-embedding-reset-single-process
May 2, 2026
Merged

docs(embeddings): warn that embedding-reset needs an idle DB#710
memtomem merged 2 commits intomainfrom
docs/707-embedding-reset-single-process

Conversation

@memtomem
Copy link
Copy Markdown
Owner

@memtomem memtomem commented May 2, 2026

Summary

Why now

Issue #707 (review follow-up from #705) flags that INSERT OR IGNORE on a content-hash key keeps whichever embedding commits first when two processes embed the same chunk under different models. The dim gate (sqlite_schema.py ~line 212, issue #298) catches the dimension-mismatch case, but same-dimension model swaps slip past it (e.g. swapping one 1024-d model for another). The realistic trigger is a user leaving mm web running, editing ~/.memtomem/config.json to a new same-dim model, then invoking mm embedding-reset --mode apply-current from another shell — exactly the flow our existing docs lay out without warning.

Test plan

  • uv run pytest packages/memtomem/tests/test_docs_guards.py -q — 11 passed.
  • uv run ruff check docs/ packages/memtomem/src + ruff format --check packages/memtomem/src — clean.
  • Rendered both edits locally; the link embeddings.mdconfiguration.md#reset-flow still resolves and the new callout sits between the two-mode resolution block and the mem_status warning schema.

🤖 Generated with Claude Code

pandas-studio and others added 2 commits May 2, 2026 14:06
Follow-up to PR #705. The new INSERT OR IGNORE chunk path is content-
hash keyed, so two concurrent processes pointed at the same SQLite file
with *different* embedding models silently lose the race-loser's
embedding while the keeper sticks. The dim-mismatch gate (#298) only
catches the case where the new model has a different dimension; same-
dim swaps slip through.

Realistic users hit this by leaving 'mm web' running, switching to a
new same-dim model in config.json, then invoking 'mm embedding-reset
--mode apply-current' from another shell. Per the issue's chosen
mitigation (option 1 — document the constraint, leaving a single-writer
guard for a separate RFC if it keeps coming up).

- configuration.md#reset-flow: callout explaining the failure mode and
  the stop-other-processes invariant.
- embeddings.md "Switching Models on an Existing Index": short inline
  warning pointing at the canonical callout.

Co-Authored-By: Claude <[email protected]>
Inline single newline inside bold renders as a space in CommonMark/GFM,
so `**same-\ndimension**` came out as "same- dimension" on GitHub. Move
the wrap so the bolded phrase stays on one line.

Co-Authored-By: Claude <[email protected]>
@memtomem memtomem changed the title docs(embeddings): warn that embedding-reset needs an idle DB (#707) docs(embeddings): warn that embedding-reset needs an idle DB May 2, 2026
@memtomem memtomem merged commit 3843f97 into main May 2, 2026
8 of 9 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 2, 2026
@memtomem memtomem deleted the docs/707-embedding-reset-single-process branch May 2, 2026 05:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Embedding-loss race when two processes use different models on the same DB (follow-up from #691)

2 participants