docs(embeddings): warn that embedding-reset needs an idle DB#710
Merged
docs(embeddings): warn that embedding-reset needs an idle DB#710
Conversation
Follow-up to PR #705. The new INSERT OR IGNORE chunk path is content- hash keyed, so two concurrent processes pointed at the same SQLite file with *different* embedding models silently lose the race-loser's embedding while the keeper sticks. The dim-mismatch gate (#298) only catches the case where the new model has a different dimension; same- dim swaps slip through. Realistic users hit this by leaving 'mm web' running, switching to a new same-dim model in config.json, then invoking 'mm embedding-reset --mode apply-current' from another shell. Per the issue's chosen mitigation (option 1 — document the constraint, leaving a single-writer guard for a separate RFC if it keeps coming up). - configuration.md#reset-flow: callout explaining the failure mode and the stop-other-processes invariant. - embeddings.md "Switching Models on an Existing Index": short inline warning pointing at the canonical callout. Co-Authored-By: Claude <[email protected]>
Inline single newline inside bold renders as a space in CommonMark/GFM, so `**same-\ndimension**` came out as "same- dimension" on GitHub. Move the wrap so the bolded phrase stays on one line. Co-Authored-By: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
mm embedding-resetso users don't trigger the same-dim model-swap race that PR fix(storage): block duplicate chunk inserts via UNIQUE + INSERT OR IGNORE (#691) #705'sINSERT OR IGNOREpath can't catch.configuration.md#reset-flow, and a one-line warning inembeddings.md"Switching Models on an Existing Index" (which already links to the canonical section).Why now
Issue #707 (review follow-up from #705) flags that
INSERT OR IGNOREon a content-hash key keeps whichever embedding commits first when two processes embed the same chunk under different models. The dim gate (sqlite_schema.py~line 212, issue #298) catches the dimension-mismatch case, but same-dimension model swaps slip past it (e.g. swapping one 1024-d model for another). The realistic trigger is a user leavingmm webrunning, editing~/.memtomem/config.jsonto a new same-dim model, then invokingmm embedding-reset --mode apply-currentfrom another shell — exactly the flow our existing docs lay out without warning.Test plan
uv run pytest packages/memtomem/tests/test_docs_guards.py -q— 11 passed.uv run ruff check docs/ packages/memtomem/src+ruff format --check packages/memtomem/src— clean.embeddings.md→configuration.md#reset-flowstill resolves and the new callout sits between the two-mode resolution block and themem_statuswarning schema.🤖 Generated with Claude Code