Summary
When a DB was initialized with provider=none (NoopEmbedder, BM25-only mode) the stored _memtomem_meta.embedding_dimension is 0. If the config is later switched to a real provider (onnx/ollama) without running mm embedding-reset, startup silently loads the mismatch: runtime embedder produces real vectors, but chunks_vec was never created (only created when dimension > 0 at storage/sqlite_schema.py:169). Every subsequent upsert_chunks crashes with no such table: chunks_vec.
Propose: startup should fail fast (or surface a prominent warning) when stored.provider != 'none' but stored.dimension == 0, instead of silently loading into a broken state.
Repro
mm init with provider=none (or an install that never ran an embedding-aware init).
- Edit
~/.memtomem/config.json to set embedding.provider=onnx, model=bge-m3, dimension=1024.
- Start
mm web or mm serve — no error / warning.
- Trigger any indexing (
mm index <path> or reindex via UI). Every file fails:
ERROR Indexing failed for <path>: upsert_chunks failed, transaction rolled back:
no such table: chunks_vec
Observed 2026-04-19
Smoke on #295's initial-scan design flooded the log with ~200 identical "no such table: chunks_vec" lines before we diagnosed the root cause. User's _memtomem_meta had dim=0, provider=onnx, model=bge-m3 with 0 chunks populated — a contradictory combination that startup accepted without complaint.
Recovery (current): mm embedding-reset --mode apply-current drops chunks_vec if it exists, recreates it with the configured dimension, updates meta. Safe when chunks=0 (no data loss); destructive otherwise.
Proposed fix direction
In storage/sqlite_schema.py create_tables (around L140-L167 where stored provider/model are validated):
- Add a new validation branch: if
stored_provider not in (None, 'none') AND stored_dim == 0, this is a legacy mismatch — surface it to the caller (StorageBackend.initialize) as a distinct error type.
- Either raise on startup with a clear remediation message (
"DB has legacy NoopEmbedder meta (dim=0) but provider={provider}. Run 'mm embedding-reset --mode apply-current' to resolve.") or flag it as a startup warning + set a dim0_mismatch flag consumed by the embedding-status endpoint / web banner.
Fail-fast is probably the right call — silent loading produces hundreds of log lines before the user sees anything actionable.
Scope boundaries
- Only affects startup code path. No schema migration needed; the fix is a gate + clear message.
- Doesn't change
mm embedding-reset behavior — it remains the recovery tool.
- CLI / web both should emit the same diagnostic (consider moving the check into
create_tables so it covers both).
Tests
- Add a unit test that constructs a DB with
_memtomem_meta = {dim:0, provider:onnx} and asserts StorageBackend.initialize() raises the new error type.
- Add the reverse: DB with
dim:0, provider:none should still initialize cleanly (that's the legitimate BM25-only case).
Related
Summary
When a DB was initialized with
provider=none(NoopEmbedder, BM25-only mode) the stored_memtomem_meta.embedding_dimensionis0. If the config is later switched to a real provider (onnx/ollama) without runningmm embedding-reset, startup silently loads the mismatch: runtime embedder produces real vectors, butchunks_vecwas never created (only created whendimension > 0atstorage/sqlite_schema.py:169). Every subsequentupsert_chunkscrashes withno such table: chunks_vec.Propose: startup should fail fast (or surface a prominent warning) when
stored.provider != 'none'butstored.dimension == 0, instead of silently loading into a broken state.Repro
mm initwithprovider=none(or an install that never ran an embedding-aware init).~/.memtomem/config.jsonto setembedding.provider=onnx,model=bge-m3,dimension=1024.mm webormm serve— no error / warning.mm index <path>or reindex via UI). Every file fails:Observed 2026-04-19
Smoke on #295's initial-scan design flooded the log with ~200 identical "no such table: chunks_vec" lines before we diagnosed the root cause. User's
_memtomem_metahaddim=0, provider=onnx, model=bge-m3with 0 chunks populated — a contradictory combination that startup accepted without complaint.Recovery (current):
mm embedding-reset --mode apply-currentdropschunks_vecif it exists, recreates it with the configured dimension, updates meta. Safe when chunks=0 (no data loss); destructive otherwise.Proposed fix direction
In
storage/sqlite_schema.pycreate_tables(around L140-L167 where stored provider/model are validated):stored_provider not in (None, 'none')ANDstored_dim == 0, this is a legacy mismatch — surface it to the caller (StorageBackend.initialize) as a distinct error type."DB has legacy NoopEmbedder meta (dim=0) but provider={provider}. Run 'mm embedding-reset --mode apply-current' to resolve.") or flag it as a startup warning + set adim0_mismatchflag consumed by the embedding-status endpoint / web banner.Fail-fast is probably the right call — silent loading produces hundreds of log lines before the user sees anything actionable.
Scope boundaries
mm embedding-resetbehavior — it remains the recovery tool.create_tablesso it covers both).Tests
_memtomem_meta = {dim:0, provider:onnx}and assertsStorageBackend.initialize()raises the new error type.dim:0, provider:noneshould still initialize cleanly (that's the legitimate BM25-only case).Related
feedback_chunks_vec_dim0_legacy.mdas a debugging recognition pattern.