feat(web): surface fastembed model load state via banner (#696) by memtomem · Pull Request #703 · memtomem/memtomem

memtomem · 2026-05-02T03:53:31Z

Closes #696 (option B).

Summary

Adds GET /api/system/model-readiness reporting per-component (embedder + reranker) load state derived from _loading / _load_error flags newly attached to OnnxEmbedder and FastEmbedReranker, plus a filesystem probe of the fastembed cache.
Wires a header banner that polls the endpoint and renders Downloading bge-m3 (~2300 MB)… / Loading model… / Model failed to load — check Settings so the first search after a cold-cache boot no longer feels like a hung UI.
Boot hydrate, visibilitychange re-hydrate, and a doSearch() pre-flight cover the three entry points.

Why option B

fastembed does not expose snapshot-download progress events, so a real percent bar (option C) would mean wrapping huggingface_hub.snapshot_download ourselves and replicating fastembed's resolution logic. Option B is the smallest change that converts "frozen Search button" into "I see what's happening" without that complexity. See the issue body for the option A/B/C tradeoffs.

Commits

Split for review; one PR for atomic ship:

feat(embedding): track loading state on lazy fastembed loaders
– _loading / _load_error flags + embedding/readiness.py cache probe + size map.
feat(web): add /api/system/model-readiness endpoint
– Schema, route, endpoint tests. No UI.
feat(web): show model-loading banner with polling
– Banner DOM + CSS, polling logic, i18n keys, cache busters.

State machine

condition	state
provider not in `{onnx, fastembed}` OR component disabled	`skipped`
`_load_error` set	`error`
`_model is not None`	`ready`
`_loading=True` and cache absent	`downloading`
`_loading=True` and cache present	`loading`
otherwise	`cold`

cold is intentionally non-terminal for doSearch()-initiated polls — the first tick can race the request and observe cold before the backend's lazy loader flips _loading=True. The boot hydrate does treat cold as terminal so a fully-warm install doesn't run a needless 4 s/tick background loop.

Out of scope (deliberate)

Real per-file download progress (option C). Documented in First search is silent for 30s+ while embedding/reranker models download — surface progress #696.
Pre-warming models at server boot. Adds startup latency; revisit if banner alone proves insufficient.
Provider-specific readiness for Ollama / Cohere. Their connection-based readiness is a separate concern; both are reported as state="skipped" here.

Cache-buster note

app.js?v=94→96 and style.css?v=76→77. The v=96 jump leapfrogs an in-flight v=95 from sibling PR #694 (namespace-tooltip). Whichever lands first, the second will need a rebase + bump.

Test plan

Backend:

uv run pytest packages/memtomem/tests/test_embedding_readiness.py packages/memtomem/tests/test_web_model_readiness.py — 20 tests cover the cache probe + state machine.
uv run pytest -m "not ollama" — 3908 passed locally.
uv run ruff check packages/memtomem/src && uv run ruff format --check packages/memtomem/src packages/memtomem/tests clean.

Manual UI (cold-cache reproduction):

mv ~/.memtomem/cache/fastembed ~/.memtomem/cache/fastembed.bak
uv run mm web --host 127.0.0.1 --port 8765 &
# Open the URL — banner is hidden because no search is in flight.
# Trigger a search — banner appears as "Downloading bge-m3 (~2300 MB)…"
# (assuming the user's config selects bge-m3). Banner persists through
# the load and disappears on completion. Search results render.
kill %1
mv ~/.memtomem/cache/fastembed.bak ~/.memtomem/cache/fastembed

Korean toggle: switch language and re-run; banner copy must come from ko.json strings.

🤖 Generated with Claude Code

PR #703 review: ``embedding/onnx.py:_resolve_model`` (short-name → fastembed-id) was duplicated by a hand-rolled ``_resolve_fastembed_model_id`` in ``web/routes/system.py``, and the approximate-size map in ``embedding/readiness.py`` lived a third copy that drifted from both the wizard text in ``cli/init_cmd.py`` and fastembed's own ``size_in_GB`` metadata. Reviewer surfaced concrete drift on bge-m3: * fastembed metadata (and ``add_custom_model(size_in_gb=2.3)``): 2300 MB * readiness banner copy (correct): 2300 MB * init wizard text (wrong): 1.2 GB A user who picked bge-m3 after reading "~1.2 GB" in the wizard would then see "Downloading bge-m3 (~2300 MB)…" in the banner — a ~2× jump. ``bge-small-en-v1.5`` and ``all-MiniLM-L6-v2`` had similar mismatches. This commit: * Adds ``embedding/aliases.py`` as the single source of truth for short alias → (fastembed id, dim, MB) plus a separate reranker size table. Sizes match ``TextEmbedding.list_supported_models()`` / ``TextCrossEncoder.list_supported_models()`` exactly. Custom- registered models (just bge-m3 today) carry the size declared on their ``add_custom_model`` call. * Updates ``embedding/onnx.py`` to import ``resolve_embedder_id`` from aliases instead of carrying its own ``_ONNX_MODELS`` map. * Drops the duplicate ``_resolve_fastembed_model_id`` and the local ``_APPROX_SIZE_MB`` from ``web/routes/system.py`` and ``embedding/readiness.py``; both now read from aliases. * Updates the init wizard at ``cli/init_cmd.py`` to render sizes via ``aliases.format_size`` so the user-facing copy and the runtime banner are guaranteed to agree. * Corrects approximate sizes that were wrong even in the readiness table — bge-small-en-v1.5 67 MB (was 130), nomic-embed-text-v1.5 520 MB (was 280), jina-reranker-v2 1110 MB (was 1100). * Adds ``tests/test_embedding_aliases.py`` covering both directions of the lookup plus a snapshot test that pins the legacy short-name contract — a future refactor that re-introduces a private alias map fails this test instead of silently drifting. The remaining minor notes from the review (raw ``_load_error`` in API response, polling-cap silent stop, blob-completeness probe) are deliberately out of scope here — see the PR thread. Co-Authored-By: Claude <[email protected]>

memtomem · 2026-05-02T04:05:10Z

Addressed both blocking review issues:

#1 Size discrepancy — Used fastembed's list_supported_models() size_in_GB (and add_custom_model(size_in_gb=2.3) for bge-m3) as the source of truth. Corrected several wrong numbers in both directions:

model	before	after
`BAAI/bge-m3` (init wizard)	1.2 GB	2.3 GB
`BAAI/bge-small-en-v1.5` (init wizard)	33 MB	67 MB
`BAAI/bge-small-en-v1.5` (readiness map)	130 MB	67 MB
`sentence-transformers/all-MiniLM-L6-v2` (init wizard)	22 MB	90 MB
`nomic-ai/nomic-embed-text-v1.5` (readiness map)	280 MB	520 MB
`jinaai/jina-reranker-v2-base-multilingual` (readiness map)	1100 MB	1110 MB

The wizard's "bge-m3 is ~1.2 GB (similar to Ollama models)" copy was also adjusted to "~2.3 GB (substantial download)" — calling 2.3 GB "similar to Ollama" was misleading.

#2 Alias map dedup — Added embedding/aliases.py as the single source of truth. OnnxEmbedder._resolve_model, _resolve_fastembed_model_id in routes, and the _APPROX_SIZE_MB map in readiness.py all gone. Init wizard reads sizes via aliases.format_size so wizard text and banner agree by construction.

Drift guard: tests/test_embedding_aliases.py::test_resolve_embedder_id_matches_legacy_resolver snapshots the short-name contract — a future refactor that reintroduces a private duplicate fails the test instead of silently drifting.

Remaining minor notes are deliberately out of scope here:

Raw _load_error in the API response — local-only tool, low risk; would muddy the schema if I started truncating.
Polling cap silent stop at 13 min — the cap is already a worst-case safeguard (visibilitychange re-arms); bumping it without changing behavior felt like noise.
Cache-presence probe doesn't verify blob completeness — the brief loading-instead-of-downloading UX wobble during the actual download isn't worth complicating the probe.

Tests: 3923 passed (added 15 alias/dedup tests on top of the 3908 baseline). ruff check + format --check clean.

``OnnxEmbedder`` and ``FastEmbedReranker`` lazily instantiate fastembed models on first use. Until now there was no way to tell from outside the class whether a download was in flight, the model was loaded, or the cache was simply cold — fastembed itself does not surface progress, so the calling layer was blind for the entire 30-second-to-multi-minute window. Add two observability fields to both lazy loaders: * ``_loading: bool`` — True between entering ``_get_model()`` and the fastembed constructor returning. * ``_load_error: str | None`` — last failure message, set on exception inside the constructor and re-raised. Plain attribute reads/writes; bool/Optional[str] assignment is atomic under CPython, and the upcoming ``GET /api/system/model-readiness`` endpoint is allowed to observe transient states without taking a lock. Also add ``embedding/readiness.py`` with ``model_snapshot_present()`` — a filesystem-only check for whether a complete fastembed snapshot exists in the cache directory. The function walks ``cache_dir/models--<sanitized>/snapshots/`` and accepts the first subdirectory that contains ``config.json``, ``tokenizer.json``, and either flat ``model.onnx`` or nested ``onnx/model.onnx`` (fastembed uses the nested form for ``BAAI/bge-m3`` and the multilingual reranker; the flat form for the smaller English models). A small ``_APPROX_SIZE_MB`` map populated from the documented model list in ``cli/init_cmd.py`` lets the upcoming banner render "Downloading bge-m3 (~2.3 GB)…" without an extra network call. Co-Authored-By: Claude <[email protected]>

Adds a read-only endpoint the SPA can poll to populate a "Downloading bge-m3 (~2.3 GB)…" / "Loading model…" banner instead of leaving the user staring at a frozen Search button while a multi-GB fastembed snapshot streams in. Response covers both the embedder and the reranker: ``` GET /api/system/model-readiness → { embedder: {state, provider, model, cache_present, approx_size_mb, error}, reranker: {state, ...} // state="skipped" when rerank.enabled is False } ``` State per component, derived from the ``_model`` / ``_loading`` / ``_load_error`` flags introduced in the previous commit plus a filesystem probe of ``cache_dir/models--<sanitized>/snapshots/<sha>/``: * ``ready`` — model loaded in memory. * ``loading`` — cache present, constructor in flight. * ``downloading`` — cache absent, constructor in flight. * ``cold`` — nothing in flight (cache may or may not be present). * ``error`` — last constructor attempt raised. * ``skipped`` — provider routes through Ollama/Cohere/etc., or the component is disabled. Providers introspected through this endpoint are restricted to the fastembed-backed paths (``"onnx"`` for the embedder, ``"fastembed"`` for the reranker). Ollama and Cohere have their own connection-based readiness model and are reported as ``skipped`` — wiring them in deserves a separate decision pass, not a quiet conflation here. The endpoint never calls ``_get_model()`` itself, so polling it cannot amplify load on a struggling installation. Cache-presence probes go through ``model_snapshot_present`` which is filesystem-only. Schema lives in ``web/schemas/config.py`` next to the existing ``EmbeddingStatusResponse`` / ``EmbeddingResetResponse`` so the embedding-related types stay colocated. UI wiring lands in the next commit. Co-Authored-By: Claude <[email protected]>

Surfaces the readiness endpoint added in the previous commit as a header banner so a cold-cache install no longer leaves users staring at a frozen Search button while ``BAAI/bge-m3`` (~2.3 GB) streams in. Closes the user-visible half of #696. Banner copy is built from ``GET /api/system/model-readiness``: * Both components downloading → "Downloading bge-m3 (~2300 MB) and jina-reranker (~1100 MB)…" * One downloading → "Downloading bge-m3 (~2300 MB)…" (or the ``..._no_size`` variant for unknown models) * Loading from cache, no download → "Loading model…" * Hard error → "Model failed to load — check Settings." * Both ready / skipped → banner hidden, polling stops. Polling uses the same single-flight + setTimeout idiom as ``_indexingPollUntilIdle`` (4-second interval, capped at 200 ticks ≈ 13 min so a stuck server doesn't yield infinite background fetches). Three entry points kick the loop: 1. Boot — ``_modelReadinessHydrate()`` runs from the DOMContentLoaded handler. Fetches once; only starts continuous polling if at least one component is actively loading or has errored. 2. ``visibilitychange`` — re-hydrates when the tab regains focus so a load that finished while backgrounded doesn't leave the banner stuck up. 3. ``doSearch()`` pre-flight — kicks ``_modelReadinessPoll()`` on every search submission. The first tick may race the request and observe ``state="cold"``; ``cold`` is intentionally non-terminal here so the next tick catches the ``_loading=True`` flip on the server side. Five new ``banner.model_*`` i18n keys land in both ``en.json`` and ``ko.json`` plus two fallback name keys for use when the response omits the model identifier. The CSS reuses the visual language of ``.dev-mode-banner`` (accent-tinted background, single-row). ``index.html`` cache busters bumped: ``style.css?v=76→77`` (banner class added) and ``app.js?v=94→96`` (polling logic + ``doSearch`` pre-flight). The ``v=96`` jump leapfrogs an in-flight v=95 from a sibling PR; if that lands first, rebase will need a bump. Co-Authored-By: Claude <[email protected]>

PR #703 review: ``embedding/onnx.py:_resolve_model`` (short-name → fastembed-id) was duplicated by a hand-rolled ``_resolve_fastembed_model_id`` in ``web/routes/system.py``, and the approximate-size map in ``embedding/readiness.py`` lived a third copy that drifted from both the wizard text in ``cli/init_cmd.py`` and fastembed's own ``size_in_GB`` metadata. Reviewer surfaced concrete drift on bge-m3: * fastembed metadata (and ``add_custom_model(size_in_gb=2.3)``): 2300 MB * readiness banner copy (correct): 2300 MB * init wizard text (wrong): 1.2 GB A user who picked bge-m3 after reading "~1.2 GB" in the wizard would then see "Downloading bge-m3 (~2300 MB)…" in the banner — a ~2× jump. ``bge-small-en-v1.5`` and ``all-MiniLM-L6-v2`` had similar mismatches. This commit: * Adds ``embedding/aliases.py`` as the single source of truth for short alias → (fastembed id, dim, MB) plus a separate reranker size table. Sizes match ``TextEmbedding.list_supported_models()`` / ``TextCrossEncoder.list_supported_models()`` exactly. Custom- registered models (just bge-m3 today) carry the size declared on their ``add_custom_model`` call. * Updates ``embedding/onnx.py`` to import ``resolve_embedder_id`` from aliases instead of carrying its own ``_ONNX_MODELS`` map. * Drops the duplicate ``_resolve_fastembed_model_id`` and the local ``_APPROX_SIZE_MB`` from ``web/routes/system.py`` and ``embedding/readiness.py``; both now read from aliases. * Updates the init wizard at ``cli/init_cmd.py`` to render sizes via ``aliases.format_size`` so the user-facing copy and the runtime banner are guaranteed to agree. * Corrects approximate sizes that were wrong even in the readiness table — bge-small-en-v1.5 67 MB (was 130), nomic-embed-text-v1.5 520 MB (was 280), jina-reranker-v2 1110 MB (was 1100). * Adds ``tests/test_embedding_aliases.py`` covering both directions of the lookup plus a snapshot test that pins the legacy short-name contract — a future refactor that re-introduces a private alias map fails this test instead of silently drifting. The remaining minor notes from the review (raw ``_load_error`` in API response, polling-cap silent stop, blob-completeness probe) are deliberately out of scope here — see the PR thread. Co-Authored-By: Claude <[email protected]>

pandas-studio and others added 4 commits May 2, 2026 13:10

memtomem force-pushed the feat/model-readiness-banner branch from 3090824 to aab7e3f Compare May 2, 2026 04:10

memtomem merged commit b2b24a3 into main May 2, 2026
8 of 9 checks passed

github-actions Bot locked and limited conversation to collaborators May 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(web): surface fastembed model load state via banner (#696)#703

feat(web): surface fastembed model load state via banner (#696)#703
memtomem merged 4 commits intomainfrom
feat/model-readiness-banner

memtomem commented May 2, 2026

Uh oh!

memtomem commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

memtomem commented May 2, 2026

Summary

Why option B

Commits

State machine

Out of scope (deliberate)

Cache-buster note

Test plan

Uh oh!

memtomem commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants