Skip to content

feat(provider): add mlx provider with local context-window detection#275

Merged
LeeCheneler merged 3 commits intomainfrom
feat/add-mlx-provider
Apr 10, 2026
Merged

feat(provider): add mlx provider with local context-window detection#275
LeeCheneler merged 3 commits intomainfrom
feat/add-mlx-provider

Conversation

@LeeCheneler
Copy link
Copy Markdown
Owner

Summary

Adds MLX (mlx_lm.server) as a first-class provider type so users can point
tomo at a local MLX server without masquerading it as Ollama. Also teaches
the provider to read the context window from the HuggingFace cache on disk,
since mlx_lm doesn't expose that info over HTTP.

GitHub Issue

N/A

What Changed

New mlx provider type. Adds "mlx" to providerTypeSchema, wires up a
default URL (http://127.0.0.1:8080) and MLX_API_KEY env var in
provider/client.ts, and exposes it in the settings dropdown. Because MLX
is OpenAI-compatible, model listing flows through the existing /v1/models
path — no new client implementation required. Previously users who selected
"ollama" and pointed the URL at MLX hit Ollama's native /api/show and
/api/tags endpoints and got confusing results.

Context window from HuggingFace config.json. mlx_lm.server's /v1/models
returns only id/object/created — no context length — and there's no
other endpoint that exposes it. Since mlx_lm downloads HF models to the
local cache before serving, we read config.json directly from
~/.cache/huggingface/hub/models--<org>--<repo>/snapshots/*/. Field lookup
walks max_position_embeddingsmax_sequence_lengthseq_length
n_positionssliding_window at the top level first, then falls back to
text_config for multimodal models (e.g. gemma-3). HF_HUB_CACHE and
HF_HOME env vars are honoured, and absolute local model paths are
supported. Any failure falls through to the existing 8192 default.

Test-stability fixups. Spinner assertions in chat/model-selector/providers
tests hardcoded frame 0 (), which raced against the 80ms spinner tick —
relaxed to match any frame in the cycle. Also bumped flushInkFrames from
25ms to 50ms to give ink input parsing more headroom on slower hosts.

Notes for Reviewers

  • fetchMlxContextWindow is synchronous (fs reads are cheap) but called
    from the async fetchContextWindow wrapper — the outer try/catch still
    catches any unexpected throw.
  • The HF cache layout uses models--<org>--<repo>/snapshots/<hash>/ — we
    take the first snapshot directory found. Multiple snapshots from stale
    revisions shouldn't matter in practice since max_position_embeddings
    is architectural, not per-revision.
  • Coverage is at 100% across the new code and the rest of the repo.

MLX server (mlx_lm.server) exposes an OpenAI-compatible API on
http://127.0.0.1:8080 by default. Treating it as its own provider type
ensures model listing and context-window detection go through the
/v1/models path rather than Ollama's native /api/show endpoint.
mlx_lm.server's /v1/models doesn't expose any context size, so look
up max_position_embeddings (falling back through max_sequence_length,
seq_length, n_positions, sliding_window) directly from the HuggingFace
cache on disk. Checks top-level first, then text_config for multimodal
models like gemma-3. Honours HF_HUB_CACHE and HF_HOME, and supports
absolute local model paths.
Spinner ticks on an 80ms interval, so hardcoding frame 0 (⠋) races
with the flush delay. Match any frame in the cycle instead. Also bump
flushInkFrames from 25ms to 50ms for more timing headroom on slower
hosts.
@LeeCheneler LeeCheneler merged commit 2fe1ef0 into main Apr 10, 2026
4 checks passed
@LeeCheneler LeeCheneler deleted the feat/add-mlx-provider branch April 10, 2026 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant