fix(embedding): support custom processor input preparation by MasakiMu319 · Pull Request #369 · jundot/omlx

MasakiMu319 · 2026-03-24T10:52:38Z

Summary

route embedding requests through custom processor hooks when available
keep the existing generic tokenizer path for standard text embedding models
add regression coverage for both compiled and eager embedding execution

Why

Qwen3-VL embedding models can be loaded through mlx-embeddings, but oMLX always used the generic processor(texts, ...) path for embedding requests. For custom processors such as qwen3_vl, that positional call is interpreted as image input, which breaks /v1/embeddings even when the model is explicitly treated as an embedding model.

Testing

uv run pytest tests/test_embedding.py -k "custom_processor or compiled_path_fallback_on_failure or is_compiled_false_uses_eager_path"
started oMLX with Qwen3-VL-Embedding-2B-mxfp8 forced to embedding and verified /v1/embeddings returns vectors for single and batch text inputs

Why: Qwen3-VL embedding models can be loaded through mlx-embeddings, but oMLX always used the generic processor(texts, ...) path for embedding requests. For custom processors such as qwen3_vl, that positional call is interpreted as image input, which breaks /v1/embeddings even when the model is explicitly treated as an embedding model. What: Detect processors that expose custom embedding input hooks and route embedding requests through prepare_embedding_inputs/prepare_model_inputs instead of the generic tokenizer path. Keep the existing path for standard text processors, and add regression coverage for both compiled and eager execution.

jundot · 2026-03-29T09:36:37Z

Thanks for the fix @MasakiMu319, the approach looks clean and correct.

Routing through prepare_embedding_inputs / prepare_model_inputs when the processor exposes them makes sense, and the fallback to the generic prepare_inputs path keeps things safe for standard text models.

I verified the existing text embedding path is not affected by this change. The tests cover both compiled and eager execution with custom processors, which is what i want to see.

Merging this.

Why: Qwen3-VL embedding models can be loaded through mlx-embeddings, but oMLX always used the generic processor(texts, ...) path for embedding requests. For custom processors such as qwen3_vl, that positional call is interpreted as image input, which breaks /v1/embeddings even when the model is explicitly treated as an embedding model. What: Detect processors that expose custom embedding input hooks and route embedding requests through prepare_embedding_inputs/prepare_model_inputs instead of the generic tokenizer path. Keep the existing path for standard text processors, and add regression coverage for both compiled and eager execution.

MasakiMu319 mentioned this pull request Mar 26, 2026

feat(embedding): add multimodal embedding items #373

Merged

jundot force-pushed the main branch from 65b4ef1 to 2e39d71 Compare March 28, 2026 01:20

jundot merged commit 6d132e4 into jundot:main Mar 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(embedding): support custom processor input preparation#369

fix(embedding): support custom processor input preparation#369
jundot merged 1 commit intojundot:mainfrom
MasakiMu319:fix/qwen3-vl-embedding

MasakiMu319 commented Mar 24, 2026 •

edited

Loading

Uh oh!

jundot commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MasakiMu319 commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Testing

Uh oh!

jundot commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MasakiMu319 commented Mar 24, 2026 •

edited

Loading