Skip to content

fix(embedding): support custom processor input preparation#369

Merged
jundot merged 1 commit intojundot:mainfrom
MasakiMu319:fix/qwen3-vl-embedding
Mar 29, 2026
Merged

fix(embedding): support custom processor input preparation#369
jundot merged 1 commit intojundot:mainfrom
MasakiMu319:fix/qwen3-vl-embedding

Conversation

@MasakiMu319
Copy link
Copy Markdown
Contributor

@MasakiMu319 MasakiMu319 commented Mar 24, 2026

Summary

  • route embedding requests through custom processor hooks when available
  • keep the existing generic tokenizer path for standard text embedding models
  • add regression coverage for both compiled and eager embedding execution

Why

Qwen3-VL embedding models can be loaded through mlx-embeddings, but oMLX always used the generic processor(texts, ...) path for embedding requests. For custom processors such as qwen3_vl, that positional call is interpreted as image input, which breaks /v1/embeddings even when the model is explicitly treated as an embedding model.

Testing

  • uv run pytest tests/test_embedding.py -k "custom_processor or compiled_path_fallback_on_failure or is_compiled_false_uses_eager_path"
  • started oMLX with Qwen3-VL-Embedding-2B-mxfp8 forced to embedding and verified /v1/embeddings returns vectors for single and batch text inputs

Why:
Qwen3-VL embedding models can be loaded through mlx-embeddings, but oMLX always used the generic processor(texts, ...) path for embedding requests. For custom processors such as qwen3_vl, that positional call is interpreted as image input, which breaks /v1/embeddings even when the model is explicitly treated as an embedding model.

What:
Detect processors that expose custom embedding input hooks and route embedding requests through prepare_embedding_inputs/prepare_model_inputs instead of the generic tokenizer path. Keep the existing path for standard text processors, and add regression coverage for both compiled and eager execution.
@jundot
Copy link
Copy Markdown
Owner

jundot commented Mar 29, 2026

Thanks for the fix @MasakiMu319, the approach looks clean and correct.

Routing through prepare_embedding_inputs / prepare_model_inputs when the processor exposes them makes sense, and the fallback to the generic prepare_inputs path keeps things safe for standard text models.

I verified the existing text embedding path is not affected by this change. The tests cover both compiled and eager execution with custom processors, which is what i want to see.

Merging this.

@jundot jundot merged commit 6d132e4 into jundot:main Mar 29, 2026
AlexTzk pushed a commit to AlexTzk/omlx that referenced this pull request Mar 29, 2026
Why:
Qwen3-VL embedding models can be loaded through mlx-embeddings, but oMLX always used the generic processor(texts, ...) path for embedding requests. For custom processors such as qwen3_vl, that positional call is interpreted as image input, which breaks /v1/embeddings even when the model is explicitly treated as an embedding model.

What:
Detect processors that expose custom embedding input hooks and route embedding requests through prepare_embedding_inputs/prepare_model_inputs instead of the generic tokenizer path. Keep the existing path for standard text processors, and add regression coverage for both compiled and eager execution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants