feat(embedding): add native loading for BERT/XLMRoBERTa embedding models#330
Merged
jundot merged 2 commits intojundot:mainfrom Mar 21, 2026
Merged
Conversation
added 2 commits
March 21, 2026 08:45
- Add _load_native() method for loading embedding models via omlx's native xlm_roberta.py implementation (BERT, XLMRoBERTa architectures) - Fall back to mlx-embeddings for other embedding architectures - Fix tokenizer handling for both PreTrainedTokenizer (callable) and tokenizers.Tokenizer (encode method) in embed() - Fix config ModelArgs property setter issue for embedding mode Supports: BERT, XLMRoBERTa embedding models without mlx-embeddings dependency
- Test _load_native() for BERT and XLMRoBERTa architectures - Test fallback for unknown architectures - Test that embed produces L2-normalized vectors
f6faf2f to
c2beead
Compare
jundot
approved these changes
Mar 21, 2026
Owner
jundot
left a comment
There was a problem hiding this comment.
Reviewed the code. Clean scope, no issues found.
Native loading reuses the existing xlm_roberta.py implementation nicely, and the mlx-embeddings fallback keeps backward compatibility intact. Attention mask handling for padding is also correct.
LGTM, good to merge.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Add native embedding support for BERT/XLMRoBERTa-family MLX models in
omlx/models/embedding.py, withmlx-embeddingskept as the fallback for unsupported architectures.Why
Some embedding models (for example
BAAI/bge-m3MLX variants) are not supported bymlx-embeddings, even though oMLX already has a native XLM-RoBERTa implementation that can produce normalized text embeddings.Changes
_load_native()to detect and load localBertModel/BertForMaskedLM/XLMRobertaModelembedding models nativelymlx-embeddingspath as the fallback for other architecturesembed()mlx-embeddingspathValidation
Local smoke-tested with real models:
bge-small-en-v1.5mxbai-embed-large-v1mlx-community/bge-m3-mlx-fp16All loaded successfully and returned embeddings end-to-end.
Notes
This PR is intentionally kept narrow to embedding only. Jina reranker support is excluded and can be proposed separately after dedicated verification.