feat(embedding): add native loading for BERT/XLMRoBERTa embedding models by yes999zc · Pull Request #330 · jundot/omlx

yes999zc · 2026-03-21T00:46:18Z

What

Add native embedding support for BERT/XLMRoBERTa-family MLX models in omlx/models/embedding.py, with mlx-embeddings kept as the fallback for unsupported architectures.

Why

Some embedding models (for example BAAI/bge-m3 MLX variants) are not supported by mlx-embeddings, even though oMLX already has a native XLM-RoBERTa implementation that can produce normalized text embeddings.

Changes

add _load_native() to detect and load local BertModel / BertForMaskedLM / XLMRobertaModel embedding models natively
keep mlx-embeddings path as the fallback for other architectures
support native tokenization + forward pass in embed()
preserve existing compiled eager/fallback behavior for the mlx-embeddings path
add tests for native BERT/XLMRoBERTa embedding loading

Validation

Local smoke-tested with real models:

bge-small-en-v1.5
mxbai-embed-large-v1
mlx-community/bge-m3-mlx-fp16

All loaded successfully and returned embeddings end-to-end.

Notes

This PR is intentionally kept narrow to embedding only. Jina reranker support is excluded and can be proposed separately after dedicated verification.

- Add _load_native() method for loading embedding models via omlx's native xlm_roberta.py implementation (BERT, XLMRoBERTa architectures) - Fall back to mlx-embeddings for other embedding architectures - Fix tokenizer handling for both PreTrainedTokenizer (callable) and tokenizers.Tokenizer (encode method) in embed() - Fix config ModelArgs property setter issue for embedding mode Supports: BERT, XLMRoBERTa embedding models without mlx-embeddings dependency

- Test _load_native() for BERT and XLMRoBERTa architectures - Test fallback for unknown architectures - Test that embed produces L2-normalized vectors

jundot

Reviewed the code. Clean scope, no issues found.

Native loading reuses the existing xlm_roberta.py implementation nicely, and the mlx-embeddings fallback keeps backward compatibility intact. Attention mask handling for padding is also correct.

LGTM, good to merge.

FocusFlow Dev added 2 commits March 21, 2026 08:45

test(embedding): add tests for native BERT/XLMRoBERTa embedding loading

1f5db23

- Test _load_native() for BERT and XLMRoBERTa architectures - Test fallback for unknown architectures - Test that embed produces L2-normalized vectors

jundot force-pushed the main branch 7 times, most recently from f6faf2f to c2beead Compare March 21, 2026 05:58

jundot approved these changes Mar 21, 2026

View reviewed changes

jundot merged commit 1e41845 into jundot:main Mar 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(embedding): add native loading for BERT/XLMRoBERTa embedding models#330

feat(embedding): add native loading for BERT/XLMRoBERTa embedding models#330
jundot merged 2 commits intojundot:mainfrom
yes999zc:feat/native-embedding-clean

yes999zc commented Mar 21, 2026

Uh oh!

jundot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yes999zc commented Mar 21, 2026

What

Why

Changes

Validation

Notes

Uh oh!

jundot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants