server: validate n_batch == n_ubatch for embeddings (#6263) #18123
+15
−9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #6263
Problem
Server accepts mismatched
--batch-sizeand--ubatch-sizevalues when--embeddingis enabled, leading to incoherent configuration.Embeddings use non-causal attention which requires all tokens in a single ubatch (
n_batch == n_ubatch). Default values differ (n_batch=2048, n_ubatch=512), so users frequently encounter this issue.Solution
Add parameter validation in
main():--embeddingenabled andn_batch != n_ubatchmin(n_batch, n_ubatch)Uses auto-correction approach (suggested by @mirekphd) for better UX than strict rejection.
Testing
✅ Builds successfully
✅ Validation triggers:
-b 2048 -ub 512 --embedding→ logs warnings, sets both=512✅ No false positives:
-b 512 -ub 512 --embedding→ silent✅ Tested on macOS M3 Pro with embedding model