Skip to content

Conversation

@yifant-code
Copy link
Contributor

Fixes #6263

Problem

Server accepts mismatched --batch-size and --ubatch-size values when --embedding is enabled, leading to incoherent configuration.

Embeddings use non-causal attention which requires all tokens in a single ubatch (n_batch == n_ubatch). Default values differ (n_batch=2048, n_ubatch=512), so users frequently encounter this issue.

Solution

Add parameter validation in main():

  • Detect when --embedding enabled and n_batch != n_ubatch
  • Log warnings explaining the requirement
  • Automatically set both to min(n_batch, n_ubatch)

Uses auto-correction approach (suggested by @mirekphd) for better UX than strict rejection.

Testing

✅ Builds successfully
✅ Validation triggers: -b 2048 -ub 512 --embedding → logs warnings, sets both=512
✅ No false positives: -b 512 -ub 512 --embedding → silent
✅ Tested on macOS M3 Pro with embedding model

Fixes ggml-org#6263 where server accepts mismatched batch/ubatch values with
embeddings, leading to suboptimal or incorrect behavior.

Problem: Embeddings and reranking use non-causal attention which requires
all tokens to be processed within a single ubatch. When n_batch != n_ubatch,
the configuration is incoherent. Default values differ (n_batch=2048,
n_ubatch=512), so users encounter this frequently.

Solution:
- Add parameter validation in main() after common_params_parse()
- When embeddings enabled and n_batch != n_ubatch:
  * Log warnings explaining the requirement
  * Automatically set both to min(n_batch, n_ubatch)
  * Ensure coherent configuration

This follows the auto-correction approach suggested by @mirekphd
and provides better UX than strict rejection.

Testing:
✅ Builds successfully
✅ Validation triggers: -b 2048 -ub 512 --embedding → logs warnings, adjusts both to 512
✅ No false positives: -b 512 -ub 512 --embedding → no warnings
✅ Verified on macOS M3 Pro with embedding model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

server: exit failure if --embedding is set with an incoherent --ubatch-size

1 participant