Skip to content

feat: add XTC sampler support#337

Merged
jundot merged 5 commits intojundot:mainfrom
blightbow:feat/xtc-sampler
Mar 29, 2026
Merged

feat: add XTC sampler support#337
jundot merged 5 commits intojundot:mainfrom
blightbow:feat/xtc-sampler

Conversation

@blightbow
Copy link
Copy Markdown
Contributor

Thread mlx-lm's XTC (eXclude Top Choices) sampling parameters through the full request pipeline. XTC was the only mlx-lm sampler missing from the omlx API surface.

  • Add xtc_probability and xtc_threshold fields to SamplingParams dataclass (default 0.0 = disabled)
  • Add optional xtc_probability and xtc_threshold to both ChatCompletionRequest and CompletionRequest API models
  • Extend get_sampling_params() to resolve XTC values with the same request > default priority as other sampling params
  • Thread XTC params through chat_kwargs dicts and direct engine calls across all API endpoints (chat, completion, anthropic messages, responses)
  • Extract XTC params from kwargs in BatchedEngine and VLMBatchedEngine SamplingParams construction
  • Pass xtc_probability, xtc_threshold, and xtc_special_tokens to both make_sampler() call sites in the scheduler
  • Add _get_xtc_special_tokens() helper to Scheduler, delegating to _get_stop_tokens() for EOS coverage and caching the result at init time
  • Add 10 new tests covering defaults, passthrough, API model acceptance, and special token derivation

@blightbow
Copy link
Copy Markdown
Contributor Author

blightbow commented Mar 21, 2026

I aligned to the upstream mlx-lm defaults on this PR. This means both parameters must be specified in order for the feature to work. It should be noted that the original author's implementation (oobabooga/text-generation-webui#6335) uses a threshold of 0.1:

https://github.com/p-e-w/text-generation-webui/blob/29d38a74ba8d47206f2371b44a746111fac80f4c/modules/presets.py#L47-L48

@jundot I personally think that principle of least astonishment is more in favor of the 0.1 default (don't silently fail when the user wants the feature enabled but doesn't set a threshold), but I'll defer to your preference on this one. Feel free to tweak that value yourself if you accept the PR.

@blightbow
Copy link
Copy Markdown
Contributor Author

blightbow commented Mar 21, 2026

On second thought, it doesn't make sense to model after upstream on this. A xtc_threshold of 0.0 when xtc_probability >0 doesn't actually disable it! I had Claude Code verify my analysis:

❯ Wait, let's double check that. xtc_threshold of 0.0 when xtc_probability > 0.0 is not treated as disabled?

(snip)

⏺ Good catch to verify. With xtc_threshold=0.0:

mask = probs > mx.where(probs > 0.0, probs, mx.inf).min()

Since all probabilities are > 0.0 (after softmax), mx.where(probs > 0.0, probs, mx.inf) returns all probs unchanged, and .min() gives the minimum probability. The mask then selects all tokens with probability greater than the minimum — so all tokens except the least probable one get masked to -inf.

That's not "disabled" — it's maximally aggressive. It would exclude every token except the lowest-probability one (plus special tokens). With xtc_probability=0.5, that would happen on 50% of token samples.

So upstream's 0.0/0.0 is safe only because both default to 0 and the if xtc_probability > 0.0 gate prevents apply_xtc from ever being called. But if a user sets xtc_probability without setting xtc_threshold, they'd get destructive behavior silently.

As such, I would consider upstream's default to be a bug. I'll change our own default and log that bug separately.

Thread mlx-lm's XTC (eXclude Top Choices) sampling parameters
through the full request pipeline. XTC was the only mlx-lm
sampler missing from the omlx API surface.

- Add xtc_probability and xtc_threshold fields to SamplingParams
  dataclass (default 0.0 and 0.1 respectively)
- Default xtc_threshold to 0.1 instead of upstream's 0.0 to
  prevent destructive sampling when only probability is set
  (upstream threshold=0.0 excludes all tokens except the least
  probable one)
- Add optional xtc_probability and xtc_threshold to both
  ChatCompletionRequest and CompletionRequest API models
- Extend get_sampling_params() to resolve XTC values with the
  same request > default priority as other sampling params
- Thread XTC params through chat_kwargs dicts and direct engine
  calls across all API endpoints (chat, completion, anthropic
  messages, responses)
- Extract XTC params from kwargs in BatchedEngine and
  VLMBatchedEngine SamplingParams construction
- Pass xtc_probability, xtc_threshold, and xtc_special_tokens
  to both make_sampler() call sites in the scheduler
- Add _get_xtc_special_tokens() helper to Scheduler, delegating
  to _get_stop_tokens() for EOS coverage and caching the result
  at init time
- Add 10 new tests covering defaults, passthrough, API model
  acceptance, and special token derivation

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Signed-off-by: Blightbow <[email protected]>
@jundot jundot force-pushed the main branch 2 times, most recently from 475d3bf to a47ef55 Compare March 22, 2026 15:37
@jundot jundot force-pushed the main branch 7 times, most recently from bef2aeb to 86720d8 Compare March 23, 2026 19:49
@jundot
Copy link
Copy Markdown
Owner

jundot commented Mar 29, 2026

Reviewed the full diff. Looking good.

The safe default for xtc_threshold (0.1 instead of upstream's 0.0) is a nice touch. Upstream's 0.0 would nuke sampling if someone only sets probability without thinking about threshold.

_get_xtc_special_tokens() reusing _get_stop_tokens() keeps things clean. All 5 get_sampling_params() call sites are updated, tests cover the important paths.

The growing tuple return from get_sampling_params() (now 10 elements) is getting unwieldy but that's pre-existing tech debt, not something to hold this up for. I'll clean that up separately.

No regression concerns. XTC defaults to disabled (probability=0.0) so existing behavior is untouched.

@jundot jundot merged commit f18c6a4 into jundot:main Mar 29, 2026
@jundot
Copy link
Copy Markdown
Owner

jundot commented Mar 29, 2026

@blightbow v0.3.0rc1 is out with your XTC sampler included. https://github.com/jundot/omlx/releases/tag/v0.3.0rc1 — if you get a chance, please give it a test and let me know if anything looks off. thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants