Conversation
|
I aligned to the upstream mlx-lm defaults on this PR. This means both parameters must be specified in order for the feature to work. It should be noted that the original author's implementation (oobabooga/text-generation-webui#6335) uses a threshold of 0.1: @jundot I personally think that principle of least astonishment is more in favor of the 0.1 default (don't silently fail when the user wants the feature enabled but doesn't set a threshold), but I'll defer to your preference on this one. Feel free to tweak that value yourself if you accept the PR. |
|
On second thought, it doesn't make sense to model after upstream on this. A xtc_threshold of 0.0 when xtc_probability >0 doesn't actually disable it! I had Claude Code verify my analysis:
As such, I would consider upstream's default to be a bug. I'll change our own default and log that bug separately. |
Thread mlx-lm's XTC (eXclude Top Choices) sampling parameters through the full request pipeline. XTC was the only mlx-lm sampler missing from the omlx API surface. - Add xtc_probability and xtc_threshold fields to SamplingParams dataclass (default 0.0 and 0.1 respectively) - Default xtc_threshold to 0.1 instead of upstream's 0.0 to prevent destructive sampling when only probability is set (upstream threshold=0.0 excludes all tokens except the least probable one) - Add optional xtc_probability and xtc_threshold to both ChatCompletionRequest and CompletionRequest API models - Extend get_sampling_params() to resolve XTC values with the same request > default priority as other sampling params - Thread XTC params through chat_kwargs dicts and direct engine calls across all API endpoints (chat, completion, anthropic messages, responses) - Extract XTC params from kwargs in BatchedEngine and VLMBatchedEngine SamplingParams construction - Pass xtc_probability, xtc_threshold, and xtc_special_tokens to both make_sampler() call sites in the scheduler - Add _get_xtc_special_tokens() helper to Scheduler, delegating to _get_stop_tokens() for EOS coverage and caching the result at init time - Add 10 new tests covering defaults, passthrough, API model acceptance, and special token derivation Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Blightbow <[email protected]>
3a99fbd to
5e71737
Compare
475d3bf to
a47ef55
Compare
bef2aeb to
86720d8
Compare
|
Reviewed the full diff. Looking good. The safe default for xtc_threshold (0.1 instead of upstream's 0.0) is a nice touch. Upstream's 0.0 would nuke sampling if someone only sets probability without thinking about threshold. _get_xtc_special_tokens() reusing _get_stop_tokens() keeps things clean. All 5 get_sampling_params() call sites are updated, tests cover the important paths. The growing tuple return from get_sampling_params() (now 10 elements) is getting unwieldy but that's pre-existing tech debt, not something to hold this up for. I'll clean that up separately. No regression concerns. XTC defaults to disabled (probability=0.0) so existing behavior is untouched. |
|
@blightbow v0.3.0rc1 is out with your XTC sampler included. https://github.com/jundot/omlx/releases/tag/v0.3.0rc1 — if you get a chance, please give it a test and let me know if anything looks off. thanks! |
Thread mlx-lm's XTC (eXclude Top Choices) sampling parameters through the full request pipeline. XTC was the only mlx-lm sampler missing from the omlx API surface.