feat: add partial mode for assistant message prefill by blightbow · Pull Request #306 · jundot/omlx

blightbow · 2026-03-18T19:56:20Z

This PR implements support for Moonshot's "partial mode", an extension to the OpenAI chat completion API that enables message prefills and renaming of role anchors. This enables prefill patterns (JSON mode via "content": "{") and named-assistant persona consistency (Kimi K2/K2.5 name field rendering).

Full disclosure: Moonshot has not publicly documented how "partial mode" is implemented on their API backend, but when you compare their documentation to the model templates for K2 and K2.5, it becomes pretty obvious what they're doing here.

The underlying infrastructure is already present in mlx_lm and the chat templates of Kimi K2 and K2.5. This change allows the name field to be passed through to the backend (in all circumstances), and reserves the partial field as a boolean toggle for switching between continue_final_message=True and add_generation_prompt=True in apply_chat_template. partial is consumed by the API server and not passed to the model template, unlike name.

For more information on Partial Mode, refer to the following:

Moonshot's Partial Mode documentation: https://platform.moonshot.ai/docs/api/partial
My PR previously submitted to mlx-openai-server: feat: add Moonshot's partial mode extension cubist38/mlx-openai-server#213

Schema changes:

Add name and partial fields to Message model
Add detect_and_strip_partial() helper in api/utils.py
Preserve name/partial through extract_text_content() and extract_multimodal_content() message extraction

Engine changes:

BatchedEngine._apply_chat_template(): conditional generation prompt based on partial mode detection
VLMBatchedEngine: strip partial field in both template methods but always use add_generation_prompt=True (vision models do not support continuation)
MLXLanguageModel.chat(): same partial mode logic as BatchedEngine

When the final assistant message has `partial: true`, use `continue_final_message=True` instead of `add_generation_prompt=True` in apply_chat_template. This enables prefill patterns (JSON mode via `"content": "{"`) and named-assistant persona consistency (Kimi K2/K2.5 `name` field rendering). Schema changes: - Add `name` and `partial` fields to Message model - Add `detect_and_strip_partial()` helper in api/utils.py - Preserve `name`/`partial` through extract_text_content() and extract_multimodal_content() message extraction Engine changes: - BatchedEngine._apply_chat_template(): conditional generation prompt based on partial mode detection - VLMBatchedEngine: strip partial field in both template methods but always use add_generation_prompt=True (vision models do not support continuation) - MLXLanguageModel.chat(): same partial mode logic as BatchedEngine Ported from mlx-openai-server PR jundot#213. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

blightbow · 2026-03-19T17:44:07Z

Part of me doesn't like the isolated pop loop in detect_and_strip_partial(), but the only clean way to avoid that is an explicit kwarg for partial mode on the engine. Overriding generation mode via chat_template_kwargs would be clumsy because the mode should be deterministic.

I'm happy to rework that, just let me know what your preference is.

jundot

Thanks for this, clean implementation and solid test coverage. The partial/name flow through the extraction pipeline and engine layer looks well thought out, and the VLM handling (strip but always use add_generation_prompt=True) is the right call.

One minor thing: _merge_consecutive_roles merges assistant messages and only keeps the first message's dict, so a second consecutive assistant message's partial: true would get dropped before it reaches detect_and_strip_partial. Pretty unlikely to happen in practice though, so not a concern for this PR.

jundot force-pushed the main branch 7 times, most recently from f6faf2f to c2beead Compare March 21, 2026 05:58

jundot approved these changes Mar 21, 2026

View reviewed changes

jundot merged commit e9530f5 into jundot:main Mar 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add partial mode for assistant message prefill#306

feat: add partial mode for assistant message prefill#306
jundot merged 1 commit intojundot:mainfrom
blightbow:feat/partial-mode

blightbow commented Mar 18, 2026 •

edited

Loading

Uh oh!

blightbow commented Mar 19, 2026

Uh oh!

jundot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

blightbow commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blightbow commented Mar 19, 2026

Uh oh!

jundot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

blightbow commented Mar 18, 2026 •

edited

Loading