feat: Gemma 4 reasoning parser and agentic tool calling by TipKnuckle · Pull Request #565 · jundot/omlx

TipKnuckle · 2026-04-04T02:32:27Z

Adds generic output parser sessions and complete Gemma 4 reasoning + agentic tool calling support. Supersedes #561.

Reasoning parser

Refactors the scheduler to support pluggable per-model output parser sessions. First implementation handles Gemma 4's <|channel>thought reasoning channel — strips it from streamed output and re-emits as <think>…</think> in reasoning_content.

omlx/adapter/output_parser.py (new): detect_output_parser() factory
omlx/adapter/gemma4.py (new): Gemma4OutputParserSession
omlx/scheduler.py: threads parser session through generation loop
omlx/utils/tokenizer.py: helpers for the session

Tool call output parsing

_inject_tool_calling was using mlx_lm._infer_tool_parser which has no knowledge of Gemma 4's <|tool_call> marker — it returned None, has_tool_calling was never set, and raw markup leaked into response content. The pinned mlx-vlm (43b9b20) ships mlx_vlm.tool_parsers as a superset that adds <|tool_call> → gemma4 detection and the correct parser. _inject_tool_calling now prefers this, falling back to the mlx_lm path for older installs.

Tool result ingestion (message extractor pattern)

The Gemma 4 chat template has no handling for role=tool messages. Tool results must appear on a model-role turn as tool_responses:

{"role": "assistant", "tool_responses": [{"name": fn_name, "response": {...}}]}

Passing raw role=tool messages caused the template to emit <|tool_response> literals in content and halt after the first tool call. Two secondary bugs: _merge_consecutive_roles was collapsing the tool_calls and tool_responses turns into one (fixed with _PRESERVE_BOUNDARY_KEY); _drop_void_assistant_messages was stripping the tool_responses turn (fixed with a tool_responses guard).

Design mirrors detect_output_parser — model-specific logic stays out of server.py:

detect_message_extractor() in output_parser.py returns the right extractor callable
BatchedEngine / VLMBatchedEngine expose it as a message_extractor property
server.py calls getattr(engine, "message_extractor", None) — no model-type chains
All Gemma 4 logic lives in omlx/adapter/gemma4.py

Tests

File	Coverage
`tests/test_gemma4_messages.py`	11 tests: message conversion, tool result folding, name resolution, multi-turn agentic loops
`tests/test_output_parser.py`	Reasoning parser session
`tests/test_scheduler.py`	Scheduler integration
`tests/test_utils_tokenizer.py`	Tokenizer helpers

Adds generic output parser sessions, Gemma 4 reasoning channel parsing, and complete agentic tool calling support for Gemma 4 VLM models. ## Reasoning parser Refactors the scheduler to support pluggable per-model output parser sessions. First implementation handles Gemma 4's reasoning channel: strips <|channel>thought…<channel|> from streamed output and re-emits it as <think>…</think> in reasoning_content. - omlx/adapter/output_parser.py (new): detect_output_parser() factory - omlx/adapter/gemma4.py (new): Gemma4OutputParserSession - omlx/scheduler.py: threads parser session through generation loop - omlx/utils/tokenizer.py: helpers for the session ## Tool call output parsing _inject_tool_calling was using mlx_lm._infer_tool_parser which has no knowledge of Gemma4's <|tool_call> marker, returning None and leaving has_tool_calling unset so raw markup leaked into response content. The pinned mlx-vlm (43b9b20) ships mlx_vlm.tool_parsers — a superset that adds <|tool_call> -> gemma4 detection and a correct per-model parser. _inject_tool_calling now prefers this, falling back to the mlx_lm path for older installs. ## Tool result ingestion (message extractor pattern) The Gemma 4 chat template has no handling for role=tool messages. Tool results must appear on a model-role turn as tool_responses: {"role": "assistant", "tool_responses": [{"name": fn, "response": ...}]} Passing raw role=tool messages caused the template to emit <|tool_response> literals in content and halt after the first tool call. Two secondary bugs: _merge_consecutive_roles was collapsing the tool_calls and tool_responses turns into one (fixed with _PRESERVE_BOUNDARY_KEY); _drop_void_assistant_messages was stripping the tool_responses turn (fixed with a tool_responses guard). Design mirrors detect_output_parser — model-specific logic stays out of server.py: - detect_message_extractor() in output_parser.py returns the extractor - BatchedEngine/VLMBatchedEngine expose it as message_extractor property - server.py uses getattr(engine, 'message_extractor', None) - All Gemma4 logic lives in omlx/adapter/gemma4.py ## Tests tests/test_gemma4_messages.py - 11 tests: message conversion, tool result folding, name resolution, multi-turn agentic loops tests/test_output_parser.py - reasoning parser session tests/test_scheduler.py - scheduler integration tests/test_utils_tokenizer.py - tokenizer helpers

TipKnuckle · 2026-04-04T05:29:46Z

Note there is an underlying bug in mlx-vlm that I discovered, so some cases of tool use are broken even with this implementation. See Blaizzy/mlx-vlm#914

TipKnuckle · 2026-04-04T17:44:26Z

Note on earlier comment: Blaizzy/mlx-vlm#914 has been fixed, and tool calls are being properly parsed from mlx-vlm in commit b8c0c5d.

jundot · 2026-04-04T17:51:04Z

@TipKnuckle I've updated mlx-lm and mlx-vlm to the latest versions, which I expect will resolve many of the existing issues. I'm currently working through the related changes needed on the omlx side. Please allow me some time - will follow up here soon!

TipKnuckle · 2026-04-04T18:05:29Z

@jundot Thanks, I realize this is an architectural change, but I do think it's the right direction to handle custom parsing like gemma4 has introduced. SOLAR is another model that would benefit from this (though would require its own parser as well).

Updating the mlx-lm and mlx-vlm will unfortunately not help with Gemma4 tool calling with your current main:

Upstream main (omlx/engine/vlm.py):

from mlx_lm.tokenizer_utils import _infer_tool_parser
Only uses mlx-lm's tool parser detection.

This PR (omlx/engine/vlm.py):

Prefer mlx_vlm.tool_parsers (superset; knows about Gemma4 etc.)
from mlx_vlm.tool_parsers import _infer_tool_parser, load_tool_module
Tries mlx-vlm first (which has Gemma 4 <|tool_call> support), falls back to mlx-lm.

The problem:

mlx-lm's _infer_tool_parser has no knowledge of Gemma 4's <|tool_call> marker
It returns None for Gemma 4 → has_tool_calling never set → raw markup leaks into response
mlx-vlm's tool_parsers module is a superset that includes gemma4 detection

The latest mlx-lm commit (4469ad4 - "Add gemma 4") doesn't add tool parser support - it's model architecture support, not tool calling. So even updating mlx-lm to latest, Gemma 4 tool calling still won't work without the mlx-vlm tool parser change.

jundot

Reviewed the full diff against current main (6 commits ahead of the v0.3.2 base). No merge conflicts, no functional overlap with recent changes.

Really clean work. The adapter pattern generalizing Harmony-only code into pluggable output parser sessions is well thought out. Model-specific logic stays out of server.py, Harmony backward compat is fully preserved, and the message extractor property is a nice touch. Tool calling coverage is thorough, all the edge cases i can think of are handled. Test coverage is solid too.

Merging.

TipKnuckle · 2026-04-04T18:18:34Z

Great, thanks!

TipKnuckle mentioned this pull request Apr 4, 2026

fix: enable Gemma4 tool calling via mlx-vlm parser infrastructure #561

Closed

TipKnuckle changed the title ~~Normalize Gemma 4 reasoning output behind generic parser sessions~~ feat: Gemma 4 reasoning parser + agentic tool calling Apr 4, 2026

TipKnuckle force-pushed the feat/gemma4-reasoning-parser-clean branch from dc2274f to 6499c0b Compare April 4, 2026 03:47

TipKnuckle force-pushed the feat/gemma4-reasoning-parser-clean branch from 2917d21 to 397287e Compare April 4, 2026 03:58

TipKnuckle changed the title ~~feat: Gemma 4 reasoning parser + agentic tool calling~~ feat: Gemma 4 reasoning parser and agentic tool calling Apr 4, 2026

jundot approved these changes Apr 4, 2026

View reviewed changes

jundot merged commit 6d84ed6 into jundot:main Apr 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Gemma 4 reasoning parser and agentic tool calling#565

feat: Gemma 4 reasoning parser and agentic tool calling#565
jundot merged 1 commit intojundot:mainfrom
TipKnuckle:feat/gemma4-reasoning-parser-clean

TipKnuckle commented Apr 4, 2026 •

edited

Loading

Uh oh!

TipKnuckle commented Apr 4, 2026

Uh oh!

TipKnuckle commented Apr 4, 2026

Uh oh!

jundot commented Apr 4, 2026

Uh oh!

TipKnuckle commented Apr 4, 2026 •

edited

Loading

Uh oh!

jundot left a comment

Uh oh!

TipKnuckle commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TipKnuckle commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reasoning parser

Tool call output parsing

Tool result ingestion (message extractor pattern)

Tests

Uh oh!

TipKnuckle commented Apr 4, 2026

Uh oh!

TipKnuckle commented Apr 4, 2026

Uh oh!

jundot commented Apr 4, 2026

Uh oh!

TipKnuckle commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Upstream main (omlx/engine/vlm.py):

This PR (omlx/engine/vlm.py):

The problem:

Uh oh!

jundot left a comment

Choose a reason for hiding this comment

Uh oh!

TipKnuckle commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TipKnuckle commented Apr 4, 2026 •

edited

Loading

TipKnuckle commented Apr 4, 2026 •

edited

Loading