Skip to content

feat: Gemma 4 reasoning parser and agentic tool calling#565

Merged
jundot merged 1 commit intojundot:mainfrom
TipKnuckle:feat/gemma4-reasoning-parser-clean
Apr 4, 2026
Merged

feat: Gemma 4 reasoning parser and agentic tool calling#565
jundot merged 1 commit intojundot:mainfrom
TipKnuckle:feat/gemma4-reasoning-parser-clean

Conversation

@TipKnuckle
Copy link
Copy Markdown
Contributor

@TipKnuckle TipKnuckle commented Apr 4, 2026

Adds generic output parser sessions and complete Gemma 4 reasoning + agentic tool calling support. Supersedes #561.


Reasoning parser

Refactors the scheduler to support pluggable per-model output parser sessions. First implementation handles Gemma 4's <|channel>thought reasoning channel — strips it from streamed output and re-emits as <think>…</think> in reasoning_content.

  • omlx/adapter/output_parser.py (new): detect_output_parser() factory
  • omlx/adapter/gemma4.py (new): Gemma4OutputParserSession
  • omlx/scheduler.py: threads parser session through generation loop
  • omlx/utils/tokenizer.py: helpers for the session

Tool call output parsing

_inject_tool_calling was using mlx_lm._infer_tool_parser which has no knowledge of Gemma 4's <|tool_call> marker — it returned None, has_tool_calling was never set, and raw markup leaked into response content. The pinned mlx-vlm (43b9b20) ships mlx_vlm.tool_parsers as a superset that adds <|tool_call>gemma4 detection and the correct parser. _inject_tool_calling now prefers this, falling back to the mlx_lm path for older installs.

Tool result ingestion (message extractor pattern)

The Gemma 4 chat template has no handling for role=tool messages. Tool results must appear on a model-role turn as tool_responses:

{"role": "assistant", "tool_responses": [{"name": fn_name, "response": {...}}]}

Passing raw role=tool messages caused the template to emit <|tool_response> literals in content and halt after the first tool call. Two secondary bugs: _merge_consecutive_roles was collapsing the tool_calls and tool_responses turns into one (fixed with _PRESERVE_BOUNDARY_KEY); _drop_void_assistant_messages was stripping the tool_responses turn (fixed with a tool_responses guard).

Design mirrors detect_output_parser — model-specific logic stays out of server.py:

  • detect_message_extractor() in output_parser.py returns the right extractor callable
  • BatchedEngine / VLMBatchedEngine expose it as a message_extractor property
  • server.py calls getattr(engine, "message_extractor", None) — no model-type chains
  • All Gemma 4 logic lives in omlx/adapter/gemma4.py

Tests

File Coverage
tests/test_gemma4_messages.py 11 tests: message conversion, tool result folding, name resolution, multi-turn agentic loops
tests/test_output_parser.py Reasoning parser session
tests/test_scheduler.py Scheduler integration
tests/test_utils_tokenizer.py Tokenizer helpers

@TipKnuckle TipKnuckle changed the title Normalize Gemma 4 reasoning output behind generic parser sessions feat: Gemma 4 reasoning parser + agentic tool calling Apr 4, 2026
@TipKnuckle TipKnuckle force-pushed the feat/gemma4-reasoning-parser-clean branch from dc2274f to 6499c0b Compare April 4, 2026 03:47
Adds generic output parser sessions, Gemma 4 reasoning channel parsing,
and complete agentic tool calling support for Gemma 4 VLM models.

## Reasoning parser

Refactors the scheduler to support pluggable per-model output parser
sessions. First implementation handles Gemma 4's reasoning channel:
strips <|channel>thought…<channel|> from streamed output and re-emits
it as <think>…</think> in reasoning_content.

- omlx/adapter/output_parser.py (new): detect_output_parser() factory
- omlx/adapter/gemma4.py (new): Gemma4OutputParserSession
- omlx/scheduler.py: threads parser session through generation loop
- omlx/utils/tokenizer.py: helpers for the session

## Tool call output parsing

_inject_tool_calling was using mlx_lm._infer_tool_parser which has no
knowledge of Gemma4's <|tool_call> marker, returning None and leaving
has_tool_calling unset so raw markup leaked into response content.

The pinned mlx-vlm (43b9b20) ships mlx_vlm.tool_parsers — a superset
that adds <|tool_call> -> gemma4 detection and a correct per-model
parser. _inject_tool_calling now prefers this, falling back to the
mlx_lm path for older installs.

## Tool result ingestion (message extractor pattern)

The Gemma 4 chat template has no handling for role=tool messages. Tool
results must appear on a model-role turn as tool_responses:

    {"role": "assistant", "tool_responses": [{"name": fn, "response": ...}]}

Passing raw role=tool messages caused the template to emit
<|tool_response> literals in content and halt after the first tool call.

Two secondary bugs: _merge_consecutive_roles was collapsing the
tool_calls and tool_responses turns into one (fixed with
_PRESERVE_BOUNDARY_KEY); _drop_void_assistant_messages was stripping
the tool_responses turn (fixed with a tool_responses guard).

Design mirrors detect_output_parser — model-specific logic stays out of
server.py:
- detect_message_extractor() in output_parser.py returns the extractor
- BatchedEngine/VLMBatchedEngine expose it as message_extractor property
- server.py uses getattr(engine, 'message_extractor', None)
- All Gemma4 logic lives in omlx/adapter/gemma4.py

## Tests

tests/test_gemma4_messages.py  - 11 tests: message conversion, tool
                                  result folding, name resolution,
                                  multi-turn agentic loops
tests/test_output_parser.py    - reasoning parser session
tests/test_scheduler.py        - scheduler integration
tests/test_utils_tokenizer.py  - tokenizer helpers
@TipKnuckle TipKnuckle force-pushed the feat/gemma4-reasoning-parser-clean branch from 2917d21 to 397287e Compare April 4, 2026 03:58
@TipKnuckle TipKnuckle changed the title feat: Gemma 4 reasoning parser + agentic tool calling feat: Gemma 4 reasoning parser and agentic tool calling Apr 4, 2026
@TipKnuckle
Copy link
Copy Markdown
Contributor Author

Note there is an underlying bug in mlx-vlm that I discovered, so some cases of tool use are broken even with this implementation. See Blaizzy/mlx-vlm#914

@TipKnuckle
Copy link
Copy Markdown
Contributor Author

Note on earlier comment: Blaizzy/mlx-vlm#914 has been fixed, and tool calls are being properly parsed from mlx-vlm in commit b8c0c5d.

@jundot
Copy link
Copy Markdown
Owner

jundot commented Apr 4, 2026

@TipKnuckle I've updated mlx-lm and mlx-vlm to the latest versions, which I expect will resolve many of the existing issues. I'm currently working through the related changes needed on the omlx side. Please allow me some time - will follow up here soon!

@TipKnuckle
Copy link
Copy Markdown
Contributor Author

TipKnuckle commented Apr 4, 2026

@jundot Thanks, I realize this is an architectural change, but I do think it's the right direction to handle custom parsing like gemma4 has introduced. SOLAR is another model that would benefit from this (though would require its own parser as well).

Updating the mlx-lm and mlx-vlm will unfortunately not help with Gemma4 tool calling with your current main:

Upstream main (omlx/engine/vlm.py):

  • from mlx_lm.tokenizer_utils import _infer_tool_parser
  • Only uses mlx-lm's tool parser detection.

This PR (omlx/engine/vlm.py):

  • Prefer mlx_vlm.tool_parsers (superset; knows about Gemma4 etc.)
    from mlx_vlm.tool_parsers import _infer_tool_parser, load_tool_module
    Tries mlx-vlm first (which has Gemma 4 <|tool_call> support), falls back to mlx-lm.

The problem:

  • mlx-lm's _infer_tool_parser has no knowledge of Gemma 4's <|tool_call> marker
  • It returns None for Gemma 4 → has_tool_calling never set → raw markup leaks into response
  • mlx-vlm's tool_parsers module is a superset that includes gemma4 detection

The latest mlx-lm commit (4469ad4 - "Add gemma 4") doesn't add tool parser support - it's model architecture support, not tool calling. So even updating mlx-lm to latest, Gemma 4 tool calling still won't work without the mlx-vlm tool parser change.

Copy link
Copy Markdown
Owner

@jundot jundot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full diff against current main (6 commits ahead of the v0.3.2 base). No merge conflicts, no functional overlap with recent changes.

Really clean work. The adapter pattern generalizing Harmony-only code into pluggable output parser sessions is well thought out. Model-specific logic stays out of server.py, Harmony backward compat is fully preserved, and the message extractor property is a nice touch. Tool calling coverage is thorough, all the edge cases i can think of are handled. Test coverage is solid too.

Merging.

@jundot jundot merged commit 6d84ed6 into jundot:main Apr 4, 2026
@TipKnuckle
Copy link
Copy Markdown
Contributor Author

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants