Skip to content

common : add parser for ministral/mistral large 3/devstral 2#17713

Merged
aldehir merged 9 commits intoggml-org:masterfrom
aldehir:ministral-3
Dec 9, 2025
Merged

common : add parser for ministral/mistral large 3/devstral 2#17713
aldehir merged 9 commits intoggml-org:masterfrom
aldehir:ministral-3

Conversation

@aldehir
Copy link
Copy Markdown
Contributor

@aldehir aldehir commented Dec 3, 2025

Parser implementation for Ministral 3 Reasoning/Instruct, Mistral Large 3 Instruct, and Devstral 2. It deviates from the current Mistral implementation by accepting tool calls in the form: [TOOL_CALLS]tool_name[ARGS]{"arg1": ... }...

Features

  • Extracts reasoning to reasoning_content when reasoning_format = auto/deepseek. If reasoning_format = none, the traces are left in content.
  • Formats system and assistant messages containing reasoning_content into {"type": "thinking", "thinking": "..."} content blocks the chat template expects. server: thinking type rejected as invalid but used by Ministral 3 #17700
  • Supports tool calling for both tool_choice = auto and tool_choice = required (with thinking).
  • Supports parallel tool calls
  • Supports response_format with thinking.

Additional Changes

  • Exposed reasoning_format during chat param init to build the appropriate parser.
  • Added make_peg_parser helper in tests/test-chat.cpp for use with peg parsers.
  • Added a temporary fix for the chat template. Will remove after updating Minja to support the official template.

@aldehir aldehir changed the title common : add parser for ministral/mistral 3 common : add parser for ministral/mistral large 3 Dec 3, 2025
@github-actions github-actions Bot added documentation Improvements or additions to documentation testing Everything test related examples server labels Dec 3, 2025
@aldehir aldehir marked this pull request as ready for review December 4, 2025 07:13
@aldehir aldehir requested a review from pwilkin December 4, 2025 07:18
@aldehir
Copy link
Copy Markdown
Contributor Author

aldehir commented Dec 4, 2025

@pwilkin don't worry, I'm still leaving Qwen3-Coder for you. I added some testing logic since you had ideas about how we should test.

Copy link
Copy Markdown
Member

@pwilkin pwilkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you could make those test cases into universal test cases for a testcase class / lambda where the only parameters for the actual test case are the parser and the expected string? So that the test case invocations look like:

test_hello_world(&p, "Hello world");
...
test_tool_with_reasoning(&p, "{iamthinking}Thinking of calling tool get_weather{iamnotthinking}{iamcallingatool}get_weather{withparameter}place{ofvalue}Paris{imdonecallingatool}");

?

@aldehir
Copy link
Copy Markdown
Contributor Author

aldehir commented Dec 4, 2025

I get where you're coming from. I'll play around with it. Another option is table driven tests. They work well at reducing duplication when you have many similar looking tests.

One thing I don't like about some of the helpers at the top is that I have scroll up to find the actual content it compares against.

@pwilkin
Copy link
Copy Markdown
Member

pwilkin commented Dec 4, 2025

One thing I don't like about some of the helpers at the top is that I have scroll up to find the actual content it compares against.

I think a decent idea to combat that is to have a comment that keeps a template of all the tests with placeholders - then when you implement the model, you just move the comment to after the new model :)

@aldehir
Copy link
Copy Markdown
Contributor Author

aldehir commented Dec 7, 2025

Any chance you could make those test cases into universal test cases for a testcase class / lambda where the only parameters for the actual test case are the parser and the expected string? So that the test case invocations look like:

test_hello_world(&p, "Hello world");
...
test_tool_with_reasoning(&p, "{iamthinking}Thinking of calling tool get_weather{iamnotthinking}{iamcallingatool}get_weather{withparameter}place{ofvalue}Paris{imdonecallingatool}");

?

Unfortunately, it's not so cut and dry. The parser requires the tool definitions and reasoning format, because it is built on the fly. Also many templates require at least 1 user message and its requirements might differ between models. This isn't a problem with the previous parser because it doesn't require a template.

At minimum, the functions would look like this:

template <typename T>
static void test_hello_world(T parse, const std::string & input) {
    common_chat_msg expected;
    expected.role = "assistant";
    expected.content = "Hello world";
    test_parser_with_streaming(expected, input, parse);
}

// ...

// Test basic message
common_chat_templates_inputs inputs;
inputs.messages = {msg};

test_hello_world(make_peg_parser(tmpls.get(), inputs), "Hello world");

And for a tool call:

template <typename T>
static void test_tool_with_reasoning(T parse, const std::string & input) {
    common_chat_msg expected;
    expected.role = "assistant";
    expected.reasoning_content = "I need to get the weather in New York City";
    expected.tool_calls = {{
        /* .name = */      "get_weather",
        /* .arguments = */ R"({"location": "New York City, NY"})",
        /* .id = */        {},
    }};
    test_parser_with_streaming(expected, input, parse);
}

// ...

// Define a set of tools
common_chat_tool tool_get_weather = {
    /* .name = */        "get_weather",
    /* .description = */ "get the weather",
    /* .parameters  = */ R"({
        "type": "object",
        "properties": {
            "location": {"type": "string"}
        }
    })"
}

// ...

// Test tool calls
common_chat_templates_inputs inputs;
inputs.messages = {msg};
inputs.reasoning_format = COMMON_REASONING_FORMAT_AUTO;
inputs.tools = {tool_get_weather};

test_tool_with_reasoning(make_peg_parser(tmpls.get(), inputs),
    "[THINK]I need to get the weather in New York City[/THINK]"
    R"([TOOL_CALLS]get_weather[ARGS]{"location": "New York City, NY"})"
);

So, keeping the idea of a convience function, how about something like this:

// Test basic message
test_peg_parser(tmpls.get(), [&](peg_test_case & t) {
    t.input = "Hello, world!\nWhat's up?";
    t.expect = message_assist;
});

// Test basic message and reasoning with reasoning_format = none
test_peg_parser(tmpls.get(), [&](peg_test_case & t) {
    t.input = "[THINK]I'm\nthinking[/THINK]Hello, world!\nWhat's up?";
    t.expect.content = "[THINK]I'm\nthinking[/THINK]Hello, world!\nWhat's up?";
});

// Test basic message and reasoning with reasoning_format = auto
test_peg_parser(tmpls.get(), [&](peg_test_case & t) {
    t.input = "[THINK]I'm\nthinking[/THINK]Hello, world!\nWhat's up?";
    t.params.reasoning_format = COMMON_REASONING_FORMAT_AUTO;

    t.expect = message_assist_thoughts;
});

I also leveraged the predefined message_* messages at the top, and created a new function to house the peg parser tests

@pwilkin
Copy link
Copy Markdown
Member

pwilkin commented Dec 7, 2025

Yeah, this looks clean enough. I'll take a detailed look tomorrow.

@ggerganov
Copy link
Copy Markdown
Member

Just checking what is the status here - anything we can do to test/verify the implementation?

@aldehir
Copy link
Copy Markdown
Contributor Author

aldehir commented Dec 9, 2025

@ggerganov I fixed the merge conflicts.

The parsing works well. I tested a 20-30 turn agentic session several times.

Optimal tool calling depends on minja updates to support the official chat template. The template throws an exception if the assistant message is null or empty. I'll submit a PR tomorrow. For the time being, the Unsloth template works.

@ggerganov
Copy link
Copy Markdown
Member

For the time being, the Unsloth template works.

For my understanding, this currently works if we explicitly specify --chat-template-file ./models/templates/unsloth-mistral-Ministral-3-14B-Reasoning-2512.jinja? And later when the minja fixes are done, it will work without specifying this template explicitly?

@aldehir
Copy link
Copy Markdown
Contributor Author

aldehir commented Dec 9, 2025

For the time being, the Unsloth template works.

For my understanding, this currently works if we explicitly specify --chat-template-file ./models/templates/unsloth-mistral-Ministral-3-14B-Reasoning-2512.jinja? And later when the minja fixes are done, it will work without specifying this template explicitly?

It's a bit nuanced. It works as is without the explicit template but it uses minja's polyfills which differ from how the model was trained. It may not work as Mistral intended.

To get the best behavior, yes. Use the unsloth template and later it won't be needed when minja is updated.

@flooryyyy
Copy link
Copy Markdown

flooryyyy commented Dec 9, 2025

edit: nothing to do with the changes in this repo - it's from the actual code base

@ggerganov I fixed the merge conflicts.

The parsing works well. I tested a 20-30 turn agentic session several times.

Optimal tool calling depends on minja updates to support the official chat template. The template throws an exception if the assistant message is null or empty. I'll submit a PR tomorrow. For the time being, the Unsloth template works.

would like to say that it no longer builds after the changes you've made

logs:

❯ nt (nt = nh os test or nixos-rebuild test)
> Building NixOS configuration
warning: Git tree '/mnt/chonky/dotfiles/nixos' is dirty
evaluation warning: 'system' has been renamed to/replaced by 'stdenv.hostPlatform.system'
evaluation warning: floory profile: `programs.ssh` default values will be removed in the future.
                    Consider setting `programs.ssh.enableDefaultConfig` to false,
                    and manually set the default values you want to keep at
                    `programs.ssh.matchBlocks."*"`.
these 14 derivations will be built:
  /nix/store/x2yc7q7hjl0lpyav5vhvpl7nfb1lfcbh-llama-cpp-7330.drv
  /nix/store/avk1himz0chvzf40yvy5icyzh4rz9pcd-system-path.drv
  /nix/store/188iw6h81bbp4v6ra32bzf75a6ihkkc1-X-Restart-Triggers-polkit.drv
  /nix/store/f066fqn9wscfahfz22jf01x87r6dqrp3-unit-polkit.service.drv
  /nix/store/x7wp1nn3qxf422lc46i201gg951j1zzh-config.yaml.drv
  /nix/store/rsjc3h0gc3kr7lx3f11zn90s734vzjx2-unit-llama-swap.service.drv
  /nix/store/pwm7lgbyw5in5ffn956a63vdp5wc73p8-dbus-1.drv
  /nix/store/w9jcwn1rvj3icpjqvj89ksjhdmaacr2v-X-Restart-Triggers-dbus-broker.drv
  /nix/store/ynb1ggk436dng4cilx6rb5w64x3pkhgk-unit-dbus-broker.service.drv
  /nix/store/11qk3bs08fmkr1a39911qy27mn39b9pi-system-units.drv
  /nix/store/dxp5ybbgf8mdrj20xcb3jx2ixxs0z535-unit-dbus-broker.service.drv
  /nix/store/bxvfyq8qxgrryw3r7ydbi9fczikf08xf-user-units.drv
  /nix/store/sxg6ci04b8c0v0b7rj6jimhgxl02fb6w-etc.drv
  /nix/store/qa5q1zqmncpckavbzqvs9dhhnwqzcq55-nixos-system-nixos-26.05.20251205.f61125a.drv
llama-cpp> building '/nix/store/x2yc7q7hjl0lpyav5vhvpl7nfb1lfcbh-llama-cpp-7330.drv'
llama-cpp> Running phase: unpackPhase
llama-cpp> unpacking source archive /nix/store/6pl6ni41rdxrm8bk21r947z6wqh77xma-source
llama-cpp> source root is source
llama-cpp> Running phase: patchPhase
llama-cpp> applying patch /nix/store/g6qzqghdcw6ha4wk5c5iglvrkmab6six-17713.patch
llama-cpp> patching file common/chat.cpp
llama-cpp> patching file common/chat.cpp
llama-cpp> patching file common/chat.cpp
llama-cpp> patching file tests/test-chat.cpp
llama-cpp> Hunk #2 FAILED at 445.
llama-cpp> Hunk #3 succeeded at 500 (offset 28 lines).
llama-cpp> Hunk #4 succeeded at 3455 (offset 133 lines).
llama-cpp> 1 out of 4 hunks FAILED -- saving rejects to file tests/test-chat.cpp.rej
llama-cpp> patching file tests/test-chat.cpp
llama-cpp> Hunk #2 succeeded at 544 (offset 21 lines).
llama-cpp> Hunk #3 succeeded at 3473 (offset 126 lines).
llama-cpp> Hunk #4 succeeded at 3687 (offset 126 lines).
llama-cpp> patching file tests/test-chat.cpp
llama-cpp> patching file tests/test-chat.cpp
llama-cpp> Hunk #1 succeeded at 3491 (offset -32 lines).
llama-cpp> Hunk #2 succeeded at 3511 (offset -32 lines).
llama-cpp> Hunk #3 succeeded at 3520 (offset -32 lines).
llama-cpp> Hunk #4 succeeded at 3530 (offset -32 lines).
llama-cpp> Hunk #5 succeeded at 3549 (offset -32 lines).
llama-cpp> patching file tests/test-chat.cpp
llama-cpp> Reversed (or previously applied) patch detected!  Assume -R? [n]
llama-cpp> Apply anyway? [n]
llama-cpp> Skipping patch.
llama-cpp> 2 out of 2 hunks ignored -- saving rejects to file tests/test-chat.cpp.rej
llama-cpp> patching file tools/server/server-common.cpp
llama-cpp> patching file models/templates/unsloth-mistral-Ministral-3-14B-Reasoning-2512.jinja
error: Cannot build '/nix/store/x2yc7q7hjl0lpyav5vhvpl7nfb1lfcbh-llama-cpp-7330.drv'.
       Reason: builder failed with exit code 1.
       Output paths:
         /nix/store/w4jg0l30mirm178pgimskdr6iskxg94n-llama-cpp-7330
       Last 25 log lines:
       > patching file common/chat.cpp
       > patching file common/chat.cpp
       > patching file tests/test-chat.cpp
       > Hunk #2 FAILED at 445.
       > Hunk #3 succeeded at 500 (offset 28 lines).
       > Hunk #4 succeeded at 3455 (offset 133 lines).
       > 1 out of 4 hunks FAILED -- saving rejects to file tests/test-chat.cpp.rej
       > patching file tests/test-chat.cpp
       > Hunk #2 succeeded at 544 (offset 21 lines).
       > Hunk #3 succeeded at 3473 (offset 126 lines).
       > Hunk #4 succeeded at 3687 (offset 126 lines).
       > patching file tests/test-chat.cpp
       > patching file tests/test-chat.cpp
       > Hunk #1 succeeded at 3491 (offset -32 lines).
       > Hunk #2 succeeded at 3511 (offset -32 lines).
       > Hunk #3 succeeded at 3520 (offset -32 lines).
       > Hunk #4 succeeded at 3530 (offset -32 lines).
       > Hunk #5 succeeded at 3549 (offset -32 lines).
       > patching file tests/test-chat.cpp
       > Reversed (or previously applied) patch detected!  Assume -R? [n]
       > Apply anyway? [n]
       > Skipping patch.
       > 2 out of 2 hunks ignored -- saving rejects to file tests/test-chat.cpp.rej
       > patching file tools/server/server-common.cpp
       > patching file models/templates/unsloth-mistral-Ministral-3-14B-Reasoning-2512.jinja
       For full logs, run:
         nix log /nix/store/x2yc7q7hjl0lpyav5vhvpl7nfb1lfcbh-llama-cpp-7330.drv
error: Cannot build '/nix/store/x7wp1nn3qxf422lc46i201gg951j1zzh-config.yaml.drv'.
       Reason: 1 dependency failed.
       Output paths:
         /nix/store/p8plas6ir2z5l572hpbb3ddadm571kby-config.yaml
error: Cannot build '/nix/store/avk1himz0chvzf40yvy5icyzh4rz9pcd-system-path.drv'.
       Reason: 1 dependency failed.
       Output paths:
         /nix/store/ka9qc3kx695xgapjp2az5z531y1fjpag-system-path
error: Cannot build '/nix/store/qa5q1zqmncpckavbzqvs9dhhnwqzcq55-nixos-system-nixos-26.05.20251205.f61125a.drv'.
       Reason: 1 dependency failed.
       Output paths:
         /nix/store/7vqv54s1s0l00f8xhzlx909vl6rx3z7v-nixos-system-nixos-26.05.20251205.f61125a
┏━ 4 Errors:
 ⋮
┃        > Hunk #2 succeeded at 544 (offset 21 lines).
┃        > Hunk #3 succeeded at 3473 (offset 126 lines).
┃        > Hunk #4 succeeded at 3687 (offset 126 lines).
┃        > patching file tests/test-chat.cpp
┃        > patching file tests/test-chat.cpp
┃        > Hunk #1 succeeded at 3491 (offset -32 lines).
┃        > Hunk #2 succeeded at 3511 (offset -32 lines).
┃        > Hunk #3 succeeded at 3520 (offset -32 lines).
┃        > Hunk #4 succeeded at 3530 (offset -32 lines).
┃        > Hunk #5 succeeded at 3549 (offset -32 lines).
┃        > patching file tests/test-chat.cpp
┃        > Reversed (or previously applied) patch detected!  Assume -R? [n]
┃        > Apply anyway? [n]
┃        > Skipping patch.
┃        > 2 out of 2 hunks ignored -- saving rejects to file tests/test-chat.cpp.rej
┃        > patching file tools/server/server-common.cpp
┃        > patching file models/templates/unsloth-mistral-Ministral-3-14B-Reasoning-2512.jinja
┃        For full logs, run:
┃          nix log /nix/store/x2yc7q7hjl0lpyav5vhvpl7nfb1lfcbh-llama-cpp-7330.drv
┣━ Dependency Graph:
┃       ┌─ ⏸ unit-dbus-broker.service waiting for 1 ⏵
┃    ┌─ ⏸ user-units
┃    │        ┌─ ⏸ dbus-1 waiting for 1 ⏵
┃    │     ┌─ ⏸ X-Restart-Triggers-dbus-broker
┃    │  ┌─ ⏸ unit-dbus-broker.service
┃    │  │  ┌─ ⏸ config.yaml waiting for 1 ⏵
┃    │  ├─ ⏸ unit-llama-swap.service
┃    │  │  ┌─ ⏸ X-Restart-Triggers-polkit waiting for 1 ⏵
┃    │  ├─ ⏸ unit-polkit.service
┃    ├─ ⏸ system-units
┃ ┌─ ⏸ etc
┃ │  ┌─ ⏵ llama-cpp-7330 (patchPhase)
┃ ├─ ⏸ system-path
┃ ⏸ nixos-system-nixos-26.05.20251205.f61125a
┣━━━ Builds
┗━ ∑ ⏵ 1 │ ✔ 0 │ ⏸ 13 │ ⚠ Exited with 4 errors reported by nix at 11:58:25 after 9s
Error:
   0: Failed to build configuration
   1: Command exited with status Exited(1)
Location:
   src/commands.rs:693

@flooryyyy
Copy link
Copy Markdown

(pkgs.fetchpatch {
        url =
"https://github.com/ggml-org/llama.cpp/commit/636fc17a376dacc01da20d508e6986a299b1f819.patch";
        revert = true;
        hash = "sha256-zjjUOupJJQiMsgoGfjVXB9ylWQA74X1t5+TQOqBt/h8=";
      })

this fixes the issue for me so seems like that commit is the issue

@aldehir
Copy link
Copy Markdown
Contributor Author

aldehir commented Dec 9, 2025

Now that Devstral has been released, people will undoubtedly want to play around with it. It shares the same chat output format as Ministral-Instruct, which is handled here.

I added a temporary hack to fix the chat template on the fly, similar to how we did for gpt-oss. The ggml-org/Ministral* models now work as intended, and the Devstral one should too once its out. After we sync up Minja, I can remove the temporary hack.

How does that sound @ggerganov?

@pwilkin Have you had a chance to take a look? I think we're waiting on you 😊

@aldehir
Copy link
Copy Markdown
Contributor Author

aldehir commented Dec 9, 2025

@flooryyyy are you applying the commits here as patches onto master? If so, the conflicts will prevent that. I have resolved those conflicts with a merge and not a rebase for the same reason. This shouldn't be an issue once this gets squashed and merged onto master.

@aldehir aldehir changed the title common : add parser for ministral/mistral large 3 common : add parser for ministral/mistral large 3/devstral 2 Dec 9, 2025
Copy link
Copy Markdown
Member

@pwilkin pwilkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's looking quite nice and elegant :)

@aldehir
Copy link
Copy Markdown
Contributor Author

aldehir commented Dec 9, 2025

@pwilkin thanks! I'll merge once the tests pass

@CISC
Copy link
Copy Markdown
Member

CISC commented Dec 9, 2025

You'd think they would know Pythonic (which jinja for large parts is) at Mistral, but apparently not, what is all this nonsense about checking for none, undefined, length > 0 etc? Just do this dammit

{% if message['content'] %}

@aldehir
Copy link
Copy Markdown
Contributor Author

aldehir commented Dec 9, 2025

After patching the Mistral Vibe CLI to add "stream": true (mistralai/mistral-vibe#11), it works well with this.

They probably also need to add "parallel_tool_calls": true if they support it, I know the models do.

@aldehir aldehir merged commit 2fbe3b7 into ggml-org:master Dec 9, 2025
70 of 72 checks passed
@flooryyyy
Copy link
Copy Markdown

@flooryyyy are you applying the commits here as patches onto master? If so, the conflicts will prevent that. I have resolved those conflicts with a merge and not a rebase for the same reason. This shouldn't be an issue once this gets squashed and merged onto master.

i'm on nixos so all i did was override the "llama-cpp" (llama-cpp-rocm) package attributes, reverse #17376 and then apply #17713

lol i forgot to hit "comment" i wrote this an hour after you asked

Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Dec 12, 2025
0Marble pushed a commit to 0Marble/llama.cpp that referenced this pull request Dec 18, 2025
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
… and new jinja template engine (ggml-org#1369)

---------

Co-authored-by: Piotr Wilkin <[email protected]>

common : add nemotron 3 parsing (ggml-org#18077)

common : add parser for ministral/mistral large 3/devstral 2 (ggml-org#17713)

common : default content to an empty string (ggml-org#18485)

chat: make tool description and parameters optional per OpenAI spec (ggml-org#18478)

Per the OpenAI API specification, both 'description' and 'parameters'
fields in tool function definitions are optional. Previously, the parser
would throw an exception if these fields were missing.

Attempts to fix ggml-org#17667

common : implement new jinja template engine (ggml-org#18462)
---------

Co-authored-by: Alde Rojas <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

jinja: correct member access rule (ggml-org#18905)

jinja : fix lexing of float literals with sign (ggml-org#18901)

jinja : add missing tojson filter for bool (ggml-org#18900)

jinja : attribute support for join, map and sort (ggml-org#18883)

jinja : fix object item order (and properly implement dictsort) (ggml-org#18904)

tests : add test-jinja -py option for cross-checking (ggml-org#18906)

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>

ci : run test-jinja -py on high perf [no ci] (ggml-org#18916)

jinja : fix undefined keys and attributes and int/float as bool (ggml-org#18924)

jinja: support none|string (ggml-org#18995)

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>

jinja : implement mixed type object keys (ggml-org#18955)

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

jinja : undefined should be treated as sequence/iterable (return string/array) by filters/tests (ggml-org#19147)

`tojson` is not a supported `undefined` filter

keep it DRY and fix some types

jinja : do not pass empty tools and add some none filters (ggml-org#19176)

jinja : add unordered_map include to value.h [no ci] (ggml-org#19205)

jinja : add missing 'in' test to template engine (ggml-org#19004) (ggml-org#19239)

The jinja template parser was missing the 'in' test from
global_builtins(), causing templates using reject("in", ...),
select("in", ...), or 'x is in(y)' to fail with
"selectattr: unknown test 'in'".

This broke tool-calling for Qwen3-Coder and any other model
whose chat template uses the 'in' test.

Added test_is_in supporting array, string, and object containment
checks, mirroring the existing 'in' operator logic in runtime.cpp.

Includes test cases for all three containment types plus
reject/select filter usage.

Co-Authored-By: Claude Opus 4.5 <[email protected]>

---------

Co-authored-by: Sid Mohan <[email protected]>
Co-authored-by: Claude Opus 4.5 <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>

Add Jinja support for "indent" string filter (ggml-org#19529)

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>

add vendor

refactor chat

server : support preserving reasoning_content in assistant message (ggml-org#18994)

chat : fix translategemma crash on common_chat_format_example (ggml-org#19019)

chat: fix language input for translategemma (ggml-org#19052)

Co-authored-by: Aldehir Rojas <[email protected]>

---------

Co-authored-by: Aldehir Rojas <[email protected]>

chat: fix case where template accepts type content only (ggml-org#19419)

mtmd : chat : Fix extra \n between text and media marker (ggml-org#19595)

Thanks to @tugot17 for detecting and reporting the issue.

For vision models (e.g. LFM2.5-VL-1.6B and Qwen/Qwen3-VL-4B-Instruct) `llama-mtmd-cli` produces identical output to HF implementation.

However `llama-server` doesn't. I traced it down to extra newline
inserted after `<__media__>`.

This happens in `to_json_oaicompat`, that treats media markers as text
and joins all parts with `\n` separator.

PR introduces new type `media_marker` and uses it for media markers.
Extra logic is added to prevent insertion of newlines before and after
media markers.

With this change number of input tokens is identical to HF
implementation and as a result the output is also identical.

I explored other ways to address the issue
* remove completely `\n` between text parts in `to_json_oaicompat`
* merge text messages in server-common.cpp before sending them to `to_json_oaicompat`

Please propose alternative ways of fixing this issue.

Co-authored-by: Piotr Wilkin (ilintar) <[email protected]>

---------

Co-authored-by: Piotr Wilkin (ilintar) <[email protected]>

common : merge qwen3-coder and nemotron nano 3 parsers (ggml-org#19765)

common : fix improper trimming in XML parser on complete message (ggml-org#19805)

Co-authored-by: Jules LEIDELINGER <[email protected]>

jinja: correct stats for tojson and string filters (ggml-org#19785)

jinja : correct default size for string slices (ggml-org#19913)

common : handle unicode during partial json parsing (ggml-org#16526)

common : fix json schema with '\' in literals (ggml-org#17307)

add back qwen_coder_xml and mirothinker

Co-authored-by: Aldehir Rojas <[email protected]>
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation examples server testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants