common : add parser for ministral/mistral large 3/devstral 2#17713
common : add parser for ministral/mistral large 3/devstral 2#17713aldehir merged 9 commits intoggml-org:masterfrom
Conversation
|
@pwilkin don't worry, I'm still leaving Qwen3-Coder for you. I added some testing logic since you had ideas about how we should test. |
pwilkin
left a comment
There was a problem hiding this comment.
Any chance you could make those test cases into universal test cases for a testcase class / lambda where the only parameters for the actual test case are the parser and the expected string? So that the test case invocations look like:
test_hello_world(&p, "Hello world");
...
test_tool_with_reasoning(&p, "{iamthinking}Thinking of calling tool get_weather{iamnotthinking}{iamcallingatool}get_weather{withparameter}place{ofvalue}Paris{imdonecallingatool}");?
|
I get where you're coming from. I'll play around with it. Another option is table driven tests. They work well at reducing duplication when you have many similar looking tests. One thing I don't like about some of the helpers at the top is that I have scroll up to find the actual content it compares against. |
I think a decent idea to combat that is to have a comment that keeps a template of all the tests with placeholders - then when you implement the model, you just move the comment to after the new model :) |
Unfortunately, it's not so cut and dry. The parser requires the tool definitions and reasoning format, because it is built on the fly. Also many templates require at least 1 user message and its requirements might differ between models. This isn't a problem with the previous parser because it doesn't require a template. At minimum, the functions would look like this: template <typename T>
static void test_hello_world(T parse, const std::string & input) {
common_chat_msg expected;
expected.role = "assistant";
expected.content = "Hello world";
test_parser_with_streaming(expected, input, parse);
}
// ...
// Test basic message
common_chat_templates_inputs inputs;
inputs.messages = {msg};
test_hello_world(make_peg_parser(tmpls.get(), inputs), "Hello world");And for a tool call: template <typename T>
static void test_tool_with_reasoning(T parse, const std::string & input) {
common_chat_msg expected;
expected.role = "assistant";
expected.reasoning_content = "I need to get the weather in New York City";
expected.tool_calls = {{
/* .name = */ "get_weather",
/* .arguments = */ R"({"location": "New York City, NY"})",
/* .id = */ {},
}};
test_parser_with_streaming(expected, input, parse);
}
// ...
// Define a set of tools
common_chat_tool tool_get_weather = {
/* .name = */ "get_weather",
/* .description = */ "get the weather",
/* .parameters = */ R"({
"type": "object",
"properties": {
"location": {"type": "string"}
}
})"
}
// ...
// Test tool calls
common_chat_templates_inputs inputs;
inputs.messages = {msg};
inputs.reasoning_format = COMMON_REASONING_FORMAT_AUTO;
inputs.tools = {tool_get_weather};
test_tool_with_reasoning(make_peg_parser(tmpls.get(), inputs),
"[THINK]I need to get the weather in New York City[/THINK]"
R"([TOOL_CALLS]get_weather[ARGS]{"location": "New York City, NY"})"
);So, keeping the idea of a convience function, how about something like this: // Test basic message
test_peg_parser(tmpls.get(), [&](peg_test_case & t) {
t.input = "Hello, world!\nWhat's up?";
t.expect = message_assist;
});
// Test basic message and reasoning with reasoning_format = none
test_peg_parser(tmpls.get(), [&](peg_test_case & t) {
t.input = "[THINK]I'm\nthinking[/THINK]Hello, world!\nWhat's up?";
t.expect.content = "[THINK]I'm\nthinking[/THINK]Hello, world!\nWhat's up?";
});
// Test basic message and reasoning with reasoning_format = auto
test_peg_parser(tmpls.get(), [&](peg_test_case & t) {
t.input = "[THINK]I'm\nthinking[/THINK]Hello, world!\nWhat's up?";
t.params.reasoning_format = COMMON_REASONING_FORMAT_AUTO;
t.expect = message_assist_thoughts;
});I also leveraged the predefined |
|
Yeah, this looks clean enough. I'll take a detailed look tomorrow. |
|
Just checking what is the status here - anything we can do to test/verify the implementation? |
|
@ggerganov I fixed the merge conflicts. The parsing works well. I tested a 20-30 turn agentic session several times. Optimal tool calling depends on minja updates to support the official chat template. The template throws an exception if the assistant message is null or empty. I'll submit a PR tomorrow. For the time being, the Unsloth template works. |
For my understanding, this currently works if we explicitly specify |
It's a bit nuanced. It works as is without the explicit template but it uses minja's polyfills which differ from how the model was trained. It may not work as Mistral intended. To get the best behavior, yes. Use the unsloth template and later it won't be needed when minja is updated. |
|
edit: nothing to do with the changes in this repo - it's from the actual code base
would like to say that it no longer builds after the changes you've made logs: |
(pkgs.fetchpatch {
url =
"https://github.com/ggml-org/llama.cpp/commit/636fc17a376dacc01da20d508e6986a299b1f819.patch";
revert = true;
hash = "sha256-zjjUOupJJQiMsgoGfjVXB9ylWQA74X1t5+TQOqBt/h8=";
})this fixes the issue for me so seems like that commit is the issue |
|
Now that Devstral has been released, people will undoubtedly want to play around with it. It shares the same chat output format as Ministral-Instruct, which is handled here. I added a temporary hack to fix the chat template on the fly, similar to how we did for How does that sound @ggerganov? @pwilkin Have you had a chance to take a look? I think we're waiting on you 😊 |
|
@flooryyyy are you applying the commits here as patches onto master? If so, the conflicts will prevent that. I have resolved those conflicts with a merge and not a rebase for the same reason. This shouldn't be an issue once this gets squashed and merged onto master. |
pwilkin
left a comment
There was a problem hiding this comment.
Yeah, it's looking quite nice and elegant :)
|
@pwilkin thanks! I'll merge once the tests pass |
|
You'd think they would know Pythonic (which jinja for large parts is) at Mistral, but apparently not, what is all this nonsense about checking for {% if message['content'] %} |
|
After patching the Mistral Vibe CLI to add They probably also need to add |
i'm on nixos so all i did was override the "llama-cpp" (llama-cpp-rocm) package attributes, reverse #17376 and then apply #17713 lol i forgot to hit "comment" i wrote this an hour after you asked |
… and new jinja template engine (ggml-org#1369) --------- Co-authored-by: Piotr Wilkin <[email protected]> common : add nemotron 3 parsing (ggml-org#18077) common : add parser for ministral/mistral large 3/devstral 2 (ggml-org#17713) common : default content to an empty string (ggml-org#18485) chat: make tool description and parameters optional per OpenAI spec (ggml-org#18478) Per the OpenAI API specification, both 'description' and 'parameters' fields in tool function definitions are optional. Previously, the parser would throw an exception if these fields were missing. Attempts to fix ggml-org#17667 common : implement new jinja template engine (ggml-org#18462) --------- Co-authored-by: Alde Rojas <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]> jinja: correct member access rule (ggml-org#18905) jinja : fix lexing of float literals with sign (ggml-org#18901) jinja : add missing tojson filter for bool (ggml-org#18900) jinja : attribute support for join, map and sort (ggml-org#18883) jinja : fix object item order (and properly implement dictsort) (ggml-org#18904) tests : add test-jinja -py option for cross-checking (ggml-org#18906) Co-authored-by: Sigbjørn Skjæret <[email protected]> --------- Co-authored-by: Sigbjørn Skjæret <[email protected]> ci : run test-jinja -py on high perf [no ci] (ggml-org#18916) jinja : fix undefined keys and attributes and int/float as bool (ggml-org#18924) jinja: support none|string (ggml-org#18995) Co-authored-by: Sigbjørn Skjæret <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]> --------- Co-authored-by: Sigbjørn Skjæret <[email protected]> jinja : implement mixed type object keys (ggml-org#18955) --------- Co-authored-by: Xuan Son Nguyen <[email protected]> jinja : undefined should be treated as sequence/iterable (return string/array) by filters/tests (ggml-org#19147) `tojson` is not a supported `undefined` filter keep it DRY and fix some types jinja : do not pass empty tools and add some none filters (ggml-org#19176) jinja : add unordered_map include to value.h [no ci] (ggml-org#19205) jinja : add missing 'in' test to template engine (ggml-org#19004) (ggml-org#19239) The jinja template parser was missing the 'in' test from global_builtins(), causing templates using reject("in", ...), select("in", ...), or 'x is in(y)' to fail with "selectattr: unknown test 'in'". This broke tool-calling for Qwen3-Coder and any other model whose chat template uses the 'in' test. Added test_is_in supporting array, string, and object containment checks, mirroring the existing 'in' operator logic in runtime.cpp. Includes test cases for all three containment types plus reject/select filter usage. Co-Authored-By: Claude Opus 4.5 <[email protected]> --------- Co-authored-by: Sid Mohan <[email protected]> Co-authored-by: Claude Opus 4.5 <[email protected]> Co-authored-by: Xuan Son Nguyen <[email protected]> Add Jinja support for "indent" string filter (ggml-org#19529) Co-authored-by: Sigbjørn Skjæret <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]> --------- Co-authored-by: Sigbjørn Skjæret <[email protected]> add vendor refactor chat server : support preserving reasoning_content in assistant message (ggml-org#18994) chat : fix translategemma crash on common_chat_format_example (ggml-org#19019) chat: fix language input for translategemma (ggml-org#19052) Co-authored-by: Aldehir Rojas <[email protected]> --------- Co-authored-by: Aldehir Rojas <[email protected]> chat: fix case where template accepts type content only (ggml-org#19419) mtmd : chat : Fix extra \n between text and media marker (ggml-org#19595) Thanks to @tugot17 for detecting and reporting the issue. For vision models (e.g. LFM2.5-VL-1.6B and Qwen/Qwen3-VL-4B-Instruct) `llama-mtmd-cli` produces identical output to HF implementation. However `llama-server` doesn't. I traced it down to extra newline inserted after `<__media__>`. This happens in `to_json_oaicompat`, that treats media markers as text and joins all parts with `\n` separator. PR introduces new type `media_marker` and uses it for media markers. Extra logic is added to prevent insertion of newlines before and after media markers. With this change number of input tokens is identical to HF implementation and as a result the output is also identical. I explored other ways to address the issue * remove completely `\n` between text parts in `to_json_oaicompat` * merge text messages in server-common.cpp before sending them to `to_json_oaicompat` Please propose alternative ways of fixing this issue. Co-authored-by: Piotr Wilkin (ilintar) <[email protected]> --------- Co-authored-by: Piotr Wilkin (ilintar) <[email protected]> common : merge qwen3-coder and nemotron nano 3 parsers (ggml-org#19765) common : fix improper trimming in XML parser on complete message (ggml-org#19805) Co-authored-by: Jules LEIDELINGER <[email protected]> jinja: correct stats for tojson and string filters (ggml-org#19785) jinja : correct default size for string slices (ggml-org#19913) common : handle unicode during partial json parsing (ggml-org#16526) common : fix json schema with '\' in literals (ggml-org#17307) add back qwen_coder_xml and mirothinker Co-authored-by: Aldehir Rojas <[email protected]>
Parser implementation for Ministral 3 Reasoning/Instruct, Mistral Large 3 Instruct, and Devstral 2. It deviates from the current Mistral implementation by accepting tool calls in the form:
[TOOL_CALLS]tool_name[ARGS]{"arg1": ... }...Features
reasoning_contentwhenreasoning_format = auto/deepseek. Ifreasoning_format = none, the traces are left incontent.systemandassistantmessages containingreasoning_contentinto{"type": "thinking", "thinking": "..."}content blocks the chat template expects. server: thinking type rejected as invalid but used by Ministral 3 #17700tool_choice = autoandtool_choice = required(with thinking).response_formatwith thinking.Additional Changes
reasoning_formatduring chat param init to build the appropriate parser.make_peg_parserhelper intests/test-chat.cppfor use with peg parsers.