Skip to content

Eval bug: unsloth/Qwen3.5-35B-A3B-GGUF peg-native chat format parser fails when model outputs text before <tool_call> (thinking model + tool calling) #20260

@zhuangzhao923

Description

@zhuangzhao923

Name and Version

version: 8240 (d088d5b)
built with AppleClang 17.0.0.17000603 for Darwin arm64

Operating systems

Mac

GGML backends

Metal

Hardware

m4 max

Models

unsloth/Qwen3.5-35B-A3B-GGUF:Q8_0
https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF?show_file_info=Qwen3.5-35B-A3B-Q8_0.gguf

Problem description & steps to reproduce

When using a thinking model (Qwen3.5-35B-A3B) as a backend for GitHub Copilot Chat in Agent mode (VS Code), the server returns a 500 error Failed to parse input at pos N during multi-turn agentic tool-calling workflows.

Start the server

llama-server \
  -hf unsloth/Qwen3.5-35B-A3B-GGUF:Q8_0 \
  --jinja \
  --reasoning-format deepseek \
  -lv 4 \
  --log-timestamps \
  --log-file ./llamaLog.txt

The llama-server is configured as an OpenAI-compatible backend for VS Code's GitHub Copilot Chat Agent mode. In this workflow:

  1. Copilot sends a chat completion request with many tools (read_file, grep_search, semantic_search, list_dir, memory, etc.)
  2. The model thinks inside <think>...</think>, then calls a tool via <tool_call>...</tool_call>
  3. The tool result is returned to the model, which thinks again and calls more tools
  4. This multi-step loop continues until the model produces a final answer

During these multi-turn interactions, the model frequently outputs a short natural language transition sentence between </think> and <tool_call>, such as:

  • 让我再查看一些额外的信息来完善分析。 ("Let me check some more information to refine the analysis.")
  • 让我继续查看更多关键文件来完善分析。 ("Let me continue reviewing more key files.")

The lazy grammar trigger correctly identifies <tool_call> and constrains the generation from that point onward (the tool call XML is well-formed). However, when the post-generation PEG parser (Parsing PEG input with format peg-native) tries to parse the complete model output, it receives the entire output including the prefix text before <tool_call>. Since the grammar's root ::= tool-call expects the input to start with <tool_call>, any prefix text causes a parse failure. This breaks the entire agentic loop.

First Bad Commit

No response

Relevant log output

Logs
<tool_call>
<function=read_file>
<parameter=filePath>
/Users/zhao/Own/Projects/Smortex/pnpm-workspace.yaml
</parameter>
<parameter=startLine>
1
</parameter>
<parameter=endLine>
100
</parameter>
</function>
</tool_call>
�[0mParsing PEG input with format peg-native: 让我再查看一些额外的信息来完善分析。

<tool_call>
<function=read_file>
<parameter=filePath>
/Users/zhao/Own/Projects/Smortex/pnpm-workspace.yaml
</parameter>
<parameter=startLine>
1
</parameter>
<parameter=endLine>
100
</parameter>
</function>
</tool_call>
�[0msrv    operator(): http: streamed chunk: data: {"error":{"code":500,"message":"Failed to parse input at pos 274: ","type":"server_error"}}


�[0msrv    operator(): http: stream ended
�[0mres  remove_waiti: remove task 2051 from waiting list. cu

[llamaLog.txt](https://github.com/user-attachments/files/25830831/llamaLog.txt)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions