-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Eval bug: unsloth/Qwen3.5-35B-A3B-GGUF peg-native chat format parser fails when model outputs text before <tool_call> (thinking model + tool calling) #20260
Description
Name and Version
version: 8240 (d088d5b)
built with AppleClang 17.0.0.17000603 for Darwin arm64
Operating systems
Mac
GGML backends
Metal
Hardware
m4 max
Models
unsloth/Qwen3.5-35B-A3B-GGUF:Q8_0
https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF?show_file_info=Qwen3.5-35B-A3B-Q8_0.gguf
Problem description & steps to reproduce
When using a thinking model (Qwen3.5-35B-A3B) as a backend for GitHub Copilot Chat in Agent mode (VS Code), the server returns a 500 error Failed to parse input at pos N during multi-turn agentic tool-calling workflows.
Start the server
llama-server \
-hf unsloth/Qwen3.5-35B-A3B-GGUF:Q8_0 \
--jinja \
--reasoning-format deepseek \
-lv 4 \
--log-timestamps \
--log-file ./llamaLog.txtThe llama-server is configured as an OpenAI-compatible backend for VS Code's GitHub Copilot Chat Agent mode. In this workflow:
- Copilot sends a chat completion request with many tools (
read_file,grep_search,semantic_search,list_dir,memory, etc.) - The model thinks inside
<think>...</think>, then calls a tool via<tool_call>...</tool_call> - The tool result is returned to the model, which thinks again and calls more tools
- This multi-step loop continues until the model produces a final answer
During these multi-turn interactions, the model frequently outputs a short natural language transition sentence between </think> and <tool_call>, such as:
让我再查看一些额外的信息来完善分析。("Let me check some more information to refine the analysis.")让我继续查看更多关键文件来完善分析。("Let me continue reviewing more key files.")
The lazy grammar trigger correctly identifies <tool_call> and constrains the generation from that point onward (the tool call XML is well-formed). However, when the post-generation PEG parser (Parsing PEG input with format peg-native) tries to parse the complete model output, it receives the entire output including the prefix text before <tool_call>. Since the grammar's root ::= tool-call expects the input to start with <tool_call>, any prefix text causes a parse failure. This breaks the entire agentic loop.
First Bad Commit
No response
Relevant log output
Logs
<tool_call>
<function=read_file>
<parameter=filePath>
/Users/zhao/Own/Projects/Smortex/pnpm-workspace.yaml
</parameter>
<parameter=startLine>
1
</parameter>
<parameter=endLine>
100
</parameter>
</function>
</tool_call>
�[0mParsing PEG input with format peg-native: 让我再查看一些额外的信息来完善分析。
<tool_call>
<function=read_file>
<parameter=filePath>
/Users/zhao/Own/Projects/Smortex/pnpm-workspace.yaml
</parameter>
<parameter=startLine>
1
</parameter>
<parameter=endLine>
100
</parameter>
</function>
</tool_call>
�[0msrv operator(): http: streamed chunk: data: {"error":{"code":500,"message":"Failed to parse input at pos 274: ","type":"server_error"}}
�[0msrv operator(): http: stream ended
�[0mres remove_waiti: remove task 2051 from waiting list. cu
[llamaLog.txt](https://github.com/user-attachments/files/25830831/llamaLog.txt)