-
-
Notifications
You must be signed in to change notification settings - Fork 68.9k
qwen3/Ollama models: Streaming tool calls incompatibility with parseStreamingJson #4892
Description
qwen3/Ollama models: Streaming tool calls incompatibility with parseStreamingJson
🐛 Bug Description
Models that send complete JSON tool call arguments in a single streaming chunk (e.g., qwen3 via Ollama) fail to work correctly with Moltbot's openai-completions provider. The streaming parser expects incremental character-by-character argument transmission but receives complete JSON objects instead.
🔍 Root Cause
Expected behavior (OpenAI/Claude):
// Chunk 1
{"delta": {"tool_calls": [{"function": {"arguments": "{\"path"}}]}}
// Chunk 2
{"delta": {"tool_calls": [{"function": {"arguments": "\":\"MEMORY.md"}}]}}
// Chunk 3
{"delta": {"tool_calls": [{"function": {"arguments": "\"}"}}]}}Actual behavior (qwen3/Ollama):
// Chunk 1 - Complete JSON sent at once
{"delta": {"tool_calls": [{"function": {"arguments": "{\"path\":\"MEMORY.md\"}"}}]}}
// Chunk 2 - Finish reason only
{"delta": {"role": "assistant", "content": ""}, "finish_reason": "tool_calls"}This causes parseStreamingJson() in openai-completions.js to fail or return incomplete results, leading to:
- Subagent timeouts (no response after tool execution)
NO_TOOL_RESULTerrors- Tools being called but results not processed
📋 Reproduction Steps
-
Configure Moltbot to use an Ollama model with tool support:
model: ollama/lucifers/qwen3-30B-coder-tools.Q4_0:latest
-
Spawn a subagent with a tool-calling task:
sessions_spawn({ task: "Read MEMORY.md and tell me the main sections", model: "ollama/lucifers/qwen3-30B-coder-tools.Q4_0:latest" })
-
Observe: Subagent times out after 30-60 seconds with no output
🔧 Proposed Fix
Modify @mariozechner/pi-ai/dist/providers/openai-completions.js to handle both streaming patterns:
Location 1: Tool call delta processing (around line 217)
Before:
if (toolCall.function?.arguments) {
delta = toolCall.function.arguments;
currentBlock.partialArgs += toolCall.function.arguments;
currentBlock.arguments = parseStreamingJson(currentBlock.partialArgs);
}After:
if (toolCall.function?.arguments) {
delta = toolCall.function.arguments;
currentBlock.partialArgs += toolCall.function.arguments;
// Handle models that send complete JSON in one chunk (e.g., qwen3/Ollama)
// Try parsing as complete JSON first, fall back to streaming parser
try {
const completeJson = JSON.parse(currentBlock.partialArgs);
currentBlock.arguments = completeJson;
} catch {
// Not a complete JSON yet, use streaming parser
currentBlock.arguments = parseStreamingJson(currentBlock.partialArgs);
}
}Location 2: finishCurrentBlock function (around line 92)
Before:
else if (block.type === "toolCall") {
block.arguments = JSON.parse(block.partialArgs || "{}");
delete block.partialArgs;
// ...
}After:
else if (block.type === "toolCall") {
// Only parse if arguments haven't been set yet (handles complete JSON from qwen3)
if (Object.keys(block.arguments).length === 0 && block.partialArgs) {
try {
block.arguments = JSON.parse(block.partialArgs);
} catch {
block.arguments = {};
}
}
delete block.partialArgs;
// ...
}✅ Verification
After applying the fix:
Test script:
# scripts/test_qwen_raw_response.py
import json
import requests
response = requests.post(
"http://localhost:11434/v1/chat/completions",
json={
"model": "lucifers/qwen3-30B-coder-tools.Q4_0:latest",
"messages": [{"role": "user", "content": "Read MEMORY.md"}],
"tools": [{
"type": "function",
"function": {
"name": "read",
"description": "Read file",
"parameters": {"type": "object", "properties": {"path": {"type": "string"}}}
}
}],
"stream": True
},
stream=True
)
for line in response.iter_lines():
if line and line.startswith(b'data: '):
print(line.decode('utf-8'))Expected result:
- ✅ Tool calls are correctly parsed
- ✅ Tool results are processed
- ✅ Final response is generated within 20-30 seconds
Actual result after fix:
Stats: runtime 23s • tokens 7.3k • task completed successfully
🌍 Impact
This affects all Ollama models with tool/function calling support that don't implement character-by-character streaming of JSON arguments, including but not limited to:
- qwen3 series (qwen3:32b, qwen3-30B-coder-tools, etc.)
- Possibly other local models via Ollama
📚 Related
- Ollama API docs: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion
- OpenAI streaming format: https://platform.openai.com/docs/api-reference/streaming
🧪 Test Environment
- Moltbot version: 2026.1.27-beta.1
- @mariozechner/pi-ai version: 0.49.3
- Ollama version: Latest (serving qwen3-30B-coder-tools.Q4_0)
- Node.js: v24.13.0 (though issue reproduced on v20.12.2 as well)
Note: This is a compatibility issue between Ollama's streaming implementation and the current parsing logic, not a bug in either system individually. The fix maintains backward compatibility with standard OpenAI-style streaming while adding support for complete-JSON-in-one-chunk patterns.