Skip to content

qwen3/Ollama models: Streaming tool calls incompatibility with parseStreamingJson #4892

@PeterPetrelli2026

Description

@PeterPetrelli2026

qwen3/Ollama models: Streaming tool calls incompatibility with parseStreamingJson

🐛 Bug Description

Models that send complete JSON tool call arguments in a single streaming chunk (e.g., qwen3 via Ollama) fail to work correctly with Moltbot's openai-completions provider. The streaming parser expects incremental character-by-character argument transmission but receives complete JSON objects instead.

🔍 Root Cause

Expected behavior (OpenAI/Claude):

// Chunk 1
{"delta": {"tool_calls": [{"function": {"arguments": "{\"path"}}]}}

// Chunk 2
{"delta": {"tool_calls": [{"function": {"arguments": "\":\"MEMORY.md"}}]}}

// Chunk 3
{"delta": {"tool_calls": [{"function": {"arguments": "\"}"}}]}}

Actual behavior (qwen3/Ollama):

// Chunk 1 - Complete JSON sent at once
{"delta": {"tool_calls": [{"function": {"arguments": "{\"path\":\"MEMORY.md\"}"}}]}}

// Chunk 2 - Finish reason only
{"delta": {"role": "assistant", "content": ""}, "finish_reason": "tool_calls"}

This causes parseStreamingJson() in openai-completions.js to fail or return incomplete results, leading to:

  • Subagent timeouts (no response after tool execution)
  • NO_TOOL_RESULT errors
  • Tools being called but results not processed

📋 Reproduction Steps

  1. Configure Moltbot to use an Ollama model with tool support:

    model: ollama/lucifers/qwen3-30B-coder-tools.Q4_0:latest
  2. Spawn a subagent with a tool-calling task:

    sessions_spawn({
      task: "Read MEMORY.md and tell me the main sections",
      model: "ollama/lucifers/qwen3-30B-coder-tools.Q4_0:latest"
    })
  3. Observe: Subagent times out after 30-60 seconds with no output

🔧 Proposed Fix

Modify @mariozechner/pi-ai/dist/providers/openai-completions.js to handle both streaming patterns:

Location 1: Tool call delta processing (around line 217)

Before:

if (toolCall.function?.arguments) {
    delta = toolCall.function.arguments;
    currentBlock.partialArgs += toolCall.function.arguments;
    currentBlock.arguments = parseStreamingJson(currentBlock.partialArgs);
}

After:

if (toolCall.function?.arguments) {
    delta = toolCall.function.arguments;
    currentBlock.partialArgs += toolCall.function.arguments;
    
    // Handle models that send complete JSON in one chunk (e.g., qwen3/Ollama)
    // Try parsing as complete JSON first, fall back to streaming parser
    try {
        const completeJson = JSON.parse(currentBlock.partialArgs);
        currentBlock.arguments = completeJson;
    } catch {
        // Not a complete JSON yet, use streaming parser
        currentBlock.arguments = parseStreamingJson(currentBlock.partialArgs);
    }
}

Location 2: finishCurrentBlock function (around line 92)

Before:

else if (block.type === "toolCall") {
    block.arguments = JSON.parse(block.partialArgs || "{}");
    delete block.partialArgs;
    // ...
}

After:

else if (block.type === "toolCall") {
    // Only parse if arguments haven't been set yet (handles complete JSON from qwen3)
    if (Object.keys(block.arguments).length === 0 && block.partialArgs) {
        try {
            block.arguments = JSON.parse(block.partialArgs);
        } catch {
            block.arguments = {};
        }
    }
    delete block.partialArgs;
    // ...
}

✅ Verification

After applying the fix:

Test script:

# scripts/test_qwen_raw_response.py
import json
import requests

response = requests.post(
    "http://localhost:11434/v1/chat/completions",
    json={
        "model": "lucifers/qwen3-30B-coder-tools.Q4_0:latest",
        "messages": [{"role": "user", "content": "Read MEMORY.md"}],
        "tools": [{
            "type": "function",
            "function": {
                "name": "read",
                "description": "Read file",
                "parameters": {"type": "object", "properties": {"path": {"type": "string"}}}
            }
        }],
        "stream": True
    },
    stream=True
)

for line in response.iter_lines():
    if line and line.startswith(b'data: '):
        print(line.decode('utf-8'))

Expected result:

  • ✅ Tool calls are correctly parsed
  • ✅ Tool results are processed
  • ✅ Final response is generated within 20-30 seconds

Actual result after fix:

Stats: runtime 23s • tokens 7.3k • task completed successfully

🌍 Impact

This affects all Ollama models with tool/function calling support that don't implement character-by-character streaming of JSON arguments, including but not limited to:

  • qwen3 series (qwen3:32b, qwen3-30B-coder-tools, etc.)
  • Possibly other local models via Ollama

📚 Related

🧪 Test Environment

  • Moltbot version: 2026.1.27-beta.1
  • @mariozechner/pi-ai version: 0.49.3
  • Ollama version: Latest (serving qwen3-30B-coder-tools.Q4_0)
  • Node.js: v24.13.0 (though issue reproduced on v20.12.2 as well)

Note: This is a compatibility issue between Ollama's streaming implementation and the current parsing logic, not a bug in either system individually. The fix maintains backward compatibility with standard OpenAI-style streaming while adding support for complete-JSON-in-one-chunk patterns.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingclose:duplicateClosed as duplicatededupe:childDuplicate issue/PR child in dedupe cluster

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions