-
-
Notifications
You must be signed in to change notification settings - Fork 39.6k
Description
Summary
OpenClaw currently connects to Ollama through the OpenAI compatibility layer (/v1/chat/completions), which silently drops tool calls when streaming is enabled. Since OpenClaw hardcodes stream: true, no Ollama model can use tools — the model decides to call a tool, but the streaming response returns empty content with finish_reason: "stop", losing the tool call entirely.
Meanwhile, Ollama's native API (/api/chat) has fully supported streaming + tool calling since May 2025 (blog post, PR ollama/ollama#10415). The problem isn't Ollama — it's that OpenClaw routes through a broken compatibility layer instead of using the native endpoint.
Root causes identified (3 issues)
| # | Problem | Impact |
|---|---|---|
| 1 | OpenAI compat endpoint drops tool_calls when streaming | Tool calls silently lost — model produces them, response doesn't contain them |
| 2 | Ollama sends tool_calls in intermediate chunks (done:false), not the final done:true chunk |
Native API client must accumulate tool_calls across all chunks |
| 3 | Ollama defaults num_ctx to 4096 tokens regardless of model's actual context window |
Large system prompts + 23 tool definitions get silently truncated, model never sees the tool schemas |
Proposed solution
Add a dedicated ollama API provider type that talks to Ollama's native /api/chat endpoint directly, with proper streaming chunk handling and context window configuration.
Config example:
{
"models": {
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434",
"api": "ollama",
"models": [{
"id": "qwen3:32b",
"name": "Qwen3 32B",
"reasoning": true,
"input": ["text"],
"contextWindow": 131072,
"maxTokens": 16384
}]
}
}
}
}What this enables:
| Aspect | openai-completions (current) |
ollama (proposed) |
|---|---|---|
| Endpoint | /v1/chat/completions |
/api/chat |
| Streaming + Tools | ❌ Broken | ✅ Works |
| Response format | OpenAI schema | Ollama native schema |
| Context window | Not configurable | Set via num_ctx from model config |
| Tool call parsing | N/A (dropped) | Accumulates from intermediate chunks |
Implementation scope:
- Add
"ollama"to theApitype union - Create native Ollama API client (request/response mapping)
- Handle streaming chunks — accumulate
tool_callsfrom intermediatedone:falsechunks - Set
num_ctxfrom model'scontextWindowconfig (default 65536) to prevent prompt truncation - Convert messages/tools between SDK format and Ollama native format
Verified behavior
Tested with qwen3:32b (32B parameters) on MacBook Pro M4 Pro 48GB:
- ✅ curl → Ollama native API with
num_ctx=65536+ 23 tools + system prompt →tool_callsgenerated correctly - ✅ Streaming text works with all Ollama models
- ✅ Tool call accumulation from intermediate chunks works
- ✅ All 13 unit tests pass
Alternatives considered
- PR fix(ollama): add streamToolCalls fallback for tool calling #5783 (
streamToolCalls: falsefallback): Disables streaming when tools are present. This works but sacrifices the streaming UX — users see no output until the full response is ready. - Wait for Ollama to fix
/v1/chat/completions: Tracked in ollama#12557, but no timeline. The native API already works, so there's no reason to wait. - jokelord's
supportedParameterspatch: Adds config-level tool support declaration for local models (sglang/vLLM). Solves a different problem (tool detection) but doesn't fix the streaming issue with Ollama.
Additional context
- Ollama (and other local models): streaming breaks tool calling — need stream:false fallback #5769 — Original issue: streaming breaks tool calling for Ollama
- fix(ollama): add streamToolCalls fallback for tool calling #5783 — Workaround PR (disable streaming when tools present)
- Ollama streaming tool calling blog post (May 2025)
- ollama/ollama#10415 — Ollama's fix for native API
- ollama/ollama#12557 — OpenAI compat endpoint still broken
- OpenCode issue #1034 — Same
num_ctxproblem reported in OpenCode project
Tested environment:
- OpenClaw v2026.1.29
- Ollama v0.15.4
- Models: qwen3:32b, glm-4.7-flash, mistral-small3.1:24b, devstral
- OS: macOS (Apple Silicon M4 Pro, 48GB)