[Feature]: Add native Ollama API provider for streaming + tool calling support

## Summary

OpenClaw currently connects to Ollama through the OpenAI compatibility layer (`/v1/chat/completions`), which silently drops tool calls when streaming is enabled. Since OpenClaw hardcodes `stream: true`, **no Ollama model can use tools** — the model decides to call a tool, but the streaming response returns empty content with `finish_reason: "stop"`, losing the tool call entirely.

Meanwhile, Ollama's native API (`/api/chat`) has fully supported streaming + tool calling since May 2025 ([blog post](https://ollama.com/blog/streaming-tool), [PR ollama/ollama#10415](https://github.com/ollama/ollama/pull/10415)). The problem isn't Ollama — it's that OpenClaw routes through a broken compatibility layer instead of using the native endpoint.

## Root causes identified (3 issues)

| # | Problem | Impact |
|---|---------|--------|
| 1 | **OpenAI compat endpoint drops tool_calls** when streaming | Tool calls silently lost — model produces them, response doesn't contain them |
| 2 | **Ollama sends tool_calls in intermediate chunks** (`done:false`), not the final `done:true` chunk | Native API client must accumulate tool_calls across all chunks |
| 3 | **Ollama defaults `num_ctx` to 4096** tokens regardless of model's actual context window | Large system prompts + 23 tool definitions get silently truncated, model never sees the tool schemas |

## Proposed solution

Add a dedicated `ollama` API provider type that talks to Ollama's native `/api/chat` endpoint directly, with proper streaming chunk handling and context window configuration.

**Config example:**

```json
{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://localhost:11434",
        "api": "ollama",
        "models": [{
          "id": "qwen3:32b",
          "name": "Qwen3 32B",
          "reasoning": true,
          "input": ["text"],
          "contextWindow": 131072,
          "maxTokens": 16384
        }]
      }
    }
  }
}
```

**What this enables:**

| Aspect | `openai-completions` (current) | `ollama` (proposed) |
|--------|-------------------------------|---------------------|
| Endpoint | `/v1/chat/completions` | `/api/chat` |
| Streaming + Tools | ❌ Broken | ✅ Works |
| Response format | OpenAI schema | Ollama native schema |
| Context window | Not configurable | Set via `num_ctx` from model config |
| Tool call parsing | N/A (dropped) | Accumulates from intermediate chunks |

**Implementation scope:**

1. Add `"ollama"` to the `Api` type union
2. Create native Ollama API client (request/response mapping)
3. Handle streaming chunks — accumulate `tool_calls` from intermediate `done:false` chunks
4. Set `num_ctx` from model's `contextWindow` config (default 65536) to prevent prompt truncation
5. Convert messages/tools between SDK format and Ollama native format

## Verified behavior

Tested with `qwen3:32b` (32B parameters) on MacBook Pro M4 Pro 48GB:

- ✅ **curl → Ollama native API** with `num_ctx=65536` + 23 tools + system prompt → `tool_calls` generated correctly
- ✅ **Streaming text** works with all Ollama models
- ✅ **Tool call accumulation** from intermediate chunks works
- ✅ All 13 unit tests pass

## Alternatives considered

- **PR #5783 (`streamToolCalls: false` fallback):** Disables streaming when tools are present. This works but sacrifices the streaming UX — users see no output until the full response is ready.
- **Wait for Ollama to fix `/v1/chat/completions`:** Tracked in [ollama#12557](https://github.com/ollama/ollama/issues/12557), but no timeline. The native API already works, so there's no reason to wait.
- **jokelord's `supportedParameters` patch:** Adds config-level tool support declaration for local models (sglang/vLLM). Solves a different problem (tool detection) but doesn't fix the streaming issue with Ollama.

## Additional context

- #5769 — Original issue: streaming breaks tool calling for Ollama
- #5783 — Workaround PR (disable streaming when tools present)
- [Ollama streaming tool calling blog post](https://ollama.com/blog/streaming-tool) (May 2025)
- [ollama/ollama#10415](https://github.com/ollama/ollama/pull/10415) — Ollama's fix for native API
- [ollama/ollama#12557](https://github.com/ollama/ollama/issues/12557) — OpenAI compat endpoint still broken
- [OpenCode issue #1034](https://github.com/sst/opencode/issues/1034) — Same `num_ctx` problem reported in OpenCode project

**Tested environment:**
- OpenClaw v2026.1.29
- Ollama v0.15.4
- Models: qwen3:32b, glm-4.7-flash, mistral-small3.1:24b, devstral
- OS: macOS (Apple Silicon M4 Pro, 48GB)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Add native Ollama API provider for streaming + tool calling support #11828

Summary

Root causes identified (3 issues)

Proposed solution

Verified behavior

Alternatives considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

#	Problem	Impact
1	OpenAI compat endpoint drops tool_calls when streaming	Tool calls silently lost — model produces them, response doesn't contain them
2	Ollama sends tool_calls in intermediate chunks (`done:false`), not the final `done:true` chunk	Native API client must accumulate tool_calls across all chunks
3	Ollama defaults `num_ctx` to 4096 tokens regardless of model's actual context window	Large system prompts + 23 tool definitions get silently truncated, model never sees the tool schemas

Aspect	`openai-completions` (current)	`ollama` (proposed)
Endpoint	`/v1/chat/completions`	`/api/chat`
Streaming + Tools	❌ Broken	✅ Works
Response format	OpenAI schema	Ollama native schema
Context window	Not configurable	Set via `num_ctx` from model config
Tool call parsing	N/A (dropped)	Accumulates from intermediate chunks

Uh oh!

[Feature]: Add native Ollama API provider for streaming + tool calling support #11828

Description

Summary

Root causes identified (3 issues)

Proposed solution

Verified behavior

Alternatives considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions