Skip to content

feat(ollama): add native /api/chat provider for streaming + tool calling#11853

Merged
steipete merged 15 commits intoopenclaw:mainfrom
BrokenFinger98:feat/11828-ollama-native-api
Feb 14, 2026
Merged

feat(ollama): add native /api/chat provider for streaming + tool calling#11853
steipete merged 15 commits intoopenclaw:mainfrom
BrokenFinger98:feat/11828-ollama-native-api

Conversation

@BrokenFinger98
Copy link
Contributor

@BrokenFinger98 BrokenFinger98 commented Feb 8, 2026

Summary

Adds a native Ollama /api/chat provider to enable streaming + tool calling with local LLMs.

Three critical fixes for Ollama tool calling:

  • Switch from broken OpenAI compat endpoint to native /api/chat (tool_calls were silently dropped)
  • Accumulate tool_calls from intermediate streaming chunks (Ollama sends them in done:false, not done:true)
  • Set num_ctx from model's contextWindow config (Ollama defaults to 4096, truncating system prompts)

Changes

File Change
src/agents/ollama-stream.ts New native Ollama streaming client with message/tool conversion, NDJSON parser, num_ctx config
src/agents/ollama-stream.test.ts 13 unit tests covering message conversion, response building, and streaming
src/agents/pi-embedded-runner/run/attempt.ts Inject createOllamaStreamFn when model.api === "ollama"
src/agents/providers.ts Register "ollama" as valid API type

Verified with local testing

  • qwen3:32b + 23 tools + system prompt → tool_calls generated correctly (with num_ctx=65536)
  • Streaming text works with all tested Ollama models
  • All 13 unit tests pass, oxlint clean, oxfmt clean

Test plan

  • Unit tests: vitest run src/agents/ollama-stream.test.ts (13 tests)
  • Lint: oxlint src/agents/ollama-stream.ts (0 errors)
  • Format: oxfmt src/agents/ollama-stream.ts
  • curl verification: native API + 23 tools + num_ctx=65536 → tool_calls work
  • E2E: Run OpenClaw with api: "ollama" and verify tool calling loop

Closes #11828
Fixes #4028
Fixes #8630

@openclaw-barnacle openclaw-barnacle bot added docs Improvements or additions to documentation agents Agent runtime and tooling labels Feb 8, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@BrokenFinger98 BrokenFinger98 force-pushed the feat/11828-ollama-native-api branch 4 times, most recently from 19d1576 to 9989c63 Compare February 9, 2026 00:46
@jerinoommen22
Copy link

is this the reason why all the tools dont work? @BrokenFinger98

@BrokenFinger98
Copy link
Contributor Author

Yes, this PR addresses the root causes. There were actually 3 issues preventing tools from working with Ollama:

  1. num_ctx default of 4096 — Ollama silently truncates the system prompt + tool definitions when they exceed 4096 tokens. This PR sets num_ctx from the model's configured contextWindow, ensuring the full prompt is processed.

  2. OpenAI-compat endpoint drops tool_calls — The /v1/chat/completions endpoint loses tool call data during streaming. This PR uses the native /api/chat endpoint instead.

  3. tool_calls sent in intermediate chunks — Ollama sends tool_calls in done:false chunks, not in the final done:true chunk. This PR accumulates them across all chunks.

Are you also experiencing tool calling issues with Ollama? If so, which model are you using?

@jerinoommen22
Copy link

@BrokenFinger98 Yeah, Im running on qwen3-coder

@BrokenFinger98
Copy link
Contributor Author

@jerinoommen22 Great, qwen3-coder is the same Qwen3 family — so this PR should directly help your setup too.

Our E2E test results with qwen3:32b on this branch:

  • Tool calling works correctly when requesting in English (e.g., exec tool ran date && whoami && uname -m and returned real output successfully).
  • However, when requesting in non-English languages (we tested Korean), the model often fails to invoke tools and instead hallucinates output. This appears to be a model-level limitation, not a code issue — the native Ollama API is sending the correct tool definitions and the model just doesn't reliably follow tool-use instructions in non-English.

Could you help us test with a larger model? We suspect a 70B model (e.g., llama3.3:70b or qwen3:72b) would handle tool calling more reliably across languages. Unfortunately, our test machine (M4 Pro 48GB) can't comfortably run 70B models.

If you have the hardware for it, please pull this branch and test:

git fetch origin feat/11828-ollama-native-api
git checkout feat/11828-ollama-native-api

Let us know how tool calling behaves with qwen3-coder on this branch!

@stintel
Copy link

stintel commented Feb 9, 2026

Thanks for this PR. It makes ollama usable. Tested with devstral-small-2:24b. Unfortunately don't have access to bigger hardware to test a bigger model.

@BrokenFinger98
Copy link
Contributor Author

@stintel Thanks for testing and confirming! Great to hear it works well with devstral-small-2:24b.

That's now 3 different model families confirmed working with this PR:

Model Tester Status
qwen3:32b @BrokenFinger98 ✅ Tool calling works
GLM-4.7 Flash @blastronaut (via #11828) ✅ Working nicely
devstral-small-2:24b @stintel ✅ Makes Ollama usable

No worries about bigger hardware — the fact that it works across multiple model families at different sizes is already strong validation. Appreciate the feedback!

@lym000000
Copy link

@BrokenFinger98 I’ve been testing this on a custom Qwen3-4B-Instruct model, and it looks like tool calling doesn’t work unless there’s full compatibility with Ollama’s streaming tool-calling spec:
https://docs.ollama.com/capabilities/tool-calling#tool-calling-with-streaming

This fork + commit works for me:
lym000000@fee6794

After applying the additional fixes, I’m now getting consistent tool calling, even on the low-end 4B model.
image

It was vibe-coded using Kimi K2.5 Instant with the prompt:

Make it work with Ollama native API for tool calling with streaming https://docs.ollama.com/capabilities/tool-calling#tool-calling-with-streaming

image image image

Model: https://huggingface.co/HauhauCS/Qwen3-4B-2507-Instruct-Uncensored-HauhauCS-Aggressive/blob/main/Qwen3-4B-2507-Instruct-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf
Modelfile:

FROM ./Qwen3-4B-2507-Instruct-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf

PARAMETER stop <|im_end|>
PARAMETER stop <|endoftext|>

TEMPLATE """
{{- if .Messages }}
{{- if or .System .Tools }}<|im_start|>system
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call>
{{- end }}<|im_end|>
{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ if .Content }}{{ .Content }}
{{- else if .ToolCalls }}<tool_call>
{{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}</tool_call>
{{- end }}{{ if not $last }}<|im_end|>
{{ end }}
{{- else if eq .Role "tool" }}<|im_start|>user
<tool_response>
{{ .Content }}
</tool_response><|im_end|>
{{ end }}
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}
{{- else }}
{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}
"""

Why this model: to test tool calling under extreme, low-quality, and high-risk conditions in a containerized setup, to surface edge cases that don’t usually appear with safer or higher-end models.

Why a custom Modelfile: the stock Ollama template didn’t work reliably in my setup.

@blastronaut
Copy link

blastronaut commented Feb 9, 2026

it would be also nice if this integrated with memory search for consistency sake

Invalid config at /Users/xxx/.openclaw/openclaw.json:\n- agents.defaults.memorySearch.provider: Invalid input

 "memorySearch": {

        "enabled": true,

        "provider": "ollama",

        "model": "qwen3-embedding:0.6b",

        "sources": [

          "memory"

        ]

      }

Invalid config at /Users/xxx/.openclaw/openclaw.json:\n- agents.defaults.memorySearch.provider: Invalid input

@BrokenFinger98
Copy link
Contributor Author

@lym000000 Thanks for the detailed testing and the fork reference! Great to see tool calling working even on a 4B model.

I've adopted your key finding — tool_name in tool result messages. Just pushed commit 83d38c2 which:

  • Adds tool_name field to OllamaChatMessage interface
  • Extracts toolName from SDK toolResult messages and forwards it to the native /api/chat endpoint
  • Includes 2 new test cases covering presence and absence of tool_name

Regarding the other changes in your fork:

  • break removal after done:true: Our current approach works reliably across all tested models (qwen3:32b, devstral, GLM-4.7). Keeping the break for now as it's simpler and we haven't seen edge cases where chunks arrive after done:true.
  • thinking field accumulation: Interesting for qwen3 models, but it's a separate concern — we'd want to handle that in a follow-up rather than expanding this PR's scope.

Your custom Modelfile approach for GGUF models is a great workaround. Would be interesting to see if the tool_name fix alone resolves your issues without the custom template.

That brings us to 4 confirmed model families:

Model Tester Status
qwen3:32b @BrokenFinger98 ✅ Tool calling works
GLM-4.7 Flash @blastronaut ✅ Working nicely
devstral-small-2:24b @stintel ✅ Makes Ollama usable
Qwen3-4B (custom GGUF) @lym000000 ✅ Works with additional fixes

@BrokenFinger98
Copy link
Contributor Author

@blastronaut Good point about memory search consistency. However, memorySearch.provider validation is handled by the config schema layer (src/config/schema.ts), which is outside the scope of this PR. This PR focuses on the streaming/tool-calling client for the native /api/chat endpoint.

Could you open a separate issue for Ollama as a memorySearch.provider? That would involve:

  1. Adding "ollama" to the allowed provider enum in the config schema
  2. Ensuring the embedding endpoint (/api/embed) is supported
  3. Wiring up the Ollama embedding model configuration

Happy to help with that in a follow-up PR!

@BrokenFinger98 BrokenFinger98 force-pushed the feat/11828-ollama-native-api branch 2 times, most recently from 6c19c22 to 3c67d50 Compare February 10, 2026 11:54
@lym000000
Copy link

I've adopted your key finding — tool_name in tool result messages. Just pushed commit 83d38c2 which:

Regarding the other changes in your fork:

  • break removal after done:true: Our current approach works reliably across all tested models (qwen3:32b, devstral, GLM-4.7). Keeping the break for now as it's simpler and we haven't seen edge cases where chunks arrive after done:true.
  • thinking field accumulation: Interesting for qwen3 models, but it's a separate concern — we'd want to handle that in a follow-up rather than expanding this PR's scope.

@BrokenFinger98
Pulled and tested, confirmed that it works without the additional changes.
image
image

Thanks!

@gonesurfing
Copy link

This is excellent work! For the first time, I was able to get a small model to use a tool.

Current: ollama/qwen3:4b-instruct-2507-q4_K_M

I'm going to do more testing, but this is already 100 times further than I've gotten so far.

@m08594589-source
Copy link

@stintel Thanks for testing and confirming! Great to hear it works well with devstral-small-2:24b.@stintel Thanks for testing and confirming! Great to hear it works well with devstral-small-2:24b.@stintel 感谢测试和确认!很高兴听到它与devstral-small-2:24b配合良好。

That's now 3 different model families confirmed working with this PR:That's now 3 different model families confirmed working with this PR:现在已经确认有3个不同的模型系列可用于此PR:

Model 模型 Tester 测试者 Status 状态
qwen3:32b @BrokenFinger98@BrokenFinger98 @断指98 ✅ Tool calling works ✅ 工具调用正常
GLM-4.7 Flash @blastronaut (via #11828) @bl宇航员(通过#11828) ✅ Working nicely ✅ 运行良好
devstral-small-2:24b @stintel@stintel @stintel ✅ Makes Ollama usable ✅ 让Ollama变得可用
No worries about bigger hardware — the fact that it works across multiple model families at different sizes is already strong validation. Appreciate the feedback!No worries about bigger hardware — the fact that it works across multiple model families at different sizes is already strong validation. Appreciate the feedback!不用担心硬件规格更高的问题——它能在不同规模的多个模型系列上运行,这本身就是有力的验证。感谢你的反馈!

Could you take a look at the 'openclaw.json' configuration file? The qwen3-32b-awq model I use always returns <tool_call> instead of calling the tool

@BrokenFinger98 BrokenFinger98 force-pushed the feat/11828-ollama-native-api branch from 3c67d50 to ab6d099 Compare February 12, 2026 03:33
@BrokenFinger98
Copy link
Contributor Author

@gonesurfing Great to hear! That's now 5 confirmed model families working with the native API:

Model Tester Status
qwen3:32b @BrokenFinger98 ✅ Tool calling works
GLM-4.7 Flash @blastronaut ✅ Working nicely
devstral-small-2:24b @stintel ✅ Makes Ollama usable
Qwen3-4B (custom GGUF) @lym000000 ✅ Works with tool_name fix
qwen3:4b-instruct @gonesurfing ✅ First small model confirmed

Exciting that even a 4B quantized model can handle tool calls with the native API. Looking forward to your further testing results!

@BrokenFinger98
Copy link
Contributor Author

@m08594589-source The <tool_call> text output (instead of structured JSON tool calls) is a known issue with AWQ-quantized models in Ollama.

Root cause: Ollama's structured tool calling requires a specific chat template baked into the model's Modelfile. The standard qwen3:32b from the Ollama library includes this template, but AWQ imports typically don't — so the model falls back to outputting raw <tool_call> tags as plain text.

Recommended fix:

  1. Use the standard Ollama library version instead:

    ollama pull qwen3:32b
    

    This includes the correct chat template for structured tool calling and works out of the box.

  2. If you specifically need the AWQ version, you'd need a custom Modelfile with the proper tool calling template. But the standard GGUF quantizations (Q4_K_M, Q8_0) generally perform just as well for tool calling.

The openclaw.json configuration itself looks fine — this is an Ollama model template issue, not a provider config issue.

@gonesurfing
Copy link

ollama/llama3.2:3b-instruct-q4_K_M with 16k context continues to work well with write and exec tools.

ollama/qwen3:1.7b-q4_K_M works with write as long as context is set to 32k. Write won't work at all with 16k. Exec won't work with either, as it gives a sandbox security error (maybe a hallucination?).

Even stripping down all the tools the context is still usually over 20k. That's another issue though, but it affects tool reliability when the context gets cutoff between ollama and openclaw.

I still think this PR fixes the underlying inability of openclaw to use tools with local ollama models.

BrokenFinger98 and others added 9 commits February 14, 2026 01:20
)

- Handle SDK "toolResult" role (camelCase) in message conversion
- Replace module-level mutable counter with crypto.randomUUID()
- Extract and pass tools from context to Ollama request body
- Unify duplicate OLLAMA_BASE_URL constants
- Remove unused SimpleStreamOptions import
- Add warning logs for malformed NDJSON lines
- Fix tool call test assertions (empty content, UUID format)
…penclaw#11828)

- Use createAssistantMessageEventStream() factory instead of class constructor
- Align content types: toolCall (not tool_use), arguments (not input)
- Use SDK StopReason: "toolUse" (not "end_turn"), "error" event type
- Cast context through unknown to satisfy strict type checks
- Import randomUUID from node:crypto explicitly
- Remove unused Context import
- Fix oxlint: remove useless spread fallback and unnecessary type assertions
- Fix oxfmt formatting
…penclaw#11828)

Ollama sends tool_calls in done:false chunks, not in the final done:true
chunk. The previous code only checked the final chunk, silently dropping
all tool call responses. Also removes debug console.warn logging.
…nclaw#11828)

Ollama defaults num_ctx to 4096 tokens, which silently truncates
large system prompts and tool definitions. This caused local models
to miss tool schemas entirely and respond with plain text instead
of tool_calls.

Set num_ctx from the model's configured contextWindow (fallback 65536)
so the full prompt + all tool definitions fit in the context.
…pec (openclaw#11828)

Ollama's native /api/chat spec accepts tool_name on tool result messages
to help models associate results with the originating tool call. Extract
toolName from SDK toolResult messages and forward it to improve tool
calling reliability, especially on smaller models (e.g. 4B).
…licate name (openclaw#11828)

CI code-size check flags duplicate function names across files.
Renamed to avoid collision with ui/src/ui/chat/grouped-render.ts.
@steipete steipete force-pushed the feat/11828-ollama-native-api branch from 33fbf09 to 0a723f9 Compare February 14, 2026 00:20
@steipete steipete merged commit 1170229 into openclaw:main Feb 14, 2026
9 checks passed
@steipete
Copy link
Contributor

Merged via squash.

Thanks @BrokenFinger98!

steipete added a commit to azade-c/openclaw that referenced this pull request Feb 14, 2026
…ing (openclaw#11853)

Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 0a723f9
Co-authored-by: BrokenFinger98 <[email protected]>
Co-authored-by: steipete <[email protected]>
Reviewed-by: @steipete
@gonesurfing
Copy link

gonesurfing commented Feb 14, 2026

Thanks for merging this. Confirmed working.

mverrilli pushed a commit to mverrilli/openclaw that referenced this pull request Feb 14, 2026
…ing (openclaw#11853)

Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 0a723f9
Co-authored-by: BrokenFinger98 <[email protected]>
Co-authored-by: steipete <[email protected]>
Reviewed-by: @steipete
GwonHyeok pushed a commit to learners-superpumped/openclaw that referenced this pull request Feb 15, 2026
…ing (openclaw#11853)

Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 0a723f9
Co-authored-by: BrokenFinger98 <[email protected]>
Co-authored-by: steipete <[email protected]>
Reviewed-by: @steipete
@BrokenFinger98
Copy link
Contributor Author

Thank you @steipete for merging this and for the kind words! It's been a great experience contributing to OpenClaw.

Huge thanks to everyone who tested and provided feedback throughout this PR — @gonesurfing, @stintel, @lym000000, @jerinoommen22, and @ShadowJonathan. The community validation across 6 model families really helped ensure this was production-ready.

Looking forward to continuing to contribute!

@BrokenFinger98
Copy link
Contributor Author

@ShadowJonathan Thanks for the review suggestion about custom provider names!

The merged code already addresses your case — it checks params.model.baseUrl as the first priority, so custom provider aliases (e.g. ollama-remote instead of ollama) resolve correctly:

const modelBaseUrl =
  typeof params.model.baseUrl === "string" ? params.model.baseUrl.trim() : "";
const providerBaseUrl =
  typeof providerConfig?.baseUrl === "string" ? providerConfig.baseUrl.trim() : "";
const ollamaBaseUrl = modelBaseUrl || providerBaseUrl || OLLAMA_NATIVE_BASE_URL;

Resolution order: model.baseUrlproviderConfig.baseUrlOLLAMA_NATIVE_BASE_URL

Could you confirm it works on your setup with the merged version? If you're still seeing issues, happy to look into it further.

@ShadowJonathan
Copy link

It works with my setup, yes

cloud-neutral pushed a commit to cloud-neutral-toolkit/openclawbot.svc.plus that referenced this pull request Feb 15, 2026
…ing (openclaw#11853)

Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 0a723f9
Co-authored-by: BrokenFinger98 <[email protected]>
Co-authored-by: steipete <[email protected]>
Reviewed-by: @steipete
jiulingyun added a commit to jiulingyun/openclaw-cn that referenced this pull request Feb 15, 2026
@anilkumar-info
Copy link

HI all I am a newbie so can someone please let me know step by step how to make ollama work without the 4K context window even after doing the steps seen in google, chatgpt etc.

@BrokenFinger98
Copy link
Contributor Author

@anilkumar-info Welcome! This PR actually solves the 4K context window issue you're hitting. Here's how to set it up:

1. Update OpenClaw to the latest version

npm install -g @anthropics/openclaw@latest

2. Configure Ollama as a native provider

Add this to your ~/.openclaw/openclaw.json:

{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://127.0.0.1:11434",
        "api": "ollama",
        "apiKey": "ollama-local",
        "models": [
          {
            "id": "YOUR_MODEL_NAME",
            "name": "Your Model",
            "reasoning": false,
            "input": ["text"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 131072,
            "maxTokens": 8192
          }
        ]
      }
    }
  }
}

Replace YOUR_MODEL_NAME with your Ollama model (e.g. qwen3:32b, llama3.2:3b, etc.).

Why this works

The key setting is "api": "ollama" (not "openai-completions"). This tells OpenClaw to use the native Ollama API (/api/chat) instead of the OpenAI-compatible layer (/v1/chat/completions).

With the native API, OpenClaw automatically sets num_ctx to your configured contextWindow value (e.g. 131072), overriding Ollama's default 4096.

3. Set your default model

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/YOUR_MODEL_NAME"
      }
    }
  }
}

That's it! Tool calling should also work reliably with this setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling docs Improvements or additions to documentation size: L

Projects

None yet

10 participants

Comments