-
-
Notifications
You must be signed in to change notification settings - Fork 69.5k
[Bug]: reasoning.effort not forwarded to Ollama — only minimal thinking despite thinking=high #13575
Description
OpenClaw logs show thinking=high for embedded runs, but the reasoning.effort parameter is not being forwarded to Ollama's /v1/responses API. The model produces only minimal thinking (1 sentence, ~30 tokens) instead of deep chain-of-thought reasoning (200+ tokens). Direct API calls to the same Ollama endpoint with "reasoning": {"effort": "high"} produce full reasoning output.
Environment
- OpenClaw version: 2026.2.9
- Ollama version: 0.15.6
- Model:
glm-4.7-flash-thinking(custom Modelfile with$.IsThinkSet/$.Thinktemplate, based onglm-4.7-flash:latest) - API mode:
openai-responses - Channel: Nextcloud Talk
- OS: Ubuntu 24.04 (VM)
Config (relevant parts)
{
"models": {
"providers": {
"ollama": {
"baseUrl": "http://ollama.internal:11434/v1",
"api": "openai-responses",
"models": [{
"id": "glm-4.7-flash-thinking",
"reasoning": true,
"contextWindow": 128000
}]
}
}
},
"agents": {
"defaults": {
"model": { "primary": "ollama/glm-4.7-flash-thinking" },
"thinkingDefault": "high"
}
}
}Steps to Reproduce
- Configure Ollama provider with
"api": "openai-responses"and a reasoning-capable model - Set
agents.defaults.thinkingDefaulttohigh - Set
reasoning: trueon the model entry - Send a message via Nextcloud Talk (or any channel)
- Observe run duration and session history
Expected Behavior
Runs should take 2-10 seconds with detailed chain-of-thought reasoning (200+ tokens). The reasoning.effort: "high" parameter should be included in the HTTP request to Ollama's /v1/responses endpoint.
Actual Behavior
Gateway logs show thinking=high, but runs complete in ~400ms with only minimal thinking output.
Gateway logs:
embedded run start: model=glm-4.7-flash-thinking thinking=high messageChannel=nextcloud-talk
embedded run done: durationMs=402 aborted=false
Session history shows minimal thinking (1 sentence instead of detailed reasoning):
{
"type": "thinking",
"thinking": "Manuel fragt nach 9 mal 20. Das ist 180. Ich antworte kurz auf Deutsch."
}
// usage: {"input": 9429, "output": 39}This minimal thinking comes from the model's <think> template tags, not from the reasoning.effort API parameter. The effort level is not reaching Ollama.
Proof: Ollama works correctly when called directly
/v1/responses (same endpoint OpenClaw uses)
curl http://ollama.internal:11434/v1/responses -d '{
"model": "glm-4.7-flash-thinking",
"input": "Was ist 15 * 37?",
"reasoning": {"effort": "high"},
"stream": false
}'
# Result: 313 tokens (290 reasoning, multi-step with verification), 2.0 seconds/v1/chat/completions
curl http://ollama.internal:11434/v1/chat/completions -d '{
"model": "glm-4.7-flash-thinking",
"messages": [{"role": "user", "content": "Was ist 15 * 37?"}],
"stream": false
}'
# Result: reasoning field with 289 tokens, 1.86 secondsBoth Ollama API endpoints produce detailed reasoning output when called directly. The reasoning.effort parameter is not reaching Ollama through OpenClaw.
Comparison
| Source | Thinking | Output tokens | Duration |
|---|---|---|---|
| Ollama direct (effort: high) | Detailed, 3 methods, verification | 290+ | 2-10s |
| OpenClaw (thinking=high) | 1 sentence, no detail | ~30 | ~400ms |
Root Cause Analysis
The reasoning.effort parameter is not being included in the HTTP request body sent to Ollama, despite being configured and tracked internally by OpenClaw.
The call chain and where it breaks
thinkingDefault: "high"→ stored insessionEntry.thinkingLevelmapThinkingLevel()insrc/agents/pi-embedded-runner/utils.tspasses "high" throughAgent._runLoop()inpackages/agent/src/agent.tssetsreasoning: "high"in AgentLoopConfigstreamSimpleOpenAIResponses()inpackages/ai/src/providers/openai-responses.tsshould map toreasoningEffort: "high"— likely broken here?buildParams()in the same file gates onmodel.reasoningANDoptions.reasoningEffort:if (model.reasoning) { if (options?.reasoningEffort || options?.reasoningSummary) { params.reasoning = { effort: options?.reasoningEffort || "medium", }; } }
Since we observe some thinking (from template <think> tags), model.reasoning is likely true (set via user config). But options.reasoningEffort appears to be undefined when it reaches buildParams(), so the inner if-block is never entered and params.reasoning is never added to the request.
Two possible locations for the mapping failure
Location A: src/agents/pi-embedded-runner/model.ts (~line 180)
The fallback model construction for Ollama models not in the built-in catalog hardcodes reasoning: false:
const fallbackModel: Model<Api> = normalizeModelCompat({
reasoning: false, // ← ignores user config
});If the fallback path is used instead of the inline provider path, this blocks the entire reasoning block in buildParams(). However, the minimal thinking we observe suggests the inline path IS being used and model.reasoning is true.
Location B: Thinking-to-effort mapping (between steps 3-4)
Even with model.reasoning: true, the mapping from OpenClaw's internal thinkingLevel: "high" to the Responses API's reasoningEffort: "high" may be broken for Ollama/custom providers. The reasoningEffort value arrives as undefined at buildParams(), so the effort parameter is never serialized into the HTTP request.
Secondary gate: src/agents/model-selection.ts
resolveThinkingDefault() checks candidate?.reasoning from the model catalog, not from the inline provider config. For unknown Ollama models this defaults to "off", which is why agents.defaults.thinkingDefault: "high" is also needed in the config.
Suggested Fix
In src/agents/pi-embedded-runner/model.ts, the generic fallback in resolveModel() should propagate all inline provider config properties instead of hardcoding defaults:
+const matchedModel = (providerCfg?.models ?? []).find(m => m.id === modelId);
+
const fallbackModel: Model<Api> = normalizeModelCompat({
id: modelId,
name: modelId,
api: providerCfg?.api ?? "openai-responses",
provider,
baseUrl: providerCfg?.baseUrl,
- reasoning: false,
+ reasoning: matchedModel?.reasoning ?? false,
- input: ["text"],
+ input: matchedModel?.input ?? ["text"],
- cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+ cost: matchedModel?.cost ?? { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
- contextWindow: providerCfg?.models?.[0]?.contextWindow ?? DEFAULT_CONTEXT_TOKENS,
+ contextWindow: matchedModel?.contextWindow ?? DEFAULT_CONTEXT_TOKENS,
- maxTokens: providerCfg?.models?.[0]?.maxTokens ?? DEFAULT_CONTEXT_TOKENS,
+ maxTokens: matchedModel?.maxTokens ?? DEFAULT_CONTEXT_TOKENS,
} as Model<Api>);This ensures that user-configured model properties (especially reasoning: true) are respected in the fallback path, instead of being silently overridden.
Why the inline match at step 2 may fail
The inline model match in resolveModel() uses a strict comparison:
const inlineMatch = inlineModels.find(
(entry) => normalizeProviderId(entry.provider) === normalizedProvider && entry.id === modelId,
);If provider normalization causes a mismatch (e.g., "ollama" vs "Ollama" or prefix handling), step 2 fails silently and the fallback at step 5 fires with reasoning: false. The suggested fix catches this case by looking up the model properties directly from providerCfg.models.
Possibly related issues
- Ollama silently truncates context to 4096 tokens - num_ctx not passed via OpenAI-compatible API #4028 (num_ctx not passed via OpenAI-compatible API — same pattern of provider config being ignored in fallback)
- Ollama provider: model runs but produces no output #8505 / Ollama models return '(no output)' — enforceFinalTag incorrectly applied to all Ollama models #2279 (Ollama models return no output)
- Discussion #4332 (think value "low" not supported)