Skip to content

[Bug]: reasoning.effort not forwarded to Ollama — only minimal thinking despite thinking=high #13575

@manuelcherubim

Description

@manuelcherubim

OpenClaw logs show thinking=high for embedded runs, but the reasoning.effort parameter is not being forwarded to Ollama's /v1/responses API. The model produces only minimal thinking (1 sentence, ~30 tokens) instead of deep chain-of-thought reasoning (200+ tokens). Direct API calls to the same Ollama endpoint with "reasoning": {"effort": "high"} produce full reasoning output.

Environment

  • OpenClaw version: 2026.2.9
  • Ollama version: 0.15.6
  • Model: glm-4.7-flash-thinking (custom Modelfile with $.IsThinkSet/$.Think template, based on glm-4.7-flash:latest)
  • API mode: openai-responses
  • Channel: Nextcloud Talk
  • OS: Ubuntu 24.04 (VM)

Config (relevant parts)

{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://ollama.internal:11434/v1",
        "api": "openai-responses",
        "models": [{
          "id": "glm-4.7-flash-thinking",
          "reasoning": true,
          "contextWindow": 128000
        }]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": { "primary": "ollama/glm-4.7-flash-thinking" },
      "thinkingDefault": "high"
    }
  }
}

Steps to Reproduce

  1. Configure Ollama provider with "api": "openai-responses" and a reasoning-capable model
  2. Set agents.defaults.thinkingDefault to high
  3. Set reasoning: true on the model entry
  4. Send a message via Nextcloud Talk (or any channel)
  5. Observe run duration and session history

Expected Behavior

Runs should take 2-10 seconds with detailed chain-of-thought reasoning (200+ tokens). The reasoning.effort: "high" parameter should be included in the HTTP request to Ollama's /v1/responses endpoint.

Actual Behavior

Gateway logs show thinking=high, but runs complete in ~400ms with only minimal thinking output.

Gateway logs:

embedded run start: model=glm-4.7-flash-thinking thinking=high messageChannel=nextcloud-talk
embedded run done: durationMs=402 aborted=false

Session history shows minimal thinking (1 sentence instead of detailed reasoning):

{
  "type": "thinking",
  "thinking": "Manuel fragt nach 9 mal 20. Das ist 180. Ich antworte kurz auf Deutsch."
}
// usage: {"input": 9429, "output": 39}

This minimal thinking comes from the model's <think> template tags, not from the reasoning.effort API parameter. The effort level is not reaching Ollama.

Proof: Ollama works correctly when called directly

/v1/responses (same endpoint OpenClaw uses)

curl http://ollama.internal:11434/v1/responses -d '{
  "model": "glm-4.7-flash-thinking",
  "input": "Was ist 15 * 37?",
  "reasoning": {"effort": "high"},
  "stream": false
}'
# Result: 313 tokens (290 reasoning, multi-step with verification), 2.0 seconds

/v1/chat/completions

curl http://ollama.internal:11434/v1/chat/completions -d '{
  "model": "glm-4.7-flash-thinking",
  "messages": [{"role": "user", "content": "Was ist 15 * 37?"}],
  "stream": false
}'
# Result: reasoning field with 289 tokens, 1.86 seconds

Both Ollama API endpoints produce detailed reasoning output when called directly. The reasoning.effort parameter is not reaching Ollama through OpenClaw.

Comparison

Source Thinking Output tokens Duration
Ollama direct (effort: high) Detailed, 3 methods, verification 290+ 2-10s
OpenClaw (thinking=high) 1 sentence, no detail ~30 ~400ms

Root Cause Analysis

The reasoning.effort parameter is not being included in the HTTP request body sent to Ollama, despite being configured and tracked internally by OpenClaw.

The call chain and where it breaks

  1. thinkingDefault: "high" → stored in sessionEntry.thinkingLevel
  2. mapThinkingLevel() in src/agents/pi-embedded-runner/utils.ts passes "high" through
  3. Agent._runLoop() in packages/agent/src/agent.ts sets reasoning: "high" in AgentLoopConfig
  4. streamSimpleOpenAIResponses() in packages/ai/src/providers/openai-responses.ts should map to reasoningEffort: "high"likely broken here?
  5. buildParams() in the same file gates on model.reasoning AND options.reasoningEffort:
    if (model.reasoning) {
        if (options?.reasoningEffort || options?.reasoningSummary) {
            params.reasoning = {
                effort: options?.reasoningEffort || "medium",
            };
        }
    }

Since we observe some thinking (from template <think> tags), model.reasoning is likely true (set via user config). But options.reasoningEffort appears to be undefined when it reaches buildParams(), so the inner if-block is never entered and params.reasoning is never added to the request.

Two possible locations for the mapping failure

Location A: src/agents/pi-embedded-runner/model.ts (~line 180)

The fallback model construction for Ollama models not in the built-in catalog hardcodes reasoning: false:

const fallbackModel: Model<Api> = normalizeModelCompat({
  reasoning: false,  // ← ignores user config
});

If the fallback path is used instead of the inline provider path, this blocks the entire reasoning block in buildParams(). However, the minimal thinking we observe suggests the inline path IS being used and model.reasoning is true.

Location B: Thinking-to-effort mapping (between steps 3-4)

Even with model.reasoning: true, the mapping from OpenClaw's internal thinkingLevel: "high" to the Responses API's reasoningEffort: "high" may be broken for Ollama/custom providers. The reasoningEffort value arrives as undefined at buildParams(), so the effort parameter is never serialized into the HTTP request.

Secondary gate: src/agents/model-selection.ts

resolveThinkingDefault() checks candidate?.reasoning from the model catalog, not from the inline provider config. For unknown Ollama models this defaults to "off", which is why agents.defaults.thinkingDefault: "high" is also needed in the config.

Suggested Fix

In src/agents/pi-embedded-runner/model.ts, the generic fallback in resolveModel() should propagate all inline provider config properties instead of hardcoding defaults:

+const matchedModel = (providerCfg?.models ?? []).find(m => m.id === modelId);
+
 const fallbackModel: Model<Api> = normalizeModelCompat({
   id: modelId,
   name: modelId,
   api: providerCfg?.api ?? "openai-responses",
   provider,
   baseUrl: providerCfg?.baseUrl,
-  reasoning: false,
+  reasoning: matchedModel?.reasoning ?? false,
-  input: ["text"],
+  input: matchedModel?.input ?? ["text"],
-  cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+  cost: matchedModel?.cost ?? { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
-  contextWindow: providerCfg?.models?.[0]?.contextWindow ?? DEFAULT_CONTEXT_TOKENS,
+  contextWindow: matchedModel?.contextWindow ?? DEFAULT_CONTEXT_TOKENS,
-  maxTokens: providerCfg?.models?.[0]?.maxTokens ?? DEFAULT_CONTEXT_TOKENS,
+  maxTokens: matchedModel?.maxTokens ?? DEFAULT_CONTEXT_TOKENS,
 } as Model<Api>);

This ensures that user-configured model properties (especially reasoning: true) are respected in the fallback path, instead of being silently overridden.

Why the inline match at step 2 may fail

The inline model match in resolveModel() uses a strict comparison:

const inlineMatch = inlineModels.find(
  (entry) => normalizeProviderId(entry.provider) === normalizedProvider && entry.id === modelId,
);

If provider normalization causes a mismatch (e.g., "ollama" vs "Ollama" or prefix handling), step 2 fails silently and the fallback at step 5 fires with reasoning: false. The suggested fix catches this case by looking up the model properties directly from providerCfg.models.

Possibly related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions