[Bug]: reasoning.effort not forwarded to Ollama — only minimal thinking despite thinking=high

OpenClaw logs show `thinking=high` for embedded runs, but the `reasoning.effort` parameter is not being forwarded to Ollama's `/v1/responses` API. The model produces only minimal thinking (1 sentence, ~30 tokens) instead of deep chain-of-thought reasoning (200+ tokens). Direct API calls to the same Ollama endpoint with `"reasoning": {"effort": "high"}` produce full reasoning output.

## Environment

- **OpenClaw version:** 2026.2.9
- **Ollama version:** 0.15.6
- **Model:** `glm-4.7-flash-thinking` (custom Modelfile with `$.IsThinkSet`/`$.Think` template, based on `glm-4.7-flash:latest`)
- **API mode:** `openai-responses`
- **Channel:** Nextcloud Talk
- **OS:** Ubuntu 24.04 (VM)

## Config (relevant parts)

```json
{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://ollama.internal:11434/v1",
        "api": "openai-responses",
        "models": [{
          "id": "glm-4.7-flash-thinking",
          "reasoning": true,
          "contextWindow": 128000
        }]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": { "primary": "ollama/glm-4.7-flash-thinking" },
      "thinkingDefault": "high"
    }
  }
}
```

## Steps to Reproduce

1. Configure Ollama provider with `"api": "openai-responses"` and a reasoning-capable model
2. Set `agents.defaults.thinkingDefault` to `high`
3. Set `reasoning: true` on the model entry
4. Send a message via Nextcloud Talk (or any channel)
5. Observe run duration and session history

## Expected Behavior

Runs should take 2-10 seconds with detailed chain-of-thought reasoning (200+ tokens). The `reasoning.effort: "high"` parameter should be included in the HTTP request to Ollama's `/v1/responses` endpoint.

## Actual Behavior

Gateway logs show `thinking=high`, but runs complete in ~400ms with only minimal thinking output.

**Gateway logs:**
```
embedded run start: model=glm-4.7-flash-thinking thinking=high messageChannel=nextcloud-talk
embedded run done: durationMs=402 aborted=false
```

**Session history shows minimal thinking (1 sentence instead of detailed reasoning):**
```json
{
  "type": "thinking",
  "thinking": "Manuel fragt nach 9 mal 20. Das ist 180. Ich antworte kurz auf Deutsch."
}
// usage: {"input": 9429, "output": 39}
```

This minimal thinking comes from the model's `<think>` template tags, not from the `reasoning.effort` API parameter. The effort level is not reaching Ollama.

## Proof: Ollama works correctly when called directly

### /v1/responses (same endpoint OpenClaw uses)

```bash
curl http://ollama.internal:11434/v1/responses -d '{
  "model": "glm-4.7-flash-thinking",
  "input": "Was ist 15 * 37?",
  "reasoning": {"effort": "high"},
  "stream": false
}'
# Result: 313 tokens (290 reasoning, multi-step with verification), 2.0 seconds
```

### /v1/chat/completions

```bash
curl http://ollama.internal:11434/v1/chat/completions -d '{
  "model": "glm-4.7-flash-thinking",
  "messages": [{"role": "user", "content": "Was ist 15 * 37?"}],
  "stream": false
}'
# Result: reasoning field with 289 tokens, 1.86 seconds
```

Both Ollama API endpoints produce detailed reasoning output when called directly. The `reasoning.effort` parameter is not reaching Ollama through OpenClaw.

## Comparison

| Source | Thinking | Output tokens | Duration |
|--------|----------|---------------|----------|
| Ollama direct (effort: high) | Detailed, 3 methods, verification | 290+ | 2-10s |
| OpenClaw (thinking=high) | 1 sentence, no detail | ~30 | ~400ms |

## Root Cause Analysis

The `reasoning.effort` parameter is not being included in the HTTP request body sent to Ollama, despite being configured and tracked internally by OpenClaw.

### The call chain and where it breaks

1. `thinkingDefault: "high"` → stored in `sessionEntry.thinkingLevel`
2. `mapThinkingLevel()` in `src/agents/pi-embedded-runner/utils.ts` passes "high" through
3. `Agent._runLoop()` in `packages/agent/src/agent.ts` sets `reasoning: "high"` in AgentLoopConfig
4. `streamSimpleOpenAIResponses()` in `packages/ai/src/providers/openai-responses.ts` should map to `reasoningEffort: "high"` — **likely broken here**?
5. `buildParams()` in the same file gates on `model.reasoning` AND `options.reasoningEffort`:
   ```typescript
   if (model.reasoning) {
       if (options?.reasoningEffort || options?.reasoningSummary) {
           params.reasoning = {
               effort: options?.reasoningEffort || "medium",
           };
       }
   }
   ```

Since we observe *some* thinking (from template `<think>` tags), `model.reasoning` is likely `true` (set via user config). But `options.reasoningEffort` appears to be `undefined` when it reaches `buildParams()`, so the inner if-block is never entered and `params.reasoning` is never added to the request.

### Two possible locations for the mapping failure

**Location A: `src/agents/pi-embedded-runner/model.ts` (~line 180)**

The fallback model construction for Ollama models not in the built-in catalog hardcodes `reasoning: false`:

```typescript
const fallbackModel: Model<Api> = normalizeModelCompat({
  reasoning: false,  // ← ignores user config
});
```

If the fallback path is used instead of the inline provider path, this blocks the entire reasoning block in `buildParams()`. However, the minimal thinking we observe suggests the inline path IS being used and `model.reasoning` is `true`.

**Location B: Thinking-to-effort mapping (between steps 3-4)**

Even with `model.reasoning: true`, the mapping from OpenClaw's internal `thinkingLevel: "high"` to the Responses API's `reasoningEffort: "high"` may be broken for Ollama/custom providers. The `reasoningEffort` value arrives as `undefined` at `buildParams()`, so the effort parameter is never serialized into the HTTP request.

### Secondary gate: `src/agents/model-selection.ts`

`resolveThinkingDefault()` checks `candidate?.reasoning` from the **model catalog**, not from the inline provider config. For unknown Ollama models this defaults to `"off"`, which is why `agents.defaults.thinkingDefault: "high"` is also needed in the config.

## Suggested Fix

In `src/agents/pi-embedded-runner/model.ts`, the generic fallback in `resolveModel()` should propagate all inline provider config properties instead of hardcoding defaults:

```diff
+const matchedModel = (providerCfg?.models ?? []).find(m => m.id === modelId);
+
 const fallbackModel: Model<Api> = normalizeModelCompat({
   id: modelId,
   name: modelId,
   api: providerCfg?.api ?? "openai-responses",
   provider,
   baseUrl: providerCfg?.baseUrl,
-  reasoning: false,
+  reasoning: matchedModel?.reasoning ?? false,
-  input: ["text"],
+  input: matchedModel?.input ?? ["text"],
-  cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+  cost: matchedModel?.cost ?? { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
-  contextWindow: providerCfg?.models?.[0]?.contextWindow ?? DEFAULT_CONTEXT_TOKENS,
+  contextWindow: matchedModel?.contextWindow ?? DEFAULT_CONTEXT_TOKENS,
-  maxTokens: providerCfg?.models?.[0]?.maxTokens ?? DEFAULT_CONTEXT_TOKENS,
+  maxTokens: matchedModel?.maxTokens ?? DEFAULT_CONTEXT_TOKENS,
 } as Model<Api>);
```

This ensures that user-configured model properties (especially `reasoning: true`) are respected in the fallback path, instead of being silently overridden.

### Why the inline match at step 2 may fail

The inline model match in `resolveModel()` uses a strict comparison:

```typescript
const inlineMatch = inlineModels.find(
  (entry) => normalizeProviderId(entry.provider) === normalizedProvider && entry.id === modelId,
);
```

If provider normalization causes a mismatch (e.g., `"ollama"` vs `"Ollama"` or prefix handling), step 2 fails silently and the fallback at step 5 fires with `reasoning: false`. The suggested fix catches this case by looking up the model properties directly from `providerCfg.models`.

## Possibly related issues

- #4028 (num_ctx not passed via OpenAI-compatible API — same pattern of provider config being ignored in fallback)
- #8505 / #2279 (Ollama models return no output)
- Discussion #4332 (think value "low" not supported)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: reasoning.effort not forwarded to Ollama — only minimal thinking despite thinking=high #13575

Environment

Config (relevant parts)

Steps to Reproduce

Expected Behavior

Actual Behavior

Proof: Ollama works correctly when called directly

/v1/responses (same endpoint OpenClaw uses)

/v1/chat/completions

Comparison

Root Cause Analysis

The call chain and where it breaks

Two possible locations for the mapping failure

Secondary gate: `src/agents/model-selection.ts`

Suggested Fix

Why the inline match at step 2 may fail

Possibly related issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Source	Thinking	Output tokens	Duration
Ollama direct (effort: high)	Detailed, 3 methods, verification	290+	2-10s
OpenClaw (thinking=high)	1 sentence, no detail	~30	~400ms

Uh oh!

[Bug]: reasoning.effort not forwarded to Ollama — only minimal thinking despite thinking=high #13575

Description

Environment

Config (relevant parts)

Steps to Reproduce

Expected Behavior

Actual Behavior

Proof: Ollama works correctly when called directly

/v1/responses (same endpoint OpenClaw uses)

/v1/chat/completions

Comparison

Root Cause Analysis

The call chain and where it breaks

Two possible locations for the mapping failure

Secondary gate: src/agents/model-selection.ts

Suggested Fix

Why the inline match at step 2 may fail

Possibly related issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Secondary gate: `src/agents/model-selection.ts`