Skip to content

[Bug]: 4096 token hard cap on input when using Ollama local models - conversation history never passed to mode #27278

@sandeepmamidi

Description

@sandeepmamidi

Summary

When using OpenClaw with Ollama local models, every single LLM call is capped at exactly 4096 input tokens regardless of the configured contextWindow. This means the model never receives conversation history, causing it to forget everything said earlier in the same conversation.

Steps to reproduce

Install Ollama with any local model (tested with qwen3:8b, qwen3-coder, glm-4.7-flash)
Configure OpenClaw with Ollama provider:
json
{
"agents": {
"defaults": {
"model": {
"primary": "ollama/qwen3:8b"
}
}
},
"models": {
"providers": {
"ollama": {
"api": "openai-completions",
"apiKey": "ollama-local",
"baseUrl": "http://127.0.0.1:11434/v1",
"models": [
{
"id": "qwen3:8b",
"contextWindow": 131072,
"maxTokens": 16384
}
]
}
}
}
}
Connect any channel (tested with Telegram)
Start a conversation and tell the assistant your name:
User: my name is Sandeep
Assistant responds acknowledging the name
Ask the assistant your name a few messages later:
User: what is my name?
Assistant says it doesn't know or forgets
Check the JSONL session file:
bash
cat ~/.openclaw/agents/main/sessions/.jsonl | grep '"input"'
Observe every single message shows exactly "input":4096 regardless of conversation length
Expected result at step 8: Input tokens should grow with each message as conversation history accumulates, up to the configured contextWindow of 131072.
Actual result at step 8: Every message is exactly "input":4096 — hard capped, conversation history never included.
Quick verification Ollama itself is NOT the issue:
bash
curl http://localhost:11434/api/chat -d '{
"model": "qwen3:8b",
"messages": [
{"role": "user", "content": "my name is Sandeep"},
{"role": "assistant", "content": "Nice to meet you Sandeep!"},
{"role": "user", "content": "what is my name?"}
]
}'
Ollama correctly returns "Your name is Sandeep" — proving the 4096 cap is introduced by OpenClaw, not Ollama.

Expected behavior

Full conversation history should be passed to the model up to the configured contextWindow limit.

Actual behavior

Every call is hard capped at 4096 input tokens. Model has no memory of anything said earlier in the conversation.

OpenClaw version

OpenClaw version: 2026.2.24

Operating system

Ubuntu 22.04

Install method

Ollama launch openclaw

Logs, screenshots, and evidence

Every single message shows exactly 4096 input tokens regardless of conversation length:
json{"role":"assistant","content":[...],"api":"openai-completions","provider":"ollama","model":"qwen3:8b","usage":{"input":4096,"output":270,"cacheRead":0,"cacheWrite":0,"totalTokens":4366}}
{"role":"assistant","content":[...],"api":"openai-completions","provider":"ollama","model":"qwen3:8b","usage":{"input":4096,"output":147,"cacheRead":0,"cacheWrite":0,"totalTokens":4243}}
{"role":"assistant","content":[...],"api":"openai-completions","provider":"ollama","model":"qwen3:8b","usage":{"input":4096,"output":308,"cacheRead":0,"cacheWrite":0,"totalTokens":4404}}

Impact and severity

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions