-
-
Notifications
You must be signed in to change notification settings - Fork 69.6k
[Bug]: 4096 token hard cap on input when using Ollama local models - conversation history never passed to mode #27278
Description
Summary
When using OpenClaw with Ollama local models, every single LLM call is capped at exactly 4096 input tokens regardless of the configured contextWindow. This means the model never receives conversation history, causing it to forget everything said earlier in the same conversation.
Steps to reproduce
Install Ollama with any local model (tested with qwen3:8b, qwen3-coder, glm-4.7-flash)
Configure OpenClaw with Ollama provider:
json
{
"agents": {
"defaults": {
"model": {
"primary": "ollama/qwen3:8b"
}
}
},
"models": {
"providers": {
"ollama": {
"api": "openai-completions",
"apiKey": "ollama-local",
"baseUrl": "http://127.0.0.1:11434/v1",
"models": [
{
"id": "qwen3:8b",
"contextWindow": 131072,
"maxTokens": 16384
}
]
}
}
}
}
Connect any channel (tested with Telegram)
Start a conversation and tell the assistant your name:
User: my name is Sandeep
Assistant responds acknowledging the name
Ask the assistant your name a few messages later:
User: what is my name?
Assistant says it doesn't know or forgets
Check the JSONL session file:
bash
cat ~/.openclaw/agents/main/sessions/.jsonl | grep '"input"'
Observe every single message shows exactly "input":4096 regardless of conversation length
Expected result at step 8: Input tokens should grow with each message as conversation history accumulates, up to the configured contextWindow of 131072.
Actual result at step 8: Every message is exactly "input":4096 — hard capped, conversation history never included.
Quick verification Ollama itself is NOT the issue:
bash
curl http://localhost:11434/api/chat -d '{
"model": "qwen3:8b",
"messages": [
{"role": "user", "content": "my name is Sandeep"},
{"role": "assistant", "content": "Nice to meet you Sandeep!"},
{"role": "user", "content": "what is my name?"}
]
}'
Ollama correctly returns "Your name is Sandeep" — proving the 4096 cap is introduced by OpenClaw, not Ollama.
Expected behavior
Full conversation history should be passed to the model up to the configured contextWindow limit.
Actual behavior
Every call is hard capped at 4096 input tokens. Model has no memory of anything said earlier in the conversation.
OpenClaw version
OpenClaw version: 2026.2.24
Operating system
Ubuntu 22.04
Install method
Ollama launch openclaw
Logs, screenshots, and evidence
Every single message shows exactly 4096 input tokens regardless of conversation length:
json{"role":"assistant","content":[...],"api":"openai-completions","provider":"ollama","model":"qwen3:8b","usage":{"input":4096,"output":270,"cacheRead":0,"cacheWrite":0,"totalTokens":4366}}
{"role":"assistant","content":[...],"api":"openai-completions","provider":"ollama","model":"qwen3:8b","usage":{"input":4096,"output":147,"cacheRead":0,"cacheWrite":0,"totalTokens":4243}}
{"role":"assistant","content":[...],"api":"openai-completions","provider":"ollama","model":"qwen3:8b","usage":{"input":4096,"output":308,"cacheRead":0,"cacheWrite":0,"totalTokens":4404}}Impact and severity
No response
Additional information
No response