[Bug]: Default `bootstrapMaxChars=20000` + verbose auto-generated bootstrap content degrades tool dispatch on small/mid models

### Bug type

Behavior bug (incorrect output/state without crash)

### Beta release blocker

No

### Summary

OpenClaw's default `agents.defaults.bootstrapMaxChars: 20000` combined with the verbose content of the auto-generated workspace bootstrap files (~13 KB of content across AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, HEARTBEAT.md, BOOTSTRAP.md) produces a system prompt totaling ~27 KB chars by the time framework-level guidance is added; on small/mid models (≤14B params) this exceeds the prompt-length threshold beyond which tool-use instruction is deprioritized, causing the agent to hallucinate tool calls (produce plausible-sounding text describing tool use) rather than actually invoking the structured tools.


### Steps to reproduce

1. Install OpenClaw 2026.4.9 standalone on a host with a small/mid model serving locally (the symptom reproduces broadly; we tested Hermes-3-Llama-3.1-8B on vLLM with `--tool-call-parser hermes`).
2. Run `openclaw doctor --fix` to bootstrap default config + auto-generate the workspace files (default behavior).
3. Configure inference at the small/mid model (any local-inference setup; we used vLLM at `http://127.0.0.1:8002/v1`).
4. Restart gateway and run a tool-warranted prompt:
   ```
   openclaw agent --session-id repro -m "Fetch https://example.com and summarize the page in one sentence." --json
   ```
5. Inspect `result.meta.systemPromptReport.systemPrompt.chars` and the session jsonl at `~/.openclaw/agents/main/sessions/<sessionId>.jsonl` for `toolCall` events.


### Expected behavior


For local-inference users on small/mid models (the most common consumer-GPU configuration), the default bootstrap configuration should produce a system prompt that fits within the prompt-length threshold for reliable tool-use instruction following on those models (~2,000–4,000 chars per AGENTIF benchmark and Runyard 2026 best-practices guidance). Either:
- (a) Lower the default `bootstrapMaxChars` to ~1,500–2,000, OR
- (b) Trim the auto-generated bootstrap content (especially AGENTS.md at 7,809 chars) to be substantially shorter while preserving load-bearing instructions, OR
- (c) Add a `bootstrapTier` config knob (per open issue #22438 / PR #22439) defaulting to `minimal` for new installs and surfacing `standard | full` as opt-in for users on capable models, OR
- (d) Have `openclaw doctor` emit a warning when the configured primary model is small/mid AND the system prompt exceeds the recommended threshold, with a one-command fix.


### Actual behavior


With OpenClaw defaults, the systemPromptReport reports `systemPrompt.chars: 27345` for any agent run. Breakdown: `projectContextChars: 13465` (the 7 workspace bootstrap files) + `nonProjectContextChars: 13880` (framework-supplied tool/agent-shell guidance). On Hermes-3-Llama-3.1-8B with this prompt, tool dispatch silently fails: the agent produces plausible-sounding text describing tool use (e.g. "I searched memory for X but found no matches", "The fetched page at https://example.com appears to be an example domain") with zero `toolCall` events in the session jsonl and zero tool/fetch/invoke markers in the gateway log. The user sees what looks like a correct response and may not realize tools aren't actually firing.

Setting `agents.defaults.bootstrapMaxChars: 1500` and `bootstrapTotalMaxChars: 12000` (no other changes) drops `systemPrompt.chars` to ~16 KB and restores real tool dispatch — wire-level confirmed: 1 `toolCall` + 1 `toolResult` for single-tool prompts, 3 `toolCall` + 3 `toolResult` for chained 3-step prompts.


### OpenClaw version

2026.4.9 (build 0512059)

### Operating system

Ubuntu 24.04 LTS aarch64 (Linux 6.17.0-1014-nvidia)

### Install method

npm global (`npm install -g openclaw@2026.4.9`), Node v22.22.2 via nvm

### Model

NousResearch/Hermes-3-Llama-3.1-8B (also reproduced on Qwen3-14B with default thinking + Mistral-7B-Instruct-v0.3, where the same prompt-bloat pressure manifests in different but equally broken failure modes — Qwen3 produces empty payloads via reasoning_content split, Mistral echoes the system prompt back as its response)

### Provider / routing chain

openclaw (standalone host gateway) → vLLM (http://127.0.0.1:8002/v1) → small/mid local model

### Additional provider/model setup details

Standalone OpenClaw on a host (no NemoClaw sandbox). vLLM 0.19.1 Docker container at `:8002` with `--enable-auto-tool-choice --tool-call-parser hermes --gpu-memory-utilization 0.20 --max-model-len 32768`. ~14 GB GPU resident. Gateway in `local` mode, primary model `inference/hermes-3-llama-3.1-8b`.


### Logs, screenshots, and evidence

```shell
Default-config systemPromptReport (excerpt):

{
  "systemPrompt": {
    "chars": 27345,
    "projectContextChars": 13465,
    "nonProjectContextChars": 13880
  },
  "injectedWorkspaceFiles": [
    {"name": "AGENTS.md",   "rawChars": 7809, "injectedChars": 7809, "truncated": false},
    {"name": "SOUL.md",     "rawChars": 1738, "injectedChars": 1738, "truncated": false},
    {"name": "TOOLS.md",    "rawChars":  850, "injectedChars":  850, "truncated": false},
    {"name": "IDENTITY.md", "rawChars":  633, "injectedChars":  633, "truncated": false},
    {"name": "USER.md",     "rawChars":  474, "injectedChars":  474, "truncated": false},
    {"name": "HEARTBEAT.md","rawChars":  192, "injectedChars":  192, "truncated": false},
    {"name": "BOOTSTRAP.md","rawChars": 1450, "injectedChars": 1450, "truncated": false}
  ]
}


Default config + tool-warranted prompt — gateway log delta during the agent run:

$ tail -c 6321 /tmp/openclaw/openclaw-2026-04-30.log | grep -iE 'tool_call|tool.*name|web_fetch|fetch.*url|invoke|dispatch'
(empty — no tool dispatch markers)


Default config — session jsonl event sequence:

L5  message  role=user
L6  message  role=assistant  ctypes=['text']  text="The fetched page at https://example.com appears to be an example domain..."
                                              ← hallucinated; example.com is famously a placeholder, model knew this without fetching


After lowering `bootstrapMaxChars` to 1500 (no other changes) — same prompt, same model, same vLLM, same parser:

L5  message  role=user
L6  message  role=assistant  ctypes=['toolCall']
L7  message  role=toolResult  ctypes=['text']  text='{"url":"https://example.com","status":200,"contentType":"text/html",...}'
L8  message  role=assistant  ctypes=['text']  text="The fetched page at https://example.com is a security notice..."
                                              ← real summary of real fetched HTML body


Public-domain validation (research showing prompt length above ~2,000 words degrades tool-following on small/mid models):
- "Writing System Prompts for AI Agents: Best Practices for 2026" (Runyard) — "Prompts over 2,000 words tend to produce agents that follow early instructions well and ignore later ones."
- AGENTIF benchmark (Tsinghua KEG) — quantifies "performance degradation as instruction length increases."
- BFCL v3 leaderboard 2026 — Qwen3 8B at F1 0.933, Hermes-3 mid-tier; performance is sensitive to prompt shape.
```

### Impact and severity

Affected: every OpenClaw user running a small/mid local model (the most common consumer-GPU configuration) who relies on tool dispatch. Cloud-API users on capable models (Sonnet 4.6, Opus 4.7, GPT-5.4) generally don't hit this because those models tolerate long prompts well — but local-inference users on Llama-3.1 / Qwen3 / Hermes / Mistral / similar 7B–14B models do. This is a growing user segment (the entire "self-host on consumer GPU" cohort).

Severity: medium-high. The failure mode is *silent and confidence-inducing*: hallucinated tool replies often sound correct (Hermes-3 8B's example.com summary read like a real summary), so users may not realize tools aren't actually firing for hours or days. Real-world consequences are particularly bad for action-taking agents (skills that send messages, modify files, execute commands) — the agent claims success while doing nothing, or worse, claims a fabricated success that the user acts on downstream.

Frequency: deterministic on default config + small/mid model.

Consequence: silent erosion of tool-call reliability across the local-inference user base. Combined with the surface-area of the issue (any tool call, on any small/mid model), this is the kind of bug that quietly drives users away from the framework or onto larger (cloud-only) models even when their local hardware would be sufficient if the prompt were lean.


### Additional information

Related issues:
- **#42084 (closed COMPLETED 2026-04-24)** "Agent silently fails to reply when workspace bootstrap file exceeds bootstrapMaxChars limit" — **adjacent but distinct failure mode**. #42084 was the *zero-payload silent fail* when truncation broke the message sequence (orphaned-user-turn dropped). The shipped fix (per steipete's closing comment): bootstrap truncation warnings appended to current-turn prompt, orphaned-user-turn repair, regression coverage that the run still returns payloads. **Our case is different**: with the default `bootstrapMaxChars: 20000`, no truncation occurs (each file is under the cap individually); the system prompt is just naturally large at ~27K chars, and the resulting tool-dispatch *degradation* (model produces hallucinated tool-use plausibility text instead of structured `toolCall` events) is not addressed by #42084's fix. The shipped warnings + payload-still-returns guarantees are necessary but not sufficient for tool-use reliability on small/mid models.
- **#22438 (open)** "feat: Tiered bootstrap file loading for progressive context control" — directly addresses this with `bootstrapTier: minimal | standard | full`. PR #22439 is the implementation candidate; landing it would resolve this issue if `minimal` becomes the default for new installs (or if `doctor` selects an appropriate default based on detected model capability).
- **#41304 (open)** "Agent refuses to invoke write/action tools, hallucinates success" — same failure shape, kinthaiofficial 2026-04-28 root-cause comment is verbatim aligned with what we observed.
- **#66060 (open)** "active-memory: bloated prompt — full agent system prompt included in memory search context" — adjacent; compounds the issue when `memory_search` fires inside an already-bloated session.
- **#69966 (open)** "Feature request: per-agent bootstrap profile support" — adjacent feature request for tier-like control at the per-agent level rather than gateway-wide.
- **#08 in this filing batch** — "Auto-generated AGENTS.md puts load-bearing tool-use rules at the bottom" — orthogonal but same root cause (prompt construction); fixing both together gives the cleanest user experience.

Suggested fixes (any one materially helps; combining them is best):
1. **Lower default `bootstrapMaxChars` to ~1,500–2,000.** Backwards-compatible (existing users with explicit settings unaffected). One-line change to the schema default.
2. **Trim the auto-generated AGENTS.md content** to ~1,500 chars, focused on tool-use rules + red-lines. (Synergizes with #08.)
3. **Land PR #22439** (#22438) with `bootstrapTier: minimal` as the default for new installs.
4. **Add a `doctor` warning** when `systemPrompt.chars > 8000` AND configured primary model is in the small-mid class (≤14B params). Suggest the trim + link to the bootstrap-tier docs.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Default `bootstrapMaxChars=20000` + verbose auto-generated bootstrap content degrades tool dispatch on small/mid models #75189

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Default bootstrapMaxChars=20000 + verbose auto-generated bootstrap content degrades tool dispatch on small/mid models #75189

Description

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: Default `bootstrapMaxChars=20000` + verbose auto-generated bootstrap content degrades tool dispatch on small/mid models #75189