-
-
Notifications
You must be signed in to change notification settings - Fork 69.4k
Heartbeat can enter unbounded tool-call loops (despite no-tools contract), causing extreme token burn #21597
Description
Bug Report: Heartbeat Tool-Loop Burned Tens of Millions of Tokens
Title
Heartbeat can enter unbounded tool-call loops (despite "no tools" contract), causing extreme token burn
Summary
On February 17, 2026 (UTC), a heartbeat-only session consumed 47,659,109 assistant tokens in one day while repeatedly calling session_status instead of returning a single-line HEARTBEAT_OK/ALERT response.
This is consistent with a prior incident on February 15, 2026, where a heartbeat loop consumed 117,299,323 assistant tokens in a single session backup.
Environment
- OpenClaw runtime reported by
session_status:OpenClaw 2026.2.12 (f9e444d) - Current local source branch/head inspected:
main@80abb5ab9 - Heartbeat config (
~/.openclaw/openclaw.json):agents.defaults.heartbeat.session = "agent:main:heartbeat-safe"- heartbeat prompt explicitly says: "Never emit tool calls"
tools.profile = "minimal"(which allowssession_status)
- Workspace heartbeat contract (
~/.openclaw/workspace/HEARTBEAT.md) also forbids tools
Expected Behavior
Heartbeat runs should obey hard no-tool behavior and return only:
HEARTBEAT_OK, orALERT: <single sentence issue>
Actual Behavior
Heartbeat repeatedly invoked tools and retried through error states inside single runs, including:
Model "... is not allowed."Unknown sessionKey/sessionId ...Agent-to-agent status is disabled...
Impact
- Severe token/cost blowups
- Long-running blocked heartbeat lanes (multiple runs near 10-minute timeout)
- Operational noise and potential queue starvation risk
Related (not duplicate)
- the tokens got burned by dragging a huge context forward #1594 describes context carry-over token burn in normal sessions (different failure path).
- fix: stop LLM retry loop when browser control service is unavailable #17673 fixed a browser-tool retry loop (same class of unbounded retry behavior, different subsystem).
Evidence
Incident A: Feb 17 session
File: ~/.openclaw/agents/main/sessions/7e3de49e-72c4-459e-a88c-4230a62e3c2d.jsonl
assistant_tokens = 47,659,109assistant_tool_calls = 779stopReason[toolUse] = 765stopReason[stop] = 18- Tool distribution:
session_status = 779
Largest per-heartbeat turns:
2026-02-17T08:43:53.618Z->13,513,404tokens2026-02-17T09:13:53.750Z->16,834,197tokens2026-02-17T09:43:53.689Z->16,043,178tokens
Runtime log confirms near-timeout runs:
/tmp/openclaw/openclaw-2026-02-16.logcontains heartbeat lane durations600066ms,600118ms,512977ms
Incident B: Feb 15 backup
File: ~/.openclaw/incident-backups/20260215-heartbeat-loop/669bd3e3-b1e6-4af9-9c5d-e73e08049867.jsonl
assistant_tokens = 117,299,323assistant_tool_calls = 1,855stopReason[toolUse] = 1849- Top tools:
cron = 923,gateway = 917
Suspected Root Cause
The no-tool heartbeat contract appears to be prompt-only, not runtime-enforced:
- Heartbeat path calls
getReplyFromConfig(..., { isHeartbeat: true })without hard tool disable (src/infra/heartbeat-runner.ts:545). - Agent execution forwards heartbeat runs into embedded runner without
disableTools: true(src/auto-reply/reply/agent-runner-execution.ts:264). - Embedded runner already supports
disableTools?: boolean(src/agents/pi-embedded-runner/run/params.ts:68) but heartbeat does not use it. - System prompt still contains: "If you need current date/time, run session_status" (
src/agents/system-prompt.ts:468), which conflicts with heartbeat no-tools policy. - Minimal tool profile explicitly allows
session_status(src/agents/tool-policy.ts:67). - Default timeout is 600 seconds (
src/agents/timeout.ts:3), allowing large burns before abort.
Proposed Fixes
- Enforce heartbeat runtime hard guard:
disableTools: trueby default for heartbeat runs. - Add heartbeat safety budget:
- max tool calls per heartbeat run (prefer
0) - max token ceiling per heartbeat run
- fast abort on repeated identical tool errors
- max tool calls per heartbeat run (prefer
- Suppress or override
session_statusguidance whenisHeartbeat=true. - Optionally isolate/rotate heartbeat session to avoid contaminated history effects.
Regression Tests Requested
- Heartbeat no-tool contract test: heartbeat prompts cannot emit tool calls.
- Repeated tool-error loop test: run aborts early with bounded tokens/time.
- Prompt conflict test: heartbeat mode does not inherit generic "run session_status" guidance.
- Session contamination test: prior tool errors do not trigger repeated retry loops across heartbeat runs.
Agent-Signoff: Clawdius-Shellesar