Skip to content

Heartbeat can enter unbounded tool-call loops (despite no-tools contract), causing extreme token burn #21597

@ProfessahX

Description

@ProfessahX

Bug Report: Heartbeat Tool-Loop Burned Tens of Millions of Tokens

Title

Heartbeat can enter unbounded tool-call loops (despite "no tools" contract), causing extreme token burn

Summary

On February 17, 2026 (UTC), a heartbeat-only session consumed 47,659,109 assistant tokens in one day while repeatedly calling session_status instead of returning a single-line HEARTBEAT_OK/ALERT response.

This is consistent with a prior incident on February 15, 2026, where a heartbeat loop consumed 117,299,323 assistant tokens in a single session backup.

Environment

  • OpenClaw runtime reported by session_status: OpenClaw 2026.2.12 (f9e444d)
  • Current local source branch/head inspected: main @ 80abb5ab9
  • Heartbeat config (~/.openclaw/openclaw.json):
    • agents.defaults.heartbeat.session = "agent:main:heartbeat-safe"
    • heartbeat prompt explicitly says: "Never emit tool calls"
    • tools.profile = "minimal" (which allows session_status)
  • Workspace heartbeat contract (~/.openclaw/workspace/HEARTBEAT.md) also forbids tools

Expected Behavior

Heartbeat runs should obey hard no-tool behavior and return only:

  • HEARTBEAT_OK, or
  • ALERT: <single sentence issue>

Actual Behavior

Heartbeat repeatedly invoked tools and retried through error states inside single runs, including:

  • Model "... is not allowed."
  • Unknown sessionKey/sessionId ...
  • Agent-to-agent status is disabled...

Impact

  • Severe token/cost blowups
  • Long-running blocked heartbeat lanes (multiple runs near 10-minute timeout)
  • Operational noise and potential queue starvation risk

Related (not duplicate)

Evidence

Incident A: Feb 17 session

File: ~/.openclaw/agents/main/sessions/7e3de49e-72c4-459e-a88c-4230a62e3c2d.jsonl

  • assistant_tokens = 47,659,109
  • assistant_tool_calls = 779
  • stopReason[toolUse] = 765
  • stopReason[stop] = 18
  • Tool distribution: session_status = 779

Largest per-heartbeat turns:

  • 2026-02-17T08:43:53.618Z -> 13,513,404 tokens
  • 2026-02-17T09:13:53.750Z -> 16,834,197 tokens
  • 2026-02-17T09:43:53.689Z -> 16,043,178 tokens

Runtime log confirms near-timeout runs:

  • /tmp/openclaw/openclaw-2026-02-16.log contains heartbeat lane durations 600066ms, 600118ms, 512977ms

Incident B: Feb 15 backup

File: ~/.openclaw/incident-backups/20260215-heartbeat-loop/669bd3e3-b1e6-4af9-9c5d-e73e08049867.jsonl

  • assistant_tokens = 117,299,323
  • assistant_tool_calls = 1,855
  • stopReason[toolUse] = 1849
  • Top tools: cron = 923, gateway = 917

Suspected Root Cause

The no-tool heartbeat contract appears to be prompt-only, not runtime-enforced:

  1. Heartbeat path calls getReplyFromConfig(..., { isHeartbeat: true }) without hard tool disable (src/infra/heartbeat-runner.ts:545).
  2. Agent execution forwards heartbeat runs into embedded runner without disableTools: true (src/auto-reply/reply/agent-runner-execution.ts:264).
  3. Embedded runner already supports disableTools?: boolean (src/agents/pi-embedded-runner/run/params.ts:68) but heartbeat does not use it.
  4. System prompt still contains: "If you need current date/time, run session_status" (src/agents/system-prompt.ts:468), which conflicts with heartbeat no-tools policy.
  5. Minimal tool profile explicitly allows session_status (src/agents/tool-policy.ts:67).
  6. Default timeout is 600 seconds (src/agents/timeout.ts:3), allowing large burns before abort.

Proposed Fixes

  1. Enforce heartbeat runtime hard guard: disableTools: true by default for heartbeat runs.
  2. Add heartbeat safety budget:
    • max tool calls per heartbeat run (prefer 0)
    • max token ceiling per heartbeat run
    • fast abort on repeated identical tool errors
  3. Suppress or override session_status guidance when isHeartbeat=true.
  4. Optionally isolate/rotate heartbeat session to avoid contaminated history effects.

Regression Tests Requested

  1. Heartbeat no-tool contract test: heartbeat prompts cannot emit tool calls.
  2. Repeated tool-error loop test: run aborts early with bounded tokens/time.
  3. Prompt conflict test: heartbeat mode does not inherit generic "run session_status" guidance.
  4. Session contamination test: prior tool errors do not trigger repeated retry loops across heartbeat runs.

Agent-Signoff: Clawdius-Shellesar

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions