Skip to content

Feature: heartbeat.skipIfRunActive — skip heartbeat when an agent run is in progress #29537

@Paul-Kyle

Description

@Paul-Kyle

Problem

Heartbeats fire on a fixed schedule regardless of whether the agent is mid-execution. The current queue check in runHeartbeatOnce only catches pending incoming messages:

if (getQueueSize(CommandLane.Main) > 0) return { status: "skipped", reason: "requests-in-flight" };

But when a session is actively running — tool calls, subagents, extended agentic work — the queue is empty. All work is in-flight, not pending. So the heartbeat fires mid-run anyway, either competing as a concurrent session, queuing up to fire the moment the run finishes, or in some queue modes steering directly into the active session.

The Gap

isEmbeddedPiRunActive() already exists in the codebase and is used throughout the queue mode logic (reply-*.js). It correctly detects an active agent run. It just isn't called in runHeartbeatOnce before deciding whether to fire.

Proposed Fix

Add an opt-in config key:

{
  agents: {
    defaults: {
      heartbeat: {
        every: "30m",
        skipIfRunActive: true  // skip this heartbeat tick if isEmbeddedPiRunActive() is true
      }
    }
  }
}

And in runHeartbeatOnce, after the requests-in-flight check:

if (opts.skipIfRunActive && isEmbeddedPiRunActive(sessionId)) {
  return { status: "skipped", reason: "run-active" };
}

Why This Is Different From #19382 and #11393

#19382 proposes skipIfRecentActivity — a time-based heuristic on when the user last sent a message. That doesn't help when the agent has been running a long autonomous task for 10+ minutes with no new user message in the window.

#11393 is about heartbeat responses ending up in the wrong conversation thread — a routing problem, not a scheduling one.

This is about run-state awareness. The heartbeat scheduler should know when the agent is actively executing and yield. The state it needs (isEmbeddedPiRunActive) is already tracked — it just needs to reach the heartbeat runner.

Use Case

On gateways running long agentic tasks — coding sessions, research chains, multi-tool workflows — there's currently no way to prevent heartbeats from firing mid-run short of disabling them entirely (every: "0m"). That's fine for gateways that are always busy, but it removes a genuinely useful background health check for gateways that are idle most of the time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions