Skip to content

Fix exec heartbeat wake scoping to prevent cross-agent fan-out#31729

Closed
altaywtf wants to merge 3 commits intoopenclaw:mainfrom
altaywtf:fix/heartbeat-exec-wake-scoping
Closed

Fix exec heartbeat wake scoping to prevent cross-agent fan-out#31729
altaywtf wants to merge 3 commits intoopenclaw:mainfrom
altaywtf:fix/heartbeat-exec-wake-scoping

Conversation

@altaywtf
Copy link
Copy Markdown
Member

@altaywtf altaywtf commented Mar 2, 2026

Summary

  • Exec completion wake events were emitted unscoped in key paths, which could trigger heartbeat evaluation outside the originating context.
  • In multi-agent setups, that can produce duplicate/unexpected heartbeats (for example unrelated agent sessions), adding token cost and noise.
  • Fix: exec-driven wake sources now include sessionKey when calling requestHeartbeatNow, and regression tests enforce no cross-agent fan-out for scoped exec wakes.
  • Out of scope: heartbeat scheduling semantics, heartbeat prompt content, and session-store architecture redesign.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes: None (no issue number provided)
  • Related: None

User-visible / Behavior Changes

  • Exec-triggered heartbeat wakes are now session-scoped instead of global in:
    • exec runtime notify-on-exit path
    • node exec event forwarding path
  • Unrelated agents are no longer woken by non-origin exec completions.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: macOS (local dev)
  • Runtime/container: Node + pnpm + Vitest
  • Model/provider: N/A
  • Integration/channel (if any): multi-agent heartbeat flow; node exec event ingestion
  • Relevant config (redacted): multiple heartbeat-enabled agents, exec notify-on-exit enabled

Steps

  1. Trigger exec completion events in one session/agent context.
  2. Observe heartbeat wake dispatch and runner routing.
  3. Confirm unrelated agents are not invoked for session-scoped exec wakes.

Expected

  • Only the originating session/agent context is woken.

Actual

  • Scoped wake payloads are emitted and runner executes only the targeted context.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Trace/log snippets (local):

  • pnpm test src/agents/bash-tools.exec-runtime.test.ts src/gateway/server-node-events.test.ts src/agents/bash-tools.test.ts
    • Test Files 3 passed (3)
    • Tests 38 passed (38)
  • pnpm test src/infra/heartbeat-runner.scheduler.test.ts src/infra/heartbeat-wake.test.ts
    • Test Files 2 passed (2)
    • Tests 20 passed (20)
  • pnpm check (format/type/lint guardrails passed)

Human Verification (required)

  • Verified scenarios:
    • emitExecSystemEvent emits scoped wake calls with sessionKey.
    • Node exec event handling emits scoped wake calls with sessionKey.
    • Scheduler regression confirms scoped exec-event wakes do not trigger unrelated agents.
  • Edge cases checked:
    • Empty/no-op exec completion paths remain suppressed where expected.
    • Existing wake scheduler behavior (coalescing/retry/target routing) remains green.
  • Not verified:
    • Live end-to-end run against production-like stale/orphan session-store data outside tests.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Failure Recovery (if this breaks)

  • How to disable/revert quickly:
    • Revert commit 6124db32a.
    • Temporary mitigation: set tools.exec.notifyOnExit=false to stop exec-driven wake paths.
  • Files/config to restore:
    • src/agents/bash-tools.exec-runtime.ts
    • src/gateway/server-node-events.ts
  • Known bad symptoms to watch for:
    • Missing heartbeat wake after valid session-scoped exec completion.
    • Unexpected fallback to unrelated-agent execution after scoped exec wakes.

Risks and Mitigations

  • Risk: Non-agent session keys still resolve to main by existing session-key semantics.
    • Mitigation: This PR still prevents multi-agent fan-out; regression tests assert no unrelated-agent runs for scoped wakes.
  • Risk: Any caller that omits sessionKey still gets unscoped behavior.
    • Mitigation: Updated key exec-event sources now pass sessionKey; tests enforce this contract.

@openclaw-barnacle openclaw-barnacle bot added gateway Gateway runtime agents Agent runtime and tooling size: S labels Mar 2, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6124db32ad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


enqueueSystemEvent(text, { sessionKey, contextKey: runId ? `exec:${runId}` : "exec" });
requestHeartbeatNow({ reason: "exec-event" });
requestHeartbeatNow({ reason: "exec-event", sessionKey });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid session-scoping wakes for non-agent node exec keys

Passing sessionKey unconditionally here makes exec-event wakes targeted even when the payload used the fallback non-agent key (node-... when obj.sessionKey is missing). In that case the runner resolves the target agent from the session key (resolveAgentIdFromSessionKey) to main and returns skipped/disabled if main is not an enabled heartbeat agent, so node exec completions can stop triggering any heartbeat in configs that only define non-main agents; before this change, the unscoped wake still ran enabled agents.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 2, 2026

Greptile Summary

This PR correctly scopes exec-driven heartbeat wake events to their originating session by passing sessionKey to requestHeartbeatNow in two paths: the maybeNotifyOnExit/emitExecSystemEvent functions in bash-tools.exec-runtime.ts and the exec.started/exec.finished/exec.denied handler in server-node-events.ts. Previously these calls were unscoped (global), causing the heartbeat runner to fan out across all configured agents on exec events.

Key observations:

  • Both changed call sites were already guarded by an early-return on empty/missing sessionKey, so the scoping is applied only when a valid key is present and existing no-op paths remain unchanged.
  • The heartbeat runner routes scoped wakes via resolveAgentIdFromSessionKey, which only parses agent:<id>:<rest>-format keys. Node events that fall back to the node-{nodeId} synthetic key (i.e. payloads without an explicit sessionKey) will resolve to the main agent — correctly preventing fan-out but potentially not targeting the intended agent in multi-agent deployments where a node is logically owned by a non-main agent. This is explicitly documented as a known risk in the PR.
  • Test coverage is thorough: unit-level mocking in bash-tools.exec-runtime.test.ts, an integration-level setHeartbeatWakeHandler poll test in bash-tools.test.ts, updated assertions in server-node-events.test.ts, and a new scheduler regression test in heartbeat-runner.scheduler.test.ts that directly asserts no finance agent fan-out for a scoped exec wake.

Confidence Score: 4/5

  • Safe to merge; the targeted fix is correct and well-tested with no behavioural regressions expected.
  • The two-line production changes are minimal and each is preceded by existing guard logic that already validates sessionKey. Tests cover unit, integration, and scheduler regression levels. The only remaining gap is node events with non-agent: session keys silently falling back to main, but this is pre-existing behaviour and explicitly acknowledged.
  • No files require special attention. The server-node-events.ts fallback to node-{nodeId} is worth a follow-up if multi-agent node routing is ever needed, but it is out of scope for this fix.

Last reviewed commit: 6124db3

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9ebf1c465e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}
enqueueSystemEvent(text, { sessionKey, contextKey: opts.contextKey });
requestHeartbeatNow({ reason: "exec-event" });
requestHeartbeatNow({ reason: "exec-event", sessionKey });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard scoped exec wakes to agent session keys

emitExecSystemEvent now always passes sessionKey to requestHeartbeatNow, but defaults.sessionKey is not guaranteed to be agent-scoped (the caller keeps non-agent keys too, e.g. global; see src/agents/bash-tools.exec.ts:195-201), and targeted wakes are routed via resolveAgentIdFromSessionKey (src/infra/heartbeat-runner.ts:1122-1124), which maps non-agent keys to main. In configs where heartbeat is enabled for non-main agents, exec completions from legacy/global sessions will be dropped as skipped/disabled instead of waking the enabled agent(s), which is a regression from the previous unscoped behavior.

Useful? React with 👍 / 👎.

@altaywtf altaywtf force-pushed the fix/heartbeat-exec-wake-scoping branch from 9ebf1c4 to 9952460 Compare March 2, 2026 14:29
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9952460a39

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

: `Exec ${status} (${session.id.slice(0, 8)}, ${exitLabel})`;
enqueueSystemEvent(summary, { sessionKey });
requestHeartbeatNow({ reason: `exec:${session.id}:exit` });
requestHeartbeatNow({ reason: `exec:${session.id}:exit`, sessionKey });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep notify-on-exit wakes unscoped for non-agent sessions

Adding sessionKey unconditionally here makes notify-on-exit wakes targeted even when exec runs are tied to non-agent keys (for example createExecTool forwards defaults.sessionKey directly in src/agents/bash-tools.exec.ts:195,485, and callers can pass sessionId when no session key exists in src/agents/pi-embedded-runner/run/attempt.ts:587). Targeted wakes resolve non-agent keys to main (src/infra/heartbeat-runner.ts:1122-1128), so installations with heartbeat enabled only on non-main agents will skip these exec completion wakes entirely, whereas the prior unscoped behavior still ran enabled agents.

Useful? React with 👍 / 👎.

@altaywtf altaywtf force-pushed the fix/heartbeat-exec-wake-scoping branch from ec5463a to d1beab4 Compare March 2, 2026 16:24
@altaywtf
Copy link
Copy Markdown
Member Author

altaywtf commented Mar 3, 2026

Superseding with a fresh draft PR (same body) to reset branch history after rebase.

@altaywtf altaywtf closed this Mar 3, 2026
@altaywtf
Copy link
Copy Markdown
Member Author

altaywtf commented Mar 3, 2026

Closed in favor of new draft PR:

@altaywtf
Copy link
Copy Markdown
Member Author

altaywtf commented Mar 3, 2026

Superseded by the new draft PR: #32724

@altaywtf altaywtf deleted the fix/heartbeat-exec-wake-scoping branch March 3, 2026 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling gateway Gateway runtime size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant