fix(cron): eliminate double-announce and replace delivery polling with push-based flow by tyler6204 · Pull Request #39089 · openclaw/openclaw

tyler6204 · 2026-03-07T18:31:24Z

Summary

Two related cron delivery bugs fixed in one commit:

1. Double-announce guard (original fix)

Early-return paths inside deliverViaAnnounce returned without setting deliveryAttempted = true. The heartbeat timer would then see deliveryAttempted = false and fire a redundant enqueueSystemEvent fallback, delivering the same message twice.

Fix: Both early-return paths (active-subagent suppression and stale-interim suppression) now set deliveryAttempted = true before returning.

2. Replace 500ms polling loop with push-based `agent.wait`

waitForDescendantSubagentSummary in subagent-followup.ts previously polled every 500ms via countActiveDescendantRuns until all descendants finished — wasting tokens and wall-clock time on each cron run that spawns subagents.

Fix: The polling loop is replaced with concurrent agent.wait RPC calls (one per active descendant run) via Promise.allSettled. Iterations continue only if new descendants appear (e.g. spawned by first-level subagents). Only a short bounded grace period (5s, 200ms poll) remains to capture the cron agent's post-orchestration synthesis after descendants settle.

Key improvements:

No more busy-polling; waits are event-driven via gateway RPC
Handles nested descendants (new runs that appear mid-wait get a second pass)
Gateway errors are absorbed gracefully via allSettled
FAST_TEST_MODE shrinks the grace period to 50ms in tests

Tests

New waitForDescendantSubagentSummary test suite (8 tests) covering push-based waits, error handling, multi-descendant cases, NO_REPLY, and early-exit paths
All existing cron tests pass (57 test files, 464 tests)

Thanks @tyler6204

aisle-research-bot · 2026-03-07T18:31:58Z

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

#	Severity	Title
1	🟡 Medium	Tight-loop gateway fanout in descendant subagent wait can amplify failures (DoS)
2	🟡 Medium	Cron delivery can be silently suppressed by misusing `deliveryAttempted` flag (notification DoS)

1. 🟡 Tight-loop gateway fanout in descendant subagent wait can amplify failures (DoS)

Property	Value
Severity	Medium
CWE	CWE-400
Location	`src/cron/isolated-agent/subagent-followup.ts:144-161`

Description

The new push-based descendant waiting loop in waitForDescendantSubagentSummary() can spin rapidly and repeatedly fan out RPC calls when the gateway call fails fast.

Key points:

pendingRunIds is refreshed from listDescendantRunsForRequester() each iteration.
The loop performs Promise.allSettled([...pendingRunIds].map(runId => callGateway({method:"agent.wait", ...}).catch(() => undefined))).
If callGateway() fails quickly (e.g., gateway down / connection refused / close during handshake), the await Promise.allSettled(...) resolves quickly.
If the registry still reports those runs as active (endedAt unset), pendingRunIds remains non-empty and the while loop immediately repeats without any delay/backoff, causing:
- high CPU usage in this cron worker
- repeated WebSocket connection attempts and request amplification against the gateway
- potentially large concurrent fanout per iteration when many descendants are active

This is an availability risk (internal DoS) that can be triggered by common infra failure modes (gateway unavailable) and becomes worse with higher configured subagent limits or multiple concurrent cron sessions.

Vulnerable code:

while (pendingRunIds.size > 0 && Date.now() < deadline) {
  const remainingMs = Math.max(1, deadline - Date.now());
  await Promise.allSettled(
    [...pendingRunIds].map((runId) =>
      callGateway({
        method: "agent.wait",
        params: { runId, timeoutMs: remainingMs },
        timeoutMs: remainingMs + 2_000,
      }).catch(() => undefined),
    ),
  );

  pendingRunIds = new Set<string>(getActiveRuns().map((e) => e.runId));
}

Recommendation

Add bounded retry/backoff and concurrency limiting for the push-wait rounds, and avoid immediate re-tries when the gateway is unhealthy.

Recommended changes:

Backoff/sleep per iteration (especially after any failures) to prevent tight loops.
Cap concurrency when waiting on many runIds (batching or p-limit).
Stop retrying on repeated gateway failures and fall back to a slower polling strategy (or break early and rely on grace-period polling / fallback reply).

Example mitigation (sketch):

import pLimit from "p-limit";

const limit = pLimit(10); // or config-driven
let backoffMs = 200;
let consecutiveGatewayFailures = 0;

while (pendingRunIds.size > 0 && Date.now() < deadline) {
  const remainingMs = Math.max(1, deadline - Date.now());

  const results = await Promise.allSettled(
    [...pendingRunIds].map((runId) =>
      limit(() =>
        callGateway({
          method: "agent.wait",
          params: { runId, timeoutMs: remainingMs },
          timeoutMs: remainingMs + 2_000,
        })
      )
    )
  );

  const anyRejected = results.some((r) => r.status === "rejected");
  if (anyRejected) {
    consecutiveGatewayFailures += 1;
    await new Promise((r) => setTimeout(r, backoffMs));
    backoffMs = Math.min(backoffMs * 2, 5_000);
    if (consecutiveGatewayFailures >= 3) {
      break; // or switch to slower poll mode
    }
  } else {
    consecutiveGatewayFailures = 0;
    backoffMs = 200;
  }

  pendingRunIds = new Set(getActiveRuns().map((e) => e.runId));
}

Also consider enforcing a maximum descendant run count to wait on per request (even if spawning is limited elsewhere) to prevent registry corruption/misconfiguration from causing unbounded fanout.

2. 🟡 Cron delivery can be silently suppressed by misusing `deliveryAttempted` flag (notification DoS)

Property	Value
Severity	Medium
CWE	CWE-440
Location	`src/cron/isolated-agent/delivery-dispatch.ts:319-349`

Description

In dispatchCronDelivery(), two early-return suppression paths now set deliveryAttempted = true even though no outbound delivery was attempted.

Because deliveryAttempted is used by the cron timer’s fallback guard (shouldEnqueueCronMainSummary) to decide whether to enqueue a system-event summary, this change can suppress the only remaining delivery mechanism in some cases.

Impact:

When activeSubagentRuns > 0 after waiting, dispatch returns without sending any message, but deliveryAttempted=true prevents the timer from enqueueing enqueueSystemEvent fallback.
When the "stale interim" suppression triggers (descendants existed but no synthesized update arrived), dispatch returns without delivering, but again marks deliveryAttempted=true and suppresses the timer fallback.
If a descendant run is buggy/malicious and never ends (or if descendant completion announcements cannot reach the user due to missing/invalid delivery origin), the user may receive no completion signal and no fallback summary.

Vulnerable logic (suppression paths marked as “attempted”):

if (activeSubagentRuns > 0) {
  deliveryAttempted = true;
  return params.withRunSession({ status: "ok", summary, outputText, deliveryAttempted, ... });
}

if (hadDescendants && /* stale interim */) {
  deliveryAttempted = true;
  return params.withRunSession({ status: "ok", summary, outputText, deliveryAttempted, ... });
}

Why this is availability-relevant:

The timer fallback only triggers when deliveryAttempted !== true.
Setting deliveryAttempted=true here changes the meaning from “an outbound send was attempted” to “we intentionally suppressed delivery”, which downstream logic does not distinguish.

This enables a notification DoS where cron runs can end with no user-visible message if descendants keep activeSubagentRuns > 0 (or keep the stale-interim condition true) long enough.

Recommendation

Treat “suppressed” as a distinct state from “attempted delivery”, so the timer can apply an appropriate fallback policy.

Recommended fixes (pick one):

Introduce a separate flag (e.g. deliverySuppressed / suppressionReason) and keep deliveryAttempted reserved for true outbound attempts.

// suppression path
const deliverySuppressed = true;
return params.withRunSession({
  status: "ok",
  summary,
  outputText,
  deliveryAttempted: false,
  deliverySuppressed,
  ...params.telemetry,
});

Then update the timer fallback logic to consider deliverySuppressed (e.g., enqueue a non-user-facing internal event, or enqueue a user-facing “still running” / “timed out waiting for descendants” message after a cap).

Add a bounded suppression timeout: if descendants remain active beyond a maximum, enqueue a fallback system event (or schedule a follow-up dispatch) so the user receives some completion/timeout signal.
If deliveryAttempted must remain true to prevent double-announces, add an explicit mechanism that guarantees a later delivery attempt (e.g., on descendant settle) and/or emits an internal alert when suppression happens for too long.

Analyzed PR: #39089 at commit cfe819e

_{Last updated on: 2026-03-07T21:06:02Z}

greptile-apps · 2026-03-07T18:34:53Z

Greptile Summary

This PR fixes a double-announce bug in cron job delivery where two early-return paths inside deliverViaAnnounce (active subagent suppression at lines 319-330 and stale interim message suppression at lines 332-350) returned without setting deliveryAttempted = true. The timer's shouldEnqueueCronMainSummary guard in heartbeat-policy.ts (line 45) checks params.deliveryAttempted !== true — when both delivered and deliveryAttempted were false/undefined, the guard would fire enqueueSystemEvent, sending a redundant second message.

Key changes:

In delivery-dispatch.ts, both early-return paths now set deliveryAttempted = true and pass it explicitly to withRunSession before returning — correctly signaling that delivery was intentionally suppressed, not missed.
A new test file (delivery-dispatch.double-announce.test.ts) adds 5 targeted tests covering both suppression paths, normal announce success, announce-failure fallback to direct delivery, and no-delivery-requested scenarios.

Implementation detail: The fix correctly distinguishes between the SILENT_REPLY_TOKEN early return (lines 351-358, which sets delivered: true in the returned result and does not require deliveryAttempted) and the two suppression paths (which now correctly set deliveryAttempted = true). Both approaches prevent the timer's fallback guard from firing — one by marking delivery as successful, the other by marking it as intentionally suppressed. All 457 existing cron tests continue to pass.

Confidence Score: 5/5

This PR is safe to merge — the fix is minimal, targeted, and well-tested with zero risk of regression.
The change is confined to two symmetric, low-risk additions of deliveryAttempted = true in clearly-identified early-return paths. The root cause analysis matches the code exactly: both early-return paths were missing the flag that the shouldEnqueueCronMainSummary guard checks to prevent the fallback. The heartbeat-policy.ts guard logic (line 45) confirms exactly why the fix works. The five new tests directly exercise both bug paths, normal announce success, announce-failure fallback, and the no-delivery-requested path, with assertions on both the state flags and the timer guard itself. The fix does not alter any other code paths or existing behavior — it only prevents the unintended timer fallback. All 457 existing cron tests continue to pass.
No files require special attention.

_{Last reviewed commit: bb765a3}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d9bbd1390

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-07T18:52:17Z

src/cron/isolated-agent/subagent-followup.ts

+  let pendingRunIds = new Set<string>(initialActiveRuns.map((e) => e.runId));
+
+  while (pendingRunIds.size > 0 && Date.now() < deadline) {
+    const remainingMs = Math.max(1, deadline - Date.now());


Keep polling for late-starting descendants

When observedActiveDescendants is true but initialActiveRuns is empty (a race that happens when the parent emits a follow-up hint before the registry marks child runs active), pendingRunIds starts empty and this loop is skipped entirely. The new implementation then never re-checks listDescendantRunsForRequester for descendants that become active a moment later, so dispatchCronDelivery can proceed with stale/interim text instead of waiting for the real subagent result.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-07T18:52:17Z

src/cron/isolated-agent/subagent-followup.ts

+        callGateway<{ status?: string }>({
+          method: "agent.wait",
+          params: { runId, timeoutMs: remainingMs },
+          timeoutMs: remainingMs + 2_000,
+        }).catch(() => undefined),


Add backoff after agent.wait failures

This call path swallows agent.wait errors and immediately continues the outer loop; if the gateway is temporarily unavailable and callGateway fails fast, the same active run IDs are retried in a tight loop until deadline. That can create high CPU usage and rapid reconnect churn during outages, unlike the previous fixed polling delay behavior.

Useful? React with 👍 / 👎.

…6204

…6204)

…6204

…6204)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c7a7241ff6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-07T20:06:41Z

src/cron/isolated-agent/subagent-followup.ts

+  let pendingRunIds = new Set<string>(initialActiveRuns.map((e) => e.runId));
+
+  while (pendingRunIds.size > 0 && Date.now() < deadline) {


Keep discovering descendants after an empty initial snapshot

When observedActiveDescendants is true but initialActiveRuns is empty (a common registry timing race), this code initializes pendingRunIds to an empty set and skips the wait loop entirely, so it never refreshes descendant activity before moving to grace-period transcript polling. That can return undefined/interim text even though descendants become active moments later, which regresses the old behavior that re-checked activity each cycle. Fresh evidence in this commit is that pendingRunIds is seeded once from initialActiveRuns and the while (pendingRunIds.size > 0 ...) guard blocks any later getActiveRuns() refresh when the first snapshot is empty.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-07T20:06:41Z

src/cron/isolated-agent/subagent-followup.ts

+        callGateway<{ status?: string }>({
+          method: "agent.wait",
+          params: { runId, timeoutMs: remainingMs },
+          timeoutMs: remainingMs + 2_000,
+        }).catch(() => undefined),


Back off after fast agent.wait gateway failures

The per-run agent.wait call now catches and suppresses gateway errors, but the outer loop has no delay path when those failures are immediate; if the gateway is temporarily down and callGateway fails fast, the same active run IDs are retried in a tight loop until deadline. This can create avoidable CPU/reconnect churn during outages. Fresh evidence in this commit is the new callGateway(...).catch(() => undefined) branch inside the retry loop without any sleep/backoff before the next iteration.

Useful? React with 👍 / 👎.

…h push-based flow - Set deliveryAttempted=true in announce early-return paths (active-subagent suppression and stale-interim suppression) so the heartbeat timer no longer fires a redundant enqueueSystemEvent fallback (double-announce bug). - Refactor waitForDescendantSubagentSummary to use event-based agent.wait RPC calls instead of a 500ms busy-poll loop. Each active descendant run is now awaited concurrently via Promise.allSettled, and only a short bounded grace period (5s) remains to capture the cron agent's post-orchestration synthesis. Eliminates O(n*timeoutMs/500ms) gateway calls and wasted wall-clock time. - Add FAST_TEST_MODE (OPENCLAW_TEST_FAST=1) to subagent-followup.ts to keep the grace-period tests instant in CI. - Add comprehensive tests for the new waitForDescendantSubagentSummary behaviour (push-based wait, error resilience, NO_REPLY handling, multi-descendant waits).

…6204

…6204)

tyler6204 · 2026-03-07T20:14:00Z

Merged via squash.

Merge commit: e554c59

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cfe819eaaa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-07T20:18:20Z

src/cron/isolated-agent/subagent-followup.ts

+  let pendingRunIds = new Set<string>(initialActiveRuns.map((e) => e.runId));
+
+  while (pendingRunIds.size > 0 && Date.now() < deadline) {


Keep discovering descendants after empty initial snapshot

When observedActiveDescendants is true but initialActiveRuns is empty (a timing race where descendants are observed by the caller before the registry snapshot includes them), pendingRunIds is initialized empty and this loop never runs. That means getActiveRuns() is never refreshed before moving to grace-period transcript polling, so late-starting descendants can be missed and cron delivery can proceed with stale/interim text instead of waiting for the actual descendant completion.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-07T20:18:20Z

src/cron/isolated-agent/subagent-followup.ts

+          method: "agent.wait",
+          params: { runId, timeoutMs: remainingMs },
+          timeoutMs: remainingMs + 2_000,
+        }).catch(() => undefined),


Add retry backoff after failed agent.wait calls

Each agent.wait call suppresses gateway errors via .catch(() => undefined), but the outer loop immediately retries the same active run IDs with no sleep. If the gateway is temporarily unavailable and failures are fast, this produces a tight retry loop until deadline, driving unnecessary CPU usage and connection churn in the cron worker.

Useful? React with 👍 / 👎.

@tyler6204

…h push-based flow (openclaw#39089) * fix(cron): eliminate double-announce and replace delivery polling with push-based flow - Set deliveryAttempted=true in announce early-return paths (active-subagent suppression and stale-interim suppression) so the heartbeat timer no longer fires a redundant enqueueSystemEvent fallback (double-announce bug). - Refactor waitForDescendantSubagentSummary to use event-based agent.wait RPC calls instead of a 500ms busy-poll loop. Each active descendant run is now awaited concurrently via Promise.allSettled, and only a short bounded grace period (5s) remains to capture the cron agent's post-orchestration synthesis. Eliminates O(n*timeoutMs/500ms) gateway calls and wasted wall-clock time. - Add FAST_TEST_MODE (OPENCLAW_TEST_FAST=1) to subagent-followup.ts to keep the grace-period tests instant in CI. - Add comprehensive tests for the new waitForDescendantSubagentSummary behaviour (push-based wait, error resilience, NO_REPLY handling, multi-descendant waits). * fix: prep cron double-announce followup tests (openclaw#39089) (thanks @tyler6204)

@tyler6204

…h push-based flow (openclaw#39089) * fix(cron): eliminate double-announce and replace delivery polling with push-based flow - Set deliveryAttempted=true in announce early-return paths (active-subagent suppression and stale-interim suppression) so the heartbeat timer no longer fires a redundant enqueueSystemEvent fallback (double-announce bug). - Refactor waitForDescendantSubagentSummary to use event-based agent.wait RPC calls instead of a 500ms busy-poll loop. Each active descendant run is now awaited concurrently via Promise.allSettled, and only a short bounded grace period (5s) remains to capture the cron agent's post-orchestration synthesis. Eliminates O(n*timeoutMs/500ms) gateway calls and wasted wall-clock time. - Add FAST_TEST_MODE (OPENCLAW_TEST_FAST=1) to subagent-followup.ts to keep the grace-period tests instant in CI. - Add comprehensive tests for the new waitForDescendantSubagentSummary behaviour (push-based wait, error resilience, NO_REPLY handling, multi-descendant waits). * fix: prep cron double-announce followup tests (openclaw#39089) (thanks @tyler6204)

@tyler6204

…h push-based flow (openclaw#39089) * fix(cron): eliminate double-announce and replace delivery polling with push-based flow - Set deliveryAttempted=true in announce early-return paths (active-subagent suppression and stale-interim suppression) so the heartbeat timer no longer fires a redundant enqueueSystemEvent fallback (double-announce bug). - Refactor waitForDescendantSubagentSummary to use event-based agent.wait RPC calls instead of a 500ms busy-poll loop. Each active descendant run is now awaited concurrently via Promise.allSettled, and only a short bounded grace period (5s) remains to capture the cron agent's post-orchestration synthesis. Eliminates O(n*timeoutMs/500ms) gateway calls and wasted wall-clock time. - Add FAST_TEST_MODE (OPENCLAW_TEST_FAST=1) to subagent-followup.ts to keep the grace-period tests instant in CI. - Add comprehensive tests for the new waitForDescendantSubagentSummary behaviour (push-based wait, error resilience, NO_REPLY handling, multi-descendant waits). * fix: prep cron double-announce followup tests (openclaw#39089) (thanks @tyler6204)

@tyler6204

…h push-based flow (openclaw#39089) * fix(cron): eliminate double-announce and replace delivery polling with push-based flow - Set deliveryAttempted=true in announce early-return paths (active-subagent suppression and stale-interim suppression) so the heartbeat timer no longer fires a redundant enqueueSystemEvent fallback (double-announce bug). - Refactor waitForDescendantSubagentSummary to use event-based agent.wait RPC calls instead of a 500ms busy-poll loop. Each active descendant run is now awaited concurrently via Promise.allSettled, and only a short bounded grace period (5s) remains to capture the cron agent's post-orchestration synthesis. Eliminates O(n*timeoutMs/500ms) gateway calls and wasted wall-clock time. - Add FAST_TEST_MODE (OPENCLAW_TEST_FAST=1) to subagent-followup.ts to keep the grace-period tests instant in CI. - Add comprehensive tests for the new waitForDescendantSubagentSummary behaviour (push-based wait, error resilience, NO_REPLY handling, multi-descendant waits). * fix: prep cron double-announce followup tests (openclaw#39089) (thanks @tyler6204)

@tyler6204

…h push-based flow (openclaw#39089) * fix(cron): eliminate double-announce and replace delivery polling with push-based flow - Set deliveryAttempted=true in announce early-return paths (active-subagent suppression and stale-interim suppression) so the heartbeat timer no longer fires a redundant enqueueSystemEvent fallback (double-announce bug). - Refactor waitForDescendantSubagentSummary to use event-based agent.wait RPC calls instead of a 500ms busy-poll loop. Each active descendant run is now awaited concurrently via Promise.allSettled, and only a short bounded grace period (5s) remains to capture the cron agent's post-orchestration synthesis. Eliminates O(n*timeoutMs/500ms) gateway calls and wasted wall-clock time. - Add FAST_TEST_MODE (OPENCLAW_TEST_FAST=1) to subagent-followup.ts to keep the grace-period tests instant in CI. - Add comprehensive tests for the new waitForDescendantSubagentSummary behaviour (push-based wait, error resilience, NO_REPLY handling, multi-descendant waits). * fix: prep cron double-announce followup tests (openclaw#39089) (thanks @tyler6204)

@tyler6204

…h push-based flow (openclaw#39089) * fix(cron): eliminate double-announce and replace delivery polling with push-based flow - Set deliveryAttempted=true in announce early-return paths (active-subagent suppression and stale-interim suppression) so the heartbeat timer no longer fires a redundant enqueueSystemEvent fallback (double-announce bug). - Refactor waitForDescendantSubagentSummary to use event-based agent.wait RPC calls instead of a 500ms busy-poll loop. Each active descendant run is now awaited concurrently via Promise.allSettled, and only a short bounded grace period (5s) remains to capture the cron agent's post-orchestration synthesis. Eliminates O(n*timeoutMs/500ms) gateway calls and wasted wall-clock time. - Add FAST_TEST_MODE (OPENCLAW_TEST_FAST=1) to subagent-followup.ts to keep the grace-period tests instant in CI. - Add comprehensive tests for the new waitForDescendantSubagentSummary behaviour (push-based wait, error resilience, NO_REPLY handling, multi-descendant waits). * fix: prep cron double-announce followup tests (openclaw#39089) (thanks @tyler6204)

@tyler6204

…h push-based flow (openclaw#39089) * fix(cron): eliminate double-announce and replace delivery polling with push-based flow - Set deliveryAttempted=true in announce early-return paths (active-subagent suppression and stale-interim suppression) so the heartbeat timer no longer fires a redundant enqueueSystemEvent fallback (double-announce bug). - Refactor waitForDescendantSubagentSummary to use event-based agent.wait RPC calls instead of a 500ms busy-poll loop. Each active descendant run is now awaited concurrently via Promise.allSettled, and only a short bounded grace period (5s) remains to capture the cron agent's post-orchestration synthesis. Eliminates O(n*timeoutMs/500ms) gateway calls and wasted wall-clock time. - Add FAST_TEST_MODE (OPENCLAW_TEST_FAST=1) to subagent-followup.ts to keep the grace-period tests instant in CI. - Add comprehensive tests for the new waitForDescendantSubagentSummary behaviour (push-based wait, error resilience, NO_REPLY handling, multi-descendant waits). * fix: prep cron double-announce followup tests (openclaw#39089) (thanks @tyler6204)

@tyler6204

…h push-based flow (openclaw#39089) * fix(cron): eliminate double-announce and replace delivery polling with push-based flow - Set deliveryAttempted=true in announce early-return paths (active-subagent suppression and stale-interim suppression) so the heartbeat timer no longer fires a redundant enqueueSystemEvent fallback (double-announce bug). - Refactor waitForDescendantSubagentSummary to use event-based agent.wait RPC calls instead of a 500ms busy-poll loop. Each active descendant run is now awaited concurrently via Promise.allSettled, and only a short bounded grace period (5s) remains to capture the cron agent's post-orchestration synthesis. Eliminates O(n*timeoutMs/500ms) gateway calls and wasted wall-clock time. - Add FAST_TEST_MODE (OPENCLAW_TEST_FAST=1) to subagent-followup.ts to keep the grace-period tests instant in CI. - Add comprehensive tests for the new waitForDescendantSubagentSummary behaviour (push-based wait, error resilience, NO_REPLY handling, multi-descendant waits). * fix: prep cron double-announce followup tests (openclaw#39089) (thanks @tyler6204)

@tyler6204

…h push-based flow (openclaw#39089) * fix(cron): eliminate double-announce and replace delivery polling with push-based flow - Set deliveryAttempted=true in announce early-return paths (active-subagent suppression and stale-interim suppression) so the heartbeat timer no longer fires a redundant enqueueSystemEvent fallback (double-announce bug). - Refactor waitForDescendantSubagentSummary to use event-based agent.wait RPC calls instead of a 500ms busy-poll loop. Each active descendant run is now awaited concurrently via Promise.allSettled, and only a short bounded grace period (5s) remains to capture the cron agent's post-orchestration synthesis. Eliminates O(n*timeoutMs/500ms) gateway calls and wasted wall-clock time. - Add FAST_TEST_MODE (OPENCLAW_TEST_FAST=1) to subagent-followup.ts to keep the grace-period tests instant in CI. - Add comprehensive tests for the new waitForDescendantSubagentSummary behaviour (push-based wait, error resilience, NO_REPLY handling, multi-descendant waits). * fix: prep cron double-announce followup tests (openclaw#39089) (thanks @tyler6204)

@tyler6204

…h push-based flow (openclaw#39089) * fix(cron): eliminate double-announce and replace delivery polling with push-based flow - Set deliveryAttempted=true in announce early-return paths (active-subagent suppression and stale-interim suppression) so the heartbeat timer no longer fires a redundant enqueueSystemEvent fallback (double-announce bug). - Refactor waitForDescendantSubagentSummary to use event-based agent.wait RPC calls instead of a 500ms busy-poll loop. Each active descendant run is now awaited concurrently via Promise.allSettled, and only a short bounded grace period (5s) remains to capture the cron agent's post-orchestration synthesis. Eliminates O(n*timeoutMs/500ms) gateway calls and wasted wall-clock time. - Add FAST_TEST_MODE (OPENCLAW_TEST_FAST=1) to subagent-followup.ts to keep the grace-period tests instant in CI. - Add comprehensive tests for the new waitForDescendantSubagentSummary behaviour (push-based wait, error resilience, NO_REPLY handling, multi-descendant waits). * fix: prep cron double-announce followup tests (openclaw#39089) (thanks @tyler6204) (cherry picked from commit e554c59)

@tyler6204

…h push-based flow (openclaw#39089) * fix(cron): eliminate double-announce and replace delivery polling with push-based flow - Set deliveryAttempted=true in announce early-return paths (active-subagent suppression and stale-interim suppression) so the heartbeat timer no longer fires a redundant enqueueSystemEvent fallback (double-announce bug). - Refactor waitForDescendantSubagentSummary to use event-based agent.wait RPC calls instead of a 500ms busy-poll loop. Each active descendant run is now awaited concurrently via Promise.allSettled, and only a short bounded grace period (5s) remains to capture the cron agent's post-orchestration synthesis. Eliminates O(n*timeoutMs/500ms) gateway calls and wasted wall-clock time. - Add FAST_TEST_MODE (OPENCLAW_TEST_FAST=1) to subagent-followup.ts to keep the grace-period tests instant in CI. - Add comprehensive tests for the new waitForDescendantSubagentSummary behaviour (push-based wait, error resilience, NO_REPLY handling, multi-descendant waits). * fix: prep cron double-announce followup tests (openclaw#39089) (thanks @tyler6204) (cherry picked from commit e554c59)

openclaw-barnacle bot added size: M maintainer Maintainer-authored PR labels Mar 7, 2026

tyler6204 force-pushed the fix/cron-double-announce branch from bb765a3 to 5d9bbd1 Compare March 7, 2026 18:48

openclaw-barnacle bot added size: L and removed size: M labels Mar 7, 2026

tyler6204 changed the title ~~fix(cron): prevent double-announce when deliverViaAnnounce returns early~~ fix(cron): eliminate double-announce and replace delivery polling with push-based flow Mar 7, 2026

chatgpt-codex-connector bot reviewed Mar 7, 2026

View reviewed changes

tyler6204 self-assigned this Mar 7, 2026

tyler6204 added a commit that referenced this pull request Mar 7, 2026

fix: prep cron double-announce followup tests (#39089) (thanks @tyler…

3ddd0a0

…6204)

tyler6204 force-pushed the fix/cron-double-announce branch from 5d9bbd1 to 3ddd0a0 Compare March 7, 2026 19:59

tyler6204 added a commit that referenced this pull request Mar 7, 2026

fix: prep cron double-announce followup tests (#39089) (thanks @tyler…

c7a7241

…6204)

tyler6204 force-pushed the fix/cron-double-announce branch from 3ddd0a0 to c7a7241 Compare March 7, 2026 20:02

chatgpt-codex-connector bot reviewed Mar 7, 2026

View reviewed changes

tyler6204 added 2 commits March 7, 2026 12:07

fix: prep cron double-announce followup tests (#39089) (thanks @tyler…

cfe819e

…6204)

tyler6204 force-pushed the fix/cron-double-announce branch from c7a7241 to cfe819e Compare March 7, 2026 20:13

tyler6204 merged commit e554c59 into main Mar 7, 2026
10 of 11 checks passed

tyler6204 deleted the fix/cron-double-announce branch March 7, 2026 20:13

github-actions bot mentioned this pull request Mar 7, 2026

📡 Upstream Digest — 2026-03-07 20:17 UTC curtismercier/openclaw-mods#204

Open

chatgpt-codex-connector bot reviewed Mar 7, 2026

View reviewed changes

github-actions bot mentioned this pull request Mar 8, 2026

上游更新: v2026.3.7 — 19 P0 + 122 P1 待合并 jiulingyun/openclaw-cn#486

Open

tyler6204 mentioned this pull request Mar 8, 2026

fix(cron): announce delivery + wake event cause double message delivery #40178

Closed

alexey-pelykh mentioned this pull request Mar 21, 2026

Cherry-pick issue #870: upstream cron/daemon updates remoteclaw/remoteclaw#1760

Merged

wonderchook mentioned this pull request Mar 22, 2026

Cron announce delivery fails with explicit to targets (v2026.2.22) #24176

Open

		let pendingRunIds = new Set<string>(initialActiveRuns.map((e) => e.runId));

		while (pendingRunIds.size > 0 && Date.now() < deadline) {

Uh oh!

Conversation

tyler6204 commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Double-announce guard (original fix)

2. Replace 500ms polling loop with push-based agent.wait

Tests

Uh oh!

aisle-research-bot bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Aisle Security Analysis

1. 🟡 Tight-loop gateway fanout in descendant subagent wait can amplify failures (DoS)

Description

Recommendation

2. 🟡 Cron delivery can be silently suppressed by misusing deliveryAttempted flag (notification DoS)

Description

Recommendation

Uh oh!

greptile-apps bot commented Mar 7, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tyler6204 commented Mar 7, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tyler6204 commented Mar 7, 2026 •

edited

Loading

2. Replace 500ms polling loop with push-based `agent.wait`

aisle-research-bot bot commented Mar 7, 2026 •

edited

Loading

2. 🟡 Cron delivery can be silently suppressed by misusing `deliveryAttempted` flag (notification DoS)