Skip to content

fix(whatsapp): widen reconnect-window retries#43978

Open
stim64045-spec wants to merge 1 commit intoopenclaw:mainfrom
stim64045-spec:fix/issue-14827
Open

fix(whatsapp): widen reconnect-window retries#43978
stim64045-spec wants to merge 1 commit intoopenclaw:mainfrom
stim64045-spec:fix/issue-14827

Conversation

@stim64045-spec
Copy link
Copy Markdown
Contributor

Summary

Fixes #14827.

When WhatsApp Web is reconnecting after a transient 408/503-style disconnect, replies could exhaust their retry budget before the socket was usable again. The delivery failure was then only logged by the dispatcher, so the agent had no structured signal that the message never reached the user.

What changed

1) Widen the WhatsApp send retry window

In src/web/auto-reply/deliver-reply.ts:

  • increase default retry attempts from 3 → 5
  • increase linear backoff from 500ms × attempt → 1000ms × attempt

That extends the retry window enough to cover the typical 3–5s reconnect gap without introducing a broader outbound queue.

2) Surface delivery failures to the agent

In src/web/auto-reply/monitor/process-message.ts:

  • keep the existing outbound error log
  • additionally enqueue a system event when WhatsApp final delivery fails

That means the failure is no longer silent from the agent/session perspective.

Why this scope

This is the smallest safe fix I could make with high confidence:

  • it improves the common reconnect-window case directly
  • it avoids introducing a new queue / flush lifecycle in the WhatsApp transport path
  • it makes unrecovered failures visible instead of silently dropping them

Tradeoff: this does not add a durable outbound queue. If the reconnect gap exceeds the wider retry window, delivery can still fail — but now it is surfaced instead of disappearing silently.

Tests

  • pnpm exec vitest run src/web/auto-reply/deliver-reply.test.ts src/web/auto-reply/monitor/process-message.inbound-contract.test.ts
  • pnpm exec tsc --noEmit

@openclaw-barnacle openclaw-barnacle bot added channel: whatsapp-web Channel integration: whatsapp-web size: S labels Mar 12, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR widens the WhatsApp reconnect retry window (3→5 attempts, 500ms→1000ms base backoff, max ~10 s of wait) and surfaces final delivery failures as structured system events so agents get a signal when a message is permanently undelivered. The changes are targeted and well-tested.

Key observations:

  • The retry constants (SEND_RETRY_MAX_ATTEMPTS, SEND_RETRY_BACKOFF_MS) are a clean extraction and align with the new test expectations.
  • vi.clearAllMocks() in beforeEach for deliver-reply.test.ts is a welcome hygiene fix that prevents cross-test mock state leakage.
  • The enqueueSystemEvent call inside onError fires unconditionally for all info.kind values ("tool", "block", "final"). The PR description and the new test only target the "final" case; if the dispatcher invokes onError for non-final kinds (e.g., internal errors), misleading system events would be emitted. A guard on info.kind === "final" would align runtime behaviour with stated intent.

Confidence Score: 3/5

  • Mostly safe to merge; one logic gap where enqueueSystemEvent fires for all error kinds rather than only final delivery failures as the PR intends.
  • The retry window changes are correct and well-tested. The system event plumbing is valuable, but calling enqueueSystemEvent unconditionally inside onError — without a guard for info.kind === "final" — could emit confusing or spurious events if the dispatcher invokes onError for non-final kinds. The test coverage only validates the final case, so this gap is unverified by the test suite.
  • src/web/auto-reply/monitor/process-message.ts — the onError handler should guard enqueueSystemEvent to info.kind === "final" only.

Comments Outside Diff (1)

  1. src/web/auto-reply/monitor/process-message.ts, line 440-453 (link)

    enqueueSystemEvent fires for all error kinds, not just "final"

    The onError callback can be invoked for info.kind === "tool", "block", or "final", but enqueueSystemEvent is called unconditionally for all three. The deliver function early-returns for non-final kinds (no network operation is attempted), so a delivery failure for "tool" or "block" cannot happen through the deliver path — but the dispatcher may invoke onError for internal/model-level errors independently of deliver, which would produce a misleading system event like "WhatsApp block update delivery failed to +1555: ...".

    The PR description says "enqueue a system event when WhatsApp final delivery fails", and the new test only exercises { kind: "final" }. Consider adding a guard so enqueueSystemEvent is only called when info.kind === "final", bringing the runtime behaviour in line with the stated intent and the test coverage.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/web/auto-reply/monitor/process-message.ts
Line: 440-453

Comment:
**`enqueueSystemEvent` fires for all error kinds, not just `"final"`**

The `onError` callback can be invoked for `info.kind === "tool"`, `"block"`, or `"final"`, but `enqueueSystemEvent` is called unconditionally for all three. The `deliver` function early-returns for non-final kinds (no network operation is attempted), so a delivery failure for `"tool"` or `"block"` cannot happen through the `deliver` path — but the dispatcher may invoke `onError` for internal/model-level errors independently of `deliver`, which would produce a misleading system event like `"WhatsApp block update delivery failed to +1555: ..."`.

The PR description says *"enqueue a system event when WhatsApp **final** delivery fails"*, and the new test only exercises `{ kind: "final" }`. Consider adding a guard so `enqueueSystemEvent` is only called when `info.kind === "final"`, bringing the runtime behaviour in line with the stated intent and the test coverage.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 91a9a55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: whatsapp-web Channel integration: whatsapp-web size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WhatsApp: Messages silently dropped during reconnection window

1 participant