Skip to content

[Bug]: Telegram messages silently lost across all streamMode settings (off/partial/block) — tool error dispatch race + draft cleanup #19001

@mudrii

Description

@mudrii

Summary

Telegram outbound messages are silently lost across all three streamMode settings (off, partial, block). The bug is most severe when tool calls (especially exec) fail — the error notification and/or the agent's actual response vanish from chat. Users see messages appear briefly then disappear, or never appear at all.

This extends the scope of #18244 significantly: the problem is not limited to draft stream cleanup and affects streamMode: "off" where no draft stream exists.

Environment

  • OpenClaw version: 2026.2.15
  • OS: macOS 26.3
  • Channel: Telegram (polling mode, group chat)
  • Node: v25.6.1
  • Config: 3 agents (main, work, group), maxConcurrent: 6

Systematic Test Results

We conducted controlled testing across all 3 stream modes with identical test protocols: send messages, trigger exec errors, send follow-up messages, compare screenshots against session history.

Test Protocol

  1. Send a message before exec error (marker: test-001)
  2. Trigger deliberate exec failure (cd /nonexistent)
  3. Send a message after exec error (marker: test-002)
  4. Trigger more exec failures
  5. Send analysis/summary message
  6. Compare Telegram screenshots against agent session history

Results by Mode

streamMode: "off" — Messages survive, but exec error notifications can vanish

Sent Visible in Telegram Status
off-mode-clean-test-001 ✅ Yes Survived
⚠️ Exec error (cd /fake/directory) Missing Vanished
off-mode-clean-test-002 ✅ Yes Survived
⚠️ Exec error (kubectl get pods) ✅ Yes Survived
off-mode-clean-test-003 ✅ Yes Survived

Pattern: Response messages survive. Exec error notifications can silently vanish (1 of 2 lost). No draft stream exists in off mode — this is a different code path.

streamMode: "partial" — Massive message loss with exec errors

Sent Visible in Telegram Status
Response with test markers (test-001, test-002) Missing Vanished
⚠️ Exec error (cd /nonexistent) ✅ Yes Survived
Replication analysis response Missing Vanished
⚠️ Exec error (cat /nonexistent) ✅ Yes Survived
⚠️ Exec error (python3 /nonexistent) Missing Vanished
⚠️ Exec error (docker ps) Missing Vanished
⚠️ Exec error (git -C /fake-repo) ✅ Yes Survived
Stress test summary response Missing Vanished
Analysis responses (multiple) Missing Vanished
Short acknowledgments Missing Vanished
PR CI results (no exec error context) ✅ Yes Survived

Pattern: ~60% of messages lost. Both response messages AND exec error notifications vanish. Exec error notifications: 3/5 survived, 2/5 vanished. Most agent responses after exec errors are gone.

streamMode: "block" — Long responses after exec errors vanish

Sent Visible in Telegram Status
Subagent spawn + announcement ✅ Yes Survived
Subagent "test-block-001" result ✅ Yes Survived
Short analysis responses ✅ Yes Survived
⚠️ Exec error (git push fail) ✅ Yes Survived
Long CI analysis (~2000 chars) after exec error Missing Vanished
Follow-up responses ✅ Yes Survived

Pattern: Long responses immediately following exec errors vanish. Short responses and subagent announcements survive.

Additional Finding: Gateway Restart Message Loss

4 inbound user messages were lost during SIGUSR1 config-patch restarts. Messages sent during the ~5s restart window are silently dropped — no delivery recovery (delivery-recovery: 0 recovered).

Root Cause Analysis

Cause 1: Draft stream finally block deletes preview with undelivered content (partial/block)

In bot-message-dispatch.ts, the finally block:

if (!finalizedViaPreviewMessage) await draftStream?.clear();

When tool errors arrive as isError payloads, they don't finalize the preview → clear() deletes the draft containing the agent's streamed text.

Confirmed at: src/telegram/bot-message-dispatch.ts:421 (v2026.2.15 source line 55556 in minified bundle)

Cause 2: disableBlockStreaming is undefined when streamMode === "off"

When streamMode === "off", draftStream is undefined, so draftStream?.disableBlockStreaming evaluates to undefined (falsy but not true). Code paths that check if (disableBlockStreaming) don't trigger, allowing block streaming logic to run even in off mode.

Cause 3: Exec error notification races with response dispatch

The ⚠️ exec error notification and the agent's actual response go through separate dispatch paths. When both are dispatched rapidly:

  • In partial mode: both create draft previews, cleanup of one interferes with the other
  • In block mode: error notification displaces the response in the delivery queue
  • In off mode: rapid sendMessage calls may hit undocumented Telegram rate limits or internal queue conflicts

Cause 4: Unknown off mode loss (no draft stream involved)

In off mode, draftStream is undefined — none of the draft cleanup logic applies. Yet 1 of 2 exec error notifications vanished. This points to:

  • Telegram Bot API silently dropping messages (no 429/retry_after in logs)
  • Internal dispatch queue with drop policy
  • sendMessage call failing silently (no error in gateway.err.log)

Cause 5: SIGUSR1 restart drops inbound

Config-patch triggers SIGUSR1 → gateway restarts → Telegram polling stops → messages sent during restart window are lost. delivery-recovery finds 0 messages to recover.

Log Evidence

  • gateway.err.log: No 429/flood/rate_limit errors from Telegram API
  • gateway.err.log: Suppressed AbortError events correlate with config reloads
  • gateway.log: delivery-recovery: 0 recovered, 0 failed — system thinks all messages were delivered
  • gateway.err.log: All 5 test exec failures logged, but no corresponding dispatch failures

Related Issues

Related PRs — Why None Fully Fix This

There are 6 open PRs touching Telegram message dispatch, but none covers all the failure modes we documented:

#18678 — Preserve draft when all finals are errors

  • What it fixes: Partial mode — prevents clear() from deleting drafts when only error payloads arrive
  • What it misses: ❌ Does NOT fix off mode loss (no draft exists). ❌ Does NOT fix block mode loss. ❌ Creates orphan message risk (drafts never cleaned up). ❌ Does not address the exec error race condition.

#18909 — Suppress recovered tool failure warnings

  • What it fixes: Partial/block — suppresses exec/bash error warnings when user already saw the streamed reply
  • What it misses: ❌ Does NOT fix off mode loss. ❌ Only handles exec/bash tool errors, not other tool types. ❌ Does not prevent the underlying race — just hides it.

#17953 — Block reply delivery tracking + duplicate prevention

  • What it fixes: Block mode — adds delivery state tracking and duplicate message prevention. Most comprehensive of the three.
  • What it misses: ❌ Does NOT fix the error-overwrite problem (Cause 1). ❌ Does NOT fix off mode loss (Cause 4). ❌ Does not decouple error notifications from response dispatch.

#17766 — Fix duplicate resend/delete in partial streaming

  • What it fixes: Partial mode — tracks lastAppliedText() to short-circuit unchanged final edits, preventing the failover/duplicate behavior triggered by Telegram's message is not modified rejection.
  • What it misses: ❌ Fixes duplicates, not message loss. ❌ Does NOT fix off or block mode loss. ❌ Does not address exec error race conditions. ❌ Complementary to ac2ede5bb (reactive error handling), not a root cause fix.

#16633 — Keep block-stream replies across assistant messages

  • What it fixes: Block mode — adds Telegram-scoped block reply break override so blockStreaming: true uses message_end boundaries. Fixes Telegram channel with blockStreaming: true drops messages #16604 (intermediate assistant messages not delivered).
  • What it misses: ❌ Addresses message boundary handling, not message loss from error dispatch. ❌ Does NOT fix off or partial mode. ❌ Does not address draft cleanup or error notification racing.

#17252 — Skip message_thread_id for private chats

  • What it fixes: Private DM chats — prevents silent message drops caused by invalid message_thread_id in non-forum private chats.
  • What it misses: ❌ Only fixes private chats. ❌ Our bug is in group chats. ❌ Completely different root cause (thread ID vs dispatch pipeline).

Gap Analysis

Root Cause #18678 #18909 #17953 #17766 #16633 #17252
1. Draft finally deletes preview ⚠️ Partial
2. disableBlockStreaming undefined in off mode
3. Error notification races with response ⚠️ Hides
4. off mode silent loss
5. SIGUSR1 restart drops inbound

No single PR — or even all 6 combined — covers all 5 root causes. Specifically:

What a Complete Fix Needs

  1. Decouple error notification from response dispatch — error notifications should not go through the same draft stream / delivery pipeline as the agent's response
  2. Fix the finally block — only clear() when deliveryState.delivered is true AND no content exists in the preview
  3. Force disableBlockStreaming = true when streamMode === "off" — explicit boolean, not relying on undefined-is-falsy
  4. Investigate and fix off mode dispatch — add logging to sendMessage calls to capture silent failures; add retry logic
  5. Queue inbound messages during SIGUSR1 restart — buffer Telegram updates during the restart window
  6. Add delivery confirmation logging — log every sendMessage/editMessage/deleteMessage call with message IDs for debugging

Expected Behavior

  1. Agent's conversational response should always be delivered to Telegram
  2. Tool error notifications should appear as separate messages, never overwriting the response
  3. Both error notifications and responses should survive regardless of streamMode
  4. Messages sent during gateway restart should be queued and delivered after restart
  5. delivery-recovery should detect and retry failed sends

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions