-
-
Notifications
You must be signed in to change notification settings - Fork 69.8k
[Bug]: Telegram messages silently lost across all streamMode settings (off/partial/block) — tool error dispatch race + draft cleanup #19001
Description
Summary
Telegram outbound messages are silently lost across all three streamMode settings (off, partial, block). The bug is most severe when tool calls (especially exec) fail — the error notification and/or the agent's actual response vanish from chat. Users see messages appear briefly then disappear, or never appear at all.
This extends the scope of #18244 significantly: the problem is not limited to draft stream cleanup and affects streamMode: "off" where no draft stream exists.
Environment
- OpenClaw version: 2026.2.15
- OS: macOS 26.3
- Channel: Telegram (polling mode, group chat)
- Node: v25.6.1
- Config: 3 agents (main, work, group),
maxConcurrent: 6
Systematic Test Results
We conducted controlled testing across all 3 stream modes with identical test protocols: send messages, trigger exec errors, send follow-up messages, compare screenshots against session history.
Test Protocol
- Send a message before exec error (marker:
test-001) - Trigger deliberate exec failure (
cd /nonexistent) - Send a message after exec error (marker:
test-002) - Trigger more exec failures
- Send analysis/summary message
- Compare Telegram screenshots against agent session history
Results by Mode
streamMode: "off" — Messages survive, but exec error notifications can vanish
| Sent | Visible in Telegram | Status |
|---|---|---|
| off-mode-clean-test-001 | ✅ Yes | Survived |
cd /fake/directory) |
❌ Missing | Vanished |
| off-mode-clean-test-002 | ✅ Yes | Survived |
kubectl get pods) |
✅ Yes | Survived |
| off-mode-clean-test-003 | ✅ Yes | Survived |
Pattern: Response messages survive. Exec error notifications can silently vanish (1 of 2 lost). No draft stream exists in off mode — this is a different code path.
streamMode: "partial" — Massive message loss with exec errors
| Sent | Visible in Telegram | Status |
|---|---|---|
| Response with test markers (test-001, test-002) | ❌ Missing | Vanished |
cd /nonexistent) |
✅ Yes | Survived |
| Replication analysis response | ❌ Missing | Vanished |
cat /nonexistent) |
✅ Yes | Survived |
python3 /nonexistent) |
❌ Missing | Vanished |
docker ps) |
❌ Missing | Vanished |
git -C /fake-repo) |
✅ Yes | Survived |
| Stress test summary response | ❌ Missing | Vanished |
| Analysis responses (multiple) | ❌ Missing | Vanished |
| Short acknowledgments | ❌ Missing | Vanished |
| PR CI results (no exec error context) | ✅ Yes | Survived |
Pattern: ~60% of messages lost. Both response messages AND exec error notifications vanish. Exec error notifications: 3/5 survived, 2/5 vanished. Most agent responses after exec errors are gone.
streamMode: "block" — Long responses after exec errors vanish
| Sent | Visible in Telegram | Status |
|---|---|---|
| Subagent spawn + announcement | ✅ Yes | Survived |
| Subagent "test-block-001" result | ✅ Yes | Survived |
| Short analysis responses | ✅ Yes | Survived |
| ✅ Yes | Survived | |
| Long CI analysis (~2000 chars) after exec error | ❌ Missing | Vanished |
| Follow-up responses | ✅ Yes | Survived |
Pattern: Long responses immediately following exec errors vanish. Short responses and subagent announcements survive.
Additional Finding: Gateway Restart Message Loss
4 inbound user messages were lost during SIGUSR1 config-patch restarts. Messages sent during the ~5s restart window are silently dropped — no delivery recovery (delivery-recovery: 0 recovered).
Root Cause Analysis
Cause 1: Draft stream finally block deletes preview with undelivered content (partial/block)
In bot-message-dispatch.ts, the finally block:
if (!finalizedViaPreviewMessage) await draftStream?.clear();When tool errors arrive as isError payloads, they don't finalize the preview → clear() deletes the draft containing the agent's streamed text.
Confirmed at: src/telegram/bot-message-dispatch.ts:421 (v2026.2.15 source line 55556 in minified bundle)
Cause 2: disableBlockStreaming is undefined when streamMode === "off"
When streamMode === "off", draftStream is undefined, so draftStream?.disableBlockStreaming evaluates to undefined (falsy but not true). Code paths that check if (disableBlockStreaming) don't trigger, allowing block streaming logic to run even in off mode.
Cause 3: Exec error notification races with response dispatch
The
- In
partialmode: both create draft previews, cleanup of one interferes with the other - In
blockmode: error notification displaces the response in the delivery queue - In
offmode: rapidsendMessagecalls may hit undocumented Telegram rate limits or internal queue conflicts
Cause 4: Unknown off mode loss (no draft stream involved)
In off mode, draftStream is undefined — none of the draft cleanup logic applies. Yet 1 of 2 exec error notifications vanished. This points to:
- Telegram Bot API silently dropping messages (no 429/retry_after in logs)
- Internal dispatch queue with drop policy
sendMessagecall failing silently (no error ingateway.err.log)
Cause 5: SIGUSR1 restart drops inbound
Config-patch triggers SIGUSR1 → gateway restarts → Telegram polling stops → messages sent during restart window are lost. delivery-recovery finds 0 messages to recover.
Log Evidence
gateway.err.log: No 429/flood/rate_limit errors from Telegram APIgateway.err.log:Suppressed AbortErrorevents correlate with config reloadsgateway.log:delivery-recovery: 0 recovered, 0 failed— system thinks all messages were deliveredgateway.err.log: All 5 test exec failures logged, but no corresponding dispatch failures
Related Issues
- [Bug]:Tool errors overwrite conversational responses in Telegram (2026.2.15) #18244 — Tool errors overwrite conversational responses (same root cause, narrower scope)
- Draft streaming conflicts with message tool sends in same reply cycle #8691 — Draft stream message deletion
- Telegram channel with blockStreaming: true drops messages #16604 — Message loss in streaming mode
- Bug: Telegram messages appear as edited instead of new when streamMode is partial/block #17668 — Telegram message delivery reliability
- streamMode 'partial' silently drops all message delivery on long multi-tool runs #18195 — Silent message loss
- [Bug]: Telegram streaming preview duplication with Anthropic-compatible providers (MiniMax) #18859 — Draft cleanup issues
- Message runs interrupted by network errors are not retried, causing silent message loss #9208 — Message runs interrupted by network errors not retried
- Session Amnesia: Orphan detection deletes legitimate user messages #12029 — Session amnesia / orphan detection deletes messages
Related PRs — Why None Fully Fix This
There are 6 open PRs touching Telegram message dispatch, but none covers all the failure modes we documented:
#18678 — Preserve draft when all finals are errors
- What it fixes: Partial mode — prevents
clear()from deleting drafts when only error payloads arrive - What it misses: ❌ Does NOT fix
offmode loss (no draft exists). ❌ Does NOT fixblockmode loss. ❌ Creates orphan message risk (drafts never cleaned up). ❌ Does not address the exec error race condition.
#18909 — Suppress recovered tool failure warnings
- What it fixes: Partial/block — suppresses exec/bash error warnings when user already saw the streamed reply
- What it misses: ❌ Does NOT fix
offmode loss. ❌ Only handles exec/bash tool errors, not other tool types. ❌ Does not prevent the underlying race — just hides it.
#17953 — Block reply delivery tracking + duplicate prevention
- What it fixes: Block mode — adds delivery state tracking and duplicate message prevention. Most comprehensive of the three.
- What it misses: ❌ Does NOT fix the error-overwrite problem (Cause 1). ❌ Does NOT fix
offmode loss (Cause 4). ❌ Does not decouple error notifications from response dispatch.
#17766 — Fix duplicate resend/delete in partial streaming
- What it fixes: Partial mode — tracks
lastAppliedText()to short-circuit unchanged final edits, preventing the failover/duplicate behavior triggered by Telegram'smessage is not modifiedrejection. - What it misses: ❌ Fixes duplicates, not message loss. ❌ Does NOT fix
offorblockmode loss. ❌ Does not address exec error race conditions. ❌ Complementary toac2ede5bb(reactive error handling), not a root cause fix.
#16633 — Keep block-stream replies across assistant messages
- What it fixes: Block mode — adds Telegram-scoped block reply break override so
blockStreaming: trueusesmessage_endboundaries. Fixes Telegram channel with blockStreaming: true drops messages #16604 (intermediate assistant messages not delivered). - What it misses: ❌ Addresses message boundary handling, not message loss from error dispatch. ❌ Does NOT fix
offorpartialmode. ❌ Does not address draft cleanup or error notification racing.
#17252 — Skip message_thread_id for private chats
- What it fixes: Private DM chats — prevents silent message drops caused by invalid
message_thread_idin non-forum private chats. - What it misses: ❌ Only fixes private chats. ❌ Our bug is in group chats. ❌ Completely different root cause (thread ID vs dispatch pipeline).
Gap Analysis
| Root Cause | #18678 | #18909 | #17953 | #17766 | #16633 | #17252 |
|---|---|---|---|---|---|---|
1. Draft finally deletes preview |
❌ | ❌ | ❌ | ❌ | ❌ | |
2. disableBlockStreaming undefined in off mode |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| 3. Error notification races with response | ❌ | ❌ | ❌ | ❌ | ❌ | |
4. off mode silent loss |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| 5. SIGUSR1 restart drops inbound | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
No single PR — or even all 6 combined — covers all 5 root causes. Specifically:
- Root cause 2 (
disableBlockStreamingundefined) is unaddressed by any PR - Root cause 4 (
offmode silent loss) is unaddressed by any PR - Root cause 5 (SIGUSR1 restart) is unaddressed by any PR
- The exec error race condition (cause 3) is only superficially addressed by fix(telegram): prevent recovered tool failures from replacing final streamed replies #18909 (which hides the symptom rather than fixing the dispatch pipeline)
What a Complete Fix Needs
- Decouple error notification from response dispatch — error notifications should not go through the same draft stream / delivery pipeline as the agent's response
- Fix the
finallyblock — onlyclear()whendeliveryState.deliveredis true AND no content exists in the preview - Force
disableBlockStreaming = truewhenstreamMode === "off"— explicit boolean, not relying on undefined-is-falsy - Investigate and fix
offmode dispatch — add logging tosendMessagecalls to capture silent failures; add retry logic - Queue inbound messages during SIGUSR1 restart — buffer Telegram updates during the restart window
- Add delivery confirmation logging — log every
sendMessage/editMessage/deleteMessagecall with message IDs for debugging
Expected Behavior
- Agent's conversational response should always be delivered to Telegram
- Tool error notifications should appear as separate messages, never overwriting the response
- Both error notifications and responses should survive regardless of
streamMode - Messages sent during gateway restart should be queued and delivered after restart
delivery-recoveryshould detect and retry failed sends