Skip to content

[Bug]: Telegram polling loop hangs indefinitely when downloading large files (no download timeout) #40074

@BTNG-studio

Description

@BTNG-studio

Bug type

Crash (process/app exits or hangs)

Summary

When a large file (photo or document) is sent via Telegram,
the inbound polling loop hangs indefinitely and stops
processing all subsequent messages. The gateway process
stays alive but Billy (or any Telegram bot account) becomes
unresponsive until a full gateway restart.

Root cause (traced in source):
In fetchRemoteMedia → readResponseWithLimit, the response
body is read via a streaming reader.read() loop with no
timeout or AbortController signal. If Telegram's file
download stalls mid-stream (e.g., large file, network
congestion), the loop never resolves and permanently blocks
the Telegram polling loop for that account.

The overflow (maxBytes) check works correctly when
Content-Length is present — but Telegram does not always
send Content-Length for large files, so the size check is
skipped and the blocking download begins.

Relevant code: local-roots-ClyFgL1E.js →
readResponseWithLimit and fetchRemoteMedia

Expected behavior: The download should be aborted after a
reasonable timeout (e.g., 30s) and the message skipped with
a warning — not block the polling loop.

Workaround: Set channels.telegram.mediaMaxMb to a low value
(e.g., 4) so files with a Content-Length header are
rejected before download. Does not help when Content-Length
is absent.

Suggested fix: Pass an AbortSignal with a timeout (e.g.,
AbortSignal.timeout(30_000)) into the fetchWithSsrFGuard
call inside fetchRemoteMedia.

Steps to reproduce

  1. Set up OpenClaw with a Telegram bot account (polling
    mode)
  2. Send a photo or file ≥5MB (the default mediaMaxMb) to
    the bot via Telegram — particularly as a document/file
    rather than a compressed photo, so Telegram does not send a
    Content-Length header
  3. Observe that the bot stops responding to all subsequent
    messages

Expected behavior

The file download should time out after a reasonable period
(e.g. 30s), the oversized/stalled message should be
skipped with a warning logged, and the polling loop should
continue processing subsequent messages normally. If
sendOversizeWarning is enabled, the user should receive a
reply like ⚠️ File too large. Maximum size is 5MB.

Actual behavior

The polling loop for the affected bot account hangs
indefinitely. All subsequent Telegram messages — including
plain text — are never processed. The gateway process
remains alive and healthy (no crash, no OOM), so neither
systemd nor the health monitor can detect the hang. The
health monitor's stale-socket check eventually fires (after
~300s) and restarts the socket layer, but the polling loop
remains blocked, so the restart does not recover the bot.
A full gateway restart (openclaw gateway restart) is
required to recover.

OpenClaw version

2026.3.7 (and 2026.3.2)

Operating system

macos

Install method

npm

Logs, screenshots, and evidence

Last successful message processed:
  2026-03-08T14:20:38.902Z [telegram] sendMessage ok
  chat=8473334963 message=2884

  Health monitor detected stale socket and attempted restart
  ~33 minutes later:
  2026-03-08T14:53:04.753Z [health-monitor] [telegram:billy]
  health-monitor: restarting (reason: stale-socket)
  2026-03-08T14:53:04.818Z [telegram] [billy] starting
  provider (@billybtng_bot)

  No further activity from the Billy account for the next 83
  minutes despite two plain-text "Ping?" messages being sent
  by the user. Telegram API confirmed the messages were
  queued but never consumed:
  Pending updates: 2
  update_id: 943947904  →  text: "Ping?"
  update_id: 943947905  →  text: "Ping"

  Update offset file (update-offset-billy.json) remained
  unchanged from 14:20:38 until manual gateway restart at
  16:17, confirming the polling loop was permanently blocked,
   not retrying.

Impact and severity

Severity: High

  • Complete loss of Telegram responsiveness — all bots on
    the affected account stop responding to every message type
    (text, commands, media) until a manual gateway restart
  • Silent failure — no error is surfaced to the user via
    Telegram, no alert is triggered, no log entry marks the
    hang; the only signal is absence of responses
  • Health monitor ineffective — the existing recovery
    mechanism (stale-socket restart) does not fix the
    condition, giving a false sense of resilience
  • Data loss — Telegram updates consumed during the hung
    polling cycle (in this case 3 updates: IDs
    943947901–943947903) are permanently lost from the queue
    and cannot be replayed after recovery
  • Reproducible with a single message — any user with access
    to the bot can inadvertently (or deliberately) trigger a
    full outage by sending one large file

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingbug:crashProcess/app exits unexpectedly or hangs

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions