-
-
Notifications
You must be signed in to change notification settings - Fork 69.4k
Telegram long-polling silently dies on Android/Termux when multi-agent gateway is under load #32048
Description
Environment
- OpenClaw version: 2026.2.26 (bc50708)
- OS: Android 15 (arm64) via Termux
- Device: Samsung Galaxy Z Fold 7
- Node: v24.13.0
- Model: anthropic/claude-opus-4-6
- Setup: Single gateway, 2 agents (main + secondary), Telegram + WhatsApp channels
Problem
Telegram long-polling for one agent silently stops receiving messages while the other agent on the same gateway continues to work perfectly. WhatsApp channel for the affected agent also continues to work.
Steps to Reproduce
- Configure gateway with 2 agents, each bound to a separate Telegram bot account
- Agent "main" handles: Telegram DMs, Telegram groups, WhatsApp DMs, WhatsApp groups, heartbeats (5m), cron jobs
- Agent "secondary" handles: Telegram DMs only, heartbeats (2m)
- Send messages to both agents via Telegram
- After some time (30min-4hrs), agent "main" stops receiving Telegram DMs
- Agent "secondary" continues to respond normally on the same gateway
- Agent "main" continues to respond via WhatsApp — only Telegram polling is dead
Observed Behavior
- Telegram shows double blue checkmarks (delivered) but the gateway never receives the message
- No error in logs — the polling just silently stops
- Gateway logs show no
lane enqueueentries for the affected agent's Telegram DMs after the polling dies - Heartbeats via Telegram for the affected agent also stop (confirming the polling connection is dead)
- The only fix is a full gateway restart (
kill -9+ restart)
Additional Context
SIGUSR1 graceful restart fails on Android
When the health monitor detects Telegram is unresponsive and sends SIGUSR1:
Gateway is draining for restart; new tasks are not accepted
shutdown timed out; exiting without full cleanup
The graceful shutdown always times out on Android/Termux, killing the gateway uncleanly. This leaves stale session locks.
Compaction hangs on large sessions
When sessions grow large (1MB+), the compaction process hangs indefinitely:
embedded run compaction start → (never completes)
This blocks the agent from responding to any messages. Moving the session file and restarting resolves it temporarily, but sessions grow back quickly with active usage.
Network change crash (separate but related)
When the Android device changes WiFi networks, the mDNS/Bonjour module crashes the entire gateway:
ERR_ASSERTION: Reached illegal state! IPV4 address change from defined to undefined!
This is from the ciao module and kills the gateway process entirely.
Session accumulation
Heartbeats every 2-5 minutes create new session files over time. After ~20 days, 407+ orphan session files accumulated, degrading performance.
Expected Behavior
- Telegram polling should auto-reconnect when the connection drops
- SIGUSR1 restart should complete cleanly on Android/Termux (or have a fallback)
- Compaction should have a timeout and not block the agent indefinitely
- Network/IP changes should not crash the gateway
- Health monitor should detect dead Telegram polling and reconnect the channel without restarting the entire gateway
Workaround
kill -9 <PID>and manually restart the gateway- Periodically clean session files to prevent accumulation
- Move large session
.jsonlfiles to backup before restart