Skip to content

Telegram long-polling silently dies on Android/Termux when multi-agent gateway is under load #32048

@eros-plusval

Description

@eros-plusval

Environment

  • OpenClaw version: 2026.2.26 (bc50708)
  • OS: Android 15 (arm64) via Termux
  • Device: Samsung Galaxy Z Fold 7
  • Node: v24.13.0
  • Model: anthropic/claude-opus-4-6
  • Setup: Single gateway, 2 agents (main + secondary), Telegram + WhatsApp channels

Problem

Telegram long-polling for one agent silently stops receiving messages while the other agent on the same gateway continues to work perfectly. WhatsApp channel for the affected agent also continues to work.

Steps to Reproduce

  1. Configure gateway with 2 agents, each bound to a separate Telegram bot account
  2. Agent "main" handles: Telegram DMs, Telegram groups, WhatsApp DMs, WhatsApp groups, heartbeats (5m), cron jobs
  3. Agent "secondary" handles: Telegram DMs only, heartbeats (2m)
  4. Send messages to both agents via Telegram
  5. After some time (30min-4hrs), agent "main" stops receiving Telegram DMs
  6. Agent "secondary" continues to respond normally on the same gateway
  7. Agent "main" continues to respond via WhatsApp — only Telegram polling is dead

Observed Behavior

  • Telegram shows double blue checkmarks (delivered) but the gateway never receives the message
  • No error in logs — the polling just silently stops
  • Gateway logs show no lane enqueue entries for the affected agent's Telegram DMs after the polling dies
  • Heartbeats via Telegram for the affected agent also stop (confirming the polling connection is dead)
  • The only fix is a full gateway restart (kill -9 + restart)

Additional Context

SIGUSR1 graceful restart fails on Android

When the health monitor detects Telegram is unresponsive and sends SIGUSR1:

Gateway is draining for restart; new tasks are not accepted
shutdown timed out; exiting without full cleanup

The graceful shutdown always times out on Android/Termux, killing the gateway uncleanly. This leaves stale session locks.

Compaction hangs on large sessions

When sessions grow large (1MB+), the compaction process hangs indefinitely:

embedded run compaction start → (never completes)

This blocks the agent from responding to any messages. Moving the session file and restarting resolves it temporarily, but sessions grow back quickly with active usage.

Network change crash (separate but related)

When the Android device changes WiFi networks, the mDNS/Bonjour module crashes the entire gateway:

ERR_ASSERTION: Reached illegal state! IPV4 address change from defined to undefined!

This is from the ciao module and kills the gateway process entirely.

Session accumulation

Heartbeats every 2-5 minutes create new session files over time. After ~20 days, 407+ orphan session files accumulated, degrading performance.

Expected Behavior

  1. Telegram polling should auto-reconnect when the connection drops
  2. SIGUSR1 restart should complete cleanly on Android/Termux (or have a fallback)
  3. Compaction should have a timeout and not block the agent indefinitely
  4. Network/IP changes should not crash the gateway
  5. Health monitor should detect dead Telegram polling and reconnect the channel without restarting the entire gateway

Workaround

  • kill -9 <PID> and manually restart the gateway
  • Periodically clean session files to prevent accumulation
  • Move large session .jsonl files to backup before restart

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions