Telegram long-polling silently dies on Android/Termux when multi-agent gateway is under load

## Environment
- **OpenClaw version:** 2026.2.26 (bc50708)
- **OS:** Android 15 (arm64) via Termux
- **Device:** Samsung Galaxy Z Fold 7
- **Node:** v24.13.0
- **Model:** anthropic/claude-opus-4-6
- **Setup:** Single gateway, 2 agents (main + secondary), Telegram + WhatsApp channels

## Problem
Telegram long-polling for one agent silently stops receiving messages while the other agent on the same gateway continues to work perfectly. WhatsApp channel for the affected agent also continues to work.

## Steps to Reproduce
1. Configure gateway with 2 agents, each bound to a separate Telegram bot account
2. Agent "main" handles: Telegram DMs, Telegram groups, WhatsApp DMs, WhatsApp groups, heartbeats (5m), cron jobs
3. Agent "secondary" handles: Telegram DMs only, heartbeats (2m)
4. Send messages to both agents via Telegram
5. After some time (30min-4hrs), agent "main" stops receiving Telegram DMs
6. Agent "secondary" continues to respond normally on the same gateway
7. Agent "main" continues to respond via WhatsApp — only Telegram polling is dead

## Observed Behavior
- Telegram shows double blue checkmarks (delivered) but the gateway never receives the message
- No error in logs — the polling just silently stops
- Gateway logs show no `lane enqueue` entries for the affected agent's Telegram DMs after the polling dies
- Heartbeats via Telegram for the affected agent also stop (confirming the polling connection is dead)
- The only fix is a full gateway restart (`kill -9` + restart)

## Additional Context

### SIGUSR1 graceful restart fails on Android
When the health monitor detects Telegram is unresponsive and sends SIGUSR1:
```
Gateway is draining for restart; new tasks are not accepted
shutdown timed out; exiting without full cleanup
```
The graceful shutdown always times out on Android/Termux, killing the gateway uncleanly. This leaves stale session locks.

### Compaction hangs on large sessions
When sessions grow large (1MB+), the compaction process hangs indefinitely:
```
embedded run compaction start → (never completes)
```
This blocks the agent from responding to any messages. Moving the session file and restarting resolves it temporarily, but sessions grow back quickly with active usage.

### Network change crash (separate but related)
When the Android device changes WiFi networks, the mDNS/Bonjour module crashes the entire gateway:
```
ERR_ASSERTION: Reached illegal state! IPV4 address change from defined to undefined!
```
This is from the `ciao` module and kills the gateway process entirely.

### Session accumulation
Heartbeats every 2-5 minutes create new session files over time. After ~20 days, 407+ orphan session files accumulated, degrading performance.

## Expected Behavior
1. Telegram polling should auto-reconnect when the connection drops
2. SIGUSR1 restart should complete cleanly on Android/Termux (or have a fallback)
3. Compaction should have a timeout and not block the agent indefinitely
4. Network/IP changes should not crash the gateway
5. Health monitor should detect dead Telegram polling and reconnect the channel without restarting the entire gateway

## Workaround
- `kill -9 <PID>` and manually restart the gateway
- Periodically clean session files to prevent accumulation
- Move large session `.jsonl` files to backup before restart

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Telegram long-polling silently dies on Android/Termux when multi-agent gateway is under load #32048

Environment

Problem

Steps to Reproduce

Observed Behavior

Additional Context

SIGUSR1 graceful restart fails on Android

Compaction hangs on large sessions

Network change crash (separate but related)

Session accumulation

Expected Behavior

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Telegram long-polling silently dies on Android/Termux when multi-agent gateway is under load #32048

Description

Environment

Problem

Steps to Reproduce

Observed Behavior

Additional Context

SIGUSR1 graceful restart fails on Android

Compaction hangs on large sessions

Network change crash (separate but related)

Session accumulation

Expected Behavior

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions