-
-
Notifications
You must be signed in to change notification settings - Fork 69.4k
[Bug]: health-monitor stale-socket breaks webhook mode with self-signed certs #39303
Description
Bug type
Regression (worked before, now fails)
Summary
Version: 2026.3.2 (stable), Node v22.22.1, arm64
Setup: 3 Telegram bots using webhook mode with self-signed cert (direct IP, nginx reverse proxy on port 8443)
Problem: health-monitor fires stale-socket every ~35 min even in webhook mode (no long-polling socket exists). When it restarts channels, it calls setWebhook WITHOUT re-uploading the self-signed certificate. Telegram then can't verify SSL → stops delivering webhooks → all bots go dead.
Logs (49 events in 7 hours):
[health-monitor] [telegram:default] restarting (reason: stale-socket)
[health-monitor] [telegram:dev] restarting (reason: stale-socket)
[health-monitor] [telegram:finance] restarting (reason: stale-socket)
[health-monitor] [whatsapp:default] restarting (reason: stale-socket)
(repeats every ~35 min)
After each restart: getWebhookInfo shows has_custom_certificate: false
Workaround: Cron job every 1 min checks has_custom_certificate and re-registers webhook with cert if needed.
Suggested fix (any of these):
- Skip stale-socket detection in webhook mode (no socket to go stale)
- Persist + re-upload self-signed cert on channel restart
- Add config option webhookCertPath so cert is auto-included in every setWebhook call
Steps to reproduce
- Configure 3 Telegram bot accounts with webhook mode using self-signed certificate (direct IP, no domain)
- Set webhookUrl, webhookPort, webhookPath per account in openclaw.json
- Manually register webhooks via Telegram Bot API setWebhook with certificate parameter (self-signed PEM)
- Wait ~35 minutes
Expected behavior
In webhook mode, health-monitor should not perform stale-socket detection (there is no long-polling socket). If it must restart channels, it should re-upload the self-signed certificate when calling setWebhook.
Actual behavior
health-monitor detects "stale-socket" every ~35 min even in webhook mode. When it restarts channels, it calls setWebhook WITHOUT re-uploading the self-signed certificate. Telegram then reports has_custom_certificate: false, SSL verification fails, and all webhook deliveries stop. All bots become unresponsive.
49 stale-socket events observed in 7 hours:
[health-monitor] [telegram:default] restarting (reason: stale-socket)
[health-monitor] [telegram:dev] restarting (reason: stale-socket)
[health-monitor] [telegram:finance] restarting (reason: stale-socket)
[health-monitor] [whatsapp:default] restarting (reason: stale-socket)
(repeats every ~35 min)
OpenClaw version
2026.3.2 (build 85377a2)
Operating system
Ubuntu 24.04 arm64 (AWS EC2 m7g.2xlarge)
Install method
npm global
Logs, screenshots, and evidence
# health-monitor fires stale-socket every ~35 min in webhook mode (49 events in 7 hours):
Mar 07 17:30:52 [health-monitor] [telegram:default] restarting (reason: stale-socket)
Mar 07 17:30:52 [health-monitor] [telegram:dev] restarting (reason: stale-socket)
Mar 07 17:30:52 [health-monitor] [telegram:finance] restarting (reason: stale-socket)
Mar 07 17:30:52 [health-monitor] [whatsapp:default] restarting (reason: stale-socket)
Mar 07 18:05:52 [health-monitor] [telegram:default] restarting (reason: stale-socket)
Mar 07 18:40:52 [health-monitor] [telegram:default] restarting (reason: stale-socket)
Mar 07 19:15:52 [health-monitor] [telegram:default] restarting (reason: stale-socket)
... (continues every ~35 min)
# After each restart, Telegram webhook cert is lost:
# getWebhookInfo shows has_custom_certificate: false
# Telegram stops delivering webhooks → all bots unresponsive
# Workaround: cron job every 1 min re-registers webhook with certImpact and severity
Affected: All Telegram webhook users with self-signed certificates (direct IP, no domain)
Severity: High (blocks all message delivery)
Frequency: 100% repro — triggers every ~35 minutes automatically
Consequence: All Telegram bots become completely unresponsive after each health-monitor restart. Users receive no replies. Messages sent during the dead window are lost. In our case, 49 outages in 7 hours across 3 bot accounts.
Additional information
First known bad version: 2026.3.2 (only version tested). The issue has two root causes: (1) stale-socket detection runs in webhook mode where no long-polling socket exists, and (2) channel restart calls setWebhook without re-uploading the self-signed certificate. Temporary workaround: a cron job running every 1 minute that checks getWebhookInfo for has_custom_certificate: false and re-registers the webhook with the cert file if needed. Suggested fix: add a webhookCertPath config option and include the cert in every setWebhook call, or skip stale-socket detection when webhook mode is active.