-
-
Notifications
You must be signed in to change notification settings - Fork 69.6k
[Bug]: Webhook health endpoints always return 200 without checking dependencies #11803
Description
Summary
The Telegram webhook server and Nextcloud Talk webhook server both expose /healthz endpoints that unconditionally return 200 OK without checking any dependencies (bot connectivity, API reachability, message processing capability). When deployed behind an orchestrator (Kubernetes, Docker Swarm, Render), this prevents detection of unhealthy instances, causing user messages to be silently dropped or lost.
Executive Risk Snapshot
- CVSS v3.1: 7.5 (High)
- CVSS v4.0: 8.7 (High)
- Primary risk: The Telegram webhook server and Nextcloud Talk webhook server both expose
/healthzendpoints that unconditionally return200 OKwithout checking any dependencies (bot connectivity, API reachability, message processing capability).
Technical Analysis
Describe root cause, what breaks, reproduction details, and fix approach.
Affected Code
File: src/telegram/webhook.ts:54-58
const server = createServer((req, res) => {
if (req.url === healthPath) {
res.writeHead(200);
res.end("ok");
return;
}File: extensions/nextcloud-talk/src/monitor.ts:81-86
const server = createServer(async (req: IncomingMessage, res: ServerResponse) => {
if (req.url === HEALTH_PATH) {
res.writeHead(200, { "Content-Type": "text/plain" });
res.end("ok");
return;
}Contrast with the main gateway health endpoint (src/gateway/server-methods/health.ts:9-27), which properly checks dependencies via refreshHealthSnapshot() and returns error status when checks fail:
health: async ({ respond, context, params }) => {
const { getHealthCache, refreshHealthSnapshot, logHealth } = context;
const wantsProbe = params?.probe === true;
// ... caching logic ...
try {
const snap = await refreshHealthSnapshot({ probe: wantsProbe });
respond(true, snap, undefined);
} catch (err) {
respond(false, undefined, errorShape(ErrorCodes.UNAVAILABLE, formatForLog(err)));
}
},Steps to Reproduce
- Deploy the Telegram webhook server behind a Kubernetes pod with a liveness/readiness probe pointing to
/healthz:livenessProbe: httpGet: path: /healthz port: 8787 initialDelaySeconds: 10 periodSeconds: 15
- Revoke the Telegram bot token or block network access to
api.telegram.org. - Send a message to the bot via Telegram.
- Observe: the
/healthzendpoint continues returning200 OK, Kubernetes does not restart the pod, and the message is lost (thewebhookCallbackhandler fails silently or throws an unhandled error).
Recommended Fix
Add minimal dependency checks to the health endpoint. At minimum, verify the bot object is initialized and the last webhook processing did not produce a persistent error:
// Telegram webhook - src/telegram/webhook.ts
let lastError: string | null = null;
let lastSuccessTs = Date.now();
const HEALTH_STALE_MS = 5 * 60 * 1000; // 5 minutes
const server = createServer((req, res) => {
if (req.url === healthPath) {
const now = Date.now();
const isStale = now - lastSuccessTs > HEALTH_STALE_MS;
if (lastError || isStale) {
res.writeHead(503, { "Content-Type": "application/json" });
res.end(JSON.stringify({
status: "unhealthy",
error: lastError ?? "no successful webhook processing in last 5 minutes",
}));
return;
}
res.writeHead(200);
res.end("ok");
return;
}
// ... existing handler ...
// On success: lastSuccessTs = Date.now(); lastError = null;
// On error: lastError = errMsg;Apply the same pattern to the Nextcloud Talk webhook server.
Detailed Risk Analysis
CVSS Assessment
| Metric | v3.1 | v4.0 |
|---|---|---|
| Score | 7.5 / 10.0 | 8.7 / 10.0 |
| Severity | High | High |
| Vector | CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H | CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:H/SC:N/SI:N/SA:N |
Attack Surface
How is this reached?
- Network (HTTP/WebSocket endpoint, API call)
- Adjacent Network (same LAN, requires network proximity)
- Local (local file, CLI argument, environment variable)
- Physical (requires physical access to machine)
Authentication required?
- None (unauthenticated/public access)
- Low (any authenticated user)
- High (admin/privileged user only)
Entry point: GET /healthz on the Telegram webhook server (default port 8787) or Nextcloud Talk webhook server (default port 8788)
Exploit Conditions
Complexity:
- Low (no special conditions, works reliably)
- High (requires race condition, specific config, or timing)
User interaction:
- None (automatic, no victim action needed)
- Required (victim must click, visit, or perform action)
Prerequisites:
- Webhook server deployed behind an orchestrator or load balancer that relies on
/healthzfor health probing - Any internal failure condition: Telegram API unreachable, bot token revoked, network partition, grammy bot disconnected
Impact Assessment
Scope:
- Unchanged (impact limited to vulnerable component)
- Changed (can affect other components, escape sandbox)
What can an attacker do?
| Impact Type | Level | Description |
|---|---|---|
| Confidentiality | None | No data exposure |
| Integrity | None | No data modification |
| Availability | High | Orchestrator cannot detect unhealthy webhook instances; traffic continues routing to broken bots; user messages are silently dropped or fail without triggering automatic recovery |
References
- CWE: CWE-393 - Return of Wrong Status Code