Skip to content

[Bug]: Webhook health endpoints always return 200 without checking dependencies #11803

@coygeek

Description

@coygeek

Summary

The Telegram webhook server and Nextcloud Talk webhook server both expose /healthz endpoints that unconditionally return 200 OK without checking any dependencies (bot connectivity, API reachability, message processing capability). When deployed behind an orchestrator (Kubernetes, Docker Swarm, Render), this prevents detection of unhealthy instances, causing user messages to be silently dropped or lost.

Executive Risk Snapshot

  • CVSS v3.1: 7.5 (High)
  • CVSS v4.0: 8.7 (High)
  • Primary risk: The Telegram webhook server and Nextcloud Talk webhook server both expose /healthz endpoints that unconditionally return 200 OK without checking any dependencies (bot connectivity, API reachability, message processing capability).

Technical Analysis

Describe root cause, what breaks, reproduction details, and fix approach.

Affected Code

File: src/telegram/webhook.ts:54-58

  const server = createServer((req, res) => {
    if (req.url === healthPath) {
      res.writeHead(200);
      res.end("ok");
      return;
    }

File: extensions/nextcloud-talk/src/monitor.ts:81-86

  const server = createServer(async (req: IncomingMessage, res: ServerResponse) => {
    if (req.url === HEALTH_PATH) {
      res.writeHead(200, { "Content-Type": "text/plain" });
      res.end("ok");
      return;
    }

Contrast with the main gateway health endpoint (src/gateway/server-methods/health.ts:9-27), which properly checks dependencies via refreshHealthSnapshot() and returns error status when checks fail:

  health: async ({ respond, context, params }) => {
    const { getHealthCache, refreshHealthSnapshot, logHealth } = context;
    const wantsProbe = params?.probe === true;
    // ... caching logic ...
    try {
      const snap = await refreshHealthSnapshot({ probe: wantsProbe });
      respond(true, snap, undefined);
    } catch (err) {
      respond(false, undefined, errorShape(ErrorCodes.UNAVAILABLE, formatForLog(err)));
    }
  },

Steps to Reproduce

  1. Deploy the Telegram webhook server behind a Kubernetes pod with a liveness/readiness probe pointing to /healthz:
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8787
      initialDelaySeconds: 10
      periodSeconds: 15
  2. Revoke the Telegram bot token or block network access to api.telegram.org.
  3. Send a message to the bot via Telegram.
  4. Observe: the /healthz endpoint continues returning 200 OK, Kubernetes does not restart the pod, and the message is lost (the webhookCallback handler fails silently or throws an unhandled error).

Recommended Fix

Add minimal dependency checks to the health endpoint. At minimum, verify the bot object is initialized and the last webhook processing did not produce a persistent error:

// Telegram webhook - src/telegram/webhook.ts
let lastError: string | null = null;
let lastSuccessTs = Date.now();
const HEALTH_STALE_MS = 5 * 60 * 1000; // 5 minutes

const server = createServer((req, res) => {
  if (req.url === healthPath) {
    const now = Date.now();
    const isStale = now - lastSuccessTs > HEALTH_STALE_MS;
    if (lastError || isStale) {
      res.writeHead(503, { "Content-Type": "application/json" });
      res.end(JSON.stringify({
        status: "unhealthy",
        error: lastError ?? "no successful webhook processing in last 5 minutes",
      }));
      return;
    }
    res.writeHead(200);
    res.end("ok");
    return;
  }
  // ... existing handler ...
  // On success: lastSuccessTs = Date.now(); lastError = null;
  // On error: lastError = errMsg;

Apply the same pattern to the Nextcloud Talk webhook server.

Detailed Risk Analysis

CVSS Assessment

Metric v3.1 v4.0
Score 7.5 / 10.0 8.7 / 10.0
Severity High High
Vector CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:H/SC:N/SI:N/SA:N

CVSS v3.1 Calculator
CVSS v4.0 Calculator

Attack Surface

How is this reached?

  • Network (HTTP/WebSocket endpoint, API call)
  • Adjacent Network (same LAN, requires network proximity)
  • Local (local file, CLI argument, environment variable)
  • Physical (requires physical access to machine)

Authentication required?

  • None (unauthenticated/public access)
  • Low (any authenticated user)
  • High (admin/privileged user only)

Entry point: GET /healthz on the Telegram webhook server (default port 8787) or Nextcloud Talk webhook server (default port 8788)

Exploit Conditions

Complexity:

  • Low (no special conditions, works reliably)
  • High (requires race condition, specific config, or timing)

User interaction:

  • None (automatic, no victim action needed)
  • Required (victim must click, visit, or perform action)

Prerequisites:

  • Webhook server deployed behind an orchestrator or load balancer that relies on /healthz for health probing
  • Any internal failure condition: Telegram API unreachable, bot token revoked, network partition, grammy bot disconnected

Impact Assessment

Scope:

  • Unchanged (impact limited to vulnerable component)
  • Changed (can affect other components, escape sandbox)

What can an attacker do?

Impact Type Level Description
Confidentiality None No data exposure
Integrity None No data modification
Availability High Orchestrator cannot detect unhealthy webhook instances; traffic continues routing to broken bots; user messages are silently dropped or fail without triggering automatic recovery

References

  • CWE: CWE-393 - Return of Wrong Status Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions