Skip to content

[Bug]: Clawdbot Gateway Crashes Repeatedly #3815

@parkerati

Description

@parkerati

Clawdbot Gateway Crash Bug Report

Date: 2026-01-29
Reporter: Parker (@parkerati)
Clawdbot Version: 2026.1.24-3
Platform: macOS (Darwin 24.6.0)
Node Version: v22.22.0


Summary

The Clawdbot gateway crashes repeatedly due to unhandled promise rejections from network failures. Any failed HTTP request (Telegram API, web_fetch, etc.) causes the entire gateway process to terminate with no graceful recovery.

Severity: CRITICAL - Gateway requires manual restarts multiple times per session


Crash Timeline (2026-01-29)

Crash #1: ~00:16 EST (05:16 UTC)

  • Trigger: Telegram setMyCommands API failures
  • Pattern: Repeated network fetch failures starting at 05:11 UTC
  • Result: Silent crash, no error logged for actual exit

Crash #2: ~00:48 EST (05:48 UTC)

  • Trigger: Unknown (silent crash during normal operation)
  • Last Log: 05:48:33 UTC - exec tool call, then process died
  • Result: No error message, no exception logged

Crash #3: 01:27 EST (06:27 UTC)

  • Trigger: web_fetch 403 error from Investing.com
  • Log Entry:
06:15:41 [tools] web_fetch failed: Web fetch failed (403): Just a moment...
06:27:03 [clawdbot] Unhandled promise rejection: TypeError: fetch failed
    at node:internal/deps/undici/undici:14902:13
    at processTicksAndRejections (node:internal/process/task_queues:105:5)

Crash #4: 01:31 EST (06:31 UTC)

  • Trigger: Unknown network fetch failure during normal operation
  • Log Entry:
06:28:52 [hooks] loaded 3 internal hook handlers
06:28:53 [telegram] [default] starting provider (@lisaparkerbot)
06:29:07 [agent/embedded] Removed orphaned user message
06:31:25 [clawdbot] Unhandled promise rejection: TypeError: fetch failed
    at node:internal/deps/undici/undici:14902:13
    at processTicksAndRejections (node:internal/process/task_queues:105:5)
  • Note: Crash occurred during normal conversation, not during tool use

Crash #5+: 01:36-01:38 EST

  • Trigger: Local file exceptions / file operations
  • Pattern: Gateway also crashes when local file operations fail or throw exceptions
  • Note: Not just network failures - ANY unhandled exception crashes the gateway

Root Cause

Network operations (Telegram API, web_fetch, etc.) AND local file operations throw unhandled promise rejections when they fail. Node.js terminates the process on unhandled rejections by default.

Crash triggers include:

  • Network fetch failures (Telegram API, web_fetch tool)
  • Local file exceptions (reading non-existent files, permission errors)
  • Any unhandled promise rejection from any operation

Example Log Pattern (Telegram crashes):

{
  "subsystem": "gateway/channels/telegram",
  "message": "telegram setMyCommands failed: HttpError: Network request for 'setMyCommands' failed!",
  "logLevelName": "ERROR",
  "time": "2026-01-29T05:11:13.656Z"
}

{
  "message": "Unhandled promise rejection: TypeError: fetch failed\n    at node:internal/deps/undici/undici:14902:13\n    at processTicksAndRejections (node:internal/process/task_queues:105:5)",
  "logLevelName": "ERROR",
  "time": "2026-01-29T05:11:13.656Z"
}

This pattern repeated 10+ times between 05:11 and 05:22 UTC, with gateway crash-looping until Telegram channel was disabled.


Reproduction Steps

  1. Start gateway with Telegram enabled
  2. Trigger network failure (disconnect internet, block Telegram API, etc.)
  3. Gateway attempts Telegram API call on startup
  4. API call fails with network error
  5. Unhandled promise rejection crashes entire gateway process

Alternative: Use web_fetch tool on a URL that returns 403/403/timeout → same crash pattern


Impact

User Experience

  • Gateway requires manual restart after each crash
  • Web UI disconnects and cannot reconnect until manual restart
  • Telegram channel becomes unusable
  • No automatic recovery despite LaunchAgent supervision (stale locks prevent restart)

Current Workarounds

  1. Disable Telegram channel temporarily
  2. Avoid web_fetch tools on unreliable endpoints
  3. Manual restarts via clawdbot gateway stop && clawdbot gateway start

Expected Behavior

Network failures should:

  1. Be caught and logged - not crash the process
  2. Retry with backoff - especially for startup operations like Telegram init
  3. Gracefully degrade - disable failing channel/tool instead of killing gateway
  4. Clean up locks - allow supervisor to restart if crash occurs

Suggested Fixes

1. Global Unhandled Rejection Handler

Add process-level handler to catch and log unhandled rejections:

process.on('unhandledRejection', (reason, promise) => {
  logger.error('Unhandled Promise Rejection:', reason);
  // Don't exit - log and continue
});

2. Wrap Network Operations

All fetch/HTTP operations should use try-catch or .catch():

// Telegram init
try {
  await telegram.setMyCommands(commands);
} catch (error) {
  logger.error('Telegram init failed:', error);
  // Disable channel or retry, don't throw
}

// web_fetch tool
async function webFetch(url) {
  try {
    return await fetch(url);
  } catch (error) {
    logger.error(`web_fetch failed for ${url}:`, error);
    return { status: 'error', error: error.message };
  }
}

3. Startup Resilience

Channel initialization should not block gateway startup:

  • Try to init channels asynchronously
  • If channel fails to init, mark as disabled and log error
  • Continue gateway startup with remaining channels

4. Lock File Cleanup

On crash, stale lock files prevent LaunchAgent auto-restart. Either:

  • Use process monitoring instead of file locks
  • Clean stale locks on startup (check if PID is actually running)
  • Implement lock timeout/expiration

Additional Context

LaunchAgent Configuration

Gateway is supervised by macOS LaunchAgent with KeepAlive = true, but auto-restart fails due to stale lock conflicts.

System Resources

Not a resource issue - crashes happen with plenty of available memory/CPU. Purely error handling problem.

Frequency

Tonight (2026-01-29): 5+ crashes in ~1.5 hours of active use. Gateway completely unstable - requires manual restart every 10-15 minutes on average. System is unusable for production.


Log Files

Full logs available at:

  • /tmp/clawdbot/clawdbot-2026-01-29.log
  • /Users/parker/.clawdbot/logs/gateway.log
  • /Users/parker/.clawdbot/logs/gateway.err.log

Relevant excerpts included above.


Priority

CRITICAL - Gateway is unusable in production without constant manual intervention. This affects:

  • All channel integrations (Telegram, etc.)
  • Tool reliability (web_fetch, web_search)
  • User confidence in system stability

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions