Skip to content

Commit e94ebfa

Browse files
juliabushobviyus
andauthored
fix: harden gateway SIGTERM shutdown (#51242) (thanks @juliabush)
* fix: increase shutdown timeout to avoid SIGTERM hang * fix(telegram): abort polling fetch on shutdown to prevent SIGTERM hang * fix(gateway): enforce hard exit on shutdown timeout for SIGTERM * fix: tighten gateway shutdown watchdog * fix: harden gateway SIGTERM shutdown (#51242) (thanks @juliabush) --------- Co-authored-by: Ayaan Zaidi <[email protected]>
1 parent 95fec66 commit e94ebfa

File tree

3 files changed

+15
-5
lines changed

3 files changed

+15
-5
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,7 @@ Docs: https://docs.openclaw.ai
218218
- Web search/onboarding: clarify provider labels, key prompts, and missing-key notes so setup/configure more clearly names the required provider credential for Gemini, Kimi, Grok, Brave Search, Firecrawl, Perplexity, and Tavily. Thanks @vincentkoc.
219219
- macOS/canvas actions: keep unattended local agent actions on trusted in-app canvas surfaces only, and stop exposing the deep-link fallback key to arbitrary page scripts. (#46790) Thanks @vincentkoc.
220220
- Agents/compaction: extend the enclosing run deadline once while compaction is actively in flight, and abort the underlying SDK compaction on timeout/cancel so large-session compactions stop freezing mid-run. (#46889) Thanks @asyncjason.
221+
- Gateway/Telegram shutdown: abort stalled Telegram polling fetches on shutdown, clean up per-cycle abort listeners, and keep the in-process watchdog ahead of supervisor stop timeouts so SIGTERM no longer leaves zombie gateways behind. (#51242) Thanks @juliabush.
221222
- Telegram/setup: warn when setup leaves DMs on pairing without an allowlist, and show valid account-scoped remediation commands. (#50710) Thanks @ernestodeoliveira.
222223
- Doctor/Telegram: replace the fresh-install empty group-allowlist false positive with first-run guidance that explains DM pairing approval and the next group setup steps, so new Telegram installs get actionable setup help instead of a broken-config warning. Thanks @vincentkoc.
223224
- Doctor/extensions: keep Matrix DM `allowFrom` repairs on the canonical `dm.allowFrom` path and stop treating Zalouser group sender gating as if it fell back to `allowFrom`, so doctor warnings and `--fix` stay aligned with runtime access control. Thanks @vincentkoc.

extensions/telegram/src/polling-session.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,13 @@ export class TelegramPollingSession {
196196
const runner = run(bot, this.opts.runnerOptions);
197197
this.#activeRunner = runner;
198198
const fetchAbortController = this.#activeFetchAbort;
199+
const abortFetch = () => {
200+
fetchAbortController?.abort();
201+
};
202+
203+
if (this.opts.abortSignal && fetchAbortController) {
204+
this.opts.abortSignal.addEventListener("abort", abortFetch, { once: true });
205+
}
199206
let stopPromise: Promise<void> | undefined;
200207
let stalledRestart = false;
201208
let forceCycleTimer: ReturnType<typeof setTimeout> | undefined;
@@ -291,6 +298,7 @@ export class TelegramPollingSession {
291298
if (forceCycleTimer) {
292299
clearTimeout(forceCycleTimer);
293300
}
301+
this.opts.abortSignal?.removeEventListener("abort", abortFetch);
294302
this.opts.abortSignal?.removeEventListener("abort", stopOnAbort);
295303
await waitForGracefulStop(stopRunner);
296304
await waitForGracefulStop(stopBot);

src/cli/gateway-cli/run-loop.ts

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,8 @@ export async function runGatewayLoop(params: {
9797
};
9898

9999
const DRAIN_TIMEOUT_MS = 90_000;
100-
const SHUTDOWN_TIMEOUT_MS = 5_000;
100+
const SUPERVISOR_STOP_TIMEOUT_MS = 30_000;
101+
const SHUTDOWN_TIMEOUT_MS = SUPERVISOR_STOP_TIMEOUT_MS - 5_000;
101102

102103
const request = (action: GatewayRunSignalAction, signal: string) => {
103104
if (shuttingDown) {
@@ -112,10 +113,10 @@ export async function runGatewayLoop(params: {
112113
const forceExitMs = isRestart ? DRAIN_TIMEOUT_MS + SHUTDOWN_TIMEOUT_MS : SHUTDOWN_TIMEOUT_MS;
113114
const forceExitTimer = setTimeout(() => {
114115
gatewayLog.error("shutdown timed out; exiting without full cleanup");
115-
// Exit non-zero on restart timeout so launchd/systemd treats it as a
116-
// failure and triggers a clean process restart instead of assuming the
117-
// shutdown was intentional. Stop-timeout stays at 0 (graceful). (#36822)
118-
exitProcess(isRestart ? 1 : 0);
116+
// Keep the in-process watchdog below the supervisor stop budget so this
117+
// path wins before launchd/systemd escalates to a hard kill. Exit
118+
// non-zero on any timeout so supervised installs restart cleanly.
119+
exitProcess(1);
119120
}, forceExitMs);
120121

121122
void (async () => {

0 commit comments

Comments
 (0)