-
-
Notifications
You must be signed in to change notification settings - Fork 69.4k
msteams provider starts twice on gateway boot, causing EADDRINUSE restart loop #22169
Description
Summary
Environment
- OpenClaw: 2026.2.19-2 (45d9b20)
- @openclaw/msteams plugin: 2026.2.19
- OS: Linux 6.6.87.2-microsoft-standard-WSL2 (x64), Node 22.22.0
- Gateway: systemd service, local loopback
Description
On every gateway start/restart, the msteams provider enters an infinite EADDRINUSE restart loop. The first instance binds port 3978 successfully, but the channel manager immediately triggers auto-restart attempt 1/10 — starting a second instance that fails to bind the same port. This cascades into 10 restart attempts with exponential backoff, and the health-monitor compounds the issue by restarting the provider again after attempts are exhausted.
Steps to reproduce
- Install @openclaw/msteams plugin (
openclaw plugins install @openclaw/msteams) - Configure msteams channel with valid appId, appPassword, tenantId
- Start gateway:
openclaw gateway start - Watch logs:
openclaw logs --follow | grep -E 'server error|auto-restart|starting provider' - Observe: provider starts successfully on port 3978, then auto-restart attempt 1/10 triggers immediately, second instance fails with EADDRINUSE, loop continues through 10/10
- Send an @mention to the bot in a Teams channel during the restart loop
- Check logs: message is received and processed, but no outbound reply is sent to Teams
Expected behavior
The msteams provider should start once, bind port 3978, and remain stable. No auto-restart should be triggered unless the provider actually crashes or becomes unhealthy.
Actual behavior
20:07:43.516 info starting provider (port 3978)
20:07:44.019 info msteams provider started on port 3978 ← SUCCESS
20:07:44.018 info [default] auto-restart attempt 1/10 in 5s ← WHY? Provider just started fine
20:07:49.515 info msteams provider started on port 3978 ← SECOND instance
20:07:49.516 error msteams server error: Error: listen EADDRINUSE: address already in use :::3978
20:07:49.515 info [default] auto-restart attempt 2/10 in 10s
...continues through 10/10, then health-monitor restarts the cycle
### OpenClaw version
2026.2.19-2 (45d9b20)
### Operating system
Linux 6.6.87.2-microsoft-standard-WSL2 (x64), Node 22.22.0
### Install method
_No response_
### Logs, screenshots, and evidence
```shell
Impact and severity
No response
Additional information
Root cause analysis
The monitorMSTeamsProvider() function in extensions/msteams/src/monitor.ts returns { app, shutdown } immediately after calling expressApp.listen(port). Since .listen() is async (the callback fires after the TCP socket binds), the function's promise resolves before the server is actually listening.
The channel manager in gateway-cli-*.js (around line 2060) interprets the resolved promise as "provider exited" and triggers the auto-restart logic. The restart creates a second provider instance that tries to bind the same port → EADDRINUSE.
Relevant code in monitor.ts (line 277):
const httpServer = expressApp.listen(port, () => {
log.info(`msteams provider started on port ${port}`);
});
// ...
return { app: expressApp, shutdown }; // resolves before listen callbackAdditional issues found
-
Error detail swallowed: The
httpServer.on("error")handler at line 282 logs{ error: String(err) }as structured metadata, but the log formatter doesn't render it. The log output shows only"msteams server error"with no error code, syscall, or stack trace — making diagnosis extremely difficult. -
Shutdown doesn't force-close connections: The
shutdown()function callshttpServer.close()but doesn't callhttpServer.closeAllConnections(), so keep-alive connections can hold the port open after shutdown, compounding the EADDRINUSE issue on restart. -
Delivery recovery stalls: The restart loop creates 8+ pending delivery entries that exceed the recovery time budget on each restart:
"Recovery time budget exceeded — 8 entries deferred to next restart". Outbound replies generated during the unstable period are permanently lost.
Impact
- Messages received but replies never sent: The provider receives inbound webhooks (Teams retries help here), but outbound replies generated by the agent are silently dropped during the restart cycle.
- Noisy logs: Dozens of error/restart lines per gateway boot obscure real issues.
- Health-monitor compounds the problem: After 10 failed restart attempts, health-monitor detects
running: falseand restarts the whole cycle.
Suggested fix
Option A (minimal): Make monitorMSTeamsProvider return a promise that doesn't resolve until the server closes:
// Wrap listen in a promise that resolves on 'listening', rejects on fatal error
await new Promise<void>((resolve, reject) => {
httpServer.listen(port, () => {
log.info(`msteams provider started on port ${port}`);
});
// Don't resolve — keep the provider "alive" from the channel manager's perspective
// Only resolve on abort signal (clean shutdown)
opts.abortSignal?.addEventListener("abort", () => resolve());
});Option B (defensive): Add EADDRINUSE detection — if the port is already held by the same process, skip binding and log a warning instead of crashing.
Also recommended:
- Log the full error object inline:
log.error(`msteams server error: ${err.message} [code=${err.code}]`) - Call
httpServer.closeAllConnections()in shutdown - Add a deduplication guard so the channel manager can't start a second instance while one is already running on the configured port
Workaround
We patched monitor.ts locally to detect EADDRINUSE and skip duplicate binds:
httpServer.on("error", (err) => {
const code = (err as any).code;
if (code === "EADDRINUSE") {
log.warn(`msteams port ${port} already in use by another provider instance — skipping duplicate bind`);
return;
}
log.error(`msteams server error: ${String(err)} [code=${code}]`);
});
await new Promise<void>((resolve) => {
httpServer.listen(port, () => {
log.info(`msteams provider started on port ${port}`);
resolve();
});
httpServer.once("error", () => resolve());
});This prevents the crash loop but doesn't fix the root cause (channel manager starting the provider twice). The patch is overwritten on plugin update.