Skip to content

msteams provider starts twice on gateway boot, causing EADDRINUSE restart loop #22169

@virtuosus711

Description

@virtuosus711

Summary

Environment

  • OpenClaw: 2026.2.19-2 (45d9b20)
  • @openclaw/msteams plugin: 2026.2.19
  • OS: Linux 6.6.87.2-microsoft-standard-WSL2 (x64), Node 22.22.0
  • Gateway: systemd service, local loopback

Description

On every gateway start/restart, the msteams provider enters an infinite EADDRINUSE restart loop. The first instance binds port 3978 successfully, but the channel manager immediately triggers auto-restart attempt 1/10 — starting a second instance that fails to bind the same port. This cascades into 10 restart attempts with exponential backoff, and the health-monitor compounds the issue by restarting the provider again after attempts are exhausted.

Steps to reproduce

  1. Install @openclaw/msteams plugin (openclaw plugins install @openclaw/msteams)
  2. Configure msteams channel with valid appId, appPassword, tenantId
  3. Start gateway: openclaw gateway start
  4. Watch logs: openclaw logs --follow | grep -E 'server error|auto-restart|starting provider'
  5. Observe: provider starts successfully on port 3978, then auto-restart attempt 1/10 triggers immediately, second instance fails with EADDRINUSE, loop continues through 10/10
  6. Send an @mention to the bot in a Teams channel during the restart loop
  7. Check logs: message is received and processed, but no outbound reply is sent to Teams

Expected behavior

The msteams provider should start once, bind port 3978, and remain stable. No auto-restart should be triggered unless the provider actually crashes or becomes unhealthy.

Actual behavior

20:07:43.516 info starting provider (port 3978)
20:07:44.019 info msteams provider started on port 3978 ← SUCCESS
20:07:44.018 info [default] auto-restart attempt 1/10 in 5s ← WHY? Provider just started fine
20:07:49.515 info msteams provider started on port 3978 ← SECOND instance
20:07:49.516 error msteams server error: Error: listen EADDRINUSE: address already in use :::3978
20:07:49.515 info [default] auto-restart attempt 2/10 in 10s
...continues through 10/10, then health-monitor restarts the cycle


### OpenClaw version

2026.2.19-2 (45d9b20)

### Operating system

 Linux 6.6.87.2-microsoft-standard-WSL2 (x64), Node 22.22.0

### Install method

_No response_

### Logs, screenshots, and evidence

```shell

Impact and severity

No response

Additional information

Root cause analysis

The monitorMSTeamsProvider() function in extensions/msteams/src/monitor.ts returns { app, shutdown } immediately after calling expressApp.listen(port). Since .listen() is async (the callback fires after the TCP socket binds), the function's promise resolves before the server is actually listening.

The channel manager in gateway-cli-*.js (around line 2060) interprets the resolved promise as "provider exited" and triggers the auto-restart logic. The restart creates a second provider instance that tries to bind the same port → EADDRINUSE.

Relevant code in monitor.ts (line 277):

const httpServer = expressApp.listen(port, () => {
    log.info(`msteams provider started on port ${port}`);
});
// ...
return { app: expressApp, shutdown };  // resolves before listen callback

Additional issues found

  1. Error detail swallowed: The httpServer.on("error") handler at line 282 logs { error: String(err) } as structured metadata, but the log formatter doesn't render it. The log output shows only "msteams server error" with no error code, syscall, or stack trace — making diagnosis extremely difficult.

  2. Shutdown doesn't force-close connections: The shutdown() function calls httpServer.close() but doesn't call httpServer.closeAllConnections(), so keep-alive connections can hold the port open after shutdown, compounding the EADDRINUSE issue on restart.

  3. Delivery recovery stalls: The restart loop creates 8+ pending delivery entries that exceed the recovery time budget on each restart: "Recovery time budget exceeded — 8 entries deferred to next restart". Outbound replies generated during the unstable period are permanently lost.

Impact

  • Messages received but replies never sent: The provider receives inbound webhooks (Teams retries help here), but outbound replies generated by the agent are silently dropped during the restart cycle.
  • Noisy logs: Dozens of error/restart lines per gateway boot obscure real issues.
  • Health-monitor compounds the problem: After 10 failed restart attempts, health-monitor detects running: false and restarts the whole cycle.

Suggested fix

Option A (minimal): Make monitorMSTeamsProvider return a promise that doesn't resolve until the server closes:

// Wrap listen in a promise that resolves on 'listening', rejects on fatal error
await new Promise<void>((resolve, reject) => {
    httpServer.listen(port, () => {
        log.info(`msteams provider started on port ${port}`);
    });
    // Don't resolve — keep the provider "alive" from the channel manager's perspective
    // Only resolve on abort signal (clean shutdown)
    opts.abortSignal?.addEventListener("abort", () => resolve());
});

Option B (defensive): Add EADDRINUSE detection — if the port is already held by the same process, skip binding and log a warning instead of crashing.

Also recommended:

  • Log the full error object inline: log.error(`msteams server error: ${err.message} [code=${err.code}]`)
  • Call httpServer.closeAllConnections() in shutdown
  • Add a deduplication guard so the channel manager can't start a second instance while one is already running on the configured port

Workaround

We patched monitor.ts locally to detect EADDRINUSE and skip duplicate binds:

httpServer.on("error", (err) => {
    const code = (err as any).code;
    if (code === "EADDRINUSE") {
        log.warn(`msteams port ${port} already in use by another provider instance — skipping duplicate bind`);
        return;
    }
    log.error(`msteams server error: ${String(err)} [code=${code}]`);
});

await new Promise<void>((resolve) => {
    httpServer.listen(port, () => {
        log.info(`msteams provider started on port ${port}`);
        resolve();
    });
    httpServer.once("error", () => resolve());
});

This prevents the crash loop but doesn't fix the root cause (channel manager starting the provider twice). The patch is overwritten on plugin update.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions