msteams provider starts twice on gateway boot, causing EADDRINUSE restart loop

### Summary

## Environment
- OpenClaw: 2026.2.19-2 (45d9b20)
- @openclaw/msteams plugin: 2026.2.19
- OS: Linux 6.6.87.2-microsoft-standard-WSL2 (x64), Node 22.22.0
- Gateway: systemd service, local loopback

## Description

On every gateway start/restart, the msteams provider enters an infinite EADDRINUSE restart loop. The first instance binds port 3978 successfully, but the channel manager immediately triggers `auto-restart attempt 1/10` — starting a second instance that fails to bind the same port. This cascades into 10 restart attempts with exponential backoff, and the health-monitor compounds the issue by restarting the provider again after attempts are exhausted.


### Steps to reproduce

1. Install @openclaw/msteams plugin (`openclaw plugins install @openclaw/msteams`)
2. Configure msteams channel with valid appId, appPassword, tenantId
3. Start gateway: `openclaw gateway start`
4. Watch logs: `openclaw logs --follow | grep -E 'server error|auto-restart|starting provider'`
5. Observe: provider starts successfully on port 3978, then auto-restart attempt 1/10 triggers immediately, second instance fails with EADDRINUSE, loop continues through 10/10
6. Send an @mention to the bot in a Teams channel during the restart loop
7. Check logs: message is received and processed, but no outbound reply is sent to Teams

### Expected behavior

The msteams provider should start once, bind port 3978, and remain stable. No auto-restart should be triggered unless the provider actually crashes or becomes unhealthy.

### Actual behavior

20:07:43.516 info  starting provider (port 3978)
20:07:44.019 info  msteams provider started on port 3978       ← SUCCESS
20:07:44.018 info  [default] auto-restart attempt 1/10 in 5s   ← WHY? Provider just started fine
20:07:49.515 info  msteams provider started on port 3978       ← SECOND instance
20:07:49.516 error msteams server error: Error: listen EADDRINUSE: address already in use :::3978
20:07:49.515 info  [default] auto-restart attempt 2/10 in 10s
...continues through 10/10, then health-monitor restarts the cycle
```

### OpenClaw version

2026.2.19-2 (45d9b20)

### Operating system

 Linux 6.6.87.2-microsoft-standard-WSL2 (x64), Node 22.22.0

### Install method

_No response_

### Logs, screenshots, and evidence

```shell

```

### Impact and severity

_No response_

### Additional information

## Root cause analysis

The `monitorMSTeamsProvider()` function in `extensions/msteams/src/monitor.ts` returns `{ app, shutdown }` immediately after calling `expressApp.listen(port)`. Since `.listen()` is async (the callback fires after the TCP socket binds), the function's promise resolves **before** the server is actually listening.

The channel manager in `gateway-cli-*.js` (around line 2060) interprets the resolved promise as "provider exited" and triggers the auto-restart logic. The restart creates a second provider instance that tries to bind the same port → EADDRINUSE.

Relevant code in `monitor.ts` (line 277):
```typescript
const httpServer = expressApp.listen(port, () => {
    log.info(`msteams provider started on port ${port}`);
});
// ...
return { app: expressApp, shutdown };  // resolves before listen callback
```

## Additional issues found

1. **Error detail swallowed:** The `httpServer.on("error")` handler at line 282 logs `{ error: String(err) }` as structured metadata, but the log formatter doesn't render it. The log output shows only `"msteams server error"` with no error code, syscall, or stack trace — making diagnosis extremely difficult.

2. **Shutdown doesn't force-close connections:** The `shutdown()` function calls `httpServer.close()` but doesn't call `httpServer.closeAllConnections()`, so keep-alive connections can hold the port open after shutdown, compounding the EADDRINUSE issue on restart.

3. **Delivery recovery stalls:** The restart loop creates 8+ pending delivery entries that exceed the recovery time budget on each restart: `"Recovery time budget exceeded — 8 entries deferred to next restart"`. Outbound replies generated during the unstable period are permanently lost.

## Impact

- **Messages received but replies never sent:** The provider receives inbound webhooks (Teams retries help here), but outbound replies generated by the agent are silently dropped during the restart cycle.
- **Noisy logs:** Dozens of error/restart lines per gateway boot obscure real issues.
- **Health-monitor compounds the problem:** After 10 failed restart attempts, health-monitor detects `running: false` and restarts the whole cycle.

## Suggested fix

**Option A (minimal):** Make `monitorMSTeamsProvider` return a promise that doesn't resolve until the server closes:

```typescript
// Wrap listen in a promise that resolves on 'listening', rejects on fatal error
await new Promise<void>((resolve, reject) => {
    httpServer.listen(port, () => {
        log.info(`msteams provider started on port ${port}`);
    });
    // Don't resolve — keep the provider "alive" from the channel manager's perspective
    // Only resolve on abort signal (clean shutdown)
    opts.abortSignal?.addEventListener("abort", () => resolve());
});
```

**Option B (defensive):** Add EADDRINUSE detection — if the port is already held by the same process, skip binding and log a warning instead of crashing.

**Also recommended:**
- Log the full error object inline: `` log.error(`msteams server error: ${err.message} [code=${err.code}]`) ``
- Call `httpServer.closeAllConnections()` in shutdown
- Add a deduplication guard so the channel manager can't start a second instance while one is already running on the configured port

## Workaround

We patched `monitor.ts` locally to detect EADDRINUSE and skip duplicate binds:

```typescript
httpServer.on("error", (err) => {
    const code = (err as any).code;
    if (code === "EADDRINUSE") {
        log.warn(`msteams port ${port} already in use by another provider instance — skipping duplicate bind`);
        return;
    }
    log.error(`msteams server error: ${String(err)} [code=${code}]`);
});

await new Promise<void>((resolve) => {
    httpServer.listen(port, () => {
        log.info(`msteams provider started on port ${port}`);
        resolve();
    });
    httpServer.once("error", () => resolve());
});
```

This prevents the crash loop but doesn't fix the root cause (channel manager starting the provider twice). The patch is overwritten on plugin update.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

msteams provider starts twice on gateway boot, causing EADDRINUSE restart loop #22169

Summary

Environment

Description

Steps to reproduce

Expected behavior

Actual behavior

Impact and severity

Additional information

Root cause analysis

Additional issues found

Impact

Suggested fix

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

msteams provider starts twice on gateway boot, causing EADDRINUSE restart loop #22169

Description

Summary

Environment

Description

Steps to reproduce

Expected behavior

Actual behavior

Impact and severity

Additional information

Root cause analysis

Additional issues found

Impact

Suggested fix

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions