fix(msteams): prevent EADDRINUSE crash on duplicate provider start by sudie-codes · Pull Request #49583 · openclaw/openclaw

sudie-codes · 2026-03-18T07:42:48Z

Summary

Fixes #25790, related #22169 — Duplicate monitorMSTeamsProvider() calls cause EADDRINUSE crash loop when the gateway boots the provider twice due to a race condition.

What & Why

Problem: When the gateway starts the MS Teams provider twice (race condition during restart/deployment), the second call to expressApp.listen(port) throws EADDRINUSE, entering an infinite crash-restart loop. The bot becomes completely unresponsive until manually restarted.

Root cause: monitorMSTeamsProvider() in monitor.ts had no guard against being called concurrently or after already starting. Each invocation created a new Express app and attempted to bind the same port.

Fix: Refactored monitorMSTeamsProvider() into a thin singleton wrapper:

Module-level activeInstance variable stores the in-flight startup Promise
Duplicate calls (whether startup is still in-flight or already running) return the same promise immediately with a warning log
On startup failure, the guard is cleared so a legitimate retry can proceed
On shutdown(), the guard is cleared so the provider can restart cleanly
Exported resetMSTeamsProviderInstance() for test isolation

Files changed: extensions/msteams/src/monitor.ts, extensions/msteams/src/monitor.lifecycle.test.ts

Screenshots

N/A — This is a server-side startup guard fix. The race condition occurs during gateway boot when monitorMSTeamsProvider() is called twice before the first call completes. Verification requires simulating concurrent provider starts. The fix is validated via unit test.

Test Results

New regression test: "returns existing instance on duplicate start without crashing" — verifies both calls return the same Promise reference
All 235 msteams tests pass (27 test files, ~9s)

AI Disclosure

AI-assisted (Claude Code with team orchestration)
Fully tested — new regression test + all existing tests pass
I understand what the code does: singleton guard prevents duplicate Express server bind by returning the existing startup promise on re-entrant calls

greptile-apps · 2026-03-18T07:44:50Z

Greptile Summary

This PR addresses two independent issues: a singleton guard in monitor.ts to prevent EADDRINUSE crash loops caused by duplicate provider starts, and a security hardening fix in message-handler.ts that closes a sender-allowlist bypass (GHSA-g7cr-9h7q-4qxq).

The singleton guard approach is the right fix for the described race condition and the accompanying regression test is well-structured. However, two issues were found:

Premature singleton clear in shutdown() (monitor.ts lines 350–361): activeInstancePromise is set to null at the very start of shutdown(), before httpServer.close() completes. Any monitorMSTeamsProvider() call that races into the window between the clear and the port being released will spin up a new _startMSTeamsProvider(), hit expressApp.listen() while the port is still occupied, and reproduce the EADDRINUSE crash loop. The sentinel should be cleared inside the httpServer.close() callback, after the port is confirmed released.
Second caller's AbortSignal is silently discarded (monitor.ts lines 78–83): When a duplicate call carries a different AbortSignal, the guard returns the existing promise but ignores the new signal entirely. This means aborting the second caller will not stop the server, and aborting the first will stop it regardless of the second caller's intent. The warning message should call this out explicitly, and ideally the guard should handle or at least log the signal mismatch.

Confidence Score: 3/5

The singleton guard partially solves the EADDRINUSE problem, but a race window in shutdown() can re-introduce it on server restart.
The core idea is correct and the security fix looks sound, but the premature activeInstancePromise = null in shutdown() leaves a meaningful race that undermines the stated goal of the PR. The AbortSignal discard is a less urgent but still real lifecycle bug for callers with independent abort controllers.
extensions/msteams/src/monitor.ts — specifically the shutdown() closure and the duplicate-call guard.

Comments Outside Diff (1)

extensions/msteams/src/monitor.ts, line 350-361 (link)

Singleton cleared before server close completes

activeInstancePromise = null is set at the very beginning of shutdown(), before httpServer.close() finishes. This creates a race window: any caller that invokes monitorMSTeamsProvider() after shutdown() is entered but before the OS releases the port will start a fresh _startMSTeamsProvider(), hit expressApp.listen() while the old server is still closing, and get EADDRINUSE — the exact crash loop this PR is trying to prevent.

The sentinel should be cleared only after the port is confirmed released:

Prompt To Fix With AI

This is a comment left during a code review.
Path: extensions/msteams/src/monitor.ts
Line: 350-361

Comment:
**Singleton cleared before server close completes**

`activeInstancePromise = null` is set at the very beginning of `shutdown()`, before `httpServer.close()` finishes. This creates a race window: any caller that invokes `monitorMSTeamsProvider()` after `shutdown()` is entered but before the OS releases the port will start a fresh `_startMSTeamsProvider()`, hit `expressApp.listen()` while the old server is still closing, and get `EADDRINUSE` — the exact crash loop this PR is trying to prevent.

The sentinel should be cleared only after the port is confirmed released:



How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: extensions/msteams/src/monitor.ts
Line: 350-361

Comment:
**Singleton cleared before server close completes**

`activeInstancePromise = null` is set at the very beginning of `shutdown()`, before `httpServer.close()` finishes. This creates a race window: any caller that invokes `monitorMSTeamsProvider()` after `shutdown()` is entered but before the OS releases the port will start a fresh `_startMSTeamsProvider()`, hit `expressApp.listen()` while the old server is still closing, and get `EADDRINUSE` — the exact crash loop this PR is trying to prevent.

The sentinel should be cleared only after the port is confirmed released:

```suggestion
  const shutdown = async () => {
    log.info("shutting down msteams provider");
    return new Promise<void>((resolve) => {
      httpServer.close((err) => {
        if (err) {
          log.debug?.("msteams server close error", { error: String(err) });
        }
        activeInstancePromise = null;
        resolve();
      });
    });
  };
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/msteams/src/monitor.ts
Line: 78-83

Comment:
**Second caller's `abortSignal` is silently discarded**

When a duplicate call arrives with a *different* `AbortSignal`, the guard returns the existing promise — but `_startMSTeamsProvider` is already wired to the first call's signal via `keepHttpServerTaskAlive`. The second caller's signal is completely ignored, meaning:

- Aborting the second caller will **not** stop the server.
- Aborting the first caller **will** stop the server even if the second caller never intended it.

In practice, both callers from the gateway race may share the same controller, so this might not manifest today. But this is a hidden contract that could bite future callers. At minimum, a log warning that the incoming signal is being ignored would make debugging easier. A more robust approach is to check whether the already-stored promise was created with a compatible (same or parent) signal and warn loudly if not:

```typescript
if (activeInstancePromise) {
  const core = getMSTeamsRuntime();
  const log = core.logging.getChildLogger({ name: "msteams" });
  log.warn?.("msteams provider already started; returning existing instance — incoming abortSignal ignored");
  return activeInstancePromise;
}
```

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: "fix(msteams): preven..."}

greptile-apps · 2026-03-18T07:44:56Z

extensions/msteams/src/monitor.ts

+  if (activeInstancePromise) {
+    const core = getMSTeamsRuntime();
+    const log = core.logging.getChildLogger({ name: "msteams" });
+    log.warn?.("msteams provider already started; returning existing instance");
+    return activeInstancePromise;
+  }


Second caller's abortSignal is silently discarded

When a duplicate call arrives with a different AbortSignal, the guard returns the existing promise — but _startMSTeamsProvider is already wired to the first call's signal via keepHttpServerTaskAlive. The second caller's signal is completely ignored, meaning:

Aborting the second caller will not stop the server.

Aborting the first caller will stop the server even if the second caller never intended it.

In practice, both callers from the gateway race may share the same controller, so this might not manifest today. But this is a hidden contract that could bite future callers. At minimum, a log warning that the incoming signal is being ignored would make debugging easier. A more robust approach is to check whether the already-stored promise was created with a compatible (same or parent) signal and warn loudly if not:

if (activeInstancePromise) { const core = getMSTeamsRuntime(); const log = core.logging.getChildLogger({ name: "msteams" }); log.warn?.("msteams provider already started; returning existing instance — incoming abortSignal ignored"); return activeInstancePromise; }

Prompt To Fix With AI

This is a comment left during a code review. Path: extensions/msteams/src/monitor.ts Line: 78-83 Comment: **Second caller's `abortSignal` is silently discarded** When a duplicate call arrives with a *different* `AbortSignal`, the guard returns the existing promise — but `_startMSTeamsProvider` is already wired to the first call's signal via `keepHttpServerTaskAlive`. The second caller's signal is completely ignored, meaning: - Aborting the second caller will **not** stop the server. - Aborting the first caller **will** stop the server even if the second caller never intended it. In practice, both callers from the gateway race may share the same controller, so this might not manifest today. But this is a hidden contract that could bite future callers. At minimum, a log warning that the incoming signal is being ignored would make debugging easier. A more robust approach is to check whether the already-stored promise was created with a compatible (same or parent) signal and warn loudly if not: ```typescript if (activeInstancePromise) { const core = getMSTeamsRuntime(); const log = core.logging.getChildLogger({ name: "msteams" }); log.warn?.("msteams provider already started; returning existing instance — incoming abortSignal ignored"); return activeInstancePromise; } ``` How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 795ec4212f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-18T07:50:16Z

extensions/msteams/src/monitor.ts

+  if (activeInstancePromise) {
+    const core = getMSTeamsRuntime();
+    const log = core.logging.getChildLogger({ name: "msteams" });
+    log.warn?.("msteams provider already started; returning existing instance");
+    return activeInstancePromise;


Honor later abort signals when deduplicating provider start

Returning the first in-flight promise here drops lifecycle inputs from subsequent calls, especially abortSignal. In the duplicate-start race this change is targeting, the gateway creates a new AbortController per start attempt (src/gateway/server-channels.ts), but stop/restart aborts only the controller currently stored for that account; when the second start call is deduped to the first promise, its signal is never connected to the running provider, so stop can abort the wrong controller and leave the Teams server running (or stopChannel waiting indefinitely). Deduped starts need to merge/track later abort signals instead of ignoring them.

Useful? React with 👍 / 👎.

vignesh07 · 2026-03-18T08:44:35Z

Thanks for digging into this and for putting together the repro/test case.

I pushed an update to this branch that fixes the duplicate-start race in the shared gateway channel manager instead of inside the MS Teams monitor.

Why I changed direction:

The underlying race is in the shared startup path, where two overlapping startChannel calls can both get past the pre-start checks before the task is registered.
Fixing that in the gateway prevents duplicate boots for the same account across all channels, including MS Teams, instead of adding a Teams-only guard.
I added a regression test that reproduces the problem with concurrent starts, and a second one that verifies stopping an account mid-boot cancels startup cleanly.

I also removed the extra authz change from extensions/msteams/src/monitor-handler/message-handler.ts from this PR.

Reasoning there:

It looked unrelated to the duplicate-start bug this PR is fixing.
In the current flow, that hunk did not appear to change the final group authz decision, because route allowlist enforcement and sender allowlist enforcement already happen later in the handler.
There is already targeted coverage for that route-allowlist/sender-authz scenario, and it passes on current main without that change.

So the branch is now scoped to the actual root cause: serializing per-account startup in the gateway.

Tests I ran:

pnpm test -- src/gateway/server-channels.test.ts
pnpm test -- extensions/msteams/src/monitor.lifecycle.test.ts
pnpm exec oxfmt --check src/gateway/server-channels.ts src/gateway/server-channels.test.ts
pnpm exec oxlint src/gateway/server-channels.ts src/gateway/server-channels.test.ts

Thanks again — I think this ends up as a stronger fix with a better regression test surface.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e6e9e867ea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-18T08:46:46Z

src/gateway/server-channels.ts

+        const existingStart = store.starting.get(id);
+        if (existingStart) {
+          await existingStart;
          return;


Don't drop restart requests while an earlier start is pending

When a channel is still inside an async preflight such as plugin.config.isConfigured (for example extensions/zalouser/src/shared.ts:70), stopChannel() aborts it but returns before store.starting clears. The next startChannel() in restart flows like src/gateway/server-reload-handlers.ts:140-143 lands here, waits for the old start gate, and then returns without scheduling a new task. Once the original start notices the abort/manual stop, it exits and the channel stays offline even though the restart path completed successfully.

Useful? React with 👍 / 👎.

…ie-codes

…ie-codes)

vignesh07 · 2026-03-18T08:57:33Z

Thanks @sudie-codes!

Landed via temp rebase onto main.

Focused validation:

      -           -         - Source commit: https://github.com/openclaw/openclaw/commit/f80ed12ece232fca44cdee3d3384d207df70416b

Landed commit: 1890089

The broader CI/gate issues were being fixed separately on main during landing.

vignesh07 · 2026-03-18T08:58:12Z

Thanks again @sudie-codes!

Source commit: f80ed12
Landed commit: 1890089

The broader CI/gate issues were being fixed separately on main during landing.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f80ed12ece

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-18T09:05:54Z

src/gateway/server-channels.ts

+        const existingStart = store.starting.get(id);
+        if (existingStart) {
+          await existingStart;
          return;


Propagate preflight errors to deduplicated start callers

existingStart only waits on startGate, and startGate is always resolved in the finally block. If the original startup rejects before store.tasks is handed off — for example an async plugin.config.isConfigured() check like extensions/zalouser/src/shared.ts:70 throws — the first startChannel() rejects but every overlapping caller that hits this branch resolves successfully. Callers such as src/gateway/channel-health-monitor.ts:164-168 then record a successful restart and enter cooldown even though the channel never came up. The deduplicated path needs to await the original startup result, not a separate resolve-only gate.

Useful? React with 👍 / 👎.

@sudie-codes

* main: (230 commits) fix llm-task invalid thinking timeout Build: narrow tsdown unresolved import guard Plugins: sync contract registry image providers Build: fail on unresolved tsdown imports Deps: align pi-agent-core for declaration builds Build: fail on plugin SDK declaration errors Release: add plugin npm publish workflow (openclaw#47678) fix(security): block build-tool and glibc env injection vectors in host exec sandbox (openclaw#49702) test simplify zero-state boundary guards ci enforce boundary guardrails test: enable vmForks for targeted channel test runs test(telegram): fix incomplete sticker-cache mocks in tests Config: align model compat thinking format types Tlon: pin api-beta to current known-good commit Plugin SDK: harden provider auth seams fix(config): add missing qwen-chat-template to thinking format schema Plugin SDK: register provider auth login entrypoint Plugin SDK: split provider auth login seam fix: serialize duplicate channel starts (openclaw#49583) (thanks @sudie-codes) Telegram: fix reply-runtime test typings ...

…ie-codes

…ie-codes)

…ie-codes

…ie-codes)

…ie-codes

…ie-codes)

…ie-codes

…ie-codes)

…ie-codes

…ie-codes)

…ie-codes

…ie-codes)

…ie-codes

…ie-codes)

openclaw-barnacle bot added channel: msteams Channel integration: msteams size: S labels Mar 18, 2026

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 18, 2026

View reviewed changes

sudie-codes mentioned this pull request Mar 18, 2026

msteams: fix sender allowlist bypass when route allowlist is configured (GHSA-g7cr-9h7q-4qxq) #49582

Merged

3 tasks

vignesh07 self-assigned this Mar 18, 2026

openclaw-barnacle bot added gateway Gateway runtime size: M and removed channel: msteams Channel integration: msteams size: S labels Mar 18, 2026

chatgpt-codex-connector bot reviewed Mar 18, 2026

View reviewed changes

fix: serialize duplicate channel starts (openclaw#49583) (thanks @sud…

f80ed12

…ie-codes)

vignesh07 force-pushed the fix/msteams-eaddrinuse-crash-loop branch from e6e9e86 to f80ed12 Compare March 18, 2026 08:57

vignesh07 merged commit 1890089 into openclaw:main Mar 18, 2026
7 checks passed

chatgpt-codex-connector bot reviewed Mar 18, 2026

View reviewed changes

github-actions bot mentioned this pull request Mar 18, 2026

📡 Upstream Digest — 2026-03-18 10:43 UTC curtismercier/openclaw-mods#297

Open

FradSer mentioned this pull request Mar 18, 2026

fix: resolve web UI chat model dropdown desync and wrong provider prefix #49757

Closed

6 tasks

ssfdre38 pushed a commit to ssfdre38/openclaw-community-edition that referenced this pull request Mar 18, 2026

fix: serialize duplicate channel starts (openclaw#49583) (thanks @sud…

2a6ea0d

…ie-codes)

analysoor-assistant pushed a commit to analysoor-assistant/openclaw that referenced this pull request Mar 19, 2026

fix: serialize duplicate channel starts (openclaw#49583) (thanks @sud…

12cb5c8

…ie-codes)

RickyTong1 pushed a commit to RickyTong1/openclaw that referenced this pull request Mar 19, 2026

fix: serialize duplicate channel starts (openclaw#49583) (thanks @sud…

cceffc2

…ie-codes)

brandontyler pushed a commit to brandontyler/clawdbot that referenced this pull request Mar 19, 2026

fix: serialize duplicate channel starts (openclaw#49583) (thanks @sud…

5bc70d0

…ie-codes)

fuller-stack-dev pushed a commit to fuller-stack-dev/openclaw that referenced this pull request Mar 20, 2026

fix: serialize duplicate channel starts (openclaw#49583) (thanks @sud…

f9dacdb

…ie-codes)

fuller-stack-dev pushed a commit to fuller-stack-dev/openclaw that referenced this pull request Mar 20, 2026

fix: serialize duplicate channel starts (openclaw#49583) (thanks @sud…

33dddf6

…ie-codes)

pholpaphankorn pushed a commit to pholpaphankorn/openclaw that referenced this pull request Mar 22, 2026

fix: serialize duplicate channel starts (openclaw#49583) (thanks @sud…

fe6d0be

…ie-codes)

BradGroux mentioned this pull request Mar 26, 2026

msteams provider exits immediately, causing infinite auto-restart loop #27885

Closed

BradGroux mentioned this pull request Mar 26, 2026

WORKING: All Microsoft Issues and PRs #49126

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(msteams): prevent EADDRINUSE crash on duplicate provider start#49583

fix(msteams): prevent EADDRINUSE crash on duplicate provider start#49583
vignesh07 merged 1 commit intoopenclaw:mainfrom
sudie-codes:fix/msteams-eaddrinuse-crash-loop

sudie-codes commented Mar 18, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 18, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Mar 18, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Uh oh!

vignesh07 commented Mar 18, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Uh oh!

Uh oh!

vignesh07 commented Mar 18, 2026

Uh oh!

vignesh07 commented Mar 18, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sudie-codes commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What & Why

Screenshots

Test Results

AI Disclosure

Uh oh!

greptile-apps bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

vignesh07 commented Mar 18, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vignesh07 commented Mar 18, 2026

Uh oh!

vignesh07 commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sudie-codes commented Mar 18, 2026 •

edited

Loading

greptile-apps bot commented Mar 18, 2026 •

edited

Loading

vignesh07 commented Mar 18, 2026 •

edited

Loading