fix(msteams): await abort signal to prevent EADDRINUSE restart loop by byungsker · Pull Request #25582 · openclaw/openclaw

byungsker · 2026-02-24T15:51:47Z

Problem

monitorMSTeamsProvider() returned as soon as expressApp.listen() bound to the port. The gateway's startAccount runner treats a resolved promise as "provider stopped" and schedules an auto-restart; the second bind attempt then fails with EADDRINUSE, producing an infinite restart loop until MAX_RESTART_ATTEMPTS is exhausted:

[default] auto-restart attempt 1/10 in 5s
EADDRINUSE: address already in use :::3978
[default] auto-restart attempt 2/10 in 11s
...

Fix

Replace the fire-and-forget abort listener with an await-until-abort block so the startAccount promise stays pending while the HTTP server is running:

-  // Handle abort signal
+  // Keep the provider alive until the abort signal fires.
+  // Without this await the startAccount promise resolves immediately after
+  // expressApp.listen() binds to the port, causing the gateway to interpret
+  // the provider as "stopped" and triggering an auto-restart loop that quickly
+  // fails with EADDRINUSE on the second bind attempt.
   if (opts.abortSignal) {
-    opts.abortSignal.addEventListener("abort", () => {
-      void shutdown();
-    });
+    if (!opts.abortSignal.aborted) {
+      await new Promise<void>((resolve) => {
+        opts.abortSignal!.addEventListener("abort", () => resolve(), { once: true });
+      });
+    }
+    await shutdown();
   }

Behaviour matrix:

abortSignal	aborted on entry	Result
provided	no	awaits abort → calls shutdown()
provided	yes	calls shutdown() immediately
absent	—	returns immediately (unchanged)

This matches the pattern already used by the zalouser monitor (extensions/zalouser/src/monitor.ts).

Fixes #25527

Greptile Summary

Fixes the MS Teams provider's EADDRINUSE restart loop by keeping the monitorMSTeamsProvider promise pending until the abort signal fires, rather than resolving immediately after expressApp.listen() binds to the port.

Replaces the fire-and-forget addEventListener("abort", ...) with an await new Promise that blocks until abort, followed by await shutdown() — matching the established pattern in extensions/zalouser/src/monitor.ts
Correctly handles edge cases: signal not yet aborted (awaits), already aborted on entry (immediate shutdown), and absent signal (returns immediately, unchanged behavior)
No issues found — the fix is minimal, well-targeted, and well-documented with a clear comment explaining the rationale

Confidence Score: 5/5

This PR is safe to merge — it's a focused, minimal fix that follows an established pattern in the codebase.
The change is small (7 lines net), well-documented, and directly addresses a clear bug (EADDRINUSE restart loop). It follows the same await-until-abort pattern already used by the zalouser monitor. All three abort-signal cases (provided/not-aborted, provided/already-aborted, absent) are handled correctly. The gateway always passes abortSignal when calling this function, so the fix covers the production path. No new dependencies, no behavioral changes to other code paths.
No files require special attention.

_{Last reviewed commit: 1d92f36}

monitorMSTeamsProvider() returned immediately after expressApp.listen() bound to the port. The gateway's startAccount runner treats a resolved promise as "provider stopped" and schedules an auto-restart; the second bind attempt then fails with EADDRINUSE, producing an infinite restart loop until MAX_RESTART_ATTEMPTS is exhausted. Replace the fire-and-forget abort listener with an await-until-abort block: when an abortSignal is provided the function now stays pending until the signal fires, then calls shutdown() before returning. If the signal is already aborted on entry the shutdown is called immediately. When no abortSignal is provided the existing behaviour is preserved (server keeps running; caller can invoke shutdown() directly). Fixes openclaw#25527

BradGroux · 2026-03-01T17:36:54Z

Field report from a live production recovery (sanitized, no secrets / no env values). Posting this in case it helps maintainers and others triaging the same restart-loop class.

Executive summary

We hit a Microsoft Teams provider auto-restart loop with the same user-visible signature described here:

provider logs startup and directory/channel resolution
then immediately reports auto-restart attempts (1/10, 2/10, ...)
loop persists despite successful startup-side lookups

In our incident, there were three independent contributors. Fixing only one did not fully resolve the loop:

Package collision (legacy global package co-installed with current package)
Missing Teams bot credential field (app secret not set)
Upstream lifecycle bug pattern (provider promise appears to resolve too early, matching this issue family)

What we observed (timeline-style)

Phase A — restart loop with repeated startup success signals

Teams provider repeatedly logged startup, user resolution, and channel resolution.
Immediately after these success logs, gateway health/auto-restart kicked in.
Backoff pattern matched known restart-loop behavior.

Phase B — eliminated local package-collision factor

Found old global package and current package both installed.
Removed legacy global package.
Result: removed one clear conflict vector, but loop still present.

Phase C — fixed missing credential config

Added missing Teams bot app password field (client secret value) in channel config.
Triggered config reload / gateway restart.
Result: credential state improved, but restart loop still present.

Phase D — remaining behavior matches known monitor/lifecycle bug class

Even after cleanup + credentials corrected, pattern remained:
- start provider
- resolve users/channels
- immediate auto-restart
This aligns with issue reports where monitor/start function resolves prematurely and channel manager interprets that as provider exit.

Distinguishing signals that helped triage

These indicators were the most useful to separate local misconfiguration from upstream bug behavior:

Success-before-failure pattern
- If user/group/channel resolution consistently succeeds before restart, network/auth may not be the primary blocker.
Stable repeating loop shape
- Consistent startup → resolution → restart sequence with backoff strongly suggests lifecycle contract mismatch.
Persistence across remediation layers
- If loop persists after:
  - removing duplicate installs,
  - setting required credentials,
  - clean restart,
    then upstream monitor lifecycle is likely involved.

Secure remediation checklist that worked best for us

(Generalized to avoid host-specific details)

1) Eliminate package duplication first

Verify only one canonical global installation is active.
Remove legacy/conflicting package names that may still register plugins.
Re-check plugin load path consistency.

2) Validate Teams auth fields completely

Ensure app ID, tenant ID, and app password (client secret value) are all present.
Confirm config validation passes before restart.

3) Restart and verify by behavior, not process state

Don’t trust “running” status alone.
Verify no new auto-restart attempt lines over a timed observation window.
Verify inbound/outbound Teams flow during that same window.

4) If still looping, classify as likely upstream lifecycle bug

Capture exact log sequence around each restart boundary.
Attach sanitized sequence to issue/PR for maintainers.

What to include in repro data (high value for maintainers)

Recommend sharing these artifacts (all sanitized):

startup line for msteams provider
user/group/channel resolution lines
first line indicating restart scheduling
health-monitor line indicating “reason: stopped” (if present)
whether duplicate package installations were found
whether credentials were complete at time of test
whether loop persists after those are corrected

Suggested maintainer-facing acceptance test

A robust guardrail test for this bug class would assert:

startAccount() for Teams does not resolve while provider is healthy.
Resolver success (users/channels) alone does not trigger subsystem restart logic.
Restart path only activates on explicit stop, fatal error, or abort signal.
No duplicate listener bind occurs during healthy run (prevents EADDRINUSE cascades).

Current status from this field report

Local config/package hygiene issues: addressed.
Remaining loop signature: still consistent with upstream lifecycle bug described in this issue family.
Net: this report supports merging a monitor/start lifecycle fix that keeps provider task pending until true shutdown.

If helpful, I can provide a compact sanitized log excerpt in follow-up showing exact line ordering (startup → resolution → restart) without any identifiers.

steipete · 2026-03-02T20:51:31Z

Superseded by already-merged MSTeams monitor lifecycle fixes:

monitorMSTeamsProvider now keeps the provider run pending until abort/shutdown, resolves startup/close races, and includes regression coverage in main.

Closing this as duplicate to keep the queue focused. Thank you for the patch and write-up.

openclaw-barnacle bot added channel: msteams Channel integration: msteams size: XS trusted-contributor labels Feb 24, 2026

This was referenced Feb 25, 2026

fix(msteams): prevent EADDRINUSE restart loop by awaiting abort signal #25594

Closed

MS Teams provider: EADDRINUSE restart loop — missing await-until-abort in monitorMSTeamsProvider #25527

Closed

justinhuangcode mentioned this pull request Feb 25, 2026

fix(msteams): keep startAccount Promise pending while server runs #24391

Closed

4 tasks

steipete closed this Mar 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(msteams): await abort signal to prevent EADDRINUSE restart loop#25582

fix(msteams): await abort signal to prevent EADDRINUSE restart loop#25582
byungsker wants to merge 1 commit intoopenclaw:mainfrom
byungsker:fix/msteams-eaddrinuse-await-until-abort

byungsker commented Feb 24, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

BradGroux commented Mar 1, 2026

Uh oh!

steipete commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

byungsker commented Feb 24, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Greptile Summary

Confidence Score: 5/5

Uh oh!

BradGroux commented Mar 1, 2026

Executive summary

What we observed (timeline-style)

Phase A — restart loop with repeated startup success signals

Phase B — eliminated local package-collision factor

Phase C — fixed missing credential config

Phase D — remaining behavior matches known monitor/lifecycle bug class

Distinguishing signals that helped triage

Secure remediation checklist that worked best for us

1) Eliminate package duplication first

2) Validate Teams auth fields completely

3) Restart and verify by behavior, not process state

4) If still looping, classify as likely upstream lifecycle bug

What to include in repro data (high value for maintainers)

Suggested maintainer-facing acceptance test

Current status from this field report

Uh oh!

steipete commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

byungsker commented Feb 24, 2026 •

edited by greptile-apps bot

Loading