fix(msteams): keep provider promise pending until abort to stop auto-restart loop by OpakAlex · Pull Request #22605 · openclaw/openclaw

OpakAlex · 2026-02-21T11:11:35Z

Summary

Describe the problem and fix in 2–5 bullets:

Problem: MS Teams provider logs "starting provider (port 3978)" then immediately "auto-restart attempt N/10 in Xs" in a loop; health monitor may log "restarting (reason: stopped)" and reset the attempt counter.
Why it matters: The channel never stays "running"; the gateway keeps restarting the provider until "giving up after 10 restart attempts."
What changed: monitorMSTeamsProvider returns a promise that stays pending until abort + shutdown (not on first listen); added gateway-lifecycle doc and a server-channels test that a pending startAccount does not trigger auto-restart.
What did NOT change (scope boundary): No gateway or health-monitor logic changes; no config/API; other channels unchanged.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related #

User-visible / Behavior Changes

MS Teams channel no longer enters an auto-restart loop; provider stays running until the user stops the channel or the gateway exits. New doc for extension authors: gateway channel lifecycle (startAccount contract).

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: any
Runtime/container: Node 22+
Integration/channel: msteams enabled with valid credentials
Relevant config (redacted): channels.msteams.enabled, webhook.port, credentials

Steps

Enable MS Teams channel and start the gateway.
Watch logs for "msteams" and "auto-restart".

Expected

One "starting provider (port 3978)" and "msteams provider started on port 3978"; no repeated "auto-restart attempt N/10" unless the provider actually crashes.

Actual (before fix)

"starting provider (port 3978)" followed immediately by "auto-restart attempt 1/10 in 5s", then cycle repeats with backoff up to 10 attempts.

Evidence

New test: server-channels.test.ts — "does not auto-restart when startAccount promise stays pending" (startAccount returns never-resolving promise → one call, running stays true).
Spec: docs/channels/gateway-lifecycle.md — startAccount promise contract and MS Teams fix.

Human Verification (required)

Verified scenarios: Unit test passes; code review of promise lifecycle (pending until abort, then resolve after shutdown).
Edge cases checked: No abort signal → promise never resolves (documented); normal stop uses abort.
What you did not verify: Live gateway with real MS Teams app (no credentials in env).

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No
If yes, exact upgrade steps: N/A

Failure Recovery (if this breaks)

How to disable/revert: Disable msteams channel or revert this commit.
Files/config to restore: None.
Known bad symptoms: If abort listener did not fire, stopping the channel could hang; shutdown is still invoked on abort so behavior unchanged.

Risks and Mitigations

Risk: Callers that awaited the old return value for immediate use could break.
- Mitigation: Gateway only awaits for lifecycle (stopChannel); it does not use the resolved value. No such callers in repo.
Risk: None otherwise.
- Mitigation: N/A

Greptile Summary

Fixes the MS Teams provider auto-restart loop by making monitorMSTeamsProvider return a promise that stays pending while the server is running, matching the gateway's startAccount contract. Previously, the promise resolved immediately after expressApp.listen(), causing the gateway to treat the channel as "exited" and enter a restart loop.

extensions/msteams/src/monitor.ts: Wraps the return value in a Promise that stays pending until the abort signal fires and shutdown completes. Without an abort signal, returns a never-resolving promise.
src/gateway/server-channels.test.ts: Adds a test confirming that a pending startAccount promise does not trigger auto-restart.
docs/channels/gateway-lifecycle.md: New documentation describing the startAccount promise contract for extension channel plugins.

Confidence Score: 5/5

This PR is safe to merge — it's a targeted, well-tested fix that correctly aligns the MS Teams provider with the gateway's startAccount promise contract.
The change is minimal and focused: it wraps the existing return value in a pending promise (with abort-based resolution), which is the documented correct pattern. The fix is backed by a new test case. No gateway or health-monitor logic was changed. Early return paths for disabled/unconfigured channels are guarded by the gateway's own checks. The abort signal is always freshly created by the gateway, so there's no risk of pre-aborted signals. No new dependencies, no API changes, backward compatible.
No files require special attention

_{Last reviewed commit: 0dc50ed}

_{(2/5) Greptile learns from your feedback when you react with thumbs up/down!}

OpakAlex · 2026-02-21T12:51:26Z

@steipete your commit adds html tags: 073651fb570a9ef555c44e4f1ea54b3d78d84ec2

## Sponsors
+
+<table align="center">
+  <tr>
+    <td align="center" valign="middle" bgcolor="#111827" width="240" height="72">
+      <a href="https://openai.com/" target="_blank" rel="noopener">
+        <img src="docs/assets/sponsors/openai.svg" alt="OpenAI" height="34" />
+      </a>
+    </td>
+    <td width="16"></td>
+    <td align="center" valign="middle" bgcolor="#111827" width="240" height="72">
+      <a href="https://blacksmith.sh/" target="_blank" rel="noopener">
+        <img src="docs/assets/sponsors/blacksmith.svg" alt="Blacksmith" height="34" />
+      </a>
+    </td>
+  </tr>
+</table>
+

Should we allow html tags for CI?

Thanks

OpakAlex · 2026-02-21T13:29:59Z

@obviyus Can you please check?

…restart loop - monitorMSTeamsProvider now returns a promise that stays pending until opts.abortSignal fires and shutdown() completes, so the gateway no longer treats the channel as exited and restarts it in a loop. - Add docs/channels/gateway-lifecycle.md describing startAccount contract. - Gateway test: startAccount that resolves on abort does not trigger auto-restart; call stopChannel so test cleanup exits. - MSTeams test: use file URL for OneDrive mediaUrl so isLocalPath works on all platforms (e.g. Windows CI). - Apply oxfmt to gateway-lifecycle.md and src/browser/* (format check). Co-authored-by: Cursor <[email protected]>

Co-authored-by: Cursor <[email protected]>

Glucksberg · 2026-02-24T01:25:26Z

Just noticed a connection:

Several other PRs seem to address the same problem:

PR fix(msteams): prevent EADDRINUSE restart loop #20455 by @taradtke
PR fix(msteams): prevent false auto-restart loop after successful startup #22182 by @pandego
PR fix(msteams): resolve EADDRINUSE double-start by awaiting port binding #22184 by @harshang03

Issue #22169 reports msteams provider starting twice on gateway boot causing EADDRINUSE; PR#22182 fixes the false auto-restart loop by keeping the provider promise pending until abort.

Both approaches have merit — might be worth coordinating.

Related issue(s): #22169

If any of these links don't look right, let me know and I'll correct them.

openclaw-barnacle · 2026-03-01T04:17:24Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

pandego · 2026-03-01T13:22:24Z

Closing this PR as superseded by follow-up work on the same issue path.

The msteams lifecycle fix is covered in #22182, so I am closing this one to keep the queue clean and avoid duplicate maintenance.

Thanks everyone for the review context and cross-links.

BradGroux · 2026-03-01T17:36:49Z

Field report from a live production recovery (sanitized, no secrets / no env values). Posting this in case it helps maintainers and others triaging the same restart-loop class.

Executive summary

We hit a Microsoft Teams provider auto-restart loop with the same user-visible signature described here:

provider logs startup and directory/channel resolution
then immediately reports auto-restart attempts (1/10, 2/10, ...)
loop persists despite successful startup-side lookups

In our incident, there were three independent contributors. Fixing only one did not fully resolve the loop:

Package collision (legacy global package co-installed with current package)
Missing Teams bot credential field (app secret not set)
Upstream lifecycle bug pattern (provider promise appears to resolve too early, matching this issue family)

What we observed (timeline-style)

Phase A — restart loop with repeated startup success signals

Teams provider repeatedly logged startup, user resolution, and channel resolution.
Immediately after these success logs, gateway health/auto-restart kicked in.
Backoff pattern matched known restart-loop behavior.

Phase B — eliminated local package-collision factor

Found old global package and current package both installed.
Removed legacy global package.
Result: removed one clear conflict vector, but loop still present.

Phase C — fixed missing credential config

Added missing Teams bot app password field (client secret value) in channel config.
Triggered config reload / gateway restart.
Result: credential state improved, but restart loop still present.

Phase D — remaining behavior matches known monitor/lifecycle bug class

Even after cleanup + credentials corrected, pattern remained:
- start provider
- resolve users/channels
- immediate auto-restart
This aligns with issue reports where monitor/start function resolves prematurely and channel manager interprets that as provider exit.

Distinguishing signals that helped triage

These indicators were the most useful to separate local misconfiguration from upstream bug behavior:

Success-before-failure pattern
- If user/group/channel resolution consistently succeeds before restart, network/auth may not be the primary blocker.
Stable repeating loop shape
- Consistent startup → resolution → restart sequence with backoff strongly suggests lifecycle contract mismatch.
Persistence across remediation layers
- If loop persists after:
  - removing duplicate installs,
  - setting required credentials,
  - clean restart,
    then upstream monitor lifecycle is likely involved.

Secure remediation checklist that worked best for us

(Generalized to avoid host-specific details)

1) Eliminate package duplication first

Verify only one canonical global installation is active.
Remove legacy/conflicting package names that may still register plugins.
Re-check plugin load path consistency.

2) Validate Teams auth fields completely

Ensure app ID, tenant ID, and app password (client secret value) are all present.
Confirm config validation passes before restart.

3) Restart and verify by behavior, not process state

Don’t trust “running” status alone.
Verify no new auto-restart attempt lines over a timed observation window.
Verify inbound/outbound Teams flow during that same window.

4) If still looping, classify as likely upstream lifecycle bug

Capture exact log sequence around each restart boundary.
Attach sanitized sequence to issue/PR for maintainers.

What to include in repro data (high value for maintainers)

Recommend sharing these artifacts (all sanitized):

startup line for msteams provider
user/group/channel resolution lines
first line indicating restart scheduling
health-monitor line indicating “reason: stopped” (if present)
whether duplicate package installations were found
whether credentials were complete at time of test
whether loop persists after those are corrected

Suggested maintainer-facing acceptance test

A robust guardrail test for this bug class would assert:

startAccount() for Teams does not resolve while provider is healthy.
Resolver success (users/channels) alone does not trigger subsystem restart logic.
Restart path only activates on explicit stop, fatal error, or abort signal.
No duplicate listener bind occurs during healthy run (prevents EADDRINUSE cascades).

Current status from this field report

Local config/package hygiene issues: addressed.
Remaining loop signature: still consistent with upstream lifecycle bug described in this issue family.
Net: this report supports merging a monitor/start lifecycle fix that keeps provider task pending until true shutdown.

If helpful, I can provide a compact sanitized log excerpt in follow-up showing exact line ordering (startup → resolution → restart) without any identifiers.

steipete · 2026-03-02T20:52:13Z

Superseded by already-merged MSTeams lifecycle work:

The provider lifecycle now stays pending until abort/shutdown, with monitor lifecycle regression coverage in main. This addresses the same restart-loop root cause.

Closing as duplicate/superseded to keep the queue focused. Thank you for the thorough write-up and docs context.

OpakAlex force-pushed the fix/msteams-provider-promise-lifecycle branch from 6cb542d to 612f04a Compare February 21, 2026 12:12

openclaw-barnacle bot added size: S and removed channel: discord Channel integration: discord app: macos App: macos security Security documentation commands Command implementations agents Agent runtime and tooling size: XL labels Feb 21, 2026

OpakAlex force-pushed the fix/msteams-provider-promise-lifecycle branch from eb13354 to 67183f7 Compare February 21, 2026 18:05

Alexandr Opak and others added 2 commits February 21, 2026 19:17

style(gateway): fix server-channels.test.ts formatting for oxfmt

c94c7b3

Co-authored-by: Cursor <[email protected]>

OpakAlex force-pushed the fix/msteams-provider-promise-lifecycle branch from 67183f7 to c94c7b3 Compare February 21, 2026 18:17

openclaw-barnacle bot added the stale Marked as stale due to inactivity label Mar 1, 2026

steipete closed this Mar 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(msteams): keep provider promise pending until abort to stop auto-restart loop#22605

fix(msteams): keep provider promise pending until abort to stop auto-restart loop#22605
OpakAlex wants to merge 2 commits intoopenclaw:mainfrom
OpakAlex:fix/msteams-provider-promise-lifecycle

OpakAlex commented Feb 21, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

OpakAlex commented Feb 21, 2026

Uh oh!

OpakAlex commented Feb 21, 2026

Uh oh!

Glucksberg commented Feb 24, 2026

Uh oh!

openclaw-barnacle bot commented Mar 1, 2026

Uh oh!

pandego commented Mar 1, 2026

Uh oh!

BradGroux commented Mar 1, 2026

Uh oh!

steipete commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

OpakAlex commented Feb 21, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual (before fix)

Evidence

Human Verification (required)

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Greptile Summary

Confidence Score: 5/5

Uh oh!

OpakAlex commented Feb 21, 2026

Uh oh!

OpakAlex commented Feb 21, 2026

Uh oh!

Glucksberg commented Feb 24, 2026

Uh oh!

openclaw-barnacle bot commented Mar 1, 2026

Uh oh!

pandego commented Mar 1, 2026

Uh oh!

BradGroux commented Mar 1, 2026

Executive summary

What we observed (timeline-style)

Phase A — restart loop with repeated startup success signals

Phase B — eliminated local package-collision factor

Phase C — fixed missing credential config

Phase D — remaining behavior matches known monitor/lifecycle bug class

Distinguishing signals that helped triage

Secure remediation checklist that worked best for us

1) Eliminate package duplication first

2) Validate Teams auth fields completely

3) Restart and verify by behavior, not process state

4) If still looping, classify as likely upstream lifecycle bug

What to include in repro data (high value for maintainers)

Suggested maintainer-facing acceptance test

Current status from this field report

Uh oh!

steipete commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

OpakAlex commented Feb 21, 2026 •

edited by greptile-apps bot

Loading