Skip to content

fix(msteams): block provider promise until abort signal fires#27968

Closed
Anandesh-Sharma wants to merge 1 commit intoopenclaw:mainfrom
Anandesh-Sharma:fix/msteams-provider-blocking-27885
Closed

fix(msteams): block provider promise until abort signal fires#27968
Anandesh-Sharma wants to merge 1 commit intoopenclaw:mainfrom
Anandesh-Sharma:fix/msteams-provider-blocking-27885

Conversation

@Anandesh-Sharma
Copy link
Copy Markdown
Contributor

Summary

  • Fix msteams provider exiting immediately after Express server start, causing infinite auto-restart loop in the channel manager
  • Add the same blocking await pattern used by the Slack provider: the monitor promise now only resolves when the abort signal fires
  • Add { once: true } to the existing abort listener to prevent event listener leaks

Root Cause

monitorMSTeamsProvider() returned { app, shutdown } immediately after expressApp.listen(). The channel manager in server-channels.ts treats promise resolution as "provider stopped" and triggers auto-restart with exponential backoff — resulting in 10 restart attempts cycling indefinitely.

The Slack provider (src/slack/monitor/provider.ts:355-359) correctly blocks with:

await new Promise<void>((resolve) => {
  opts.abortSignal?.addEventListener("abort", () => resolve(), { once: true });
});

This same pattern is now applied to the msteams provider.

Test plan

  • Start msteams provider and verify it stays running (no auto-restart messages in logs)
  • Send abort signal and verify clean shutdown
  • Verify no event listener leaks with { once: true }

Closes #27885

🤖 Generated with Claude Code

The msteams provider returned immediately after starting the Express
server. The channel manager in server-channels.ts treats promise
resolution as "provider stopped" and triggers auto-restart with
exponential backoff, causing an infinite restart loop.

Add the same blocking pattern used by the Slack provider: await a
promise that only resolves when the abort signal fires. Also add
{ once: true } to the existing abort listener to prevent leaks.

Closes openclaw#27885

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@openclaw-barnacle openclaw-barnacle bot added channel: msteams Channel integration: msteams size: XS labels Feb 26, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 83caf25d28

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +311 to +315
if (opts.abortSignal?.aborted) {
resolve();
return;
}
opts.abortSignal?.addEventListener("abort", () => resolve(), { once: true });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Propagate server startup failures out of abort wait

This new wait only resolves when opts.abortSignal aborts, so if expressApp.listen fails (for example EADDRINUSE), the error handler logs but this promise never settles and the channel manager keeps the account marked as running instead of entering its restart/error path. In practice, a failed MSTeams bind can now hang indefinitely until a manual stop triggers abort, so startup failures should also resolve/reject this wait.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 26, 2026

Greptile Summary

Prevents infinite restart loop by blocking the provider promise until abort signal fires. Previously returned immediately after expressApp.listen(), causing channel manager to interpret instant resolution as provider crash and trigger auto-restart. Now matches Slack provider pattern: holds promise until abort fires, ensuring controlled lifecycle.

  • Added blocking promise at extensions/msteams/src/monitor.ts:310-316 that only resolves when abortSignal fires
  • Added { once: true } to abort listener at line 304 to prevent event listener leaks
  • Includes early-exit check for already-aborted signals
  • Consistent with Slack provider pattern (src/slack/monitor/provider.ts:361-365)

Confidence Score: 5/5

  • Safe to merge - straightforward fix for critical auto-restart bug
  • Fix directly addresses the root cause (immediate promise resolution) using established pattern from Slack provider. Changes are minimal, well-commented, and include defensive programming ({ once: true }, early abort check). No new dependencies or complex logic introduced.
  • No files require special attention

Last reviewed commit: 83caf25

@BradGroux
Copy link
Copy Markdown
Contributor

Field report from a live production recovery (sanitized, no secrets / no env values). Posting this in case it helps maintainers and others triaging the same restart-loop class.

Executive summary

We hit a Microsoft Teams provider auto-restart loop with the same user-visible signature described here:

  • provider logs startup and directory/channel resolution
  • then immediately reports auto-restart attempts (1/10, 2/10, ...)
  • loop persists despite successful startup-side lookups

In our incident, there were three independent contributors. Fixing only one did not fully resolve the loop:

  1. Package collision (legacy global package co-installed with current package)
  2. Missing Teams bot credential field (app secret not set)
  3. Upstream lifecycle bug pattern (provider promise appears to resolve too early, matching this issue family)

What we observed (timeline-style)

Phase A — restart loop with repeated startup success signals

  • Teams provider repeatedly logged startup, user resolution, and channel resolution.
  • Immediately after these success logs, gateway health/auto-restart kicked in.
  • Backoff pattern matched known restart-loop behavior.

Phase B — eliminated local package-collision factor

  • Found old global package and current package both installed.
  • Removed legacy global package.
  • Result: removed one clear conflict vector, but loop still present.

Phase C — fixed missing credential config

  • Added missing Teams bot app password field (client secret value) in channel config.
  • Triggered config reload / gateway restart.
  • Result: credential state improved, but restart loop still present.

Phase D — remaining behavior matches known monitor/lifecycle bug class

  • Even after cleanup + credentials corrected, pattern remained:
    • start provider
    • resolve users/channels
    • immediate auto-restart
  • This aligns with issue reports where monitor/start function resolves prematurely and channel manager interprets that as provider exit.

Distinguishing signals that helped triage

These indicators were the most useful to separate local misconfiguration from upstream bug behavior:

  1. Success-before-failure pattern

    • If user/group/channel resolution consistently succeeds before restart, network/auth may not be the primary blocker.
  2. Stable repeating loop shape

    • Consistent startup → resolution → restart sequence with backoff strongly suggests lifecycle contract mismatch.
  3. Persistence across remediation layers

    • If loop persists after:
      • removing duplicate installs,
      • setting required credentials,
      • clean restart,
        then upstream monitor lifecycle is likely involved.

Secure remediation checklist that worked best for us

(Generalized to avoid host-specific details)

1) Eliminate package duplication first

  • Verify only one canonical global installation is active.
  • Remove legacy/conflicting package names that may still register plugins.
  • Re-check plugin load path consistency.

2) Validate Teams auth fields completely

  • Ensure app ID, tenant ID, and app password (client secret value) are all present.
  • Confirm config validation passes before restart.

3) Restart and verify by behavior, not process state

  • Don’t trust “running” status alone.
  • Verify no new auto-restart attempt lines over a timed observation window.
  • Verify inbound/outbound Teams flow during that same window.

4) If still looping, classify as likely upstream lifecycle bug

  • Capture exact log sequence around each restart boundary.
  • Attach sanitized sequence to issue/PR for maintainers.

What to include in repro data (high value for maintainers)

Recommend sharing these artifacts (all sanitized):

  • startup line for msteams provider
  • user/group/channel resolution lines
  • first line indicating restart scheduling
  • health-monitor line indicating “reason: stopped” (if present)
  • whether duplicate package installations were found
  • whether credentials were complete at time of test
  • whether loop persists after those are corrected

Suggested maintainer-facing acceptance test

A robust guardrail test for this bug class would assert:

  1. startAccount() for Teams does not resolve while provider is healthy.
  2. Resolver success (users/channels) alone does not trigger subsystem restart logic.
  3. Restart path only activates on explicit stop, fatal error, or abort signal.
  4. No duplicate listener bind occurs during healthy run (prevents EADDRINUSE cascades).

Current status from this field report

  • Local config/package hygiene issues: addressed.
  • Remaining loop signature: still consistent with upstream lifecycle bug described in this issue family.
  • Net: this report supports merging a monitor/start lifecycle fix that keeps provider task pending until true shutdown.

If helpful, I can provide a compact sanitized log excerpt in follow-up showing exact line ordering (startup → resolution → restart) without any identifiers.

@steipete
Copy link
Copy Markdown
Contributor

steipete commented Mar 2, 2026

Superseded by already-merged MSTeams monitor lifecycle fixes:

monitorMSTeamsProvider now keeps the provider run pending until abort/shutdown, resolves startup/close races, and includes regression coverage in main.

Closing this as duplicate to keep the queue focused. Thank you for the patch and write-up.

@steipete steipete closed this Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: msteams Channel integration: msteams size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

msteams provider exits immediately, causing infinite auto-restart loop

3 participants