Skip to content

fix(msteams): inject system event on message delivery failure#60430

Merged
BradGroux merged 2 commits intoopenclaw:mainfrom
BradGroux:bgod/fix-53911-delivery-failure-notify
Apr 4, 2026
Merged

fix(msteams): inject system event on message delivery failure#60430
BradGroux merged 2 commits intoopenclaw:mainfrom
BradGroux:bgod/fix-53911-delivery-failure-notify

Conversation

@BradGroux
Copy link
Copy Markdown
Contributor

Fixes #53911

When message delivery to Teams fails (Bot Framework 502, timeout, etc.), the agent now receives a system event notification so it knows the user didn't receive the response.

Changes:

  • Added delivery failure detection with error classification (transient/permanent) in reply-dispatcher.ts
  • Injects a system event via enqueueSystemEvent on delivery failure
  • Added 2 new tests covering the failure → notification path

Testing:

  • pnpm test -- extensions/msteams/src/reply-dispatcher.test.ts — 12/12 pass
  • pnpm check — clean

Copilot AI review requested due to automatic review settings April 3, 2026 16:57
@openclaw-barnacle openclaw-barnacle bot added docs Improvements or additions to documentation channel: msteams Channel integration: msteams agents Agent runtime and tooling extensions: stepfun size: L maintainer Maintainer-authored PR labels Apr 3, 2026
@BradGroux BradGroux self-assigned this Apr 3, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 3, 2026

Greptile Summary

This PR bundles two independent changes: the stated MSTeams fix (inject a system event when Bot Framework message delivery fails) and a new StepFun provider plugin (extensions/stepfun) that was co-landed from PR #60032.

MSTeams delivery-failure notification (extensions/msteams)

  • createMSTeamsReplyDispatcher now accepts sessionKey and, in flushPendingMessages, calls queueDeliveryFailureSystemEvent if any individual retry fails after a batch failure.
  • Error classification, message formatting, and contextKey scoping to the conversation ID look correct.
  • message-handler.ts wires route.sessionKey through — the one-liner is complete.
  • Two new tests cover the failure path and the success (no-event) path; both exercise the real markDispatchIdleflushPendingMessages flow.

StepFun provider plugin (extensions/stepfun)

  • Adds stepfun (standard) and stepfun-plan providers with China/global endpoints, step-3.5-flash on both catalogs, and step-3.5-flash-2603 on Step Plan only.
  • Region resolution is layered cleanly: explicit base URL → cross-surface base URL → profile ID suffix → STEPFUN_API_KEY env → intl default.
  • Manifest, package, catalog, onboard helpers, labeler entry, generated env-var file, docs, and provider catalog tests are all present.

Minor observations:

  • The two new test assertions for enqueueSystemEventMock both match the same single call; adding toHaveBeenCalledTimes(1) would make the expectation unambiguous.
  • The \"extensions: stepfun\" entry in .github/labeler.yml is inserted between deepseek and anthropic, breaking the alphabetical ordering of the extension label block.

Confidence Score: 4/5

Safe to merge; both changes are additive with no breaking surface modifications.

The MSTeams fix is narrowly scoped, well-tested, and handles edge cases correctly (lastFailedError initialisation, failed === 0 guard). The StepFun extension follows established provider plugin patterns and includes catalog, onboard, manifest, docs, generated files, and integration tests. The only nits are a mildly misleading test assertion structure and an out-of-order labeler entry — neither blocks merging.

No files require special attention; the two P2 style notes are in extensions/msteams/src/reply-dispatcher.test.ts and .github/labeler.yml.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: .github/labeler.yml
Line: 249-252

Comment:
**Out-of-alphabetical-order label entry**

The `"extensions: stepfun"` block is inserted between `"extensions: deepseek"` and `"extensions: anthropic"`, which breaks the alphabetical ordering of the extension label section (D → S → A). `stepfun` should appear after `sglang`/`signal` and before `telegram` in the sorted block.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/msteams/src/reply-dispatcher.test.ts
Line: 239-249

Comment:
**Two assertions, one call — potentially misleading structure**

`queueDeliveryFailureSystemEvent` calls `core.system.enqueueSystemEvent` exactly once, producing a single joined sentence. Both `toHaveBeenCalledWith` matchers here match that same call because the full message contains both substrings. The two-assertion pattern looks as if two separate calls are expected, which could confuse future maintainers.

Consider adding `toHaveBeenCalledTimes(1)` alongside the existing assertions to make the single-call expectation explicit, and collapse the substring checks into one matcher to remove any ambiguity about how many times `enqueueSystemEvent` fires per failure batch.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat: add bundled StepFun provider plugi..." | Re-trigger Greptile

Comment on lines +249 to +252
"extensions: stepfun":
- changed-files:
- any-glob-to-any-file:
- "extensions/stepfun/**"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Out-of-alphabetical-order label entry

The "extensions: stepfun" block is inserted between "extensions: deepseek" and "extensions: anthropic", which breaks the alphabetical ordering of the extension label section (D → S → A). stepfun should appear after sglang/signal and before telegram in the sorted block.

Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/labeler.yml
Line: 249-252

Comment:
**Out-of-alphabetical-order label entry**

The `"extensions: stepfun"` block is inserted between `"extensions: deepseek"` and `"extensions: anthropic"`, which breaks the alphabetical ordering of the extension label section (D → S → A). `stepfun` should appear after `sglang`/`signal` and before `telegram` in the sorted block.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +239 to +249
expect(enqueueSystemEventMock).toHaveBeenCalledWith(
expect.stringContaining("Microsoft Teams delivery failed"),
expect.objectContaining({
sessionKey: "agent:main:main",
contextKey: "msteams:delivery-failure:conv",
}),
);
expect(enqueueSystemEventMock).toHaveBeenCalledWith(
expect.stringContaining("The user may not have received the full reply"),
expect.any(Object),
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Two assertions, one call — potentially misleading structure

queueDeliveryFailureSystemEvent calls core.system.enqueueSystemEvent exactly once, producing a single joined sentence. Both toHaveBeenCalledWith matchers here match that same call because the full message contains both substrings. The two-assertion pattern looks as if two separate calls are expected, which could confuse future maintainers.

Consider adding toHaveBeenCalledTimes(1) alongside the existing assertions to make the single-call expectation explicit, and collapse the substring checks into one matcher to remove any ambiguity about how many times enqueueSystemEvent fires per failure batch.

Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/msteams/src/reply-dispatcher.test.ts
Line: 239-249

Comment:
**Two assertions, one call — potentially misleading structure**

`queueDeliveryFailureSystemEvent` calls `core.system.enqueueSystemEvent` exactly once, producing a single joined sentence. Both `toHaveBeenCalledWith` matchers here match that same call because the full message contains both substrings. The two-assertion pattern looks as if two separate calls are expected, which could confuse future maintainers.

Consider adding `toHaveBeenCalledTimes(1)` alongside the existing assertions to make the single-call expectation explicit, and collapse the substring checks into one matcher to remove any ambiguity about how many times `enqueueSystemEvent` fires per failure batch.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves reliability/observability for Microsoft Teams replies by notifying the agent when outbound delivery fails, and it also introduces a new bundled StepFun model-provider plugin (including onboarding + docs) so StepFun models can be configured and discovered automatically.

Changes:

  • MSTeams: detect message delivery failures during flush fallback sending and enqueue a system event to the agent session.
  • Providers: add bundled StepFun provider plugin (stepfun, stepfun-plan) with region-aware base URL inference and onboarding presets.
  • Docs/tests/metadata: add StepFun docs + changelog entry + labeler rule, plus new provider-resolution tests and env var plumbing.

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/plugins/bundled-provider-auth-env-vars.generated.ts Adds STEPFUN_API_KEY mapping for stepfun / stepfun-plan provider auth discovery.
src/agents/models-config.providers.stepfun.test.ts Adds StepFun implicit-provider resolution tests (env + profiles + baseUrl inference).
src/agents/models-config.e2e-harness.ts Allows STEPFUN env var to be preserved in implicit-provider test harness env snapshots.
pnpm-lock.yaml Registers new extensions/stepfun workspace importer.
extensions/stepfun/provider-catalog.ts Defines StepFun/Step Plan provider IDs, endpoints, and built-in model catalogs.
extensions/stepfun/package.json Declares the StepFun extension package and entrypoint.
extensions/stepfun/openclaw.plugin.json Declares StepFun plugin manifest, auth choices, and env-var wiring.
extensions/stepfun/onboard.ts Provides onboarding preset appliers for standard/plan and CN/Intl endpoints.
extensions/stepfun/index.ts Registers both providers + auth flows and implements region/baseUrl inference for catalog discovery.
extensions/msteams/src/reply-dispatcher.ts Injects a system event when message delivery fails during flush/send fallback logic.
extensions/msteams/src/reply-dispatcher.test.ts Adds tests covering failure → system-event notification and success → no notification.
extensions/msteams/src/monitor-handler/message-handler.ts Passes sessionKey into reply dispatcher to support system event enqueueing.
docs/start/wizard-cli-reference.md Updates wizard docs to include StepFun in onboarding options.
docs/reference/wizard.md Adds StepFun to wizard reference provider list/details.
docs/providers/stepfun.md New StepFun provider documentation page (endpoints, onboarding, model refs, snippets).
docs/providers/models.md Adds StepFun to “starter set” provider quickstart list and normalizes list ordering/formatting.
docs/providers/index.md Adds StepFun to providers index page.
docs/docs.json Registers new StepFun docs page in docs navigation.
docs/concepts/model-providers.md Adds StepFun to bundled provider examples and env-var/provider-id list.
CHANGELOG.md Adds StepFun provider plugin feature entry.
.github/labeler.yml Adds label automation for extensions/stepfun/** changes.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

Comment on lines 8 to 12

- Channels/context visibility: add configurable `contextVisibility` per channel (`all`, `allowlist`, `allowlist_quote`) so supplemental quote, thread, and fetched history context can be filtered by sender allowlists instead of always passing through as received.
- Matrix/exec approvals: add Matrix-native exec approval prompts with account-scoped approvers, channel-or-DM delivery, and room-thread aware resolution handling. (#58635) Thanks @gumadeiras.
- Providers/StepFun: add the bundled StepFun provider plugin with standard and Step Plan endpoints, China/global onboarding choices, `step-3.5-flash` on both catalogs, and `step-3.5-flash-2603` currently exposed on Step Plan. (#60032) Thanks @hengm3467.

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title/description focus on the MSTeams delivery-failure system event, but this change set also adds the bundled StepFun provider plugin (plus docs + changelog). Please either update the PR title/description to reflect both changes, or split StepFun into a separate PR to keep review scope and release notes aligned.

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +17
it("includes standard and Step Plan providers when STEPFUN_API_KEY is configured", async () => {
const agentDir = mkdtempSync(join(tmpdir(), "openclaw-test-"));
const providers = await resolveImplicitProvidersForTest({
agentDir,
env: { ...process.env, STEPFUN_API_KEY: "test-stepfun-key" },
});
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test passes env: { ...process.env, STEPFUN_API_KEY: ... }, which can accidentally include other provider API keys from the developer/CI environment and make the implicit-provider resolution nondeterministic. Prefer passing a minimal env object (or using the existing test harness helpers that snapshot/unset implicit env vars) so the test is isolated and stable.

Copilot uses AI. Check for mistakes.
Comment on lines +67 to +72
it("uses China endpoints when explicit config points the paired surface at the China host", async () => {
const agentDir = mkdtempSync(join(tmpdir(), "openclaw-test-"));
const providers = await resolveImplicitProvidersForTest({
agentDir,
env: { ...process.env, STEPFUN_API_KEY: "test-stepfun-key" },
config: {
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here: spreading process.env into the env passed to resolveImplicitProvidersForTest can pull in unrelated provider keys and make the baseUrl/region assertions flaky. Pass only the env vars needed for this test case (or explicitly unset the implicit-provider env allowlist first).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 165533a377

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +13 to +14
"method": "standard-api-key-cn",
"choiceId": "stepfun-standard-api-key-cn",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make inferred StepFun auth default to global endpoint

Placing standard-api-key-cn first makes non-interactive auth inference pick the China flow when users pass only --stepfun-api-key (no --auth-choice). resolveManifestProviderOnboardAuthFlags de-dupes shared flags by keeping the first matching choice, and inferAuthChoiceFromFlags then uses that first match, so the default becomes stepfun-standard-api-key-cn. For global users this writes .com endpoints by default and causes model calls to fail with global keys; ordering the global choice first avoids that regression.

Useful? React with 👍 / 👎.

@openclaw-barnacle openclaw-barnacle bot added size: S and removed docs Improvements or additions to documentation agents Agent runtime and tooling extensions: stepfun size: L labels Apr 4, 2026
@BradGroux BradGroux merged commit 1b2fb6b into openclaw:main Apr 4, 2026
43 of 81 checks passed
KimGLee pushed a commit to KimGLee/openclaw that referenced this pull request Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: msteams Channel integration: msteams maintainer Maintainer-authored PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: msteams agent unaware of message delivery failure — silently drops errors

3 participants