Skip to content

MS Teams: add channel archive persistence and deleted-channel cleanup#43190

Open
hddevteam wants to merge 19 commits intoopenclaw:mainfrom
hddevteam:feat/msteams-channel-archive
Open

MS Teams: add channel archive persistence and deleted-channel cleanup#43190
hddevteam wants to merge 19 commits intoopenclaw:mainfrom
hddevteam:feat/msteams-channel-archive

Conversation

@hddevteam
Copy link
Copy Markdown

Summary

  • add a new extensions/msteams-channel-archive plugin for persistent Teams channel archiving
  • persist channel messages plus archived attachments, and expose archive query tools
  • add deleted-channel cleanup, including a channel_deleted plugin hook driven by Teams conversationUpdate/channelDeleted

What changed

  • added the new msteams-channel-archive extension with durable JSONL archive storage and attachment copying
  • enriched inbound hook metadata so the archive plugin receives Teams-specific conversation, reply, media, and provider metadata
  • added plugin hook support for channel_deleted and emitted it from the Teams monitor on channelDeleted
  • pruned archived conversation state when Teams emits channel deletion lifecycle events
  • kept the existing Graph-based cleanup sweep as a backstop for older/archive-state records
  • added changelog entries and focused regression tests for archive storage, cleanup, Teams handler dispatch, and hook runner wiring

Validation

  • pnpm vitest run src/plugins/wired-hooks-message.test.ts extensions/msteams/src/monitor-handler.file-consent.test.ts extensions/msteams-channel-archive/index.test.ts extensions/msteams-channel-archive/src/archive-store.test.ts extensions/msteams-channel-archive/src/channel-cleanup.test.ts src/hooks/message-hook-mappers.test.ts
  • validated on the Azure Teams gateway node with a fresh channel message followed by a restorable channel delete; the runtime logged channelDeleted and pruned the new channel archive successfully

Notes

  • repo-wide pnpm build still hits pre-existing baseline failures in src/agents/** and src/commands/openai-codex-oauth.ts
  • the Azure node is now configured to run the latest checkout via systemd for continued validation

Copilot AI review requested due to automatic review settings March 11, 2026 13:03
@openclaw-barnacle openclaw-barnacle bot added channel: msteams Channel integration: msteams size: XL labels Mar 11, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR introduces a msteams-channel-archive extension that persistently archives Teams channel messages and attachments to local JSONL files, adds deleted-channel cleanup (both event-driven via channelDeleted and periodic via Graph), and enriches the inbound hook metadata (reply chain, team/channel IDs, provider metadata) so downstream plugins receive richer context.

Key highlights:

  • New msteams-channel-archive plugin — JSONL-backed message store with attachment deduplication, content-hash storage, and four query tools (search, get-message, get-thread, search-attachments).
  • channel_deleted plugin hook — new lifecycle hook dispatched from the Teams monitor when a conversationUpdate/channelDeleted event arrives; the archive plugin prunes the deleted channel's data in response.
  • Graph attachment improvements/chats/ fallback URLs for channel threads where teamId is not a UUID, plus /$value fetching for hosted content whose bytes aren't inlined in the collection response.
  • Three issues found:
    • DEFAULT_MEDIA_AUTH_HOST_ALLOWLIST adds "trafficmanager.net" (suffix-matched), causing Bearer tokens to be forwarded to any *.trafficmanager.net host rather than just the intended smba.trafficmanager.net.
    • archiveAttachments is called outside the index file lock, creating a TOCTOU race with pruneConversation that can leave a message record referencing a deleted attachment.
    • onlyHtmlAttachments is now set with some() instead of every() — the variable name is misleading after this intentional semantic change.

Confidence Score: 3/5

  • Mostly safe but has a credential-scope widening in the auth allowlist and a TOCTOU race in the archive store that should be addressed before merging.
  • The PR is well-structured, thoroughly tested, and the overall design is sound. However, the trafficmanager.net entry in DEFAULT_MEDIA_AUTH_HOST_ALLOWLIST broadens token forwarding to any *.trafficmanager.net host (a large Azure-wide namespace) instead of the specific smba.trafficmanager.net — a meaningful security regression. The TOCTOU race between archiveAttachments and pruneConversation is narrow in practice but can produce silently broken archive records. These two issues lower confidence below a straight merge.
  • extensions/msteams/src/attachments/shared.ts (auth allowlist scope) and extensions/msteams-channel-archive/src/archive-store.ts (attachment copy vs. lock ordering) need attention before merging.

Comments Outside Diff (1)

  1. extensions/msteams-channel-archive/src/archive-store.ts, line 671-746 (link)

    TOCTOU race between archiveAttachments and pruneConversation

    archiveAttachments (which physically copies files to the attachment store) is called before the file lock is acquired. The lock is only taken when writing the message record and updating the index. This creates a race:

    1. Thread A calls archiveAttachments and copies attachment → attachments/ab/ab<sha>/file.bin (no lock held).
    2. Thread B calls pruneConversation under the lock, collects all referenced attachment paths from the current index (which doesn't include Thread A's not-yet-written message), and deletes attachments/ab/ab<sha>/file.bin as "unreferenced".
    3. Thread A acquires the lock, writes a message record whose storedPath points to the now-deleted file.

    The result is a message record that permanently references a missing attachment with no way to recover it. The missing: true flag in withAttachmentStatus will surface this, but the data is gone.

    The simplest mitigation is to move the archiveAttachments call inside the lock body so the file's reference is recorded atomically with the copy, or alternatively reverse the pruning logic so it only removes a file after confirming no in-flight writers could reference it (e.g., by re-scanning the index after the message file has been written).

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: extensions/msteams-channel-archive/src/archive-store.ts
    Line: 671-746
    
    Comment:
    **TOCTOU race between `archiveAttachments` and `pruneConversation`**
    
    `archiveAttachments` (which physically copies files to the attachment store) is called **before** the file lock is acquired. The lock is only taken when writing the message record and updating the index. This creates a race:
    
    1. Thread A calls `archiveAttachments` and copies attachment → `attachments/ab/ab<sha>/file.bin` (no lock held).
    2. Thread B calls `pruneConversation` under the lock, collects all referenced attachment paths from the current index (which doesn't include Thread A's not-yet-written message), and deletes `attachments/ab/ab<sha>/file.bin` as "unreferenced".
    3. Thread A acquires the lock, writes a message record whose `storedPath` points to the now-deleted file.
    
    The result is a message record that permanently references a missing attachment with no way to recover it. The `missing: true` flag in `withAttachmentStatus` will surface this, but the data is gone.
    
    The simplest mitigation is to move the `archiveAttachments` call inside the lock body so the file's reference is recorded atomically with the copy, or alternatively reverse the pruning logic so it only removes a file after confirming no in-flight writers could reference it (e.g., by re-scanning the index *after* the message file has been written).
    
    How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 8155db0

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Microsoft Teams channel-archive extension and wires new lifecycle/metadata plumbing so plugins can persist Teams channel history (including attachments) and clean up archived state when a channel is deleted.

Changes:

  • Introduces extensions/msteams-channel-archive with JSONL-backed archive storage, attachment copying, and archive query tools.
  • Adds a new plugin hook (channel_deleted) and emits it from the Teams monitor on conversationUpdate/channelDeleted.
  • Enriches inbound message hook metadata with Teams-specific fields (reply/thread/media/provider metadata) to support archiving.

Reviewed changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/plugins/wired-hooks-message.test.ts Adds hook-runner regression test coverage for the new channel_deleted hook runner method.
src/plugins/types.ts Extends plugin hook names/types to include channel_deleted and defines its event type.
src/plugins/hooks.ts Adds runChannelDeleted to the hook runner implementation/export surface.
src/hooks/message-hook-mappers.ts Extends canonical inbound hook context and maps additional Teams/media/thread/provider fields into plugin event metadata.
src/hooks/message-hook-mappers.test.ts Updates tests to assert the new mapped metadata fields.
src/auto-reply/templating.ts Adds ProviderMetadata to message context typing for provider-specific untrusted metadata propagation.
pnpm-lock.yaml Adds dependency lock entries for the new extension workspace.
extensions/msteams/src/monitor-handler/message-handler.ts Enriches inbound message context with Teams provider metadata (team/channel IDs, replyToId, etc.).
extensions/msteams/src/monitor-handler/inbound-media.ts Adjusts Graph fallback triggering logic for HTML attachments when no media downloaded.
extensions/msteams/src/monitor-handler.ts Emits channel_deleted hook on Teams conversationUpdate/channelDeleted events via the global hook runner.
extensions/msteams/src/monitor-handler.file-consent.test.ts Adds test coverage verifying channelDeleted events are forwarded to plugin hooks.
extensions/msteams/src/attachments/shared.ts Extends auth-host allowlist for attachment downloads.
extensions/msteams/src/attachments/graph.ts Adds /chats/ URL fallbacks and improves hosted-content download by fetching $value when contentBytes isn’t inlined.
extensions/msteams-channel-archive/src/types.ts Defines archive store record types and search/prune result shapes.
extensions/msteams-channel-archive/src/tools.ts Registers archive query tools with TypeBox schemas and JSON-formatted responses.
extensions/msteams-channel-archive/src/channel-cleanup.ts Implements periodic Graph-based deleted-channel cleanup sweep and prune logic.
extensions/msteams-channel-archive/src/channel-cleanup.test.ts Adds unit tests for cleanup sweep pruning/skip behavior.
extensions/msteams-channel-archive/src/archive-store.ts Implements JSONL archive store, attachment copying, searching, listing, and pruning.
extensions/msteams-channel-archive/src/archive-store.test.ts Adds store tests for dedupe, search/thread queries, and prune/orphan-attachment removal behavior.
extensions/msteams-channel-archive/package.json Adds new extension package manifest and TypeBox dependency.
extensions/msteams-channel-archive/openclaw.plugin.json Declares plugin metadata and config schema for the new extension.
extensions/msteams-channel-archive/index.ts Wires the extension: registers tools/services, archives message_received, prunes on channel_deleted.
extensions/msteams-channel-archive/index.test.ts Adds basic extension wiring test for channel_deleted pruning.
CHANGELOG.md Adds changelog entries describing the Teams channel archive plugin and delete-event pruning.
.github/labeler.yml Adds GitHub label automation for the new extension path.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8155db0e99

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@hddevteam
Copy link
Copy Markdown
Author

Follow-up pushed in f9d9f5086.

Addressed the substantive review items:

  • tightened the Teams media auth allowlist from trafficmanager.net to smba.trafficmanager.net
  • moved attachment archiving under the archive store lock to avoid the archive/prune race
  • preserved legacy Teams allowlist compatibility by checking the runtime team ID first and Graph team ID as an alternate key
  • fixed legacy deleted-channel cleanup fallback to resolve Graph team IDs using the archived channelId
  • renamed the HTML-attachment fallback flag to match the current behavior

Validation rerun:

  • pnpm vitest run extensions/msteams/src/policy.test.ts extensions/msteams/src/monitor-handler.file-consent.test.ts extensions/msteams-channel-archive/src/channel-cleanup.test.ts extensions/msteams-channel-archive/src/archive-store.test.ts src/hooks/message-hook-mappers.test.ts src/plugins/wired-hooks-message.test.ts extensions/msteams-channel-archive/index.test.ts

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f9d9f50866

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@hddevteam
Copy link
Copy Markdown
Author

Addressed second round of review feedback in commit 872be20:

Fixed (4 issues):

  1. JSONL error tolerance (archive-store.ts) — parseJsonLine now returns null on malformed lines and the caller filters them out with a console.warn; a single corrupt line no longer breaks all reads for that archive.

  2. Bounded top-N search (archive-store.ts) — searchMessages and searchAttachments now maintain an O(limit)-memory bounded accumulator instead of collecting all matches and sorting at the end. This avoids unbounded allocations when archives are large and limit is small.

  3. Graph team discovery cache (channel-cleanup.ts) — runArchiveCleanupSweep now caches teamId::channelId → graphTeamId across the per-archive loop so a full-tenant Graph scan is only performed once per distinct key per sweep, not once per archived channel.

  4. maxBytes guard before buffering (graph.ts) — Checks content-length before calling arrayBuffer() on hosted-content /$value responses; oversized payloads are skipped without being materialized.

Outdated thread (P2 — compare archive channel ID when resolving Graph team) was already fixed in the previous commit (f9d9f5086) — resolveGraphTeamIdForArchive now compares channel.id against archive.channelId, not teamRuntimeId.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 872be2053a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if (!Array.isArray(value)) {
return [];
}
return value.map((entry) => (typeof entry === "string" ? entry.trim() : "")).filter(Boolean);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve mediaTypes index alignment with mediaPaths

readStringArray removes empty strings, but Teams inbound payloads intentionally keep MediaTypes cardinality aligned with MediaPaths (missing MIME slots are represented as empty strings). When any attachment lacks a detected content type, filtering empties shifts later MIME values left, so archiveMessage records incorrect MIME types for attachments by index (and can drop the last MIME). This breaks MIME-based archive queries and attachment metadata accuracy for mixed-type uploads.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c074e4418a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

}

try {
accessToken ??= await getAccessToken();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid pruning with a token from the wrong tenant

The sweep acquires one Graph token once (accessToken ??= await getAccessToken()) and reuses it for every archive, but archived channels can belong to different tenants; when that happens, fetchGraphChannelExists can return 404 under the wrong tenant token and this path prunes still-active channels as if they were deleted. This is especially risky in cross-tenant Teams setups introduced elsewhere in this change, so cleanup should resolve/use tenant-specific credentials per archive before treating 404 as deletion.

Useful? React with 👍 / 👎.


try {
accessToken ??= await getAccessToken();
const cacheKey = `${archive.teamId}::${archive.channelId}`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Cache Graph team resolution by team id instead of channel id

Fresh evidence: the new cache key includes channelId (${archive.teamId}::${archive.channelId}), so archives from the same team but different channels still miss the cache and each call re-runs resolveGraphTeamIdForArchive's full tenant scan. In normal usage (one archive per channel), this keeps the sweep near the original N× behavior and can still trigger Graph throttling on large tenants; cache by team/runtime-team id (or precompute a map once per sweep) to get actual reuse.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36b16dc3d0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

conversationType,
conversationId,
conversationMessageId: conversationMessageId ?? undefined,
conversationTenantId: conversation?.tenantId ?? undefined,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Forward conversation tenant to Graph media resolution

This new conversationTenantId argument is passed into resolveMSTeamsInboundMedia, but that helper does not propagate it to downloadMSTeamsGraphMedia, so _convTenantId remains undefined and the cross-tenant token branch never runs. In cross-tenant Teams channels, hosted-content/attachment Graph fetches will still use the bot tenant token and continue failing despite the new token-selection logic.

Useful? React with 👍 / 👎.

Comment on lines +128 to +132
cleanupTimer = setInterval(() => {
void runSweep().catch((error: unknown) => {
api.logger.warn?.("msteams-channel-archive: deleted-channel cleanup sweep failed", {
error: error instanceof Error ? error.message : String(error),
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prevent overlapping archive cleanup sweeps

The periodic sweep is started with setInterval but there is no in-flight guard, so if one run takes longer than intervalMs a second run begins concurrently. On large tenants this can trigger parallel Graph scans/prunes against the same archive set, increasing throttling and racey duplicate work right when cleanup is already under load.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7d8ec6912

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@@ -1,4 +1,5 @@
import type { OpenClawConfig, RuntimeEnv } from "openclaw/plugin-sdk/msteams";
import { getGlobalHookRunner } from "../../../src/plugins/hook-runner-global.js";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop importing core source tree from msteams plugin

Replace this relative import with a stable public API dependency: when @openclaw/msteams is installed as an npm plugin (the normal plugin installer target is ~/.openclaw/extensions/<id>), ../../../src/plugins/hook-runner-global.js resolves outside the plugin package and is not present, causing module-load failure before the Teams handler starts. That makes the channel unusable in npm-installed deployments even though it works in the monorepo layout.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0475260e00

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

hasChannelDeletedHooks,
});
if (conversationId && hookRunner && hasChannelDeletedHooks) {
await hookRunner.runChannelDeleted(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Run channel_deleted hooks off the webhook critical path

This await hookRunner.runChannelDeleted(...) runs plugin hook work synchronously during the Teams webhook turn, so heavy handlers (for example the new archive plugin pruning files under a lock) can delay the HTTP response long enough to trigger Bot Framework retries/timeouts on large archives. The issue appears when a channelDeleted event hits a tenant with substantial archived data, and it can cause duplicate deliveries and extra load; this hook should be dispatched fire-and-forget or queued asynchronously like message_received hooks.

Useful? React with 👍 / 👎.

Comment on lines +218 to +222
accessToken,
archive,
});
graphTeamIdCache.set(cacheKey, graphTeamId);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid caching unresolved Graph team IDs per runtime team

The sweep caches graphTeamId even when resolution returns null, keyed only by tenant+runtime team id; for non-GUID team ids, resolution depends on the current archive's channelId, so if the first channel checked is already gone, null is cached and all subsequent archives from that same team are skipped without another lookup. In practice this leaves stale archives unpruned whenever iteration order encounters an unresolvable channel before a resolvable one in the same team.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b8d4caedfd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

return null;
}

const appId = normalizeSecretInputString(teamsConfig.appId);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fall back to env creds when resolving Graph cleanup auth

resolveGraphCredentials only reads channels.msteams.* from config, so deployments that authenticate Teams via MSTEAMS_APP_ID / MSTEAMS_APP_PASSWORD / MSTEAMS_TENANT_ID (which the main Teams channel already supports in extensions/msteams/src/token.ts) will always return null here and skip the cleanup service entirely. In those environments, deleted-channel sweeps never run, leaving stale archives behind even though Teams messaging itself is configured correctly.

Useful? React with 👍 / 👎.

@hddevteam
Copy link
Copy Markdown
Author

hddevteam commented Mar 16, 2026

TL;DR for reviewers following the stack:

The child PR is framing this as a default-policy fix, not a replyToId loss bug.

  • mention gating (requireMention) and reply destination (replyStyle) are separate concerns
  • channel replies should still default to thread
  • creating a new post should only happen with an explicit replyStyle: "top-level" override

Follow-up validation run:

  • pnpm test -- extensions/msteams/src/policy.test.ts extensions/msteams/src/messenger.test.ts

Ming and others added 12 commits March 17, 2026 14:38
…inlined

Graph API's /hostedContents collection endpoint does not return contentBytes
inline in the response — the field is always null/absent. The previous code
would skip every item with an empty contentBytes, causing all image/file
attachments sent via Teams to be silently dropped.

Fix: when contentBytes is absent but the item has an id, fall back to fetching
the actual bytes via the individual /{id}/$value endpoint, which always returns
the binary content. The content-type header from that response is also used to
set itemContentType for accurate MIME detection downstream.

Verified working against Microsoft Teams (channel messages) with both image
and file attachments.
When a Teams channel message contains a mix of image and html attachments,
every() required ALL attachments to be html before trying the Graph path.
This caused image attachments to be silently skipped.

Use some() so the Graph path is attempted whenever at least one attachment
has a text/html content type.
…LIST

DM image attachments in Microsoft Teams use a direct download URL on
smba.trafficmanager.net (the Bot Framework attachment service, SMBA =
Service Messaging Bot Attachments). This domain was absent from the auth
token allowlist, so download requests were sent without a Bearer token,
resulting in 401 responses and silent download failures.

Add "trafficmanager.net" so that Bot Framework tokens are automatically
attached to requests targeting this domain.
…lback for channels

Two related fixes for buildMSTeamsGraphMessageUrls:

1. DM/group chat: prefer channelData.chatId (the proper Graph chat GUID)
   over conversationId, which may be a Bot Framework conversation ID in
   the form "a:xxx" that Graph does not recognise.

2. Channel messages: Teams channel IDs surfaced in channelData.team.id
   are sometimes in thread format ("19:[email protected]") rather than a
   GUID. The /teams/{guid}/channels/ Graph endpoint rejects these with
   400. /chats/{threadId}/messages/ accepts thread-format IDs, so append
   /chats/ candidate URLs as a fallback after the /teams/ URLs.
- archive-store: tolerate malformed JSONL lines (skip + warn instead of throw)
- archive-store: bounded top-N accumulator in searchMessages/searchAttachments
  avoids O(n) memory with large archives; O(limit) memory instead
- channel-cleanup: cache Graph team/channel resolution across archives in
  one sweep to avoid N× full-tenant Graph scan per archived channel
- graph: check content-length header before materialising hosted-content
  arrayBuffer to prevent large allocations beyond maxBytes

Co-authored-by: Copilot <[email protected]>
…ctivities

Some Bot Framework clients (e.g. certain MS Teams clients) send activity
payloads that contain invalid JSON escape sequences such as bare backslashes
followed by characters not defined in RFC 8259 (e.g. \p, \q, \c).

The previous `express.json()` middleware uses body-parser's strict JSON
parser which throws `SyntaxError: Bad escaped character in JSON` on such
payloads. This causes the webhook to return a non-200 response, putting
Azure Bot Service into exponential backoff and dropping subsequent messages.

This fix replaces `express.json()` with `express.raw()` (raw Buffer body),
then adds a two-pass JSON parse middleware:

1. First tries standard JSON.parse on the raw body.
2. If that fails (SyntaxError), attempts to repair the payload by
   double-escaping bare backslashes: /\\([^"\\/bfnrtu])/g → \\\\$1
3. If the repaired payload parses successfully, logs a warning and
   continues normally so the activity is processed.
4. If the payload cannot be repaired, responds with HTTP 200 to prevent
   Azure Bot Service backoff, and logs the error for investigation.

Fixes: SyntaxError: Bad escaped character in JSON at position N
Tested: conversationUpdate and message activities from MS Teams clients
…iddleware

Covers the bug where express.json()'s strict parser throws SyntaxError on
Bot Framework activities containing invalid escape sequences (e.g. \\p, \\q),
causing Azure Bot Service to enter exponential backoff and drop messages.

Tests verify:
1. Original JSON.parse() fails on bad-escape payloads (proving the bug)
2. Two-pass repair middleware recovers such payloads
3. Normal well-formed payloads still work (no regression)
4. Valid escape sequences (\\n \\t \\\\ etc.) are not double-escaped
5. Completely unrecoverable JSON is re-thrown (caller returns HTTP 200)"
hddevteam and others added 7 commits March 17, 2026 14:40
Greptile code review (PR comment) identified an edge case in the
repair regex: /\\([^\"\\\/bfnrtu])/g incorrectly matches the second \
of a valid \\\\ pair followed by an invalid-escape char.

Example of the bug:
  Input JSON text: \"C:\\\\q\" (valid: literal string C:\\q)
  Old regex transforms \\q → \\\\q producing \"C:\\\\\\\\q\" (C:\\\\q) — wrong!

Fix: use alternation so valid \\\\ pairs are consumed first before
the invalid-escape branch can inspect their constituent backslashes.

  /(\\\\\\\\)+|\\\\([^\"\\\\\\\/bfnrtu])/g
    ^^^^^^^^^ consume valid \\\\ pairs (leave unchanged)
              ^^^^^^^^^^^^^^^^^^^^^^^^^ only then repair lone bare escapes

This ensures that:
  \\\\q  → unchanged   (valid: single literal backslash + q)
  \\q    → \\\\q       (repaired: invalid bare escape)
  \\\\\\q → \\\\\\\\q  (repaired: two-backslash pair OK, trailing lone \\q repaired)

Also adds a regression test for the consecutive-backslash edge case."
Copilot Reviewer identified that the express mock in monitor.lifecycle.test.ts
exported json() but not raw(). Since monitor.ts now uses express.raw() instead
of express.json(), the test suite would throw 'express.raw is not a function'
at startup.

Add raw() as a no-op middleware factory alongside json() in the mock, matching
the pattern already used for json().
CI 'Check types and lint and oxfmt' step failed because our new files
were not formatted with oxfmt. Run oxfmt --write on both files to fix.
@hddevteam hddevteam force-pushed the feat/msteams-channel-archive branch from 1272481 to 173dbc5 Compare March 17, 2026 06:41
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 173dbc5782

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// IMPORTANT: the alternation /(\\)+|\([^...])/ consumes valid \\\\ pairs
// first so that they are never misidentified as bare-backslash escapes.
// See repairJsonEscapes() above for full explanation.
const fixed = repairJsonEscapes(rawText);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Define JSON escape repair helper before calling it

When JSON.parse(rawText) fails for a malformed Bot Framework payload, this branch calls repairJsonEscapes(rawText), but no such function is defined in the module. In that scenario the middleware throws a ReferenceError before reaching the fallback parse, so the webhook still returns an error (instead of recovering malformed escapes) and Bot Framework can continue retry/backoff behavior for affected channelDeleted/conversationUpdate deliveries.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: msteams Channel integration: msteams size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants