MS Teams: add channel archive persistence and deleted-channel cleanup#43190
MS Teams: add channel archive persistence and deleted-channel cleanup#43190hddevteam wants to merge 19 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR introduces a Key highlights:
Confidence Score: 3/5
|
There was a problem hiding this comment.
Pull request overview
Adds a new Microsoft Teams channel-archive extension and wires new lifecycle/metadata plumbing so plugins can persist Teams channel history (including attachments) and clean up archived state when a channel is deleted.
Changes:
- Introduces
extensions/msteams-channel-archivewith JSONL-backed archive storage, attachment copying, and archive query tools. - Adds a new plugin hook (
channel_deleted) and emits it from the Teams monitor onconversationUpdate/channelDeleted. - Enriches inbound message hook metadata with Teams-specific fields (reply/thread/media/provider metadata) to support archiving.
Reviewed changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/plugins/wired-hooks-message.test.ts | Adds hook-runner regression test coverage for the new channel_deleted hook runner method. |
| src/plugins/types.ts | Extends plugin hook names/types to include channel_deleted and defines its event type. |
| src/plugins/hooks.ts | Adds runChannelDeleted to the hook runner implementation/export surface. |
| src/hooks/message-hook-mappers.ts | Extends canonical inbound hook context and maps additional Teams/media/thread/provider fields into plugin event metadata. |
| src/hooks/message-hook-mappers.test.ts | Updates tests to assert the new mapped metadata fields. |
| src/auto-reply/templating.ts | Adds ProviderMetadata to message context typing for provider-specific untrusted metadata propagation. |
| pnpm-lock.yaml | Adds dependency lock entries for the new extension workspace. |
| extensions/msteams/src/monitor-handler/message-handler.ts | Enriches inbound message context with Teams provider metadata (team/channel IDs, replyToId, etc.). |
| extensions/msteams/src/monitor-handler/inbound-media.ts | Adjusts Graph fallback triggering logic for HTML attachments when no media downloaded. |
| extensions/msteams/src/monitor-handler.ts | Emits channel_deleted hook on Teams conversationUpdate/channelDeleted events via the global hook runner. |
| extensions/msteams/src/monitor-handler.file-consent.test.ts | Adds test coverage verifying channelDeleted events are forwarded to plugin hooks. |
| extensions/msteams/src/attachments/shared.ts | Extends auth-host allowlist for attachment downloads. |
| extensions/msteams/src/attachments/graph.ts | Adds /chats/ URL fallbacks and improves hosted-content download by fetching $value when contentBytes isn’t inlined. |
| extensions/msteams-channel-archive/src/types.ts | Defines archive store record types and search/prune result shapes. |
| extensions/msteams-channel-archive/src/tools.ts | Registers archive query tools with TypeBox schemas and JSON-formatted responses. |
| extensions/msteams-channel-archive/src/channel-cleanup.ts | Implements periodic Graph-based deleted-channel cleanup sweep and prune logic. |
| extensions/msteams-channel-archive/src/channel-cleanup.test.ts | Adds unit tests for cleanup sweep pruning/skip behavior. |
| extensions/msteams-channel-archive/src/archive-store.ts | Implements JSONL archive store, attachment copying, searching, listing, and pruning. |
| extensions/msteams-channel-archive/src/archive-store.test.ts | Adds store tests for dedupe, search/thread queries, and prune/orphan-attachment removal behavior. |
| extensions/msteams-channel-archive/package.json | Adds new extension package manifest and TypeBox dependency. |
| extensions/msteams-channel-archive/openclaw.plugin.json | Declares plugin metadata and config schema for the new extension. |
| extensions/msteams-channel-archive/index.ts | Wires the extension: registers tools/services, archives message_received, prunes on channel_deleted. |
| extensions/msteams-channel-archive/index.test.ts | Adds basic extension wiring test for channel_deleted pruning. |
| CHANGELOG.md | Adds changelog entries describing the Teams channel archive plugin and delete-event pruning. |
| .github/labeler.yml | Adds GitHub label automation for the new extension path. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8155db0e99
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Follow-up pushed in Addressed the substantive review items:
Validation rerun:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f9d9f50866
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Addressed second round of review feedback in commit 872be20: Fixed (4 issues):
Outdated thread (P2 — compare archive channel ID when resolving Graph team) was already fixed in the previous commit ( |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 872be2053a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (!Array.isArray(value)) { | ||
| return []; | ||
| } | ||
| return value.map((entry) => (typeof entry === "string" ? entry.trim() : "")).filter(Boolean); |
There was a problem hiding this comment.
Preserve mediaTypes index alignment with mediaPaths
readStringArray removes empty strings, but Teams inbound payloads intentionally keep MediaTypes cardinality aligned with MediaPaths (missing MIME slots are represented as empty strings). When any attachment lacks a detected content type, filtering empties shifts later MIME values left, so archiveMessage records incorrect MIME types for attachments by index (and can drop the last MIME). This breaks MIME-based archive queries and attachment metadata accuracy for mixed-type uploads.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c074e4418a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
|
|
||
| try { | ||
| accessToken ??= await getAccessToken(); |
There was a problem hiding this comment.
Avoid pruning with a token from the wrong tenant
The sweep acquires one Graph token once (accessToken ??= await getAccessToken()) and reuses it for every archive, but archived channels can belong to different tenants; when that happens, fetchGraphChannelExists can return 404 under the wrong tenant token and this path prunes still-active channels as if they were deleted. This is especially risky in cross-tenant Teams setups introduced elsewhere in this change, so cleanup should resolve/use tenant-specific credentials per archive before treating 404 as deletion.
Useful? React with 👍 / 👎.
|
|
||
| try { | ||
| accessToken ??= await getAccessToken(); | ||
| const cacheKey = `${archive.teamId}::${archive.channelId}`; |
There was a problem hiding this comment.
Cache Graph team resolution by team id instead of channel id
Fresh evidence: the new cache key includes channelId (${archive.teamId}::${archive.channelId}), so archives from the same team but different channels still miss the cache and each call re-runs resolveGraphTeamIdForArchive's full tenant scan. In normal usage (one archive per channel), this keeps the sweep near the original N× behavior and can still trigger Graph throttling on large tenants; cache by team/runtime-team id (or precompute a map once per sweep) to get actual reuse.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 36b16dc3d0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| conversationType, | ||
| conversationId, | ||
| conversationMessageId: conversationMessageId ?? undefined, | ||
| conversationTenantId: conversation?.tenantId ?? undefined, |
There was a problem hiding this comment.
Forward conversation tenant to Graph media resolution
This new conversationTenantId argument is passed into resolveMSTeamsInboundMedia, but that helper does not propagate it to downloadMSTeamsGraphMedia, so _convTenantId remains undefined and the cross-tenant token branch never runs. In cross-tenant Teams channels, hosted-content/attachment Graph fetches will still use the bot tenant token and continue failing despite the new token-selection logic.
Useful? React with 👍 / 👎.
| cleanupTimer = setInterval(() => { | ||
| void runSweep().catch((error: unknown) => { | ||
| api.logger.warn?.("msteams-channel-archive: deleted-channel cleanup sweep failed", { | ||
| error: error instanceof Error ? error.message : String(error), | ||
| }); |
There was a problem hiding this comment.
Prevent overlapping archive cleanup sweeps
The periodic sweep is started with setInterval but there is no in-flight guard, so if one run takes longer than intervalMs a second run begins concurrently. On large tenants this can trigger parallel Graph scans/prunes against the same archive set, increasing throttling and racey duplicate work right when cleanup is already under load.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b7d8ec6912
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| @@ -1,4 +1,5 @@ | |||
| import type { OpenClawConfig, RuntimeEnv } from "openclaw/plugin-sdk/msteams"; | |||
| import { getGlobalHookRunner } from "../../../src/plugins/hook-runner-global.js"; | |||
There was a problem hiding this comment.
Stop importing core source tree from msteams plugin
Replace this relative import with a stable public API dependency: when @openclaw/msteams is installed as an npm plugin (the normal plugin installer target is ~/.openclaw/extensions/<id>), ../../../src/plugins/hook-runner-global.js resolves outside the plugin package and is not present, causing module-load failure before the Teams handler starts. That makes the channel unusable in npm-installed deployments even though it works in the monorepo layout.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0475260e00
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| hasChannelDeletedHooks, | ||
| }); | ||
| if (conversationId && hookRunner && hasChannelDeletedHooks) { | ||
| await hookRunner.runChannelDeleted( |
There was a problem hiding this comment.
Run channel_deleted hooks off the webhook critical path
This await hookRunner.runChannelDeleted(...) runs plugin hook work synchronously during the Teams webhook turn, so heavy handlers (for example the new archive plugin pruning files under a lock) can delay the HTTP response long enough to trigger Bot Framework retries/timeouts on large archives. The issue appears when a channelDeleted event hits a tenant with substantial archived data, and it can cause duplicate deliveries and extra load; this hook should be dispatched fire-and-forget or queued asynchronously like message_received hooks.
Useful? React with 👍 / 👎.
| accessToken, | ||
| archive, | ||
| }); | ||
| graphTeamIdCache.set(cacheKey, graphTeamId); | ||
| } |
There was a problem hiding this comment.
Avoid caching unresolved Graph team IDs per runtime team
The sweep caches graphTeamId even when resolution returns null, keyed only by tenant+runtime team id; for non-GUID team ids, resolution depends on the current archive's channelId, so if the first channel checked is already gone, null is cached and all subsequent archives from that same team are skipped without another lookup. In practice this leaves stale archives unpruned whenever iteration order encounters an unresolvable channel before a resolvable one in the same team.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b8d4caedfd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return null; | ||
| } | ||
|
|
||
| const appId = normalizeSecretInputString(teamsConfig.appId); |
There was a problem hiding this comment.
Fall back to env creds when resolving Graph cleanup auth
resolveGraphCredentials only reads channels.msteams.* from config, so deployments that authenticate Teams via MSTEAMS_APP_ID / MSTEAMS_APP_PASSWORD / MSTEAMS_TENANT_ID (which the main Teams channel already supports in extensions/msteams/src/token.ts) will always return null here and skip the cleanup service entirely. In those environments, deleted-channel sweeps never run, leaving stale archives behind even though Teams messaging itself is configured correctly.
Useful? React with 👍 / 👎.
|
TL;DR for reviewers following the stack: The child PR is framing this as a default-policy fix, not a
Follow-up validation run:
|
…inlined
Graph API's /hostedContents collection endpoint does not return contentBytes
inline in the response — the field is always null/absent. The previous code
would skip every item with an empty contentBytes, causing all image/file
attachments sent via Teams to be silently dropped.
Fix: when contentBytes is absent but the item has an id, fall back to fetching
the actual bytes via the individual /{id}/$value endpoint, which always returns
the binary content. The content-type header from that response is also used to
set itemContentType for accurate MIME detection downstream.
Verified working against Microsoft Teams (channel messages) with both image
and file attachments.
…already string | undefined)
When a Teams channel message contains a mix of image and html attachments, every() required ALL attachments to be html before trying the Graph path. This caused image attachments to be silently skipped. Use some() so the Graph path is attempted whenever at least one attachment has a text/html content type.
…LIST DM image attachments in Microsoft Teams use a direct download URL on smba.trafficmanager.net (the Bot Framework attachment service, SMBA = Service Messaging Bot Attachments). This domain was absent from the auth token allowlist, so download requests were sent without a Bearer token, resulting in 401 responses and silent download failures. Add "trafficmanager.net" so that Bot Framework tokens are automatically attached to requests targeting this domain.
…lback for channels
Two related fixes for buildMSTeamsGraphMessageUrls:
1. DM/group chat: prefer channelData.chatId (the proper Graph chat GUID)
over conversationId, which may be a Bot Framework conversation ID in
the form "a:xxx" that Graph does not recognise.
2. Channel messages: Teams channel IDs surfaced in channelData.team.id
are sometimes in thread format ("19:[email protected]") rather than a
GUID. The /teams/{guid}/channels/ Graph endpoint rejects these with
400. /chats/{threadId}/messages/ accepts thread-format IDs, so append
/chats/ candidate URLs as a fallback after the /teams/ URLs.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
- archive-store: tolerate malformed JSONL lines (skip + warn instead of throw) - archive-store: bounded top-N accumulator in searchMessages/searchAttachments avoids O(n) memory with large archives; O(limit) memory instead - channel-cleanup: cache Graph team/channel resolution across archives in one sweep to avoid N× full-tenant Graph scan per archived channel - graph: check content-length header before materialising hosted-content arrayBuffer to prevent large allocations beyond maxBytes Co-authored-by: Copilot <[email protected]>
…er context Co-authored-by: Copilot <[email protected]>
…ctivities Some Bot Framework clients (e.g. certain MS Teams clients) send activity payloads that contain invalid JSON escape sequences such as bare backslashes followed by characters not defined in RFC 8259 (e.g. \p, \q, \c). The previous `express.json()` middleware uses body-parser's strict JSON parser which throws `SyntaxError: Bad escaped character in JSON` on such payloads. This causes the webhook to return a non-200 response, putting Azure Bot Service into exponential backoff and dropping subsequent messages. This fix replaces `express.json()` with `express.raw()` (raw Buffer body), then adds a two-pass JSON parse middleware: 1. First tries standard JSON.parse on the raw body. 2. If that fails (SyntaxError), attempts to repair the payload by double-escaping bare backslashes: /\\([^"\\/bfnrtu])/g → \\\\$1 3. If the repaired payload parses successfully, logs a warning and continues normally so the activity is processed. 4. If the payload cannot be repaired, responds with HTTP 200 to prevent Azure Bot Service backoff, and logs the error for investigation. Fixes: SyntaxError: Bad escaped character in JSON at position N Tested: conversationUpdate and message activities from MS Teams clients
…iddleware Covers the bug where express.json()'s strict parser throws SyntaxError on Bot Framework activities containing invalid escape sequences (e.g. \\p, \\q), causing Azure Bot Service to enter exponential backoff and drop messages. Tests verify: 1. Original JSON.parse() fails on bad-escape payloads (proving the bug) 2. Two-pass repair middleware recovers such payloads 3. Normal well-formed payloads still work (no regression) 4. Valid escape sequences (\\n \\t \\\\ etc.) are not double-escaped 5. Completely unrecoverable JSON is re-thrown (caller returns HTTP 200)"
Greptile code review (PR comment) identified an edge case in the
repair regex: /\\([^\"\\\/bfnrtu])/g incorrectly matches the second \
of a valid \\\\ pair followed by an invalid-escape char.
Example of the bug:
Input JSON text: \"C:\\\\q\" (valid: literal string C:\\q)
Old regex transforms \\q → \\\\q producing \"C:\\\\\\\\q\" (C:\\\\q) — wrong!
Fix: use alternation so valid \\\\ pairs are consumed first before
the invalid-escape branch can inspect their constituent backslashes.
/(\\\\\\\\)+|\\\\([^\"\\\\\\\/bfnrtu])/g
^^^^^^^^^ consume valid \\\\ pairs (leave unchanged)
^^^^^^^^^^^^^^^^^^^^^^^^^ only then repair lone bare escapes
This ensures that:
\\\\q → unchanged (valid: single literal backslash + q)
\\q → \\\\q (repaired: invalid bare escape)
\\\\\\q → \\\\\\\\q (repaired: two-backslash pair OK, trailing lone \\q repaired)
Also adds a regression test for the consecutive-backslash edge case."
Copilot Reviewer identified that the express mock in monitor.lifecycle.test.ts exported json() but not raw(). Since monitor.ts now uses express.raw() instead of express.json(), the test suite would throw 'express.raw is not a function' at startup. Add raw() as a no-op middleware factory alongside json() in the mock, matching the pattern already used for json().
CI 'Check types and lint and oxfmt' step failed because our new files were not formatted with oxfmt. Run oxfmt --write on both files to fix.
1272481 to
173dbc5
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 173dbc5782
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // IMPORTANT: the alternation /(\\)+|\([^...])/ consumes valid \\\\ pairs | ||
| // first so that they are never misidentified as bare-backslash escapes. | ||
| // See repairJsonEscapes() above for full explanation. | ||
| const fixed = repairJsonEscapes(rawText); |
There was a problem hiding this comment.
Define JSON escape repair helper before calling it
When JSON.parse(rawText) fails for a malformed Bot Framework payload, this branch calls repairJsonEscapes(rawText), but no such function is defined in the module. In that scenario the middleware throws a ReferenceError before reaching the fallback parse, so the webhook still returns an error (instead of recovering malformed escapes) and Bot Framework can continue retry/backoff behavior for affected channelDeleted/conversationUpdate deliveries.
Useful? React with 👍 / 👎.
Summary
extensions/msteams-channel-archiveplugin for persistent Teams channel archivingchannel_deletedplugin hook driven by TeamsconversationUpdate/channelDeletedWhat changed
msteams-channel-archiveextension with durable JSONL archive storage and attachment copyingchannel_deletedand emitted it from the Teams monitor onchannelDeletedValidation
pnpm vitest run src/plugins/wired-hooks-message.test.ts extensions/msteams/src/monitor-handler.file-consent.test.ts extensions/msteams-channel-archive/index.test.ts extensions/msteams-channel-archive/src/archive-store.test.ts extensions/msteams-channel-archive/src/channel-cleanup.test.ts src/hooks/message-hook-mappers.test.tschannelDeletedand pruned the new channel archive successfullyNotes
pnpm buildstill hits pre-existing baseline failures insrc/agents/**andsrc/commands/openai-codex-oauth.ts