fix(msteams): fix image attachment download for channel and DM messages#40463
fix(msteams): fix image attachment download for channel and DM messages#40463hddevteam wants to merge 7 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR fixes a real bug where Teams image/file attachments were silently dropped because the Microsoft Graph API never returns Key changes in
Confidence Score: 4/5
Last reviewed commit: 830077f |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 830077f60f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
| const ct = valResp.headers.get("content-type"); | ||
| if (ct) itemContentType = ct; | ||
| buffer = Buffer.from(await valResp.arrayBuffer()); |
There was a problem hiding this comment.
Enforce maxBytes before buffering hosted content
valResp.arrayBuffer() reads the entire /hostedContents/{id}/$value payload into memory before the size guard runs (buffer.byteLength > params.maxBytes), so oversized attachments are still fully downloaded and buffered. In environments where users can send large Teams files, this can cause avoidable memory spikes or OOM despite maxBytes being configured; the response body should be size-limited while reading (or rejected from headers before buffering).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Fixes MSTeams Graph hosted content downloads by handling the Graph API behavior where contentBytes is not inlined in /hostedContents list responses, ensuring Teams images/files aren’t silently dropped.
Changes:
- Decode hosted content from
contentByteswhen present; otherwise fetch bytes via.../hostedContents/{id}/$value. - Capture
content-typefrom the$valueresponse and feed it into MIME detection / storage.
| const valueUrl = `${params.messageUrl}/hostedContents/${encodeURIComponent(item.id)}/$value`; | ||
| const fetchFn = params.fetchFn ?? fetch; | ||
| const { response: valResp, release: valRelease } = await fetchWithSsrFGuard({ | ||
| url: valueUrl, | ||
| fetchImpl: fetchFn, | ||
| init: { headers: { Authorization: `Bearer ${params.accessToken}` } }, | ||
| policy: params.ssrfPolicy, | ||
| auditContext: "msteams.graph.hostedcontent.value", | ||
| }); |
There was a problem hiding this comment.
The fetchWithSsrFGuard(...) call in the item.id branch can throw (network error, SSRF block, invalid URL). Because it's not wrapped in a try/catch, a single hosted content download failure would reject downloadGraphHostedContent and abort all subsequent attachment handling for the message. Consider catching errors around this fetch and continueing so failures remain per-item, consistent with the rest of this loop's error-handling strategy.
| continue; | ||
| } | ||
| const ct = valResp.headers.get("content-type"); | ||
| if (ct) itemContentType = ct; |
There was a problem hiding this comment.
Buffer.from(await valResp.arrayBuffer()) reads the full response into memory before enforcing maxBytes. This bypasses the repo's usual max-bytes protection (e.g., readResponseWithLimit / fetchRemoteMedia patterns) and could allow oversized hosted content to cause high memory usage even though you later continue when buffer.byteLength > params.maxBytes. Prefer enforcing maxBytes while reading (e.g., check content-length and/or read the body with a byte limit) so large payloads are rejected early.
| if (ct) itemContentType = ct; | |
| if (ct) itemContentType = ct; | |
| const contentLengthHeader = valResp.headers.get("content-length"); | |
| if (contentLengthHeader) { | |
| const contentLength = Number(contentLengthHeader); | |
| if (!Number.isNaN(contentLength) && contentLength > params.maxBytes) { | |
| continue; | |
| } | |
| } |
| const ct = valResp.headers.get("content-type"); | ||
| if (ct) itemContentType = ct; | ||
| buffer = Buffer.from(await valResp.arrayBuffer()); |
There was a problem hiding this comment.
When overriding itemContentType from the response header (content-type), consider normalizing/trimming it (e.g., via normalizeContentType) before passing it into mime detection / saveMediaBuffer. This keeps behavior consistent with normalizeGraphAttachment and avoids propagating header values with unexpected whitespace/casing.
| } else if (item.id) { | ||
| // Graph API does not inline contentBytes in the collection response. | ||
| // Fetch the actual bytes via the /$value endpoint instead. | ||
| const valueUrl = `${params.messageUrl}/hostedContents/${encodeURIComponent(item.id)}/$value`; | ||
| const fetchFn = params.fetchFn ?? fetch; | ||
| const { response: valResp, release: valRelease } = await fetchWithSsrFGuard({ |
There was a problem hiding this comment.
There are existing tests for hostedContents when contentBytes is inlined, but this new fallback path (item.id -> /{id}/$value) isn’t covered. Please add a unit test that simulates a hostedContents list item with id but no contentBytes, and asserts that the code fetches .../hostedContents/{id}/$value and stores the returned bytes/content-type.
|
The two failing CI checks are pre-existing issues unrelated to this PR:
Both failures reproduce on |
|
Hi @hddevteam — thanks for the submission. I’m the new Microsoft Teams maintainer for OpenClaw. Please give me a day or two to work through the open Teams backlog. Also, join the Twitter community for daily MS Teams feedback + updates: https://x.com/i/communities/2031170403607974228 |
2090dd9 to
eec664e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eec664e0cb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| "graph.microsoft.de", | ||
| "graph.microsoft.cn", | ||
| // Bot Framework attachment service (smba.trafficmanager.net) used for DM images | ||
| "trafficmanager.net", |
There was a problem hiding this comment.
Restrict auth token forwarding to specific Traffic Manager host
Adding "trafficmanager.net" to the auth allowlist causes fetchWithAuthFallback to attach bearer tokens to any https://*.trafficmanager.net URL after a 401/403, because host checks are suffix-based (isUrlAllowed). Since Traffic Manager subdomains can be tenant-controlled, a malicious attachment URL like https://attacker.trafficmanager.net/... can trigger token forwarding and exfiltrate Graph/Bot Framework access tokens; this should be narrowed to the exact Microsoft endpoint(s) needed (for example smba.trafficmanager.net) rather than the whole parent domain.
Useful? React with 👍 / 👎.
|
Totally understand, @BradGroux — no rush at all! Thanks for taking on the Teams maintainer role and for letting us know. Looking forward to your review whenever you get a chance. 🙏 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4039a52195
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const contentLength = Number(valResp.headers.get("content-length") ?? "0"); | ||
| if (contentLength > 0 && contentLength > params.maxBytes) { | ||
| continue; | ||
| } | ||
| buffer = Buffer.from(await valResp.arrayBuffer()); |
There was a problem hiding this comment.
Enforce byte limit when /$value lacks Content-Length
The new hosted-content fallback only rejects oversized payloads when content-length is present, then calls arrayBuffer() unconditionally; if the server omits or misreports that header (valid for chunked responses), an attachment larger than maxBytes is still fully buffered in memory before being discarded. This defeats the configured size guard for that response class and can cause avoidable memory spikes under large DM/channel attachments.
Useful? React with 👍 / 👎.
…inlined
Graph API's /hostedContents collection endpoint does not return contentBytes
inline in the response — the field is always null/absent. The previous code
would skip every item with an empty contentBytes, causing all image/file
attachments sent via Teams to be silently dropped.
Fix: when contentBytes is absent but the item has an id, fall back to fetching
the actual bytes via the individual /{id}/$value endpoint, which always returns
the binary content. The content-type header from that response is also used to
set itemContentType for accurate MIME detection downstream.
Verified working against Microsoft Teams (channel messages) with both image
and file attachments.
…already string | undefined)
When a Teams channel message contains a mix of image and html attachments, every() required ALL attachments to be html before trying the Graph path. This caused image attachments to be silently skipped. Use some() so the Graph path is attempted whenever at least one attachment has a text/html content type.
…LIST DM image attachments in Microsoft Teams use a direct download URL on smba.trafficmanager.net (the Bot Framework attachment service, SMBA = Service Messaging Bot Attachments). This domain was absent from the auth token allowlist, so download requests were sent without a Bearer token, resulting in 401 responses and silent download failures. Add "trafficmanager.net" so that Bot Framework tokens are automatically attached to requests targeting this domain.
…lback for channels
Two related fixes for buildMSTeamsGraphMessageUrls:
1. DM/group chat: prefer channelData.chatId (the proper Graph chat GUID)
over conversationId, which may be a Bot Framework conversation ID in
the form "a:xxx" that Graph does not recognise.
2. Channel messages: Teams channel IDs surfaced in channelData.team.id
are sometimes in thread format ("19:[email protected]") rather than a
GUID. The /teams/{guid}/channels/ Graph endpoint rejects these with
400. /chats/{threadId}/messages/ accepts thread-format IDs, so append
/chats/ candidate URLs as a fallback after the /teams/ URLs.
4039a52 to
f43d155
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 442621bb7e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| accessToken = await provider.getAccessToken( | ||
| { | ||
| clientId: provider.connectionSettings.clientId, | ||
| clientSecret: provider.connectionSettings.clientSecret, | ||
| tenantId: conversationTenantId, |
There was a problem hiding this comment.
Use token provider API contract when fetching tenant token
downloadMSTeamsGraphMedia accepts tokenProvider as MSTeamsAccessTokenProvider (single-argument getAccessToken(scope)), but this branch calls getAccessToken(authConfig, scope) whenever conversationTenantId differs. If a provider follows the declared interface yet exposes connectionSettings metadata, the first argument becomes an object instead of a scope string, so token acquisition can fail and Graph media fallback returns tokenError for cross-tenant messages. This should be gated by an explicit capability check or a dedicated tenant-scoped provider path rather than invoking an undeclared overload.
Useful? React with 👍 / 👎.
Problem
When Teams users send images or file attachments, the bot silently drops them. This PR fixes multiple root causes discovered in a cross-tenant deployment (Bot App registered in Azure subscription tenant A, Teams conversations in M365 org tenant B). All fixes have been validated in production.
Fixes
1. Fetch hostedContents via
/$valuewhencontentBytesnot inlinedThe Graph API
/hostedContentscollection endpoint never returnscontentBytesinline in the list response. The original code treated an emptycontentBytesas "no content" andcontinued, so every attachment was skipped unconditionally.Fix: when
contentBytesis absent but the item has anid, fall back to fetching the binary via/{id}/$value, which reliably returns the file content. TheContent-Typeheader from that response is also captured for accurate MIME detection.2.
some()instead ofevery()for HTML attachment detection (inbound-media.ts)When a channel message contains a mix of image and
text/htmlattachments,every()required ALL attachments to be html before attempting the Graph path. This silently skipped image attachments in mixed messages.Fix: changed to
some()so the Graph path is attempted whenever at least one attachment has atext/htmlcontent type.3. Add
trafficmanager.nettoDEFAULT_MEDIA_AUTH_HOST_ALLOWLIST(shared.ts)DM image attachments use direct download URLs on
smba.trafficmanager.net(SMBA = Service Messaging Bot Attachments, served via Azure Traffic Manager). This domain was missing from the auth token allowlist, so requests were sent without a Bearer token, resulting in 401 and silent download failures.Fix: added
"trafficmanager.net"toDEFAULT_MEDIA_AUTH_HOST_ALLOWLIST.4. Prefer
channelData.chatIdfor DM/group chats; add/chats/fallback for channel messages (graph.ts)Two related fixes in
buildMSTeamsGraphMessageUrls:conversationIdmay be a Bot Framework conversation ID (a:xxx) not accepted by Graph. PreferchannelData.chatId(the proper Graph chat GUID) when available.channelData.team.idis sometimes a Teams thread ID (19:[email protected]) instead of a GUID. The/teams/{guid}/channels/endpoint rejects this with 400./chats/{threadId}/messages/accepts thread-format IDs, so/chats/URLs are now appended as fallback candidates after the/teams/URLs.Testing
@mariozechner/pi-aidependency unrelated to this PR).