fix(outbound): multi-layer deduplication for all channels#33455
fix(outbound): multi-layer deduplication for all channels#33455Youhai020616 wants to merge 3 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR adds three layers of channel-agnostic outbound message deduplication (process-level TTL cache, text self-deduplication, and per-dispatcher lifecycle dedup) to fix duplicate message delivery across Telegram, WhatsApp, and other channels. The overall architecture is well-thought-out, and the separation of concerns between the three layers is clear. Key issues found:
Confidence Score: 3/5
Last reviewed commit: 35a458c |
| const hasChannelData = Boolean( | ||
| payload.channelData && Object.keys(payload.channelData).length > 0, | ||
| ); | ||
|
|
||
| // Nothing to deduplicate for empty payloads. | ||
| if (!rawText.trim() && mediaUrls.length === 0 && !hasChannelData) { | ||
| return null; | ||
| } | ||
|
|
||
| const normalizedText = rawText ? normalizeTextForComparison(rawText) : ""; | ||
| const sortedMediaUrls = mediaUrls.toSorted().join(","); | ||
| const replyTo = payload.replyToId ?? ""; | ||
| const thread = threadId != null ? String(threadId) : ""; | ||
| const account = accountId ?? ""; | ||
|
|
||
| return `out|${channel}|${account}|${to}|${thread}|${normalizedText}|${sortedMediaUrls}|${replyTo}`; |
There was a problem hiding this comment.
channelData content excluded from dedup key
The key construction at line 55 does not include the channelData value — it only checks whether channelData is non-empty to avoid returning null for "empty" payloads. For payloads that consist entirely of channelData (no text, no mediaUrl/mediaUrls), the key collapses to out|${channel}|${account}|${to}|${thread}|||, which is identical for any two such payloads sent to the same recipient and thread.
This means two distinct channelData-only messages (e.g., two different Telegram flex/carousel layouts) delivered within the 30s TTL to the same recipient would collide on the same key — the second delivery would be silently suppressed as a false duplicate.
A serialized representation of channelData should be included in the key:
// After the existing key parts, include channelData fingerprint
const channelDataFingerprint = hasChannelData
? JSON.stringify(payload.channelData, Object.keys(payload.channelData ?? {}).sort())
: "";
return `out|${channel}|${account}|${to}|${thread}|${normalizedText}|${sortedMediaUrls}|${replyTo}|${channelDataFingerprint}`;Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/outbound/outbound-dedupe.ts
Line: 40-55
Comment:
**`channelData` content excluded from dedup key**
The key construction at line 55 does not include the `channelData` value — it only checks whether `channelData` is non-empty to avoid returning `null` for "empty" payloads. For payloads that consist entirely of `channelData` (no `text`, no `mediaUrl`/`mediaUrls`), the key collapses to `out|${channel}|${account}|${to}|${thread}|||`, which is identical for any two such payloads sent to the same recipient and thread.
This means two distinct `channelData`-only messages (e.g., two different Telegram flex/carousel layouts) delivered within the 30s TTL to the same recipient would collide on the same key — the second delivery would be silently suppressed as a false duplicate.
A serialized representation of `channelData` should be included in the key:
```ts
// After the existing key parts, include channelData fingerprint
const channelDataFingerprint = hasChannelData
? JSON.stringify(payload.channelData, Object.keys(payload.channelData ?? {}).sort())
: "";
return `out|${channel}|${account}|${to}|${thread}|${normalizedText}|${sortedMediaUrls}|${replyTo}|${channelDataFingerprint}`;
```
How can I resolve this? If you propose a fix, please make it concise.| const thread = threadId != null ? String(threadId) : ""; | ||
| const account = accountId ?? ""; | ||
|
|
||
| return `out|${channel}|${account}|${to}|${thread}|${normalizedText}|${sortedMediaUrls}|${replyTo}`; |
There was a problem hiding this comment.
Unescaped | separator risks key collisions
The key uses a bare | as a field separator without escaping the field values. If any field (to, thread, channel, etc.) itself contains a | character, the resulting key is ambiguous. For example:
{ to: "a|b", threadId: "" }→...||a|b||...{ to: "a", threadId: "b" }→...||a|b||...
Both produce the same segment, so messages to different recipients or threads could collide. While common channel identifiers (phone numbers, integer chat IDs) rarely contain pipes, custom channel IDs or topic thread identifiers might.
Consider using a separator that cannot appear in the field values (e.g. \0, or encoding fields with a safe delimiter), or using JSON.stringify([channel, account, to, thread, normalizedText, ...]) to produce an unambiguous composite key.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/outbound/outbound-dedupe.ts
Line: 55
Comment:
**Unescaped `|` separator risks key collisions**
The key uses a bare `|` as a field separator without escaping the field values. If any field (`to`, `thread`, `channel`, etc.) itself contains a `|` character, the resulting key is ambiguous. For example:
- `{ to: "a|b", threadId: "" }` → `...||a|b||...`
- `{ to: "a", threadId: "b" }` → `...||a|b||...`
Both produce the same segment, so messages to different recipients or threads could collide. While common channel identifiers (phone numbers, integer chat IDs) rarely contain pipes, custom channel IDs or topic thread identifiers might.
Consider using a separator that cannot appear in the field values (e.g. `\0`, or encoding fields with a safe delimiter), or using `JSON.stringify([channel, account, to, thread, normalizedText, ...])` to produce an unambiguous composite key.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 35a458cd7e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/infra/outbound/deliver.ts
Outdated
| channel, | ||
| to, | ||
| accountId, | ||
| threadId: params.threadId, | ||
| payload, |
There was a problem hiding this comment.
Include resolved reply target in outbound dedupe key
The dedupe key is built from payload before delivery, but the actual send target uses effectivePayload.replyToId ?? params.replyToId; this means two replies with identical text/media sent to the same chat but different top-level replyToId values are treated as duplicates and the second reply is skipped. This can drop legitimate threaded replies when payload-level replyToId is unset, so the dedupe params should include the resolved reply target used by sendOverrides.
Useful? React with 👍 / 👎.
| const thread = threadId != null ? String(threadId) : ""; | ||
| const account = accountId ?? ""; | ||
|
|
||
| return `out|${channel}|${account}|${to}|${thread}|${normalizedText}|${sortedMediaUrls}|${replyTo}`; |
There was a problem hiding this comment.
Differentiate channelData payloads in dedupe key
The dedupe key string omits payload.channelData contents, so messages that share text/media/to/thread but carry different structured channel payloads (for example different LINE flex/template data or different button sets) collapse to the same key and later sends are suppressed for 30s. Because channelData changes delivery semantics, the key should include a stable representation/hash of it rather than only a presence check.
Useful? React with 👍 / 👎.
|
Thanks for the sharp observation @benBeideman — you're absolutely right.
I've removed
I've also pushed a v2 commit addressing all automated review feedback:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 329f463158
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Include channelData fingerprint so distinct structured payloads are not | ||
| // collapsed into the same key (e.g. different Telegram flex layouts). | ||
| const channelDataFp = hasChannelData | ||
| ? JSON.stringify(payload.channelData, Object.keys(payload.channelData!).sort()) |
There was a problem hiding this comment.
Serialize full channelData when building dedupe fingerprints
Using JSON.stringify(payload.channelData, Object.keys(payload.channelData!).sort()) only whitelists top-level keys, so nested properties are dropped from the fingerprint (for example two payloads with different channelData.template contents both stringify to the same value). In channels that send structured/nested payloads, distinct outbound messages can be treated as duplicates and skipped for 30 seconds.
Useful? React with 👍 / 👎.
| return null; | ||
| } | ||
|
|
||
| const normalizedText = rawText ? normalizeTextForComparison(rawText) : ""; |
There was a problem hiding this comment.
Keep emoji-only outbound texts distinct in dedupe keys
The dedupe key uses normalizeTextForComparison, which strips emoji; for emoji-only messages this produces an empty normalized text, so different messages like "😀" and "😢" collapse to the same key for the same recipient/channel. That causes legitimate emoji-only sends within the TTL window to be dropped as duplicates.
Useful? React with 👍 / 👎.
| if (deliveredPayloadKeys.has(dedupeKey)) { | ||
| return false; | ||
| } | ||
| deliveredPayloadKeys.add(dedupeKey); |
There was a problem hiding this comment.
Register per-dispatch dedupe keys only after successful delivery
The dispatcher marks a payload as delivered before options.deliver runs, and the key is never removed on error. If the first attempt fails transiently, any later identical payload in the same dispatch lifecycle is skipped, which blocks retry-by-reemission and can lose the reply entirely.
Useful? React with 👍 / 👎.
329f463 to
6419274
Compare
Add channel-agnostic outbound message deduplication to fix duplicate message delivery across Telegram, WhatsApp, and all other channels. Layer 1 (Cross-turn TTL dedup): Process-level 30s TTL cache skips identical payloads recently sent to the same recipient — covers queue race conditions, cron announce repeats, and tool execution races. Layer 2 (Text self-duplication): Normalization step detects and fixes paragraph-level and full-text repetition before delivery — covers BlueBubbles streaming concatenation bug. Key design: - channelData fingerprint (sorted JSON) included in dedup key - Resolved replyToId (payload-level ?? dispatch-level fallback) - JSON.stringify array key format to prevent delimiter collisions - Only registers after successful delivery (failed sends can retry) Fixes openclaw#30316, openclaw#25192 Supersedes openclaw#30478
6419274 to
a5ae2e4
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a5ae2e4d0d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (!key) { | ||
| return false; | ||
| } | ||
| return dedupeCache.peek(key, now); |
There was a problem hiding this comment.
Atomically claim dedupe key before sending outbound payload
isOutboundDuplicate only performs a peek, so concurrent deliveries of the same payload can both observe a cache miss and proceed to send before either path calls registerOutboundDelivered; this leaves the race window that this layer is meant to close (e.g., two queue/tool executions starting at the same time). Because registration happens after network delivery in deliverOutboundPayloadsCore, dedupe is not effective for in-flight duplicates and should use an atomic claim/reserve step (with rollback on failure) instead of a read-then-write sequence.
Useful? React with 👍 / 👎.
src/infra/outbound/deliver.ts
Outdated
| threadId: params.threadId, | ||
| resolvedReplyToId: params.replyToId, | ||
| payload, |
There was a problem hiding this comment.
Derive dedupe params from hook-modified payload
The dedupe key is built from payload before applyMessageSendingHook, but delivery uses effectivePayload after hooks may rewrite content; this means plugin-hook deployments can incorrectly suppress distinct outbound messages (same pre-hook payload, different post-hook text) and miss true duplicates (different pre-hook payloads rewritten to identical output). Build/check/register keys from the same effective payload that is actually sent.
Useful? React with 👍 / 👎.
…ta, emoji, post-hook dedup Fix 5 issues identified during code review: 1. channelData fingerprint: Replace shallow JSON.stringify replacer with recursive stableStringify that correctly handles nested objects at all depth levels. Prevents false-duplicate suppression for structured payloads like Telegram flex/carousel layouts with different nested content. 2. Emoji-only messages: Fall back to trimmed raw text when normalizeTextForComparison strips all emoji, so distinct emoji messages produce different dedup keys instead of collapsing to the same empty key. 3. Atomic claim/rollback: Replace peek-then-register pattern with claimOutboundDelivery (atomic check+register) and rollbackOutboundClaim (remove on failure). Closes the TOCTOU race window where two concurrent deliveries could both peek, both see cache miss, and both send. Add DedupeCache.remove() method to support rollback. 4. Post-hook dedup: Move dedup check after applyMessageSendingHook so the key is built from effectivePayload (the content actually delivered), not the pre-hook payload. Prevents false positives/negatives when hooks modify message content. 5. Dispatcher Layer 3 retry: Move deliveredPayloadKeys.add() from enqueue time to after successful delivery in sendChain.then(), consistent with Layer 1 design. Failed sends within a dispatch lifecycle can now retry. Tests: 85 passing (32 outbound-dedupe + 14 self-dedup + 39 deliver) TypeScript: tsc --noEmit passes
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d2587ab43d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Register the key only after successful delivery so failures can retry. | ||
| if (dedupeKey) { | ||
| deliveredPayloadKeys.add(dedupeKey); |
There was a problem hiding this comment.
Reserve dedupe key when enqueuing dispatcher payloads
The per-dispatch dedupe check runs before queuing, but the key is only inserted after await options.deliver(...), so two identical replies enqueued back-to-back in the same turn both pass has() and both get delivered. This means layer-3 dedupe does not actually block common synchronous duplicate emissions (tool/block/final bursts), and in dispatcher paths that send directly (without deliverOutboundPayloads), duplicate outbound messages still leak; the key should be claimed at enqueue time and removed on delivery failure.
Useful? React with 👍 / 👎.
| // Fallback to trimmed raw text when normalization produces empty string | ||
| // (e.g. emoji-only messages where normalizeTextForComparison strips all emoji). | ||
| const effectiveText = normalizedText || rawText.trim(); | ||
| const sortedMediaUrls = mediaUrls.toSorted().join(","); |
There was a problem hiding this comment.
Preserve media URL boundaries in dedupe fingerprint
Joining sorted media URLs with commas can collide distinct payloads when a URL itself contains commas (for example, ["a,b","c"] and ["a","b,c"] both fingerprint as "a,b,c"), which can incorrectly suppress non-duplicate media sends during the dedupe TTL window. The fingerprint should keep media URLs as a structured list (or another unambiguous encoding) instead of a delimiter-joined string.
Useful? React with 👍 / 👎.
1. Reset dedup cache in test/setup.ts afterEach to prevent cross-test
pollution. The outbound dedup cache is a process-level singleton;
without reset, heartbeat/cron tests fail on Windows CI shards when
a prior test's cached keys block delivery in subsequent tests.
Fixes: heartbeat-runner.ghost-reminder, heartbeat-runner.returns-
default-unset, heartbeat-runner.respects-ackmaxchars, and
cron/isolated-agent test failures on checks-windows shards 2-4.
2. Media URL fingerprint: Replace comma-joined media URLs with
JSON.stringify(mediaUrls.toSorted()) to prevent collision when URLs
contain commas (e.g. ["a,b","c"] vs ["a","b,c"]).
3. Dispatcher Layer 3: Claim dedup key at enqueue time (instead of
after delivery) to block back-to-back synchronous duplicate
emissions within the same dispatch cycle. On delivery failure,
delete the key (rollback) so the payload can be retried. This
matches the atomic claim/rollback pattern used by Layer 1.
Tests: 85 passing (32 outbound-dedupe + 14 self-dedup + 39 deliver)
+ heartbeat-runner tests verified passing locally
TypeScript: tsc --noEmit passes
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 04f3d2d92f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Fallback to trimmed raw text when normalization produces empty string | ||
| // (e.g. emoji-only messages where normalizeTextForComparison strips all emoji). | ||
| const effectiveText = normalizedText || rawText.trim(); | ||
| const sortedMediaUrls = JSON.stringify(mediaUrls.toSorted()); |
There was a problem hiding this comment.
Keep media URL order in the dedupe fingerprint
buildOutboundDedupeKey canonicalizes media with mediaUrls.toSorted(), so two payloads with the same files in different orders hash to the same key and one is dropped during the TTL window. In deliverOutboundPayloadsCore, media are actually sent in original order and the first item gets the caption, so ['a','b'] vs ['b','a'] are observably different outbound messages and should not be treated as duplicates.
Useful? React with 👍 / 👎.
| const normalizedText = rawText ? normalizeTextForComparison(rawText) : ""; | ||
| // Fallback to trimmed raw text when normalization produces empty string | ||
| // (e.g. emoji-only messages where normalizeTextForComparison strips all emoji). | ||
| const effectiveText = normalizedText || rawText.trim(); |
There was a problem hiding this comment.
Preserve emoji distinctions in normalized text dedupe keys
The key uses normalizeTextForComparison(rawText) whenever it is non-empty, and that normalization strips emoji; therefore messages like "Build passed ✅" and "Build passed ❌" collapse to the same dedupe key and the later send can be suppressed even though the meaning changed. The fallback to raw text only runs for emoji-only inputs, so mixed text+emoji status updates remain incorrectly conflated.
Useful? React with 👍 / 👎.
|
Maintainer triage call: close as-is. Reason:
If you want to continue this work, please open narrowly scoped follow-up PRs per subsystem with explicit risk bounds. |
Summary
deliverOutboundPayloadsCoreskips identical payloads recently sent to the same recipient — covers queue race conditions, cron announce repeats, and tool execution races (iMessage: duplicate message delivery when processing batched/queued messages #25192, [Bug]: Signal: duplicate message delivery when processing queued messages (same as #25192) #33150, [Bug] Cron jobs with mode:announce sending duplicate messages #27921, [Bug]: Race condition during tool use can cause duplicate message replies #20740)Fixes #30316, #25192
Supersedes #30478
Changes in v2 (review feedback)
Addressed all review feedback from @greptile-apps and @chatgpt-codex-connector:
channelDataincluded in dedup key: Sorted JSON fingerprint ofchannelDatais now part of the composite key, so distinct structured payloads (e.g. different Telegram flex layouts) are no longer silently suppressed as false duplicatesreplyToId: Dedup key now uses the resolved reply target (payload.replyToId ?? params.replyToId) matching actual send behavior, preventing legitimate threaded replies from being droppedJSON.stringify([...])to eliminate delimiter collision risk when field values contain|Test plan
outbound-dedupe.test.ts: 22 unit tests (was 17) — key building, TTL behavior, peek vs register, channelData fingerprinting, resolvedReplyToId, delimiter collision resistanceself-dedup.test.ts: 14 unit tests — short text, paragraph dedup, BlueBubbles pattern, three-way repetition, full-text repetitiondeliver.test.ts: 4 integration tests — cross-turn dedup skip, different recipients allowed, failed delivery retryable, self-duplicated text auto-fixedpnpm tsgo)pnpm check)🤖 Generated with Claude Code