Skip to content

fix(auto-reply): add unified outbound text sanitizer for NO_REPLY and streaming leaks#38232

Open
who96 wants to merge 5 commits intoopenclaw:mainfrom
who96:fix/unified-outbound-sanitizer
Open

fix(auto-reply): add unified outbound text sanitizer for NO_REPLY and streaming leaks#38232
who96 wants to merge 5 commits intoopenclaw:mainfrom
who96:fix/unified-outbound-sanitizer

Conversation

@who96
Copy link
Copy Markdown

@who96 who96 commented Mar 6, 2026

User Impact

Users on chat surfaces such as Telegram and Discord could see leaked control text like {"action":"NO_REPLY"} or a bare Reasoning: fragment show up as visible replies. This change suppresses those leaks once in the shared outbound reply path so existing and future channels inherit the same protection.

Closes #37727
Closes #37890

What this PR does

This PR adds a unified outbound text sanitizer at the shared auto-reply reply exit, instead of adding more per-channel ad-hoc patches.

It suppresses:

  • JSON-wrapped NO_REPLY payloads in lightweight forms
  • bare NO_REPLY sentinels
  • bare streaming Reasoning: leaks with no body

The sanitizer is wired into buildReplyPayloads, so media-only and channelData payloads are preserved while text-only silent/leaked payloads are suppressed.

Background: #37904, #37891, #37892, and #37900 addressed this in scattered places. This PR unifies the behavior at the shared outbound reply layer.
Newer overlapping fixes such as #38659 and #38689 address the same symptom in Telegram-specific paths. This PR intentionally fixes the shared outbound reply layer so the suppression rules are inherited across existing and future channels instead of being patched per channel.

AI Disclosure

  • This PR was AI-assisted (Claude + Codex review)
  • Degree of testing: fully tested (reply-utils, agent-runner-payloads, get-reply-directives tests)
  • I understand what the code does
  • Bot review conversations have been addressed

@openclaw-barnacle openclaw-barnacle bot added agents Agent runtime and tooling size: L labels Mar 6, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR introduces a unified outbound text sanitizer (sanitizeOutboundText in reply-utils.ts) wired into buildReplyPayloads, replacing scattered per-channel patches for two known leaks: JSON-wrapped {"action":"NO_REPLY"} control payloads and bare Reasoning: streaming prefixes. The refactor correctly threads a {rawText, payload} entry shape through the deduplication pipeline so that content-key lookups against the block-streaming pipeline always use the pre-sanitize text, while the delivered payload carries the sanitized (suppressed) text.

Key changes and observations:

  • reply-utils.ts: sanitizeOutboundText has a clean three-path contract (undefined pass-through, null suppress, original string). NO_REPLY_JSON_SENTINEL_MAX_LENGTH = 200 is named and documented. The Reasoning: check uses .trim() correctly and is case-sensitive and colon-required (exact leak pattern only).
  • agent-runner-payloads.ts: The rawText/sanitized-payload entry shape correctly preserves original content keys for block-streaming deduplication. filterMessagingToolMediaDuplicates is a .map() (same-length output), so the index-based re-pairing at line 223–224 is safe.
  • block-reply-pipeline.ts: hasSentContentKey cleanly separates pre-computed key lookup from hasSentPayload (payload-object lookup), which is necessary now that the caller sanitizes text before dedup.
  • Test coverage: The sanitizeOutboundText unit tests and buildReplyPayloads integration tests are thorough and cover the exact edge cases (media-with-NO_REPLY-text, channelData-with-Reasoning-text, non-documented Reasoning variants, aborted-streaming raw-key dedup).
  • Unrelated cleanups: Dead test helper removal in model-compat.test.ts and a formatting-only change in message-handler.ts are benign.

Confidence Score: 4/5

  • This PR is safe to merge; the sanitizer logic is correct, index-based mapping is safe, and test coverage is comprehensive.
  • The implementation is well-designed and addresses real production leaks. Previously raised concerns (regex exactness, magic number, falsy-check collapse, index alignment, Set membership) have all been resolved or explicitly accepted by the author. No new logic bugs were found in this review. Score of 4 rather than 5 reflects the inherent complexity of the multi-stage entry-tracking refactor in buildReplyPayloads, which could be a subtle source of future regressions if the entry shape evolves — though the current implementation is correct.
  • No files require special attention — all changed files are in good shape and well-tested.

Last reviewed commit: 797360d

Comment on lines +39 to +41
if (/^Reasoning:?$/i.test(text.trim())) {
return null;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasoning regex makes colon optional and case-insensitive

The regex /^Reasoning:?$/i uses :? (optional colon) and the i case-insensitive flag. The documented leak pattern is specifically the streaming prefix "Reasoning:" — always capitalized, always with a colon. Making the colon optional means the bare word "Reasoning" (no colon) sent by a user as their entire reply would also be silently suppressed. Similarly, the i flag suppresses "reasoning" or "REASONING" as standalone replies.

Consider requiring the colon and keeping the match case-sensitive to scope the suppression precisely to the known leak pattern:

Suggested change
if (/^Reasoning:?$/i.test(text.trim())) {
return null;
}
if (/^Reasoning:$/.test(text.trim())) {
return null;
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/reply-utils.ts
Line: 39-41

Comment:
**Reasoning regex makes colon optional and case-insensitive**

The regex `/^Reasoning:?$/i` uses `:?` (optional colon) and the `i` case-insensitive flag. The documented leak pattern is specifically the streaming prefix `"Reasoning:"` — always capitalized, always with a colon. Making the colon optional means the bare word `"Reasoning"` (no colon) sent by a user as their entire reply would also be silently suppressed. Similarly, the `i` flag suppresses `"reasoning"` or `"REASONING"` as standalone replies.

Consider requiring the colon and keeping the match case-sensitive to scope the suppression precisely to the known leak pattern:

```suggestion
  if (/^Reasoning:$/.test(text.trim())) {
    return null;
  }
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already exact at the current head, and this was locked in b3ad9ed68f58e39c2264693702205acb93b91bdc. The regression coverage now proves Reasoning and REASONING: still pass through while only exact Reasoning: is suppressed.


function isJsonWrappedNoReply(text: string): boolean {
const trimmed = text.trim();
if (trimmed.length === 0 || trimmed.length > 200) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number for JSON payload length threshold

The value 200 is used as a hard cutoff for what counts as a "lightweight" JSON NO_REPLY payload, but it is not documented or extracted into a named constant. A reader has no way to know where 200 came from or whether it was intentional. There is also an off-by-one: a 200-character string passes the trimmed.length > 200 check (it equals 200, not greater), while a 201-character {"action":"NO_REPLY",...} slips through unsuppressed.

Consider extracting the constant with a comment explaining the rationale:

Suggested change
if (trimmed.length === 0 || trimmed.length > 200) {
if (trimmed.length === 0 || trimmed.length > MAX_NO_REPLY_JSON_BYTES) {
return false;
}

and at module level:

/** Maximum byte length of a JSON payload that can be treated as a bare NO_REPLY sentinel. */
const MAX_NO_REPLY_JSON_BYTES = 200;
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/reply-utils.ts
Line: 5

Comment:
**Magic number for JSON payload length threshold**

The value `200` is used as a hard cutoff for what counts as a "lightweight" JSON NO_REPLY payload, but it is not documented or extracted into a named constant. A reader has no way to know where 200 came from or whether it was intentional. There is also an off-by-one: a 200-character string passes the `trimmed.length > 200` check (it equals 200, not greater), while a 201-character `{"action":"NO_REPLY",...}` slips through unsuppressed.

Consider extracting the constant with a comment explaining the rationale:

```suggestion
  if (trimmed.length === 0 || trimmed.length > MAX_NO_REPLY_JSON_BYTES) {
    return false;
  }
```

and at module level:
```ts
/** Maximum byte length of a JSON payload that can be treated as a bare NO_REPLY sentinel. */
const MAX_NO_REPLY_JSON_BYTES = 200;
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 5b1bc7ed48850f54d4ebecba1c8b323fdbbfc7f8. That commit extracts NO_REPLY_JSON_SENTINEL_MAX_LENGTH and documents why only tiny control-like JSON blobs are treated as sentinels.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: abbf9dcea3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

});
provider = modelState.provider;
model = modelState.model;
resolvedReasoningLevel ??= await modelState.resolveDefaultReasoningLevel();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Persist explicit reasoning off overrides across turns

This line now backfills resolvedReasoningLevel from the model default, which can be "on", but /reasoning off still clears sessionEntry.reasoningLevel instead of storing an explicit off override (src/auto-reply/reply/directive-handling.impl.ts, lines 321-323). In that setup, disabling reasoning only affects the current turn; the next message has no stored override, so this fallback re-enables reasoning, making user intent non-persistent on reasoning-capable models.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is scoped to #37940, not this sanitizer PR. #37940 fixes the catalog-aware resolvedReasoning path for session_status and the synchronous status surfaces.

Comment on lines +383 to +385
const reasoningLevel =
args.resolvedReasoning ??
(args.config ? resolveReasoningDefault({ cfg: args.config, provider, model }) : "off");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catalog-less fallback misses catalog-only reasoning models

The new resolveReasoningDefault fallback here checks user config but receives no catalog, so any model whose reasoning: true flag comes purely from Pi SDK catalog discovery (not from an explicit entry in the user's models.providers config) will return "off".

The most important production callsite that hits this path is session-status-tool.ts (line 433), which calls buildStatusMessage without supplying resolvedReasoning. The updated system prompt (also in this PR) now tells agents to "ALWAYS call session_status first" for reasoning/model questions — yet the session_status tool will report Reasoning: off for any user running gpt-5.4 via catalog-based discovery rather than an explicit config entry.

The /status command path already does the right thing (resolvedReasoning: resolvedReasoningLevel ?? (await resolveDefaultReasoningLevel())), so the inconsistency is specifically in the session_status tool path. To close the gap, session-status-tool.ts should resolve and pass resolvedReasoning the same way commands-status.ts does.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/auto-reply/status.ts
Line: 383-385

Comment:
**Catalog-less fallback misses catalog-only reasoning models**

The new `resolveReasoningDefault` fallback here checks user config but receives no catalog, so any model whose `reasoning: true` flag comes purely from Pi SDK catalog discovery (not from an explicit entry in the user's `models.providers` config) will return `"off"`.

The most important production callsite that hits this path is `session-status-tool.ts` (line 433), which calls `buildStatusMessage` without supplying `resolvedReasoning`. The updated system prompt (also in this PR) now tells agents to "ALWAYS call session_status first" for reasoning/model questions — yet the `session_status` tool will report `Reasoning: off` for any user running gpt-5.4 via catalog-based discovery rather than an explicit config entry.

The `/status` command path already does the right thing (`resolvedReasoning: resolvedReasoningLevel ?? (await resolveDefaultReasoningLevel())`), so the inconsistency is specifically in the `session_status` tool path. To close the gap, `session-status-tool.ts` should resolve and pass `resolvedReasoning` the same way `commands-status.ts` does.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also scoped to #37940, not this sanitizer PR. #37940 threads catalog-aware reasoning resolution through session_status and the other synchronous status surfaces.

@who96 who96 force-pushed the fix/unified-outbound-sanitizer branch from 23b07b1 to 5b1bc7e Compare March 6, 2026 18:12
@openclaw-barnacle openclaw-barnacle bot added size: M and removed agents Agent runtime and tooling size: L labels Mar 6, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5b1bc7ed48

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

}

function isDocumentedReasoningLeak(text: string): boolean {
return text.trim() === DOCUMENTED_REASONING_STREAM_LEAK;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict bare Reasoning: suppression to leak contexts

This check suppresses any outbound message whose trimmed text is exactly Reasoning:, regardless of whether it came from a leaked reasoning stream. In normal chat flows, a legitimate user-requested reply like “send exactly Reasoning:” will now be dropped by buildReplyPayloads (text becomes undefined and the payload is filtered out), which is a behavior regression from prior commits; consider gating this rule on an explicit reasoning/leak signal instead of raw text alone.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR intentionally matches the documented leak token at the shared outbound boundary. Adding explicit leak-context metadata would require a broader reply-pipeline change and is outside the scope of this sanitizer PR.

@who96
Copy link
Copy Markdown
Author

who96 commented Mar 6, 2026

@greptile-apps review - New commit b3ad9ed adds negative test cases locking the exact Reasoning: matcher. The implementation already uses strict equality (text.trim() === 'Reasoning:'), not the regex pattern flagged in the previous review. REASONING:/reasoning: variants are confirmed to pass through. The other two comments (session-status resolvedReasoning and duplicate constants) are scoped to separate PRs #37940 and #38173 respectively.

const text = sanitizeOutboundText(normalized.payload.text);
return {
...normalized.payload,
text: text ? text : undefined,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Falsy check discards empty-string text that sanitizer explicitly preserves

sanitizeOutboundText has a dedicated early-return for empty strings — if (text === "") return text — specifically to distinguish "preserve as-is" ("") from "actively suppress" (null). The three-valued contract is:

  • null → suppress (caller should drop the field)
  • undefined → input was undefined (pass-through)
  • string (including "") → keep unchanged

Using text ? text : undefined collapses the null and "" cases into the same outcome (undefined). A payload arriving with text: "" plus a mediaUrl or channelData will silently lose its (admittedly empty) text field, even though the sanitizer said to preserve it.

The straightforward fix is a null-specific check:

Suggested change
text: text ? text : undefined,
text: text !== null ? text : undefined,

Or equivalently, text ?? undefined works too, since ?? only short-circuits on null/undefined, not on "".

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/agent-runner-payloads.ts
Line: 86

Comment:
**Falsy check discards empty-string text that sanitizer explicitly preserves**

`sanitizeOutboundText` has a dedicated early-return for empty strings — `if (text === "") return text` — specifically to distinguish "preserve as-is" (`""`) from "actively suppress" (`null`). The three-valued contract is:
- `null` → suppress (caller should drop the field)
- `undefined` → input was undefined (pass-through)
- `string` (including `""`) → keep unchanged

Using `text ? text : undefined` collapses the `null` and `""` cases into the same outcome (`undefined`). A payload arriving with `text: ""` plus a `mediaUrl` or `channelData` will silently lose its (admittedly empty) text field, even though the sanitizer said to preserve it.

The straightforward fix is a null-specific check:

```suggestion
      text: text !== null ? text : undefined,
```

Or equivalently, `text ?? undefined` works too, since `??` only short-circuits on `null`/`undefined`, not on `""`.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left that collapse intentional in this PR. The sanitizer only needs null to suppress leaked text at the shared outbound boundary, and preserving empty-string payload shape exactly would be a separate payload-normalization follow-up.

@who96
Copy link
Copy Markdown
Author

who96 commented Mar 7, 2026

Maintainer note: #38659 and #38689 overlap with this PR at the symptom level, but both are Telegram-specific fixes. #38232 is intentionally scoped one layer lower at the shared outbound reply boundary, so the NO_REPLY / bare Reasoning leak suppression is inherited by existing and future channels instead of being patched per channel.

@who96
Copy link
Copy Markdown
Author

who96 commented Mar 7, 2026

Note: this PR provides a unified sanitizer that covers the same NO_REPLY suppression pattern fixed in #32183 (suppress NO_REPLY token in chat transcript rendering by @ademczuk), but applied at the shared reply-utils layer so all channels (Telegram, Discord, WhatsApp, webchat) benefit without per-channel patches.

Related per-channel fixes this supersedes: #37904, #37891, #37892.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1e50878b3a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

blockReplyPipeline: null,
replyToMode: "first",
...overrides,
}).replyPayloads;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Await buildReplyPayloads before reading replyPayloads

buildReplyPayloads returns a Promise, but this helper accesses .replyPayloads synchronously, so build(...) becomes undefined instead of an array. That means the new outbound-sanitizer tests in this block are not exercising the intended code path (and will fail/assert on wrong values), which undermines coverage for the regression this commit is trying to prevent; the helper and related assertions should await the async call.

Useful? React with 👍 / 👎.

@who96 who96 force-pushed the fix/unified-outbound-sanitizer branch from fd4dc6a to 3652483 Compare March 12, 2026 21:33
@who96
Copy link
Copy Markdown
Author

who96 commented Mar 12, 2026

The async issue flagged by the Codex review on commit 1e50878 has been fixed in the current head (fd4dc6a) — all buildReplyPayloads calls in reply-utils.test.ts now properly await the result (lines 848 and 924). 45 tests pass.

@chatgpt-codex-connector review — please re-evaluate against the latest commit.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@who96 who96 force-pushed the fix/unified-outbound-sanitizer branch from 3652483 to 7384372 Compare March 12, 2026 21:43
@openclaw-barnacle openclaw-barnacle bot added app: macos App: macos size: M and removed size: S labels Mar 12, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 738437206c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

payload: parsed,
payload: {
...normalized.payload,
text: text ? text : undefined,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve streamed payload keys when sanitizing final text

When block streaming aborts (didStream() === true and isAborted() === true), buildReplyPayloads falls back to final payloads and de-duplicates against blockReplyPipeline.hasSentPayload(...), which uses a content key that includes text (createBlockReplyContentKey in block-reply-pipeline.ts). Sanitizing normalized.payload.text here can change a streamed payload like {"action":"NO_REPLY"} (or bare Reasoning:) into undefined, so the fallback key no longer matches what was already streamed and the same media/channelData payload can be sent again.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@who96
Copy link
Copy Markdown
Author

who96 commented Mar 14, 2026

@greptileai review

Comment on lines +223 to +229
const mediaFilteredPayloadEntries = dedupedPayloadEntries.flatMap((entry, index) => {
const payload = mediaFilteredPayloads[index];
if (!payload || !isRenderablePayload(payload)) {
return [];
}
return [{ ...entry, payload }];
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index misalignment when media duplicates are removed mid-array

mediaFilteredPayloadEntries iterates dedupedPayloadEntries by position and indexes into mediaFilteredPayloads using the same index. But mediaFilteredPayloads is a filtered subset of dedupedPayloads — if any payload is removed from the middle (e.g. payload at index 1 of 3 is a media duplicate), every subsequent entry picks up the wrong payload and the last entry silently disappears.

Concrete example:

  • dedupedPayloads = [payA, payB, payC] — payB has a media URL already sent
  • mediaFilteredPayloads = [payA, payC] — payB removed
  • Iteration: dedupedPayloadEntries[1] (entryB) gets mediaFilteredPayloads[1] = payC (wrong), while entryC's mediaFilteredPayloads[2] is undefined → dropped entirely

The effect is that entryC is silently lost and payC is emitted with entryB's rawText, which can cause the content-key dedup in the block-streaming path to compare the wrong raw text against the pipeline's sentContentKeys.

The dedupedPayloadSet approach used a few lines above is the correct pattern — match by object identity rather than position:

const mediaFilteredPayloadSet = new Set(mediaFilteredPayloads);
const mediaFilteredPayloadEntries = dedupedPayloadEntries.filter(
  (entry) => mediaFilteredPayloadSet.has(entry.payload),
);
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/agent-runner-payloads.ts
Line: 223-229

Comment:
**Index misalignment when media duplicates are removed mid-array**

`mediaFilteredPayloadEntries` iterates `dedupedPayloadEntries` by position and indexes into `mediaFilteredPayloads` using the same index. But `mediaFilteredPayloads` is a filtered subset of `dedupedPayloads` — if any payload is removed from the middle (e.g. payload at index 1 of 3 is a media duplicate), every subsequent entry picks up the wrong payload and the last entry silently disappears.

Concrete example:
- `dedupedPayloads = [payA, payB, payC]` — payB has a media URL already sent
- `mediaFilteredPayloads = [payA, payC]` — payB removed
- Iteration: `dedupedPayloadEntries[1]` (entryB) gets `mediaFilteredPayloads[1]` = **payC** (wrong), while entryC's `mediaFilteredPayloads[2]` is `undefined` → dropped entirely

The effect is that entryC is silently lost and payC is emitted with entryB's `rawText`, which can cause the content-key dedup in the block-streaming path to compare the wrong raw text against the pipeline's `sentContentKeys`.

The `dedupedPayloadSet` approach used a few lines above is the correct pattern — match by object identity rather than position:

```ts
const mediaFilteredPayloadSet = new Set(mediaFilteredPayloads);
const mediaFilteredPayloadEntries = dedupedPayloadEntries.filter(
  (entry) => mediaFilteredPayloadSet.has(entry.payload),
);
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1109 to 1141
public let transport: String

public init(
ok: Bool,
status: Int,
apnsid: String?,
reason: String?,
tokensuffix: String,
topic: String,
environment: String)
environment: String,
transport: String)
{
self.ok = ok
self.status = status
self.apnsid = apnsid
self.reason = reason
self.tokensuffix = tokensuffix
self.topic = topic
self.environment = environment
self.transport = transport
}

private enum CodingKeys: String, CodingKey {
case ok
case status
case apnsid = "apnsId"
case reason
case tokensuffix = "tokenSuffix"
case topic
case environment
case transport
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated model change included in sanitizer PR

The addition of transport: String to PushTestResult (and the identical change in apps/shared/OpenClawKit/Sources/OpenClawProtocol/GatewayModels.swift) appears unrelated to the outbound text sanitizer described in this PR. The PR description, linked issues, and background context all focus on the auto-reply NO_REPLY/Reasoning leak fix; there is no mention of push test transport metadata.

Because transport is a non-optional String it is a source-breaking change for any Swift call-sites that construct PushTestResult directly. If this change belongs to a different fix or feature, landing it here makes it harder to revert either change independently and adds noise to the sanitizer's review surface.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/macos/Sources/OpenClawProtocol/GatewayModels.swift
Line: 1109-1141

Comment:
**Unrelated model change included in sanitizer PR**

The addition of `transport: String` to `PushTestResult` (and the identical change in `apps/shared/OpenClawKit/Sources/OpenClawProtocol/GatewayModels.swift`) appears unrelated to the outbound text sanitizer described in this PR. The PR description, linked issues, and background context all focus on the auto-reply NO_REPLY/Reasoning leak fix; there is no mention of push test transport metadata.

Because `transport` is a non-optional `String` it is a source-breaking change for any Swift call-sites that construct `PushTestResult` directly. If this change belongs to a different fix or feature, landing it here makes it harder to revert either change independently and adds noise to the sanitizer's review surface.

How can I resolve this? If you propose a fix, please make it concise.

@who96 who96 force-pushed the fix/unified-outbound-sanitizer branch from 433e867 to 0e77ecf Compare March 14, 2026 19:06
@openclaw-barnacle openclaw-barnacle bot added gateway Gateway runtime agents Agent runtime and tooling and removed app: macos App: macos labels Mar 14, 2026
@who96
Copy link
Copy Markdown
Author

who96 commented Mar 14, 2026

Fixed both issues from the last review:

  1. Index misalignment in mediaFilteredPayloadEntries — replaced positional indexing with Set-based object identity filtering (matching the dedupedPayloadSet pattern above).
  2. Unrelated Swift changes — already resolved after rebasing onto latest main.

@greptileai review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fb91f562e0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +223 to +225
const mediaFilteredPayloadSet = new Set(mediaFilteredPayloads);
const mediaFilteredPayloadEntries = dedupedPayloadEntries.filter((entry) =>
mediaFilteredPayloadSet.has(entry.payload),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve entries when media dedupe rewrites payload objects

buildReplyPayloads rebuilds mediaFilteredPayloadEntries by checking object identity (new Set(mediaFilteredPayloads) + has(entry.payload)), but filterMessagingToolMediaDuplicates creates new payload objects when it strips duplicate media. That means rewritten payloads never match their original entry.payload references and get dropped entirely, so a reply like {text:"hello", mediaUrl:"..."} with a deduped media URL can become no final payload instead of preserving the text-only reply.

Useful? React with 👍 / 👎.

Comment on lines +223 to +226
const mediaFilteredPayloadSet = new Set(mediaFilteredPayloads);
const mediaFilteredPayloadEntries = dedupedPayloadEntries.filter((entry) =>
mediaFilteredPayloadSet.has(entry.payload),
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set membership check breaks for media-stripped payloads

filterMessagingToolMediaDuplicates is implemented with payloads.map(...). When a payload has a duplicate media URL to strip it returns a new object ({...payload, mediaUrl: undefined, ...}); only unchanged payloads return the original reference (return payload; // No change at reply-payloads.ts:152).

Because mediaFilteredPayloadSet is built from those potentially-new references, and dedupedPayloadEntries still holds the original entry.payload references, the has() check silently fails for every payload that had its media URL stripped:

dedupedPayloadEntries[i].payload  →  originalPayloadRef
mediaFilteredPayloads[i]          →  newPayloadRef  (different object after .map spread)
mediaFilteredPayloadSet.has(originalPayloadRef)  →  false  ← entry dropped!

The consequence is that any payload whose media URL was deduplicated — but still has non-empty text or channelData that should be delivered — is completely silently dropped from replyPayloads. Before this refactor, mediaFilteredPayloads flowed directly to filteredPayloads, so the modified payload (with stripped media URL) was still sent. After this refactor it is lost.

The filterMessagingToolDuplicates path does not have this problem because it uses .filter() and always preserves object references.

One fix: track entries by content key rather than object reference for the media-filter step, or perform the media-dedup directly on the entries array so the same object always flows through:

// Instead of a separate media-filtered payloads array, apply the media filter
// directly on entries and let each entry carry a potentially-updated payload.
const mediaFilteredPayloadEntries = dedupeMessagingToolPayloads
  ? applyMediaFilterToEntries(dedupedPayloadEntries, messagingToolSentMediaUrls)
  : dedupedPayloadEntries;
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/agent-runner-payloads.ts
Line: 223-226

Comment:
**Set membership check breaks for media-stripped payloads**

`filterMessagingToolMediaDuplicates` is implemented with `payloads.map(...)`. When a payload has a duplicate media URL to strip it returns a **new object** (`{...payload, mediaUrl: undefined, ...}`); only unchanged payloads return the original reference (`return payload; // No change` at `reply-payloads.ts:152`).

Because `mediaFilteredPayloadSet` is built from those potentially-new references, and `dedupedPayloadEntries` still holds the original `entry.payload` references, the `has()` check silently fails for every payload that had its media URL stripped:

```
dedupedPayloadEntries[i].payload  →  originalPayloadRef
mediaFilteredPayloads[i]          →  newPayloadRef  (different object after .map spread)
mediaFilteredPayloadSet.has(originalPayloadRef)  →  false  ← entry dropped!
```

The consequence is that any payload whose media URL was deduplicated — but still has non-empty text or `channelData` that should be delivered — is completely silently dropped from `replyPayloads`. Before this refactor, `mediaFilteredPayloads` flowed directly to `filteredPayloads`, so the modified payload (with stripped media URL) was still sent. After this refactor it is lost.

The `filterMessagingToolDuplicates` path does not have this problem because it uses `.filter()` and always preserves object references.

One fix: track entries by content key rather than object reference for the media-filter step, or perform the media-dedup directly on the entries array so the same object always flows through:

```ts
// Instead of a separate media-filtered payloads array, apply the media filter
// directly on entries and let each entry carry a potentially-updated payload.
const mediaFilteredPayloadEntries = dedupeMessagingToolPayloads
  ? applyMediaFilterToEntries(dedupedPayloadEntries, messagingToolSentMediaUrls)
  : dedupedPayloadEntries;
```

How can I resolve this? If you propose a fix, please make it concise.

@who96
Copy link
Copy Markdown
Author

who96 commented Mar 14, 2026

Fixed the media dedup regression: applied filterMessagingToolMediaDuplicates directly on the entries array so each entry carries the potentially-updated payload reference. No more Set-based identity mismatch when .map() + spread creates new objects.

@greptileai review

@who96 who96 force-pushed the fix/unified-outbound-sanitizer branch from 797360d to 006ce8b Compare March 15, 2026 13:36
@openclaw-barnacle openclaw-barnacle bot removed gateway Gateway runtime agents Agent runtime and tooling labels Mar 15, 2026
@who96
Copy link
Copy Markdown
Author

who96 commented Mar 15, 2026

Rebased onto latest main (c4265a5) to resolve merge conflicts. Two earlier commits dropped as already obsoleted by upstream changes. All 5 substantive fix commits preserved. CI re-running.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4608103a7c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

payload: normalized.payload,
normalizeMediaPaths: params.normalizeMediaPaths,
});
const text = sanitizeOutboundText(rawPayload.text);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Sanitize block-streamed text before emitting replies

This sanitizer is only executed in buildReplyPayloads, but when block streaming succeeds the same function drops all final payloads (shouldDropFinalPayloads), so this path is skipped and streamed chunks are emitted from createBlockReplyDeliveryHandler via normalizeStreamingText in src/auto-reply/reply/agent-runner-execution.ts without sanitizeOutboundText. In block-streaming environments, leaked control text like {"action":"NO_REPLY"} or bare Reasoning: will still be delivered to users, so the fix does not cover the primary streamed delivery path.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The streaming path (normalizeStreamingText in agent-runner-execution.ts) does bypass this sanitizer. This PR only covers the final payload path in buildReplyPayloads. Fixing the block streaming path would also require changes in createBlockReplyDeliveryHandler, so I kept that out of scope here. I can open a follow-up for that.

@who96
Copy link
Copy Markdown
Author

who96 commented Mar 16, 2026

Addressing remaining bot review conversations:


Re: Await buildReplyPayloads before reading replyPayloads (Codex P2, reply-utils.test.ts)
Good catch — the test helper calls buildReplyPayloads without await, so assertions run against undefined instead of the populated array. Will fix the test helper to properly await the async call.

Re: Index misalignment when media duplicates are removed mid-array (Greptile, agent-runner-payloads.ts)
Valid — mediaFilteredPayloads is a filtered subset but is indexed positionally against dedupedPayloadEntries, causing index drift when midl entries are dropped. This needs a join-by-identity approach instead of positional indexing.

Re: Preserve entries when media dedupe rewrites payload objects (Codex P1) + Set membership check breaks for media-stripped payloads (Greptile)
These two flag the same root cause: filterMessagingToolMediaDuplicates creates new objects for stripped payloads, but mediaFilteredPayloadSet uses Set.has() with object identity. The original entry.payload references won't match the new objects, silently dropping text-only payloads that had media stripped. Fix: track by index or use a Map instead of Set identity.

Re: Unrelated model change included in sanitizer PR (Greptile, GatewayModels.swift)
Acknowledged — the transport: String addition to PushTestResult was an unrelated change that leaked into this branch. Will remove it to keep the PR focused on the sanitizer.

Re: Codex Review: No major issues 🎉 (agent-runner-payloads.ts)
Noted — this was from an earlier revision before the media dedupe issues were identified in the subsequent review round.

@who96 who96 force-pushed the fix/unified-outbound-sanitizer branch 4 times, most recently from daff254 to 7306ac9 Compare March 18, 2026 15:17
who96 added 5 commits March 19, 2026 23:14
Positional indexing into mediaFilteredPayloads broke when
filterMessagingToolMediaDuplicates removed elements mid-array,
causing subsequent entries to pair with wrong payloads and the
last entry to be silently dropped.

Use Set-based object identity filtering (matching the
dedupedPayloadSet pattern) instead.
…yload refs

filterMessagingToolMediaDuplicates uses .map() + spread, creating new
object references when stripping duplicate media URLs. The previous
Set-based identity check (Set.has(entry.payload)) silently dropped
these rewritten payloads because the new references didn't match.

Now apply media filtering on entries directly so each entry carries the
potentially-updated payload, then filter out non-renderable entries.
@who96 who96 force-pushed the fix/unified-outbound-sanitizer branch from 7306ac9 to 0f2b02e Compare March 19, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

1 participant