Skip to content

fix: redact raw error messages from channel-facing agent failure replies#51308

Open
henry-the-frog wants to merge 1 commit intoopenclaw:mainfrom
henry-the-frog:fix/redact-preflight-errors-from-channels
Open

fix: redact raw error messages from channel-facing agent failure replies#51308
henry-the-frog wants to merge 1 commit intoopenclaw:mainfrom
henry-the-frog:fix/redact-preflight-errors-from-channels

Conversation

@henry-the-frog
Copy link
Copy Markdown

Summary

Fixes #51275 — agent preflight errors (sandbox security violations, config issues, etc.) were being forwarded verbatim to channels like Discord, leaking internal filesystem paths, OS usernames, and sandbox configuration details.

Changes

src/auto-reply/reply/agent-runner-execution.ts:

  • Added isSandboxOrSecurity classifier for sandbox security error messages
  • Sandbox/security errors now show: "Agent failed to start due to a configuration error"
  • All other unclassified errors now show: "Agent failed before reply" (generic, no raw message)
  • Removed unused safeMessage/trimmedMessage variables (error details are no longer interpolated into user-facing text)
  • Full error details are still logged via defaultRuntime.error() and visible in openclaw logs --follow

src/auto-reply/reply.triggers...e2e.test.ts:

  • Updated expected error text to match new generic format

Security Impact

Before: channel messages could contain:

  • Full filesystem paths including OS username
  • Sandbox configuration details (bind mounts, allowed roots)
  • Internal workspace directory structure

After: all error details stay in operator logs only.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 20, 2026

Greptile Summary

This PR fixes a security issue where raw agent error messages — containing internal filesystem paths, OS usernames, and sandbox configuration details — were being forwarded verbatim to external channels (e.g., Discord). The fix replaces all raw error interpolation in user-facing fallback replies with static, generic strings, while preserving full error details in operator logs via defaultRuntime.error().

Key changes:

  • Added an isSandboxOrSecurity classifier (/sandbox security|outside allowed roots/i) to show a slightly more descriptive "configuration error" message for that class of failures.
  • All other unclassified errors now produce a fully generic "Agent failed before reply." message — no raw error text is ever sent to channels.
  • Removed the now-unnecessary safeMessage/trimmedMessage variables that were previously used to sanitize and interpolate error details.
  • The existing test case for "sandbox is not defined." validates the generic fallback correctly, but the new isSandboxOrSecurity-specific message ("Agent failed to start due to a configuration error...") has no test coverage — a case with e.g. "sandbox security violation" or "outside allowed roots" in the error string should be added.

Confidence Score: 4/5

  • Safe to merge — the security fix is correct and the fallback chain is strictly safer than before; the only gap is missing test coverage for the new isSandboxOrSecurity branch.
  • The logic change is minimal and directionally correct: raw error text is never forwarded to channels in any code path. The isSandboxOrSecurity classifier is additive (missing matches fall to the safe generic message), so the regex scope doesn't introduce regression risk. Score is 4 rather than 5 only because the new classifier branch lacks a dedicated test case.
  • The test file (src/auto-reply/reply.triggers.trigger-handling.targets-active-session-native-stop.e2e.test.ts) is missing a test case that exercises the isSandboxOrSecurity branch in agent-runner-execution.ts.

Comments Outside Diff (1)

  1. src/auto-reply/reply.triggers.trigger-handling.targets-active-session-native-stop.e2e.test.ts, line 153-163 (link)

    P2 Missing test case for isSandboxOrSecurity branch

    The test updates the expected output for "sandbox is not defined.", which correctly hits the generic fallback because "sandbox is not defined." doesn't match the new /sandbox security|outside allowed roots/i regex. However, the new isSandboxOrSecurity classifier branch — which produces the distinct "⚠️ Agent failed to start due to a configuration error. Check \openclaw logs --follow` for details."` message — has no test coverage at all.

    Consider adding a test case that exercises the actual patterns the regex targets, e.g.:

    Without this, a future refactor of the regex could silently break the specific "configuration error" UX without any test catching it.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/auto-reply/reply.triggers.trigger-handling.targets-active-session-native-stop.e2e.test.ts
    Line: 153-163
    
    Comment:
    **Missing test case for `isSandboxOrSecurity` branch**
    
    The test updates the expected output for `"sandbox is not defined."`, which correctly hits the generic fallback because `"sandbox is not defined."` doesn't match the new `/sandbox security|outside allowed roots/i` regex. However, the new `isSandboxOrSecurity` classifier branch — which produces the distinct `"⚠️ Agent failed to start due to a configuration error. Check \`openclaw logs --follow\` for details."` message — has no test coverage at all.
    
    Consider adding a test case that exercises the actual patterns the regex targets, e.g.:
    
    
    
    Without this, a future refactor of the regex could silently break the specific "configuration error" UX without any test catching it.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/auto-reply/reply.triggers.trigger-handling.targets-active-session-native-stop.e2e.test.ts
Line: 153-163

Comment:
**Missing test case for `isSandboxOrSecurity` branch**

The test updates the expected output for `"sandbox is not defined."`, which correctly hits the generic fallback because `"sandbox is not defined."` doesn't match the new `/sandbox security|outside allowed roots/i` regex. However, the new `isSandboxOrSecurity` classifier branch — which produces the distinct `"⚠️ Agent failed to start due to a configuration error. Check \`openclaw logs --follow\` for details."` message — has no test coverage at all.

Consider adding a test case that exercises the actual patterns the regex targets, e.g.:

```suggestion
      const errorCases = [
        {
          error: "sandbox is not defined.",
          expected: "⚠️ Agent failed before reply. Check `openclaw logs --follow` for details.",
        },
        {
          error: "sandbox security violation: outside allowed roots /home/user/secret",
          expected:
            "⚠️ Agent failed to start due to a configuration error. Check `openclaw logs --follow` for details.",
        },
        {
          error: "Context window exceeded",
          expected:
            "⚠️ Context overflow — prompt too large for this model. Try a shorter message or a larger-context model.",
        },
      ] as const;
```

Without this, a future refactor of the regex could silently break the specific "configuration error" UX without any test catching it.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "fix: redact raw erro..."

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 98c8282e4d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 631 to +635
: isRoleOrderingError
? "⚠️ Message ordering conflict - please try again. If this persists, use /new to start a fresh session."
: `⚠️ Agent failed before reply: ${trimmedMessage}.\nLogs: openclaw logs --follow`;
: isSandboxOrSecurity
? "⚠️ Agent failed to start due to a configuration error. Check `openclaw logs --follow` for details."
: "⚠️ Agent failed before reply. Check `openclaw logs --follow` for details.";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve actionable failures on internal webchat runs

runAgentTurnWithFallback() already distinguishes the internal control surface (webchat) via shouldSurfaceToControlUi, and runReplyAgent() returns this fallback text directly to that chat surface. After this change, the final branch always emits the same generic copy here, so a webchat session that throws before any lifecycle error event is emitted (for example the existing sandbox is not defined path in src/auto-reply/reply.triggers.trigger-handling.targets-active-session-native-stop.e2e.test.ts) loses the only actionable error the operator sees. That makes the commit redact internal debugging output as well, even though the title says the change is for channel-facing replies.

Useful? React with 👍 / 👎.

Comment on lines 631 to +635
: isRoleOrderingError
? "⚠️ Message ordering conflict - please try again. If this persists, use /new to start a fresh session."
: `⚠️ Agent failed before reply: ${trimmedMessage}.\nLogs: openclaw logs --follow`;
: isSandboxOrSecurity
? "⚠️ Agent failed to start due to a configuration error. Check `openclaw logs --follow` for details."
: "⚠️ Agent failed before reply. Check `openclaw logs --follow` for details.";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep sanitized retry guidance for exhausted transient HTTP failures

This branch still retries isTransientHttp errors once, but if the second attempt also fails it now falls straight into the generic Agent failed before reply text. That regresses the existing safe channel-facing copy from sanitizeUserFacingText() / formatRawAssistantErrorForUi() (for example the friendly 521/529 unavailable/overloaded messages covered in src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts), so common provider outages stop telling users that the failure is transient and retryable. The helper already strips raw HTML/API payloads, so this is a behavior loss, not just a secrecy hardening.

Useful? React with 👍 / 👎.

@henry-the-frog henry-the-frog force-pushed the fix/redact-preflight-errors-from-channels branch from 98c8282 to 1236d8c Compare March 21, 2026 20:46
Preflight errors (sandbox security, config issues, etc.) were being
forwarded verbatim to channels like Discord, leaking filesystem paths,
OS usernames, and sandbox configuration details.

Now:
- Sandbox/security errors get a specific generic message
- All other unclassified errors get a generic message
- Full error details remain in openclaw logs for operator debugging

Fixes openclaw#51275
@henry-the-frog henry-the-frog force-pushed the fix/redact-preflight-errors-from-channels branch from 1236d8c to ba97c76 Compare March 23, 2026 14:33
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ba97c76779

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

const errorCases = [
{
error: "sandbox is not defined.",
expected: "⚠️ Agent failed before reply. Check `openclaw logs --follow` for details.",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Sync the harness mock with the redacted fallback copy

This expectation now asserts the new redacted message, but the file-level runReplyAgent mock still hard-codes the old ⚠️ Agent failed before reply: …\nLogs: … format in src/auto-reply/reply.triggers.trigger-handling.targets-active-session-native-stop.e2e.test.ts:42-47. As written, the first error case will fail even if the runtime change is correct, so this regression test no longer validates the behavior the commit is trying to lock in.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Sandbox security errors posted as channel messages — leaks internal paths and PII

1 participant