fix: redact raw error messages from channel-facing agent failure replies#51308
fix: redact raw error messages from channel-facing agent failure replies#51308henry-the-frog wants to merge 1 commit intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR fixes a security issue where raw agent error messages — containing internal filesystem paths, OS usernames, and sandbox configuration details — were being forwarded verbatim to external channels (e.g., Discord). The fix replaces all raw error interpolation in user-facing fallback replies with static, generic strings, while preserving full error details in operator logs via Key changes:
Confidence Score: 4/5
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 98c8282e4d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| : isRoleOrderingError | ||
| ? "⚠️ Message ordering conflict - please try again. If this persists, use /new to start a fresh session." | ||
| : `⚠️ Agent failed before reply: ${trimmedMessage}.\nLogs: openclaw logs --follow`; | ||
| : isSandboxOrSecurity | ||
| ? "⚠️ Agent failed to start due to a configuration error. Check `openclaw logs --follow` for details." | ||
| : "⚠️ Agent failed before reply. Check `openclaw logs --follow` for details."; |
There was a problem hiding this comment.
Preserve actionable failures on internal webchat runs
runAgentTurnWithFallback() already distinguishes the internal control surface (webchat) via shouldSurfaceToControlUi, and runReplyAgent() returns this fallback text directly to that chat surface. After this change, the final branch always emits the same generic copy here, so a webchat session that throws before any lifecycle error event is emitted (for example the existing sandbox is not defined path in src/auto-reply/reply.triggers.trigger-handling.targets-active-session-native-stop.e2e.test.ts) loses the only actionable error the operator sees. That makes the commit redact internal debugging output as well, even though the title says the change is for channel-facing replies.
Useful? React with 👍 / 👎.
| : isRoleOrderingError | ||
| ? "⚠️ Message ordering conflict - please try again. If this persists, use /new to start a fresh session." | ||
| : `⚠️ Agent failed before reply: ${trimmedMessage}.\nLogs: openclaw logs --follow`; | ||
| : isSandboxOrSecurity | ||
| ? "⚠️ Agent failed to start due to a configuration error. Check `openclaw logs --follow` for details." | ||
| : "⚠️ Agent failed before reply. Check `openclaw logs --follow` for details."; |
There was a problem hiding this comment.
Keep sanitized retry guidance for exhausted transient HTTP failures
This branch still retries isTransientHttp errors once, but if the second attempt also fails it now falls straight into the generic Agent failed before reply text. That regresses the existing safe channel-facing copy from sanitizeUserFacingText() / formatRawAssistantErrorForUi() (for example the friendly 521/529 unavailable/overloaded messages covered in src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts), so common provider outages stop telling users that the failure is transient and retryable. The helper already strips raw HTML/API payloads, so this is a behavior loss, not just a secrecy hardening.
Useful? React with 👍 / 👎.
98c8282 to
1236d8c
Compare
Preflight errors (sandbox security, config issues, etc.) were being forwarded verbatim to channels like Discord, leaking filesystem paths, OS usernames, and sandbox configuration details. Now: - Sandbox/security errors get a specific generic message - All other unclassified errors get a generic message - Full error details remain in openclaw logs for operator debugging Fixes openclaw#51275
1236d8c to
ba97c76
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ba97c76779
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const errorCases = [ | ||
| { | ||
| error: "sandbox is not defined.", | ||
| expected: "⚠️ Agent failed before reply. Check `openclaw logs --follow` for details.", |
There was a problem hiding this comment.
Sync the harness mock with the redacted fallback copy
This expectation now asserts the new redacted message, but the file-level runReplyAgent mock still hard-codes the old ⚠️ Agent failed before reply: …\nLogs: … format in src/auto-reply/reply.triggers.trigger-handling.targets-active-session-native-stop.e2e.test.ts:42-47. As written, the first error case will fail even if the runtime change is correct, so this regression test no longer validates the behavior the commit is trying to lock in.
Useful? React with 👍 / 👎.
Summary
Fixes #51275 — agent preflight errors (sandbox security violations, config issues, etc.) were being forwarded verbatim to channels like Discord, leaking internal filesystem paths, OS usernames, and sandbox configuration details.
Changes
src/auto-reply/reply/agent-runner-execution.ts:isSandboxOrSecurityclassifier for sandbox security error messagessafeMessage/trimmedMessagevariables (error details are no longer interpolated into user-facing text)defaultRuntime.error()and visible inopenclaw logs --followsrc/auto-reply/reply.triggers...e2e.test.ts:Security Impact
Before: channel messages could contain:
After: all error details stay in operator logs only.