Skip to content

sanitizeUserFacingText false-positive: normal assistant replies containing billing keywords get replaced with billing error message #11649

@jabezborja

Description

@jabezborja

Bug Description

sanitizeUserFacingText() in pi-embedded-helpers runs isBillingErrorMessage() against normal assistant reply text before delivering to channel (e.g., Telegram). When the assistant's reply naturally contains billing-related keywords (e.g., discussing API billing, credits, payment issues), the sanitizer false-positives and replaces the entire reply with the canned billing error string:

⚠️ API provider returned a billing error — your API key has run out of credits or has an insufficient balance...

Steps to Reproduce

  1. Configure a Telegram channel with streamMode: "partial"
  2. Send a message that causes the assistant to discuss billing, credits, or API payment topics
  3. The assistant's reply streams correctly to webchat (via WebSocket, pre-sanitization)
  4. The reply delivered to Telegram is replaced with the billing error message

Root Cause

In pi-embedded-helpers, sanitizeUserFacingText() calls isBillingErrorMessage() on all assistant text, not just error payloads:

function sanitizeUserFacingText(text) {
  // ...
  if (isBillingErrorMessage(trimmed)) return BILLING_ERROR_USER_MESSAGE;
  // ...
}

And isBillingErrorMessage() matches broadly on keyword combinations:

function isBillingErrorMessage(raw) {
  const value = raw.toLowerCase();
  if (matchesErrorPatterns(value, ERROR_PATTERNS.billing)) return true;
  return value.includes("billing") && 
    (value.includes("upgrade") || value.includes("credits") || 
     value.includes("payment") || value.includes("plan"));
}

This means any assistant reply containing words like "billing", "credits", "payment", "insufficient balance" will be flagged and replaced — even when the assistant is legitimately discussing those topics in conversation.

Expected Behavior

sanitizeUserFacingText() should only replace text that is an actual API error payload, not normal conversational text. Possible fixes:

  1. Only run error sanitization on messages flagged as errors (e.g., via a metadata field or error role), not on all assistant text
  2. Require stricter pattern matching (e.g., JSON error payloads, HTTP status codes) rather than broad keyword matching
  3. Add a minimum confidence threshold or structural check (error payloads have specific formats vs. natural language)

Environment

  • OpenClaw version: 2026.2.6-1
  • Channel: Telegram (streamMode: "partial")
  • Model: anthropic/claude-opus-4-6
  • OS: macOS (Darwin arm64)

Impact

Every reply containing billing-related keywords gets silently replaced on Telegram, making the assistant appear broken. The user sees the billing error repeatedly while the actual reply is only visible in webchat. This is particularly ironic because once the user asks about the billing error, every subsequent reply about the issue also triggers the false positive — creating an infinite loop of confusion.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions