-
-
Notifications
You must be signed in to change notification settings - Fork 69.5k
LLM error messages are over-normalized: raw error details lost in logs #51387
Description
Problem
When an LLM request fails, formatAssistantErrorText() normalizes many different errors into a single generic message like "LLM request timed out.". The raw error message is never logged, making it impossible to diagnose the actual failure.
Error patterns that all map to "LLM request timed out"
The ERROR_PATTERNS.timeout array matches 15+ patterns:
timeout,timed outservice unavailableconnection error,network errorfetch failed,socket hang upECONNREFUSED,ECONNRESET,ECONNABORTEDETIMEDOUT,ENETUNREACH,EHOSTUNREACH- And more...
These represent very different failure modes (real timeout vs. connection refused vs. network error), but users and operators only see "LLM request timed out."
Impact
In our deployment using a custom provider (custom-idealab-alibaba-inc-com), we see frequent "LLM request timed out" errors in gateway logs. Some occur within 0.4 seconds of the request starting — clearly not a 30-second timeout. Without the raw error, we cannot determine whether the issue is:
- An actual timeout
- A connection reset
- A DNS failure
- A TLS error
- The request being aborted by something else
Where the raw error is lost
In handleAgentEnd(), the error flows through:
lastAssistant.errorMessage(raw) →formatAssistantErrorText()→safeErrorText(normalized)safeErrorTextis what gets logged viaconsoleMessagebuildApiErrorObservationFields()further redacts the raw error
The raw errorMessage is never emitted to any log output.
Suggested fix
-
Log the raw error alongside the formatted one — at minimum in debug/warn level:
embedded run agent end: runId=... error=LLM request timed out. rawError=<original error> -
Consider differentiating error categories — instead of mapping everything to "timed out", use distinct user-facing messages:
- "LLM request timed out (no response within Xs)"
- "LLM request failed: connection error"
- "LLM request failed: service unavailable"
-
Make the LLM request timeout configurable per provider in
openclaw.json:{ "models": { "providers": { "my-provider": { "requestTimeoutMs": 120000 } } } }Currently the timeout is hardcoded at 30 seconds (
3e4inGatewayClientconstructor).
Environment
- OpenClaw version: 2026.3.13
- Provider: custom Anthropic-compatible proxy (anthropic-messages API)
- Model: claude-opus-4-6
- Channel: DingTalk