Skip to content

fix(errors): classify connection errors as retryable failover reason#43710

Open
fagemx wants to merge 2 commits intoopenclaw:mainfrom
fagemx:fix/connection-error-classification-v2
Open

fix(errors): classify connection errors as retryable failover reason#43710
fagemx wants to merge 2 commits intoopenclaw:mainfrom
fagemx:fix/connection-error-classification-v2

Conversation

@fagemx
Copy link
Copy Markdown
Contributor

@fagemx fagemx commented Mar 12, 2026

Summary

Rebased successor of #15163 (stale due to merge conflicts with the failover-matches.ts refactor).

Connection-level failures (ECONNREFUSED, ECONNRESET, ENOTFOUND, socket hang up, fetch failed, APIConnectionError, dns lookup failed, etc.) were previously unclassified — they leaked raw error text to channels and were not retried by the failover mechanism.

This PR:

  • Adds a connection error pattern set to failover-matches.ts (aligned with the refactored architecture that moved patterns out of errors.ts)
  • Exposes isConnectionErrorMessage() for connection-specific detection
  • Wires it into formatAssistantErrorText() to return a friendly user-facing message instead of raw error text
  • Wires it into classifyFailoverReason() mapped to "timeout" (retryable), so the failover mechanism can retry on a different provider

What changed vs #15163

The original PR defined patterns inline in errors.ts. Since then, main refactored all error patterns into failover-matches.ts. This PR follows the new architecture — the connection pattern set and isConnectionErrorMessage() live in failover-matches.ts alongside the existing pattern categories.

Test plan

  • New isConnectionErrorMessage test suite (3 tests: SDK message, common patterns, negative cases)
  • New formatAssistantErrorText tests for connection error messages (2 tests)
  • All existing tests pass

Fixes #15083
Supersedes #15163

🤖 Generated with Claude Code

Transient APIConnectionError from the Anthropic SDK (and similar network
failures) was not matched by any error pattern, so it leaked raw error
text to user channels instead of being classified and retried.

Add connection error patterns (ECONNREFUSED, ECONNRESET, socket hang up,
fetch failed, etc.), wire into classifyFailoverReason as "timeout"
(retryable), and return a friendly user-facing message.

Fixes openclaw#15083

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR classifies connection-level errors (ECONNREFUSED, ECONNRESET, ENOTFOUND, APIConnectionError, etc.) as retryable failover events and surfaces a friendly user message instead of raw error text. It follows the established architecture by adding a connection pattern set to failover-matches.ts alongside the existing categories.

Key changes:

  • New connection pattern set and isConnectionErrorMessage() function in failover-matches.ts
  • formatAssistantErrorText() now returns "The AI service encountered a connection error. Please try again in a moment." for connection errors, before the existing timeout path
  • classifyFailoverReason() maps connection errors to "timeout" (retryable), inserted before the existing isTimeoutErrorMessage check
  • isConnectionErrorMessage is exported via pi-embedded-helpers.ts and errors.ts

Minor gap: The new connection pattern set is missing /\beai_again\b/i (temporary DNS failure) and "network request failed", both of which are present in the timeout pattern set and are semantically connection-level errors. These errors will still be retried correctly via isTimeoutErrorMessage, but will display "LLM request timed out." rather than the new friendly connection error message in formatAssistantErrorText.

Confidence Score: 4/5

  • This PR is safe to merge — changes are additive and all existing tests pass alongside the new test cases.
  • The implementation correctly follows the existing pattern-matching architecture, the new classifyFailoverReason and formatAssistantErrorText paths are logically sound, and test coverage addresses the main cases. The only issue is a minor inconsistency — /\beai_again\b/i and "network request failed" are in timeout but absent from connection, meaning those two error flavours will show a timeout message rather than the new friendly connection message. Failover retry behaviour is unaffected. Score is 4 rather than 5 solely for that gap.
  • src/agents/pi-embedded-helpers/failover-matches.ts — the connection pattern set should include /\beai_again\b/i and "network request failed" for consistency with the timeout set.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-helpers/failover-matches.ts
Line: 95-104

Comment:
**Missing `EAI_AGAIN` and `"network request failed"` patterns**

The `timeout` pattern set (lines 28–46) includes `/\beai_again\b/i` and `"network request failed"`, but neither is present in the new `connection` set. `EAI_AGAIN` is a temporary DNS failure — semantically identical to `ENOTFOUND` (which is included) — so this looks like an oversight. As a result, `EAI_AGAIN` errors will:
- Return `"LLM request timed out."` in `formatAssistantErrorText` instead of the new friendly connection error message
- Still be caught by `isTimeoutErrorMessage` for failover classification, so retry behaviour is unaffected

Consider adding both patterns for consistency:

```suggestion
  connection: [
    "connection error",
    "apiconnectionerror",
    "socket hang up",
    /\beconn(?:refused|reset|aborted)\b/i,
    /\benotfound\b/i,
    /\beai_again\b/i,
    "fetch failed",
    "network error",
    "network request failed",
    "dns lookup failed",
  ],
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: fc26a3f

Comment on lines +95 to +104
connection: [
"connection error",
"apiconnectionerror",
"socket hang up",
/\beconn(?:refused|reset|aborted)\b/i,
/\benotfound\b/i,
"fetch failed",
"network error",
"dns lookup failed",
],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing EAI_AGAIN and "network request failed" patterns

The timeout pattern set (lines 28–46) includes /\beai_again\b/i and "network request failed", but neither is present in the new connection set. EAI_AGAIN is a temporary DNS failure — semantically identical to ENOTFOUND (which is included) — so this looks like an oversight. As a result, EAI_AGAIN errors will:

  • Return "LLM request timed out." in formatAssistantErrorText instead of the new friendly connection error message
  • Still be caught by isTimeoutErrorMessage for failover classification, so retry behaviour is unaffected

Consider adding both patterns for consistency:

Suggested change
connection: [
"connection error",
"apiconnectionerror",
"socket hang up",
/\beconn(?:refused|reset|aborted)\b/i,
/\benotfound\b/i,
"fetch failed",
"network error",
"dns lookup failed",
],
connection: [
"connection error",
"apiconnectionerror",
"socket hang up",
/\beconn(?:refused|reset|aborted)\b/i,
/\benotfound\b/i,
/\beai_again\b/i,
"fetch failed",
"network error",
"network request failed",
"dns lookup failed",
],
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-helpers/failover-matches.ts
Line: 95-104

Comment:
**Missing `EAI_AGAIN` and `"network request failed"` patterns**

The `timeout` pattern set (lines 28–46) includes `/\beai_again\b/i` and `"network request failed"`, but neither is present in the new `connection` set. `EAI_AGAIN` is a temporary DNS failure — semantically identical to `ENOTFOUND` (which is included) — so this looks like an oversight. As a result, `EAI_AGAIN` errors will:
- Return `"LLM request timed out."` in `formatAssistantErrorText` instead of the new friendly connection error message
- Still be caught by `isTimeoutErrorMessage` for failover classification, so retry behaviour is unaffected

Consider adding both patterns for consistency:

```suggestion
  connection: [
    "connection error",
    "apiconnectionerror",
    "socket hang up",
    /\beconn(?:refused|reset|aborted)\b/i,
    /\benotfound\b/i,
    /\beai_again\b/i,
    "fetch failed",
    "network error",
    "network request failed",
    "dns lookup failed",
  ],
```

How can I resolve this? If you propose a fix, please make it concise.

…atterns

Address review feedback: include /\beai_again\b/i and "network request
failed" in the connection error pattern set for consistency with the
timeout pattern set.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Transient APIConnectionError surfaced to user channel instead of being handled silently

1 participant