Skip to content

Implement ErrorClassifier #22

@alexey-pelykh

Description

@alexey-pelykh

Summary

Implement a regex-based error classifier that maps CLI subprocess stderr output and error messages to actionable error categories. This is a pure function module consumed by ChannelBridge (#32) to determine retry behavior and error reporting.

File: src/middleware/error-classifier.ts
Test: src/middleware/error-classifier.test.ts
Depends on: types module (PR #4, merged)

Error Categories

The classifier produces one of five categories:

Category Meaning Caller Behavior
retryable Transient failure worth retrying with backoff Caller may retry with exponential backoff
context_overflow Input exceeds model context window Caller should reduce context or abort
fatal Unrecoverable error (auth failures + catch-all default) Surface to user, do not retry
timeout Hard timeout exceeded Not produced by classifier — set by CLIRuntimeBase watchdog
aborted External cancellation via AbortSignal Not produced by classifier — set by CLIRuntimeBase abort handler

Important: timeout and aborted are defined in the ErrorCategory type for completeness (they appear in the same discriminated space), but the classifier itself only produces retryable, context_overflow, or fatal. The runtime layer sets timeout and aborted directly.

API Surface

/** Error categories for CLI subprocess failures. */
export type ErrorCategory = "retryable" | "fatal" | "context_overflow" | "timeout" | "aborted";

/**
 * Classify an error message string into an actionable category.
 *
 * Uses first-match-wins semantics across ordered pattern arrays:
 * 1. Retryable patterns checked first
 * 2. Context overflow patterns checked second
 * 3. Fatal auth patterns checked third
 * 4. Default: "fatal" for unmatched messages
 *
 * Case-insensitive matching throughout.
 */
export function classifyError(message: string): ErrorCategory;

Pattern Specification

All patterns are case-insensitive regex matches against the full error message string.

Retryable Patterns

These indicate transient failures that may succeed on retry:

Pattern Rationale Example Providers
rate.?limit API rate limiting (text and underscore variants) All providers
429 HTTP 429 Too Many Requests All providers
503 HTTP 503 Service Unavailable All providers
overloaded API overloaded responses Claude, Gemini
ETIMEDOUT Network connection timeout All (Node.js network)
ECONNRESET Connection reset by peer All (Node.js network)
ECONNREFUSED Connection refused All (Node.js network)
network Generic network error text All providers

Context Overflow Patterns

These indicate the input exceeds the model's context window:

Pattern Rationale Example Providers
context.?length Context length exceeded Claude
context.?window Context window exceeded Claude, Gemini
context.?overflow Context overflow error Generic
too many tokens Token count exceeded Codex, OpenCode
maximum context Maximum context reached Generic
token.?limit Token limit exceeded Gemini, OpenCode

Fatal Auth Patterns

These indicate unrecoverable authentication/authorization failures:

Pattern Rationale Example Providers
401 HTTP 401 Unauthorized All providers
403 HTTP 403 Forbidden All providers
unauthorized Unauthorized access text All providers
forbidden Forbidden access text All providers
invalid.?key Invalid API key All providers
authentication Authentication failed All providers

Default Behavior

Any error message that does not match retryable, context_overflow, or fatal auth patterns defaults to fatal. This is intentionally conservative — unknown errors should not be retried.

Implementation Notes

  • ~40 lines of production code: Three readonly pattern arrays + single classify function
  • First-match-wins: Check retryable → context_overflow → fatal_auth → default fatal
  • Case-insensitive: All regex patterns should use the i flag
  • Pure function: No state, no side effects, no dependencies beyond the type definition
  • Export both: ErrorCategory type AND classifyError function

Test Requirements

Test file: src/middleware/error-classifier.test.ts

Tests should cover:

Retryable classification

  • Rate limit text variants (rate limit, rate_limit, Rate Limit)
  • HTTP status codes (429, 503)
  • Overloaded API messages
  • Network errors (ETIMEDOUT, ECONNRESET, ECONNREFUSED, generic network error)

Context overflow classification

  • Context length/window/overflow variants
  • Token count messages (too many tokens, token limit)
  • Maximum context reached

Fatal classification

  • Auth errors (401, 403, unauthorized, forbidden)
  • Invalid key variants
  • Authentication failed messages

Default behavior

  • Unknown error messages → fatal
  • Empty string → fatal

Case insensitivity

  • Mixed case variants should classify correctly

First-match-wins verification

  • Message matching multiple categories should classify as the first matching category
    (e.g., a message containing both a retryable and fatal pattern should classify as retryable)

Integration Context

This classifier is consumed by ChannelBridge (#32) which:

  1. Captures stderr from CLI subprocess execution (collected by CLIRuntimeBase)
  2. Passes stderr content through classifyError()
  3. Uses the category to determine: retry with backoff (retryable), reduce context (context_overflow), or surface error to user (fatal)
  4. The errorSubtype field in AgentRunResult may carry the classified category for downstream consumers

The classifier does NOT need to handle timeout or aborted — those are set directly by CLIRuntimeBase (see cli-runtime-base.ts lines 176-190).

Acceptance Criteria

  • src/middleware/error-classifier.ts exports ErrorCategory type and classifyError() function
  • All retryable patterns classified correctly (rate limit variants, 429, 503, overloaded, network errors)
  • All context overflow patterns classified correctly (context length/window/overflow, token limits)
  • All fatal auth patterns classified correctly (401, 403, unauthorized, forbidden, invalid key, authentication)
  • Unknown/empty strings default to fatal
  • Case-insensitive matching works for all patterns
  • First-match-wins ordering verified (retryable > context_overflow > fatal_auth > default)
  • timeout and aborted are in the ErrorCategory type but NOT produced by classifyError()
  • All tests pass: npx vitest run src/middleware/error-classifier.test.ts
  • Full suite passes: npx vitest run

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions