Skip to content

feat: add model fallback support with TTFT-based timeout#13189

Open
keakon wants to merge 1 commit intoanomalyco:devfrom
keakon:feat/model-fallback
Open

feat: add model fallback support with TTFT-based timeout#13189
keakon wants to merge 1 commit intoanomalyco:devfrom
keakon:feat/model-fallback

Conversation

@keakon
Copy link
Copy Markdown

@keakon keakon commented Feb 11, 2026

What does this PR do?

This PR adds model fallback support for agents. When a primary model fails or is too slow to respond, the system automatically cycles through configured fallback models.
feat #7602 #9575

Key changes:

  • New config options: fallback_models (list of fallback models in priority order) and first_token_timeout (base timeout in ms for first-token detection, scaled by input size: base + inputChars * 0.5)
  • Fallback cycling logic (SessionFallback): builds a deduplicated model list, cycles through all models starting from the last successful one, and remembers which model worked per session/agent pair
  • TTFT timeout: measures time-to-first-token from after LLM.stream() returns (excluding framework init overhead), so the timeout reflects actual network/model latency
  • Short retry before switching: for retryable API errors, one retry attempt is made on the same model before falling back to the next
  • Error handling: context overflow and user abort errors skip fallback (not model-specific); FirstTokenTimeoutError is properly unwrapped from AbortError so it triggers fallback instead of
    being treated as user cancellation
  • Toast notification: when a fallback switch occurs, a toast is shown indicating the model change
  • Edge case fixes: clamps out-of-bounds fallbackStartIndex to prevent infinite loops when models are removed from config; resets both indices when the remembered model fails to resolve

Files changed:

  • src/session/fallback.ts — new module for fallback state management and model cycling
  • src/session/processor.ts — main integration: TTFT timer, fallback loop, error classification
  • src/agent/agent.ts / src/config/config.ts — schema additions for fallback_models and first_token_timeout
  • src/session/prompt.ts / src/session/message-v2.ts — pass fallback config to processor; handle FirstTokenTimeoutError

How did you verify your code works?

  • Added unit tests for SessionFallback (buildModelList, nextIndex cycling, recordSuccess/getStartIndex, full cycle simulations including remembered-model and out-of-bounds edge cases) — see test/session/fallback.test.ts
  • Added unit tests for TTFT computation (estimateInputChars, computeTtftTimeout with various input sizes) and FirstTokenTimeoutError serialization — see test/session/processor-ttft.test.ts
  • Manual testing with multiple model configurations to verify fallback cycling, TTFT timeout triggering, and toast notifications

When a primary model fails or is too slow to respond, the system
automatically cycles through configured fallback models.

- Add `fallback_models` and `first_token_timeout` config options
- Implement SessionFallback module for model cycling and state tracking
- Measure TTFT from HTTP request initiation (excluding framework init)
- Add short retry before switching models for retryable API errors
- Clamp out-of-bounds fallbackStartIndex to prevent infinite loops
- Handle FirstTokenTimeoutError separately from user abort
- Show toast notification on fallback model switch
- Add unit tests for fallback cycling and TTFT computation
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions
Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

Based on my search, I found two potentially related PRs:

  1. PR fix(copilot): add gpt-5.3-codex fallback model #13172 - fix(copilot): add gpt-5.3-codex fallback model

  2. PR feat: add runtime model fallback on retry exhaustion #11739 - feat: add runtime model fallback on retry exhaustion

  3. PR fix(opencode): correct model fallback index tracking and config parsing #8669 - fix(opencode): correct model fallback index tracking and config parsing

These PRs may represent earlier attempts at implementing fallback functionality or related features. PR #13189 appears to be a more comprehensive implementation with TTFT-based timeout support.

@aaryan-rampal
Copy link
Copy Markdown

Seems useful. Any updates on getting this merged?
Otherwise, I wonder if this adds latency when deciding to switch models. Is it always running a token speed test? Is it only on delegation? I would also like to add multiple options for selecting a model. A simple sequential (with wrap around), round robin, etc. Perhaps we can even let user define their own logic. For my purposes, I want this functionality but with a simple sequential ordering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants