Skip to content

feat(llm): cascade routing — try cheap provider first, escalate on degenerate output#1721

Merged
bug-ops merged 2 commits intomainfrom
cascade-routing-try-cheap-mode
Mar 14, 2026
Merged

feat(llm): cascade routing — try cheap provider first, escalate on degenerate output#1721
bug-ops merged 2 commits intomainfrom
cascade-routing-try-cheap-mode

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 14, 2026

Closes #1339.

Summary

Implements RouterStrategy::Cascade in zeph-llm. When strategy = "cascade" is configured, the router tries providers in chain order (cheapest first) and escalates to the next provider only when the response is classified as degenerate.

  • Heuristic classifier (default, zero LLM calls): detects degenerate outputs — empty, repetitive, incoherent, truncated. Accurately documented as a degenerate-output detector, not a semantic quality gate
  • Judge mode: optional LLM-based quality scoring with automatic fallback to heuristic
  • Best-seen response returned on budget exhaustion (not NoProviders error)
  • max_cascade_tokens caps cumulative token cost across escalation levels
  • Skipped for chat_with_tools (falls through to standard routing)
  • Thompson/EMA distributions not contaminated by quality-based failures
  • Config boundary validation: quality_threshold clamped to [0.0, 1.0], window_size minimum 1, collect_stream capped at 1 MiB

Configuration

[llm.router]
strategy = "cascade"

[llm.router.cascade]
quality_threshold = 0.5     # escalate if score below this
max_escalations = 2         # max provider hops
classifier_mode = "heuristic"  # or "judge"
window_size = 10            # rolling quality window
# max_cascade_tokens = 4096  # optional token budget cap

Known limitations (follow-up issues)

  • cascade_chat_stream does not track best-seen response (asymmetric with cascade_chat)
  • ClassifierMode::Judge wired but not implemented — falls through to heuristic
  • cost_tiers config deferred (providers ordered by chain order for now)
  • Non-escalation streaming path replays as a single chunk (no incremental streaming on cheap-model path)

Test plan

  • Unit tests for heuristic classifier (all signal types)
  • Unit tests for escalation logic (threshold, budget, best-seen)
  • Regression test for BUG-1 (best-seen at budget exhaustion)
  • Unit test for token estimation min=1 (BUG-2)
  • cargo nextest run --workspace --features full: 5404/5404 passed

@github-actions github-actions bot added documentation Improvements or additions to documentation llm zeph-llm crate (Ollama, Claude) rust Rust code changes core zeph-core crate enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 14, 2026
…generate output

Implements RouterStrategy::Cascade in zeph-llm (closes #1339).

When `strategy = "cascade"` is configured, the router tries providers in
chain order and escalates to the next only when the response is classified
as degenerate (empty, repetitive, incoherent, truncated).

Key behaviors:
- Heuristic classifier (default): zero LLM calls, detects degenerate outputs
  only (not semantic failures) — documented accurately
- Judge mode: LLM-based quality scoring with automatic fallback to heuristic
- Network/API errors do not consume escalation budget
- Best-seen response returned on budget exhaustion (not NoProviders)
- max_cascade_tokens caps cumulative token cost across escalation levels
- Skipped for chat_with_tools (falls through to standard routing)
- Thompson/EMA distributions not contaminated by quality-based failures
- quality_threshold validated at config boundary (clamped to [0.0, 1.0])
- window_size clamped to minimum 1
- collect_stream capped at 1 MiB to prevent unbounded allocation
- Token estimation uses chars().count() for correct non-ASCII handling

Config: [llm.router.cascade] with quality_threshold (default 0.5),
max_escalations (default 2), classifier_mode, window_size, max_cascade_tokens.
@bug-ops bug-ops force-pushed the cascade-routing-try-cheap-mode branch from 91cd1cf to 5edee99 Compare March 14, 2026 00:55
@bug-ops bug-ops enabled auto-merge (squash) March 14, 2026 00:55
…zard, tests

- REV-01: record_availability(false) on stream-open errors in cascade_chat_stream
  early-provider loop, symmetric with cascade_chat error handling
- REV-02: add doc comment to RouterConfig::chain field
- REV-03: add round-trip serde test for RouterStrategyConfig::Cascade
- REV-04: add Cascade option to --init wizard step_router with quality_threshold
  and max_escalations prompts; wire through WizardState and build_config
@bug-ops bug-ops merged commit 4b23454 into main Mar 14, 2026
15 checks passed
@bug-ops bug-ops deleted the cascade-routing-try-cheap-mode branch March 14, 2026 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation enhancement New feature or request llm zeph-llm crate (Ollama, Claude) rust Rust code changes size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cascade routing: try cheap model first, escalate on quality threshold

1 participant