feat(llm): cascade routing — try cheap provider first, escalate on degenerate output#1721
Merged
feat(llm): cascade routing — try cheap provider first, escalate on degenerate output#1721
Conversation
This was referenced Mar 14, 2026
…generate output Implements RouterStrategy::Cascade in zeph-llm (closes #1339). When `strategy = "cascade"` is configured, the router tries providers in chain order and escalates to the next only when the response is classified as degenerate (empty, repetitive, incoherent, truncated). Key behaviors: - Heuristic classifier (default): zero LLM calls, detects degenerate outputs only (not semantic failures) — documented accurately - Judge mode: LLM-based quality scoring with automatic fallback to heuristic - Network/API errors do not consume escalation budget - Best-seen response returned on budget exhaustion (not NoProviders) - max_cascade_tokens caps cumulative token cost across escalation levels - Skipped for chat_with_tools (falls through to standard routing) - Thompson/EMA distributions not contaminated by quality-based failures - quality_threshold validated at config boundary (clamped to [0.0, 1.0]) - window_size clamped to minimum 1 - collect_stream capped at 1 MiB to prevent unbounded allocation - Token estimation uses chars().count() for correct non-ASCII handling Config: [llm.router.cascade] with quality_threshold (default 0.5), max_escalations (default 2), classifier_mode, window_size, max_cascade_tokens.
91cd1cf to
5edee99
Compare
…zard, tests - REV-01: record_availability(false) on stream-open errors in cascade_chat_stream early-provider loop, symmetric with cascade_chat error handling - REV-02: add doc comment to RouterConfig::chain field - REV-03: add round-trip serde test for RouterStrategyConfig::Cascade - REV-04: add Cascade option to --init wizard step_router with quality_threshold and max_escalations prompts; wire through WizardState and build_config
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1339.
Summary
Implements
RouterStrategy::Cascadeinzeph-llm. Whenstrategy = "cascade"is configured, the router tries providers in chain order (cheapest first) and escalates to the next provider only when the response is classified as degenerate.NoProviderserror)max_cascade_tokenscaps cumulative token cost across escalation levelschat_with_tools(falls through to standard routing)quality_thresholdclamped to [0.0, 1.0],window_sizeminimum 1,collect_streamcapped at 1 MiBConfiguration
Known limitations (follow-up issues)
cascade_chat_streamdoes not track best-seen response (asymmetric withcascade_chat)ClassifierMode::Judgewired but not implemented — falls through to heuristiccost_tiersconfig deferred (providers ordered by chain order for now)Test plan
cargo nextest run --workspace --features full: 5404/5404 passed