-
Notifications
You must be signed in to change notification settings - Fork 2
Cascade routing: try cheap model first, escalate on quality threshold #1339
Copy link
Copy link
Closed
Labels
enhancementNew feature or requestNew feature or requestresearchResearch-driven improvementResearch-driven improvement
Description
Research
ICLR 2025 paper on unified routing + cascading shows consistent 4% improvement. Pattern: route to cheapest model first, run quality evaluator, escalate if below threshold. Complements existing Thompson Sampling and EMA routing.
Proposal
- Add cascade mode to router: try cheapest provider first
- Implement lightweight output quality classifier using self-learning data
- Escalate to next provider only if quality prediction below threshold
Sources
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestresearchResearch-driven improvementResearch-driven improvement