-
Notifications
You must be signed in to change notification settings - Fork 2
research: PILOT adaptive LLM routing via contextual bandit with budget constraints (arXiv:2508.21141) #2230
Copy link
Copy link
Closed
Labels
P2High value, medium complexityHigh value, medium complexityllmzeph-llm crate (Ollama, Claude)zeph-llm crate (Ollama, Claude)researchResearch-driven improvementResearch-driven improvement
Description
Source
arXiv:2508.21141 — "Adaptive LLM Routing under Budget Constraints" (EMNLP 2025)
Summary
Frames LLM routing as a contextual bandit problem using a shared query-model embedding space and LinUCB-based PILOT algorithm. Adapts online to observed quality feedback without requiring pre-labelled model-query pairs. Includes a multi-choice knapsack cost policy for per-request token budget enforcement.
Applicability to Zeph
HIGH — zeph-llm router (triage/thompson/cascade strategies).
Current Zeph routing is static (triage = rule-based) or heuristic (thompson = reputation sampling). PILOT's online bandit formulation would:
- Adapt routing based on actual quality outcomes observed during use
- Handle budget constraints natively (maps to
[cost] max_daily_cents) - No retraining required — updates from each inference call
Implementation Direction
- New
LlmRoutingStrategy::Banditvariant alongside existingThompson,Triage,Cascade - Shared query-model embedding for provider selection
- Online updates via
[llm.router.reputation]infrastructure (already exists)
Priority: P2 — high-impact improvement to routing quality and cost efficiency
Discovered: CI-211 research scan (2026-03-27)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P2High value, medium complexityHigh value, medium complexityllmzeph-llm crate (Ollama, Claude)zeph-llm crate (Ollama, Claude)researchResearch-driven improvementResearch-driven improvement