Skip to content

research: PILOT adaptive LLM routing via contextual bandit with budget constraints (arXiv:2508.21141) #2230

@bug-ops

Description

@bug-ops

Source

arXiv:2508.21141 — "Adaptive LLM Routing under Budget Constraints" (EMNLP 2025)

Summary

Frames LLM routing as a contextual bandit problem using a shared query-model embedding space and LinUCB-based PILOT algorithm. Adapts online to observed quality feedback without requiring pre-labelled model-query pairs. Includes a multi-choice knapsack cost policy for per-request token budget enforcement.

Applicability to Zeph

HIGHzeph-llm router (triage/thompson/cascade strategies).

Current Zeph routing is static (triage = rule-based) or heuristic (thompson = reputation sampling). PILOT's online bandit formulation would:

  • Adapt routing based on actual quality outcomes observed during use
  • Handle budget constraints natively (maps to [cost] max_daily_cents)
  • No retraining required — updates from each inference call

Implementation Direction

  • New LlmRoutingStrategy::Bandit variant alongside existing Thompson, Triage, Cascade
  • Shared query-model embedding for provider selection
  • Online updates via [llm.router.reputation] infrastructure (already exists)

Priority: P2 — high-impact improvement to routing quality and cost efficiency
Discovered: CI-211 research scan (2026-03-27)

Metadata

Metadata

Assignees

Labels

P2High value, medium complexityllmzeph-llm crate (Ollama, Claude)researchResearch-driven improvement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions