research: PILOT adaptive LLM routing via contextual bandit with budget constraints (arXiv:2508.21141)

## Source

arXiv:2508.21141 — "Adaptive LLM Routing under Budget Constraints" (EMNLP 2025)

## Summary

Frames LLM routing as a contextual bandit problem using a shared query-model embedding space and LinUCB-based PILOT algorithm. Adapts online to observed quality feedback without requiring pre-labelled model-query pairs. Includes a multi-choice knapsack cost policy for per-request token budget enforcement.

## Applicability to Zeph

**HIGH** — `zeph-llm` router (triage/thompson/cascade strategies).

Current Zeph routing is static (triage = rule-based) or heuristic (thompson = reputation sampling). PILOT's online bandit formulation would:
- Adapt routing based on actual quality outcomes observed during use
- Handle budget constraints natively (maps to `[cost] max_daily_cents`)
- No retraining required — updates from each inference call

## Implementation Direction

- New `LlmRoutingStrategy::Bandit` variant alongside existing `Thompson`, `Triage`, `Cascade`
- Shared query-model embedding for provider selection
- Online updates via `[llm.router.reputation]` infrastructure (already exists)

**Priority**: P2 — high-impact improvement to routing quality and cost efficiency  
**Discovered**: CI-211 research scan (2026-03-27)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research: PILOT adaptive LLM routing via contextual bandit with budget constraints (arXiv:2508.21141) #2230

Source

Summary

Applicability to Zeph

Implementation Direction

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research: PILOT adaptive LLM routing via contextual bandit with budget constraints (arXiv:2508.21141) #2230

Description

Source

Summary

Applicability to Zeph

Implementation Direction

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions