Conversation
…dit (#2230, #2192) Add LlmRoutingStrategy::Bandit — an online contextual bandit router based on the LinUCB algorithm from "Adaptive LLM Routing under Budget Constraints" (arXiv:2508.21141). Key properties: - Feature vector: query embedding truncated to configurable dim (default 32), L2-normalized - Selection: argmax UCB = theta^T*x + alpha*sqrt(x^T*A^{-1}*x) with Gaussian elimination (partial pivoting) - Online updates: A += x*xT, b += reward*x after each inference call - Cold start: falls back to Thompson sampling until warmup_queries threshold is reached - Embed cache: 512-entry FIFO cache with 50ms hard timeout; zero-vector fallback on miss - Config validation: dim clamped to [1,256], alpha > 0, decay_factor in (0,1] - State persistence: atomic JSON write, same pattern as Thompson/Reputation Budget enforcement uses a closure (budget_filter: Box<dyn Fn(&str) -> bool + Send + Sync>) to avoid a circular dependency between zeph-llm and zeph-core's CostTracker. Also documents SLM provider recommendations (#2192): adds doc comments on all *_provider config fields identifying subsystems suitable for lightweight models (gpt-4o-mini, claude-haiku-4-5, qwen3:8b), with an SLM guide table in config/default.toml. Closes #2230, closes #2192.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
LlmRoutingStrategy::Bandit— a PILOT-inspired LinUCB contextual bandit router that adapts online to observed quality feedback without retraining (research: PILOT adaptive LLM routing via contextual bandit with budget constraints (arXiv:2508.21141) #2230)config/default.toml(research(architecture): SLMs are the Future of Agentic AI — LLM-to-SLM conversion algorithm (arXiv:2506.02153) #2192)bandit.rs; 7188/7188 tests passChanges
New:
crates/zeph-llm/src/router/bandit.rsLinUCB bandit core:
BanditState,LinUcbArm,BanditConfig,BanditEmbedCache.Key properties:
dim(default 32), L2-normalizedwarmup_queriesthresholdModified:
crates/zeph-llm/src/router/mod.rsWires
RouterStrategy::Banditthroughchat/stream/toolspaths. Budget enforcement uses abudget_filter: Box<dyn Fn(&str) -> bool + Send + Sync>closure to avoid circular dependency withzeph-core::CostTracker.Modified:
crates/zeph-config/src/providers.rsBanditConfigstruct with all fields;LlmRoutingStrategy::Banditvariant.Modified:
crates/zeph-core/src/bootstrap/provider.rsWires bandit from config at bootstrap.
Modified:
config/default.toml[llm.router.bandit]config section; SLM guide table documenting 10 subsystems suitable for lightweight models.Test plan
cargo +nightly fmt --check— passcargo clippy --all-targets --features full --workspace -- -D warnings— passcargo nextest run --workspace --features full --lib --bins— 7188/7188 passCloses #2230, closes #2192.