feat(llm): PILOT LinUCB bandit routing strategy with SLM subsystem audit by bug-ops · Pull Request #2390 · bug-ops/zeph

bug-ops · 2026-03-29T02:19:02Z

Summary

Implements LlmRoutingStrategy::Bandit — a PILOT-inspired LinUCB contextual bandit router that adapts online to observed quality feedback without retraining (research: PILOT adaptive LLM routing via contextual bandit with budget constraints (arXiv:2508.21141) #2230)
Documents SLM provider recommendations for all narrow/repetitive subsystems via doc comments and a guide table in config/default.toml (research(architecture): SLMs are the Future of Agentic AI — LLM-to-SLM conversion algorithm (arXiv:2506.02153) #2192)
29 unit tests added in bandit.rs; 7188/7188 tests pass

Changes

New: `crates/zeph-llm/src/router/bandit.rs`

LinUCB bandit core: BanditState, LinUcbArm, BanditConfig, BanditEmbedCache.
Key properties:

Feature vector: query embedding truncated to dim (default 32), L2-normalized
Selection: argmax UCB = θᵀx + α√(xᵀA⁻¹x) via Gaussian elimination with partial pivoting
Online updates: A += xxᵀ, b += reward·x after each inference call
Cold start: Thompson fallback until warmup_queries threshold
Embed cache: 512-entry FIFO + 50ms hard timeout; zero-vector fallback on miss
Config validation: dim clamped to [1,256], alpha > 0, decay_factor ∈ (0,1]
State persistence: atomic JSON write (same pattern as Thompson/Reputation)

Modified: `crates/zeph-llm/src/router/mod.rs`

Wires RouterStrategy::Bandit through chat/stream/tools paths. Budget enforcement uses a budget_filter: Box<dyn Fn(&str) -> bool + Send + Sync> closure to avoid circular dependency with zeph-core::CostTracker.

Modified: `crates/zeph-config/src/providers.rs`

BanditConfig struct with all fields; LlmRoutingStrategy::Bandit variant.

Modified: `crates/zeph-core/src/bootstrap/provider.rs`

Wires bandit from config at bootstrap.

Modified: `config/default.toml`

[llm.router.bandit] config section; SLM guide table documenting 10 subsystems suitable for lightweight models.

Test plan

cargo +nightly fmt --check — pass
cargo clippy --all-targets --features full --workspace -- -D warnings — pass
cargo nextest run --workspace --features full --lib --bins — 7188/7188 pass
Adversarial architecture critique reviewed and addressed (2 BLOCKING, 6 MAJOR resolved)
Impl critique reviewed (2 MAJOR fixed: Thompson cold-start initialization, zero-vector fallback for embed failures)
Security audit: no critical/high findings (3 MEDIUM addressed: config validation added)

Closes #2230, closes #2192.

…dit (#2230, #2192) Add LlmRoutingStrategy::Bandit — an online contextual bandit router based on the LinUCB algorithm from "Adaptive LLM Routing under Budget Constraints" (arXiv:2508.21141). Key properties: - Feature vector: query embedding truncated to configurable dim (default 32), L2-normalized - Selection: argmax UCB = theta^T*x + alpha*sqrt(x^T*A^{-1}*x) with Gaussian elimination (partial pivoting) - Online updates: A += x*xT, b += reward*x after each inference call - Cold start: falls back to Thompson sampling until warmup_queries threshold is reached - Embed cache: 512-entry FIFO cache with 50ms hard timeout; zero-vector fallback on miss - Config validation: dim clamped to [1,256], alpha > 0, decay_factor in (0,1] - State persistence: atomic JSON write, same pattern as Thompson/Reputation Budget enforcement uses a closure (budget_filter: Box<dyn Fn(&str) -> bool + Send + Sync>) to avoid a circular dependency between zeph-llm and zeph-core's CostTracker. Also documents SLM provider recommendations (#2192): adds doc comments on all *_provider config fields identifying subsystems suitable for lightweight models (gpt-4o-mini, claude-haiku-4-5, qwen3:8b), with an SLM guide table in config/default.toml. Closes #2230, closes #2192.

github-actions bot added documentation Improvements or additions to documentation llm zeph-llm crate (Ollama, Claude) rust Rust code changes core zeph-core crate config Configuration file changes labels Mar 29, 2026

bug-ops enabled auto-merge (squash) March 29, 2026 02:19

github-actions bot added enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 29, 2026

bug-ops merged commit 5f49eb5 into main Mar 29, 2026
27 checks passed

bug-ops deleted the feat-issue-2192-research-architecture-slms-are branch March 29, 2026 02:26

bug-ops mentioned this pull request Mar 30, 2026

research(llm): BaRP preference-conditioned bandit routing — runtime cost/quality trade-off dial (arXiv:2510.07429) #2415

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): PILOT LinUCB bandit routing strategy with SLM subsystem audit#2390

feat(llm): PILOT LinUCB bandit routing strategy with SLM subsystem audit#2390
bug-ops merged 1 commit intomainfrom
feat-issue-2192-research-architecture-slms-are

bug-ops commented Mar 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bug-ops commented Mar 29, 2026

Summary

Changes

New: crates/zeph-llm/src/router/bandit.rs

Modified: crates/zeph-llm/src/router/mod.rs

Modified: crates/zeph-config/src/providers.rs

Modified: crates/zeph-core/src/bootstrap/provider.rs

Modified: config/default.toml

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New: `crates/zeph-llm/src/router/bandit.rs`

Modified: `crates/zeph-llm/src/router/mod.rs`

Modified: `crates/zeph-config/src/providers.rs`

Modified: `crates/zeph-core/src/bootstrap/provider.rs`

Modified: `config/default.toml`