-
Notifications
You must be signed in to change notification settings - Fork 2
feat(llm): PILOT bandit state not persisted — save_bandit_state() not called in save_router_state() #2394
Description
Summary
LinUCB bandit state is never saved to disk. The bandit resets to cold-start (Thompson fallback) on every session restart, making online learning ineffective.
Root Cause
File: crates/zeph-llm/src/any.rs:136
save_router_state() calls save_thompson_state() and save_reputation_state() but NOT save_bandit_state():
pub fn save_router_state(&self) {
if let Self::Router(p) = self {
p.save_thompson_state();
p.save_reputation_state();
// MISSING: p.save_bandit_state();
}
}save_bandit_state() exists on RouterProvider but is never called from the shutdown path.
Observed Behavior (CI-271, 2026-03-29, v0.18.0)
With routing = "bandit" and embedding_provider = "openai-stt":
- Bandit correctly routes:
[status] bandit: routing to openai - Rewards recorded:
bandit: recorded reward provider="openai" reward=0.7 quality=0.7 - After restart: no state file found, bandit starts fresh
Additionally: with default [llm.bandit] config, the bandit section must be under [llm.router.bandit] (not [llm.bandit]) — documentation should clarify this.
Fix
Add p.save_bandit_state(); to AnyProvider::save_router_state() after the existing calls.
Priority
P2 — PILOT feature non-functional for long-term learning, but cold-start fallback to Thompson works correctly.