-
Notifications
You must be signed in to change notification settings - Fork 2
feat(llm): cascade_chat_stream best-seen response tracking #1722
Copy link
Copy link
Closed
Labels
enhancementNew feature or requestNew feature or requestllmzeph-llm crate (Ollama, Claude)zeph-llm crate (Ollama, Claude)
Description
Context
Cascade routing (PR #1721, closes #1339) implements best-seen response tracking in cascade_chat but not in cascade_chat_stream. This is an asymmetry documented as a known limitation.
Problem
When all providers in the cascade chain fail after exhausting the escalation budget, cascade_chat_stream propagates the error rather than returning the best response seen so far. cascade_chat correctly returns best-seen on exhaustion.
Fix
Add a best_seen: Option<String> buffer in cascade_chat_stream's early-provider loop to accumulate the best non-error response, then return it on budget exhaustion.
Priority
Medium — affects reliability of streaming cascade mode under all-fail conditions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestllmzeph-llm crate (Ollama, Claude)zeph-llm crate (Ollama, Claude)