feat(llm): add best-seen response tracking to cascade_chat_stream#1753
Merged
feat(llm): add best-seen response tracking to cascade_chat_stream#1753
Conversation
cascade_chat_stream now accumulates best_seen: Option<(String, f64)> in the early-provider loop, mirroring the existing cascade_chat behavior. On budget exhaustion (should_escalate=true), returns the highest-scoring prior response instead of the current one. On last-provider stream error, returns best_seen if available rather than propagating the error. Closes #1722.
9a8fd51 to
99bcdc5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
best_seen: Option<(String, f64)>accumulation incascade_chat_stream's early-provider loop, mirroring the existingcascade_chatbehavior introduced in feat(llm): cascade routing — try cheap provider first, escalate on degenerate output #1721should_escalate=true, returns the highest-scoring prior response instead of the current provider's responsebest_seenif available rather than propagating the errorTest plan
cascade_stream_budget_returns_best_seen— budget exhausts in early loop, best_seen returnedcascade_stream_budget_returns_best_seen_not_current— 4 providers, budget=17 tokens, p1 (16t) < p2 (1t) → p1 returned over p2 at exhaustioncascade_stream_last_fails_returns_best_seen— last provider errors, best_seen from early providers returnedcascade_stream_all_fail_returns_error— no best_seen accumulated, error propagatedAll 5431 tests pass.
Closes #1722.