Merged
Conversation
Contributor
any way to confirm this for sure before merging? |
Contributor
Author
|
|
Contributor
seanshi-scale
left a comment
There was a problem hiding this comment.
could you write how changes were tested?
edgan8
approved these changes
Mar 28, 2024
Contributor
edgan8
left a comment
There was a problem hiding this comment.
Talked with Katie, sse starlette was on an older version two weeks ago so this looks good. Could you add more details to the PR explaining the context for what broke and why we need this specific version?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Summary
The updated version has weird behavior with streaming in the http-forwarder (tokens getting streamed back were batched so time to first token was very long; we expect to get tokens back at a steady rate instead)
I tried sse-starlette version 1.8.2 to see if it was the major version bump (1.8.2 -> 2.0.0) that broke things, but 1.8.2 still had the weird behavior, so downgrading to the original version of 1.6.1 from before the security scan updates
Should still be fine with security scan
Test Plan and Usage Guide
Tested that using this image for the HTTP forwarder of a Llama 2 endpoint in the training cluster fixes the oncall issue of streaming time to first token being long (via curling localhost:5000 from the HTTP forwarder)