feat: handle large stream chunks responses to support Nano Banana [Test Needed]#18884
Conversation
|
fixes #17626 |
|
@Classic298 @rgaricano If you have time, please review the code. This code has been running in our production environment for over six months without any issues detected, but it's still advisable to take a careful look. |
|
@ShirasawaSama, |
|
& if we won't use a buffer for store the long lines (to prevent memory issues when dealing with very large lines) also we can proccess directly the chunks: |
|
Maybe a mixed solution, using a limited buffer and chunking long lines as they arrive: |
|
@rgaricano Have you tested your code? Because I noticed that it seems to require returning a complete line each time, rather than a portion of a line. Additionally, regarding why row count limitations are necessary: My production cluster has multiple instances with Redis enabled. Sometimes models include built-in MCP calls, which transmit the entire call process. When executing web searches, this also sends web data. Occasionally, when reading PDF pages, it directly transmits tens of megabytes of data. Combined with socket.io broadcasts across multiple instances, this caused my Redis cluster to crash.
|
|
@ShirasawaSama Lines are chunked data, when we proccess the stream we also are chunking the data in lines, that isn't a problem (at least it doesn't aggravate the problem... which in fact exists due to the use of lines as delimiters when SSE) Probably for manage that situation is better use io.BytesIO and a controlled line_partial_size if neccessary. |
|
Leaving aside buffering, At the very least, it should have the option to delete or split large content. e.g. for remove or split (mantaining integrity of possible JSON objects in data) with 2 env vars: |
|
@rgaricano Do you think Additionally, I feel your code is a bit too complex, making it nearly impossible for others to maintain later on. In fact, the major issues I've encountered aren't limited to the I added the feature to skip excessively long lines because we previously discovered that certain third-party model providers would output massive amounts of unnecessary content within a single line. This would directly cause the Redis cluster connected to OpenWebUI to crash and result in an instantaneous surge of traffic on the OpenWebUI backend servers. We simply need to skip these lines entirely. This is purely a protective measure. |
|
@silentoplayz Are there any error messages? Can you see the file size of the output image in base64? |
There aren't any error messages thrown/triggered to be displayed that I am aware of. I've checked both frontend+backend logs and the browser console and there's no error. As for network requests, here's what that looks like from the start of a new chat with I've tested on both Chrome and Firefox web browsers. |
|
Nano Banana support was added in dev. Is this PR still needed? |
|
|
||
|
|
||
| CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE = os.environ.get( | ||
| "CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE", "10485760" # 10MB |
There was a problem hiding this comment.
Any reasons for setting it to 10MB specifically?
There was a problem hiding this comment.
10MB is the typical size for the base64 strings returned by most image generation models. This has been tested with models like Gemini Image 2.5, Qwen Image Edit, Doubao Seedream, and GPT Image.
Of course, I can accommodate values larger or smaller than this. This limit is solely to prevent LLM from returning excessively large data in a single response that could crash the backend service.
There was a problem hiding this comment.
(All data below is single-line after base64 encoding) For Gemini Image 2.5, the image size is approximately under 2.5MB; for gpt image, it's around 3.5MB; and for doubao seedream, it's roughly under 7MB. Therefore, I consider 10MB to be an acceptable value.
|
@silentoplayz Could you please provide the complete SSE data from OpenRouter? Nin can directly use JavaScript's fetch method; it's possible that the issue lies solely with OpenRouter's API returning data. |
Sorry, but the OpenRouter API key I used to test with was given to me for testing purposes only and I don't know how to obtain the SSE data from OpenRouter myself. I don't have access to the OpenRouter dashboard or anything like that. |
bb78ce9 to
ce1079d
Compare
@silentoplayz I confirmed this is because the image data returned by OpenRouter appears non-standard. It places the image base64 in I recommend opening a separate PR to handle this non-standard output. And, this does not affect my current PR handling large single-line SSE text. |
|
@ShirasawaSama this generally looks good and seems like it can be merged as-is, another qq: will this potentially effect any existing behaviours? |
I cannot guarantee with absolute certainty that there will be no other repercussions. However, this code has been running in our production environment for over eight months without any incidents or user-reported bugs. |
|
Merging this but this will be disabled by default for now, Thanks! |






Pull Request Checklist
Note to first-time contributors: Please open a discussion post in Discussions and describe your changes before submitting a pull request.
Before submitting, make sure you've checked the following:
devbranch. Not targeting thedevbranch may lead to immediate closure of the PR.Changelog Entry
Description
Handle large stream chunks responses to support
gemini-2.5-flash-imagemodel (fixes #17626)Additionally, some third-party service providers may return excessively large data sets (>100MB, such as web search data), necessitating length restrictions.
Therefore, an environment variable named
CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE(in bytes) has been introduced, which can be used to set the maximum read length for eachAdded
Changed
Deprecated
Removed
Fixed
Security
Breaking Changes
Additional Information
Screenshots or Videos
Contributor License Agreement
By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.