feat: handle large stream chunks responses to support Nano Banana [Test Needed] by ShirasawaSama · Pull Request #18884 · open-webui/open-webui

ShirasawaSama · 2025-11-03T09:53:22Z

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions and describe your changes before submitting a pull request.

Before submitting, make sure you've checked the following:

Target branch: Verify that the pull request targets the dev branch. Not targeting the dev branch may lead to immediate closure of the PR.
Description: Provide a concise description of the changes made in this pull request.
Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
Documentation: If necessary, update relevant documentation Open WebUI Docs like environment variables, the tutorials, or other documentation sources.
Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
Testing: Perform manual tests to verify the implemented fix/feature works as intended AND does not break any other functionality. Take this as an opportunity to make screenshots of the feature/fix and include it in the PR description.
Agentic AI Code:: Confirm this Pull Request is not written by any AI Agent or has at least gone through additional human review and manual testing. If any AI Agent is the co-author of this PR, it may lead to immediate closure of the PR.
Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
Title Prefix: To clearly categorize this pull request, prefix the pull request title using one of the following:
- BREAKING CHANGE: Significant changes that may affect compatibility
- build: Changes that affect the build system or external dependencies
- ci: Changes to our continuous integration processes or workflows
- chore: Refactor, cleanup, or other non-functional code changes
- docs: Documentation update or addition
- feat: Introduces a new feature or enhancement to the codebase
- fix: Bug fix or error correction
- i18n: Internationalization or localization changes
- perf: Performance improvement
- refactor: Code restructuring for better maintainability, readability, or scalability
- style: Changes that do not affect the meaning of the code (white space, formatting, missing semi-colons, etc.)
- test: Adding missing tests or correcting existing tests
- WIP: Work in progress, a temporary label for incomplete or ongoing work

Changelog Entry

Description

Handle large stream chunks responses to support gemini-2.5-flash-image model (fixes #17626)

Additionally, some third-party service providers may return excessively large data sets (>100MB, such as web search data), necessitating length restrictions.

Therefore, an environment variable named CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE (in bytes) has been introduced, which can be used to set the maximum read length for each

Added

feat: handle large stream chunks responses

Changed

[List any changes, updates, refactorings, or optimizations]

Deprecated

[List any deprecated functionality or features that have been removed]

Removed

[List any removed features, files, or functionalities]

Fixed

[List any fixes, corrections, or bug fixes]

Security

[List any new or updated security-related changes, including vulnerability fixes]

Breaking Changes

BREAKING CHANGE: [List any breaking changes affecting compatibility or functionality]

Additional Information

Screenshots or Videos

[Attach any relevant screenshots or videos demonstrating the changes]

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.

ShirasawaSama · 2025-11-03T09:54:03Z

fixes #17626

ShirasawaSama · 2025-11-03T10:00:15Z

@Classic298 @rgaricano If you have time, please review the code.

This code has been running in our production environment for over six months without any issues detected, but it's still advisable to take a careful look.

Classic298 · 2025-11-03T10:05:06Z

CC @silentoplayz

rgaricano · 2025-11-03T11:22:10Z

@ShirasawaSama,
Not sure about this approach,
why remove longer lines?
its can be splitted in shorter chunks, proccessed and removed from the buffer.
e.g. (the max buffer size & and all the transmited data are mantained).

async def handle_large_stream_chunks(stream: aiohttp.StreamReader, max_buffer_size: int = CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE):
    """
    Handle large stream chunks for streaming responses.
    Breaks large lines into manageable chunks without skipping any data.
    
    :param stream: The stream reader to handle.
    :param max_buffer_size: The maximum size of each chunk to yield.
    :return: An async generator yielding chunks of the stream.
    """
    buffer = bytearray()
    
    async for chunk, _ in stream.iter_chunks():
        if not chunk:
            continue
            
        # Add new chunk to buffer
        buffer.extend(chunk)
        
        # Process all complete lines
        while b'\n' in buffer:
            # Find first newline position
            line_end = buffer.find(b'\n')
            line = buffer[:line_end]
            
            # Process line in chunks if it's too large
            if len(line) > max_buffer_size:
                # Break large line into chunks
                for i in range(0, len(line), max_buffer_size):
                    yield line[i:i + max_buffer_size]
            else:
                yield line
            
            # Remove processed line from buffer
            buffer = buffer[line_end + 1:]
            
    # Yield remaining data if any
    if buffer:
        yield buffer

rgaricano · 2025-11-03T11:48:03Z

& if we won't use a buffer for store the long lines (to prevent memory issues when dealing with very large lines) also we can proccess directly the chunks:

async def handle_large_stream_chunks(stream: aiohttp.StreamReader, max_buffer_size: int = CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE):
    """
    Handle large stream chunks without storing entire large lines in memory.
    
    :param stream: The stream reader to handle.
    :param max_buffer_size: The maximum size of each chunk to yield.
    :return: An async generator yielding chunks of the stream.
    """
    async for chunk, _ in stream.iter_chunks():
        if not chunk:
            continue
            
        # Process each chunk directly without storing in a buffer
        lines = chunk.split(b'\n')
        
        for line in lines[:-1]:
            # Process complete lines
            if len(line) > max_buffer_size:
                # Break large line into chunks
                for i in range(0, len(line), max_buffer_size):
                    yield line[i:i + max_buffer_size]
            else:
                yield line
                
        # Handle last partial line if exists
        if lines and lines[-1]:
            yield lines[-1]

rgaricano · 2025-11-03T11:52:52Z

Maybe a mixed solution, using a limited buffer and chunking long lines as they arrive:

async def handle_large_stream_chunks(stream: aiohttp.StreamReader, 
                                    max_buffer_size: int = CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE):
    """
    Handle large stream chunks with a buffer that processes chunks as they arrive,
    without storing entire lines in memory.
    
    :param stream: The stream reader to handle.
    :param max_buffer_size: The maximum size of each chunk to yield.
    :return: An async generator yielding chunks of the stream.
    """
    # Initialize buffer with first chunk to start processing
    first_chunk = await stream.readany()
    if not first_chunk:
        return
    
    # Start processing from the beginning of the first chunk
    current_line = first_chunk
    
    async for chunk in stream:
        if not chunk:
            continue
            
        # Combine current_line with new chunk
        data = current_line + chunk
        
        # Process complete lines
        lines = data.split(b'\n')
        
        for line in lines[:-1]:
            # If line exceeds max size, break it into chunks
            if len(line) > max_buffer_size:
                for i in range(0, len(line), max_buffer_size):
                    yield line[i:i + max_buffer_size]
            else:
                yield line
                
        # Handle last partial line
        if lines:
            current_line = lines[-1]
            if len(current_line) > max_buffer_size:
                # If the partial line is too large, break it into chunks
                for i in range(0, len(current_line), max_buffer_size):
                    yield current_line[i:i + max_buffer_size]
            else:
                current_line = lines[-1]  # Keep partial line for next iteration
                
    # Process remaining data in current_line
    if current_line:
        if len(current_line) > max_buffer_size:
            for i in range(0, len(current_line), max_buffer_size):
                yield current_line[i:i + max_buffer_size]
        else:
            yield current_line

ShirasawaSama · 2025-11-03T12:03:15Z

@rgaricano Have you tested your code?

Because I noticed that it seems to require returning a complete line each time, rather than a portion of a line.

https://github.com/open-webui/open-webui/blob/main/backend/open_webui/utils/middleware.py#L2321

Additionally, regarding why row count limitations are necessary: My production cluster has multiple instances with Redis enabled. Sometimes models include built-in MCP calls, which transmit the entire call process.

When executing web searches, this also sends web data. Occasionally, when reading PDF pages, it directly transmits tens of megabytes of data. Combined with socket.io broadcasts across multiple instances, this caused my Redis cluster to crash.

In fact, to fix the issue where the Redis cluster was overwhelmed by socket.io broadcast traffic, I converted the chat socket.io data back to SSE and switched to incremental updates. This reduced the original full request size of over 70MB to just 300KB. However, I believe Tim is highly unlikely to accept this code modification.

rgaricano · 2025-11-03T13:03:14Z

@ShirasawaSama
No, I haven't tried it (and probably won't be able to try it in the short term).

Lines are chunked data, when we proccess the stream we also are chunking the data in lines, that isn't a problem (at least it doesn't aggravate the problem... which in fact exists due to the use of lines as delimiters when SSE)

Probably for manage that situation is better use io.BytesIO and a controlled line_partial_size if neccessary.
(But it doesn't matter, it can be changed later)

rgaricano · 2025-11-03T15:04:25Z

Leaving aside buffering,
Another question: what happend when there are JSON objects in the stream, and if somewant not want remove large content?

At the very least, it should have the option to delete or split large content.

e.g. for remove or split (mantaining integrity of possible JSON objects in data)

async def handle_large_stream_chunks(
    stream: aiohttp.StreamReader, 
    max_buffer_size: int = CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE,
    split_large_content: bool = True  # New parameter
):
    """
    Handle stream response chunks with configurable behavior for oversized lines.

    :param stream: The stream reader to handle.
    :param max_buffer_size: The maximum buffer size in bytes.
    :param split_large_content: If True, split large content; if False, skip it.
    :return: An async generator that yields the stream data.
    """

    buffer = b""
    skip_mode = False

    async for data, _ in stream.iter_chunks():
        if not data:
            continue

        if skip_mode and len(buffer) > max_buffer_size:
            buffer = b""

        lines = (buffer + data).split(b"\n")

        for i in range(len(lines) - 1):
            line = lines[i]

            if skip_mode:
                if len(line) <= max_buffer_size:
                    skip_mode = False
                    yield line
                else:
                    yield b"data: {}"
            else:
                if len(line) > max_buffer_size:
                    if split_large_content:
                        # Try to split the content dynamically
                        try:
                            line_str = line.decode('utf-8', 'replace')
                            if line_str.startswith('data:'):
                                data_str = line_str[len('data:'):].strip()
                                data_obj = json.loads(data_str)

                                # Split large content field
                                choices = data_obj.get('choices', [])
                                if choices and 'delta' in choices[0]:
                                    content = choices[0]['delta'].get('content', '')
                                    if len(content) > max_buffer_size:
                                        # Emit in chunks
                                        for i in range(0, len(content), max_buffer_size):
                                            chunk = content[i:i + max_buffer_size]
                                            chunked_data = {
                                                **data_obj,
                                                'choices': [{
                                                    **choices[0],
                                                    'delta': {**choices[0]['delta'], 'content': chunk}
                                                }]
                                            }
                                            yield f"data: {json.dumps(chunked_data)}\n".encode('utf-8')
                                        continue
                        except Exception as e:
                            log.warning(f"Failed to split large content, skipping: {e}")

                    # Fallback to skip mode
                    skip_mode = True
                    yield b"data: {}"
                    log.info(f"Skip mode triggered, line size: {len(line)}")
                else:
                    yield line

        buffer = lines[-1]

        if not skip_mode and len(buffer) > max_buffer_size:
            skip_mode = True
            log.info(f"Skip mode triggered, buffer size: {len(buffer)}")
            buffer = b""

    if buffer and not skip_mode:
        yield buffer

with 2 env vars:

CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE = int(os.getenv("CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE", 16384))
CHAT_STREAM_SPLIT_LARGE_CONTENT = os.getenv("CHAT_STREAM_SPLIT_LARGE_CONTENT", "true").lower() == "true"

ShirasawaSama · 2025-11-04T06:40:27Z

@rgaricano Do you think -1 means not skipping overly large single-line content?

Additionally, I feel your code is a bit too complex, making it nearly impossible for others to maintain later on. In fact, the major issues I've encountered aren't limited to the choices.delta—they often appear in the tools, references, or thinking fields as well.

I added the feature to skip excessively long lines because we previously discovered that certain third-party model providers would output massive amounts of unnecessary content within a single line. This would directly cause the Redis cluster connected to OpenWebUI to crash and result in an instantaneous surge of traffic on the OpenWebUI backend servers. We simply need to skip these lines entirely.

This is purely a protective measure.

silentoplayz · 2025-11-05T22:27:31Z

I've attempted to test this PR using the Google: Gemini 2.5 Flash Image (Nano Banana) model provided by OpenRouter and it appears to silently fail to provide back an image to me.

ShirasawaSama · 2025-11-06T03:37:17Z

@silentoplayz Are there any error messages?

Can you see the file size of the output image in base64?

silentoplayz · 2025-11-06T16:49:06Z

@silentoplayz Are there any error messages?

Can you see the file size of the output image in base64?

There aren't any error messages thrown/triggered to be displayed that I am aware of. I've checked both frontend+backend logs and the browser console and there's no error.

As for network requests, here's what that looks like from the start of a new chat with Google: Gemini 2.5 Flash Image Preview (Nano Banana).

I've tested on both Chrome and Firefox web browsers.

Classic298 · 2025-11-06T17:12:05Z

Nano Banana support was added in dev. Is this PR still needed?

ShirasawaSama · 2025-11-07T03:15:13Z

Nano Banana support was added in dev. Is this PR still needed?

Yes,

tjbck · 2025-11-07T03:17:30Z

backend/open_webui/env.py



+CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE = os.environ.get(
+    "CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE", "10485760" # 10MB


Any reasons for setting it to 10MB specifically?

10MB is the typical size for the base64 strings returned by most image generation models. This has been tested with models like Gemini Image 2.5, Qwen Image Edit, Doubao Seedream, and GPT Image.

Of course, I can accommodate values larger or smaller than this. This limit is solely to prevent LLM from returning excessively large data in a single response that could crash the backend service.

(All data below is single-line after base64 encoding) For Gemini Image 2.5, the image size is approximately under 2.5MB; for gpt image, it's around 3.5MB; and for doubao seedream, it's roughly under 7MB. Therefore, I consider 10MB to be an acceptable value.

ShirasawaSama · 2025-11-07T04:09:24Z

@silentoplayz Could you please provide the complete SSE data from OpenRouter? Nin can directly use JavaScript's fetch method; it's possible that the issue lies solely with OpenRouter's API returning data.

silentoplayz · 2025-11-07T06:27:18Z

@silentoplayz Could you please provide the complete SSE data from OpenRouter? Nin can directly use JavaScript's fetch method; it's possible that the issue lies solely with OpenRouter's API returning data.

Sorry, but the OpenRouter API key I used to test with was given to me for testing purposes only and I don't know how to obtain the SSE data from OpenRouter myself. I don't have access to the OpenRouter dashboard or anything like that.

ShirasawaSama · 2025-11-07T07:24:04Z

@silentoplayz I confirmed this is because the image data returned by OpenRouter appears non-standard. It places the image base64 in choices[0].images.image_url instead of choices[0].content, which is required for markdown image syntax display.

I recommend opening a separate PR to handle this non-standard output.

And, this does not affect my current PR handling large single-line SSE text.

tjbck · 2025-11-08T06:33:05Z

@ShirasawaSama this generally looks good and seems like it can be merged as-is, another qq: will this potentially effect any existing behaviours?

ShirasawaSama · 2025-11-08T06:36:48Z

@ShirasawaSama this generally looks good and seems like it can be merged as-is, another qq: will this potentially effect any existing behaviours?

I cannot guarantee with absolute certainty that there will be no other repercussions. However, this code has been running in our production environment for over eight months without any incidents or user-reported bugs.

tjbck · 2025-11-10T02:08:47Z

Merging this but this will be disabled by default for now, Thanks!

ShirasawaSama mentioned this pull request Nov 3, 2025

issue: "Chunk too big" error when using Google Gemini 2.5 Flash with image input #17626

Closed

10 tasks

Classic298 requested a review from silentoplayz November 3, 2025 10:04

ShirasawaSama changed the title ~~feat: handle large stream chunks responses [Test Needed]~~ feat: handle large stream chunks responses to support Nano Banana [Test Needed] Nov 3, 2025

silentoplayz added the testing wanted Testing from the community is needed label Nov 3, 2025

tjbck self-assigned this Nov 5, 2025

ShirasawaSama marked this pull request as draft November 6, 2025 03:39

tjbck reviewed Nov 7, 2025

View reviewed changes

ShirasawaSama added 2 commits November 7, 2025 07:00

feat: handle large stream chunks responses

89c0e15

feat: Allow configuration of not process large single-line data

ce1079d

ShirasawaSama force-pushed the feature/handle-large-stream-chunks branch from bb78ce9 to ce1079d Compare November 7, 2025 07:00

ShirasawaSama marked this pull request as ready for review November 8, 2025 06:54

tjbck merged commit 27df461 into open-webui:dev Nov 10, 2025
0 of 2 checks passed

warshanks mentioned this pull request Nov 24, 2025

feat(gemini): Streaming for image models suurt8ll/open_webui_functions#227

Closed



		CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE = os.environ.get(
		"CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE", "10485760" # 10MB

Uh oh!

Comments

Conversation

ShirasawaSama commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Checklist

Note to first-time contributors: Please open a discussion post in Discussions and describe your changes before submitting a pull request.

Changelog Entry

Description

Added

Changed

Deprecated

Removed

Fixed

Security

Breaking Changes

Additional Information

Screenshots or Videos

Contributor License Agreement

Uh oh!

ShirasawaSama commented Nov 3, 2025

Uh oh!

ShirasawaSama commented Nov 3, 2025

Uh oh!

Classic298 commented Nov 3, 2025

Uh oh!

rgaricano commented Nov 3, 2025

Uh oh!

rgaricano commented Nov 3, 2025

Uh oh!

rgaricano commented Nov 3, 2025

Uh oh!

ShirasawaSama commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgaricano commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgaricano commented Nov 3, 2025

Uh oh!

ShirasawaSama commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

silentoplayz commented Nov 5, 2025

Uh oh!

ShirasawaSama commented Nov 6, 2025

Uh oh!

silentoplayz commented Nov 6, 2025

Uh oh!

Classic298 commented Nov 6, 2025

Uh oh!

ShirasawaSama commented Nov 7, 2025

Uh oh!

tjbck Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

ShirasawaSama Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShirasawaSama Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

ShirasawaSama commented Nov 7, 2025

Uh oh!

silentoplayz commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShirasawaSama commented Nov 7, 2025

Uh oh!

tjbck commented Nov 8, 2025

Uh oh!

ShirasawaSama commented Nov 8, 2025

Uh oh!

tjbck commented Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

ShirasawaSama commented Nov 3, 2025 •

edited

Loading

ShirasawaSama commented Nov 3, 2025 •

edited

Loading

rgaricano commented Nov 3, 2025 •

edited

Loading

ShirasawaSama commented Nov 4, 2025 •

edited

Loading

ShirasawaSama Nov 7, 2025 •

edited

Loading

silentoplayz commented Nov 7, 2025 •

edited

Loading