Skip to content

[BUG]: User message text is silently dropped when message contains both text and image in OpenAI format #8067

@kawakami-o3

Description

@kawakami-o3

Summary

In the OpenAI provider format, when a user message contains both text (MessageContent::Text) and image (MessageContent::Image) content blocks, the text portion is silently discarded and never sent to the LLM.

Bug Location

File: crates/goose/src/providers/formats/openai.rs, format_messages function

Lines 263-267 (approximate):

if !content_array.is_empty() {
    converted["content"] = json!(content_array);  // Only content_array is used
} else if !text_array.is_empty() {
    converted["content"] = json!(text_array.join("\n"));
}

The if-else logic means that when content_array is non-empty (i.e., an image is present), text_array is completely ignored.

How items are routed:

  • MessageContent::Texttext_array (line ~101)
  • MessageContent::Imagecontent_array (line ~219)

When both are present, the final assembly only uses content_array, dropping all text.

Steps to Reproduce

  1. Use any OpenAI-compatible provider (OpenAI, Databricks, etc.)
  2. Send a user message that contains both a text block and an image block (e.g., via ContentBlock::Image in ACP)
  3. The text portion of the message is not sent to the LLM

Expected Behavior

Both text and image content should be sent to the LLM. The text from text_array should be merged into content_array as a {"type": "text", "text": "..."} entry.

Actual Behavior

Only the image is sent. The text is silently dropped. This causes:

  • The LLM responds as if no text instruction was given
  • If the user writes in a non-English language, the response defaults to the system prompt language (since the user's language instruction is lost)

Proposed Fix

if !content_array.is_empty() {
    if !text_array.is_empty() {
        let combined_text = text_array.join("\n");
        content_array.insert(0, json!({"type": "text", "text": combined_text}));
    }
    converted["content"] = json!(content_array);
} else if !text_array.is_empty() {
    converted["content"] = json!(text_array.join("\n"));
}

Test Coverage Gap

The existing test test_format_messages_multiple_text_blocks only covers multiple text blocks without images. There is no test for mixed text + image content in a single user message.

Introduced By

Commit c82a0dd8 (November 24, 2025) — "fix(#5626 #5832): handle multiple content chunks & images better (#5839)"

This commit introduced the text_array / content_array split pattern but missed the case where both are populated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions