Skip to content

Images in Anthropic tool_result content blocks are silently dropped for VLM models #393

@GeorgeTheo99

Description

@GeorgeTheo99

Bug Description

When using oMLX as an Anthropic Messages API proxy for VLM models (e.g., Qwen 3.5-VL), images embedded inside tool_result content blocks are silently dropped. This means VLM-capable models cannot see images returned by tool calls (such as Claude Code's Read tool reading PNG/JPG files).

Steps to Reproduce

  1. Use a VLM model via the /v1/messages endpoint (e.g., qwen3.5-VL-397b-a17b-vlm-6bit-300gb)
  2. Send a conversation containing a tool_result with image content:
{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_123",
      "content": [
        {
          "type": "image",
          "source": {
            "type": "base64",
            "media_type": "image/png",
            "data": "<base64 data>"
          }
        },
        {
          "type": "text",
          "text": "Image content from /tmp/screenshot.png"
        }
      ]
    }
  ]
}
  1. The model receives only the text content; the image is lost

Root Cause

In omlx/api/anthropic_utils.py, the _extract_tool_result_content() function only extracts type: "text" blocks from list content and ignores type: "image" blocks:

elif isinstance(content, list):
    text_parts = []
    for item in content:
        if isinstance(item, dict):
            if item.get("type") == "text":
                text_parts.append(item.get("text", ""))
        # <-- "image" blocks are silently skipped

Additionally, in convert_anthropic_to_internal(), even when preserve_images=True, the tool_result handlers don't check for image blocks inside the tool result's content — they only call _extract_tool_result_content() which strips images.

This affects both the native tool calling path (line ~185) and the non-native fallback path (line ~232).

Impact

This breaks image support for any client that sends images via tool results, including:

  • Claude Code: When the Read tool reads an image file, it returns the image data inside a tool_result block. VLM models never see the image.
  • Any other Anthropic API client that returns images as tool outputs

Suggested Fix

  1. Add a helper to extract images from tool result content:
def _extract_images_from_tool_result_content(content, image_parts):
    """Extract image blocks from tool result content for VLM processing."""
    if isinstance(content, list):
        for item in content:
            if isinstance(item, dict) and item.get("type") == "image":
                _append_anthropic_image_part(image_parts, item)
    elif isinstance(content, dict) and content.get("type") == "image":
        _append_anthropic_image_part(image_parts, content)
  1. In both tool_result handlers within convert_anthropic_to_internal(), after processing the tool result text, extract images when preserve_images=True:
# After processing tool_result text...
if preserve_images:
    _extract_images_from_tool_result_content(tool_result_content, image_parts)

The extracted images accumulate in image_parts and get included in the next user message via _build_message_from_parts(), where extract_images_from_messages() in the VLM engine picks them up.

Environment

  • oMLX version: 0.2.20.dev2
  • Model: qwen3.5-VL-397b-a17b-vlm-6bit-300gb
  • Client: Claude Code via Anthropic Messages API (/v1/messages)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions