Skip to content

[BUG] read_file tool descriptions don't document image support despite full implementation #10440

@nabilfreeman

Description

@nabilfreeman

Problem (one or two sentences)

The read_file tool description says it "may not handle other binary files properly" but the implementation actually supports 9 image formats (PNG, JPG, JPEG, GIF, BMP, SVG, WEBP, ICO, AVIF) with comprehensive functionality - agents are discouraged from reading images even though it works perfectly.

Context (who is affected and when)

All users with vision-capable models (like Claude) are affected when agents need to analyze images. Agents avoid using read_file for images and resort to workarounds like the open command (which doesn't provide context), reducing the system's visual analysis capabilities. This became more noticeable after switching to native tool-calling as the default.

Reproduction steps

  1. Environment: Roo Code with any Claude model (which supports images)
  2. Action: Ask agent to analyze an image file (e.g., "read and describe src/assets/images/roo.png")
  3. Observe: Agent either avoids reading the image or uses alternative commands instead of read_file
  4. Root cause: Tool descriptions at src/core/prompts/tools/read-file.ts:8 and src/core/prompts/tools/native-tools/read_file.ts:5 state "may not handle other binary files properly" without mentioning image support

Expected result

Agents should know they can use read_file to analyze images on vision-capable models.

Actual result

Agents avoid reading images with read_file due to misleading documentation.

Variations tried

Tested with verified working image reading in src/assets/images/roo.png - implementation works perfectly, only documentation is incorrect.

App Version

Current main branch

API Provider

Anthropic

Model Used

Claude Sonnet (and other vision-capable models)


Technical Details for Implementation

Root Cause Analysis

The implementation in ReadFileTool.ts:369-413 fully supports image reading:

  • Detects 9 image formats via imageHelpers.ts:23-35
  • Checks modelInfo.supportsImages at line 139
  • Validates image size and memory limits at lines 376-391
  • Processes images and returns data URLs at lines 393-401
  • Has comprehensive test coverage in readFileTool.spec.ts:1478-1821

However, the tool descriptions don't mention this capability.

Required Fix

1. XML Protocol Description (src/core/prompts/tools/read-file.ts:8)

Current text (line 8):

Supports text extraction from PDF and DOCX files, but may not handle other binary files properly.

Should dynamically become (when model supports images):

Supports text extraction from PDF and DOCX files. When the model supports images, automatically processes and returns image files (PNG, JPG, JPEG, GIF, BMP, SVG, WEBP, ICO, AVIF) for visual analysis. May not handle other binary files properly.

Implementation approach:
The getReadFileDescription() function already receives ToolArgs which should include model info. Check if model supports images and conditionally include image format documentation.

2. Native Tool Description (src/core/prompts/tools/native-tools/read_file.ts:3-5)

Current constant:

const READ_FILE_SUPPORTS_NOTE = `Supports text extraction from PDF and DOCX files, but may not handle other binary files properly.`

Solution:
Convert to a function that takes model capabilities:

function getReadFileSupportsNote(supportsImages: boolean): string {
  if (supportsImages) {
    return `Supports text extraction from PDF and DOCX files. Automatically processes and returns image files (PNG, JPG, JPEG, GIF, BMP, SVG, WEBP, ICO, AVIF) for visual analysis. May not handle other binary files properly.`;
  }
  return `Supports text extraction from PDF and DOCX files, but may not handle other binary files properly.`;
}

Then update createReadFileTool() signature to accept model capabilities and use the dynamic note.

3. Key Implementation Details

Image format detection: Already implemented in imageHelpers.ts:89-91

Model capability check: Pattern exists at ReadFileTool.ts:139:

const supportsImages = modelInfo.supportsImages ?? false

Image validation: Already checks model support at imageHelpers.ts:104-109:

if (!supportsImages) {
  return {
    isValid: false,
    reason: "unsupported_model",
    notice: "Image file detected but current model does not support images..."
  }
}

4. Why Dynamic Injection is Critical

Not all models support vision capabilities. From the codebase analysis:

  • Claude models: supportsImages: true
  • Many other models: supportsImages: false or undefined

If we document image support unconditionally, non-vision models would attempt to read images and fail. The documentation must match the runtime capabilities of each model.

Testing Verification

The functionality is already tested extensively:

  • Image format detection: lines 1561-1599
  • Image reading: lines 1601-1656
  • Model capability handling: lines 1657-1706
  • Binary file handling: lines 1738-1771

No new tests needed - just verify existing tests still pass after documentation update.

Files to Modify

  1. src/core/prompts/tools/read-file.ts - Update getReadFileDescription() to dynamically include image support based on model capabilities
  2. src/core/prompts/tools/native-tools/read_file.ts - Convert READ_FILE_SUPPORTS_NOTE to function with model capability parameter

Summary

This is a documentation-only fix - the implementation is complete and tested. The agent implementing this should focus on making the tool descriptions model-aware, showing image support only when modelInfo.supportsImages === true.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocumentationImprovements or additions to documentationIssue/PR - TriageNew issue. Needs quick review to confirm validity and assign labels.

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions