-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Problem (one or two sentences)
The read_file tool description says it "may not handle other binary files properly" but the implementation actually supports 9 image formats (PNG, JPG, JPEG, GIF, BMP, SVG, WEBP, ICO, AVIF) with comprehensive functionality - agents are discouraged from reading images even though it works perfectly.
Context (who is affected and when)
All users with vision-capable models (like Claude) are affected when agents need to analyze images. Agents avoid using read_file for images and resort to workarounds like the open command (which doesn't provide context), reducing the system's visual analysis capabilities. This became more noticeable after switching to native tool-calling as the default.
Reproduction steps
- Environment: Roo Code with any Claude model (which supports images)
- Action: Ask agent to analyze an image file (e.g., "read and describe src/assets/images/roo.png")
- Observe: Agent either avoids reading the image or uses alternative commands instead of
read_file - Root cause: Tool descriptions at
src/core/prompts/tools/read-file.ts:8andsrc/core/prompts/tools/native-tools/read_file.ts:5state "may not handle other binary files properly" without mentioning image support
Expected result
Agents should know they can use read_file to analyze images on vision-capable models.
Actual result
Agents avoid reading images with read_file due to misleading documentation.
Variations tried
Tested with verified working image reading in src/assets/images/roo.png - implementation works perfectly, only documentation is incorrect.
App Version
Current main branch
API Provider
Anthropic
Model Used
Claude Sonnet (and other vision-capable models)
Technical Details for Implementation
Root Cause Analysis
The implementation in ReadFileTool.ts:369-413 fully supports image reading:
- Detects 9 image formats via
imageHelpers.ts:23-35 - Checks
modelInfo.supportsImagesat line 139 - Validates image size and memory limits at lines 376-391
- Processes images and returns data URLs at lines 393-401
- Has comprehensive test coverage in
readFileTool.spec.ts:1478-1821
However, the tool descriptions don't mention this capability.
Required Fix
1. XML Protocol Description (src/core/prompts/tools/read-file.ts:8)
Current text (line 8):
Supports text extraction from PDF and DOCX files, but may not handle other binary files properly.
Should dynamically become (when model supports images):
Supports text extraction from PDF and DOCX files. When the model supports images, automatically processes and returns image files (PNG, JPG, JPEG, GIF, BMP, SVG, WEBP, ICO, AVIF) for visual analysis. May not handle other binary files properly.
Implementation approach:
The getReadFileDescription() function already receives ToolArgs which should include model info. Check if model supports images and conditionally include image format documentation.
2. Native Tool Description (src/core/prompts/tools/native-tools/read_file.ts:3-5)
Current constant:
const READ_FILE_SUPPORTS_NOTE = `Supports text extraction from PDF and DOCX files, but may not handle other binary files properly.`Solution:
Convert to a function that takes model capabilities:
function getReadFileSupportsNote(supportsImages: boolean): string {
if (supportsImages) {
return `Supports text extraction from PDF and DOCX files. Automatically processes and returns image files (PNG, JPG, JPEG, GIF, BMP, SVG, WEBP, ICO, AVIF) for visual analysis. May not handle other binary files properly.`;
}
return `Supports text extraction from PDF and DOCX files, but may not handle other binary files properly.`;
}Then update createReadFileTool() signature to accept model capabilities and use the dynamic note.
3. Key Implementation Details
Image format detection: Already implemented in imageHelpers.ts:89-91
Model capability check: Pattern exists at ReadFileTool.ts:139:
const supportsImages = modelInfo.supportsImages ?? falseImage validation: Already checks model support at imageHelpers.ts:104-109:
if (!supportsImages) {
return {
isValid: false,
reason: "unsupported_model",
notice: "Image file detected but current model does not support images..."
}
}4. Why Dynamic Injection is Critical
Not all models support vision capabilities. From the codebase analysis:
- Claude models:
supportsImages: true - Many other models:
supportsImages: falseorundefined
If we document image support unconditionally, non-vision models would attempt to read images and fail. The documentation must match the runtime capabilities of each model.
Testing Verification
The functionality is already tested extensively:
- Image format detection: lines 1561-1599
- Image reading: lines 1601-1656
- Model capability handling: lines 1657-1706
- Binary file handling: lines 1738-1771
No new tests needed - just verify existing tests still pass after documentation update.
Files to Modify
src/core/prompts/tools/read-file.ts- UpdategetReadFileDescription()to dynamically include image support based on model capabilitiessrc/core/prompts/tools/native-tools/read_file.ts- ConvertREAD_FILE_SUPPORTS_NOTEto function with model capability parameter
Summary
This is a documentation-only fix - the implementation is complete and tested. The agent implementing this should focus on making the tool descriptions model-aware, showing image support only when modelInfo.supportsImages === true.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status