Skip to content

feat(gemini): Phase 4 -- vision / multimodal (inlineData) #1596

@bug-ops

Description

@bug-ops

Part of #1592

Scope

Wire up image input via Gemini's inlineData parts format.

Files to Modify

  • crates/zeph-llm/src/gemini.rs -- convert MessagePart::Image to inlineData part in message builder

Key Implementation Details

  • Images sent as inlineData parts within contents[].parts[]:
    { "inlineData": { "mimeType": "image/jpeg", "data": "base64..." } }
  • All Gemini 2.0+ models support vision natively (no separate vision model needed)
  • Multiple images per message supported
  • Mixed text + image parts in single message supported
  • Zeph's ImageData { data: Vec<u8>, mime_type: String } maps directly

Acceptance Criteria

  • supports_vision() returns true (already set in Phase 1)
  • MessagePart::Image correctly converted to inlineData format
  • Multiple images in single message work
  • Mixed text + image parts produce correct parts array
  • Unit tests with mock image payloads

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions