SD Agent Tutorial: Integrating Multi-Modal Endpoints

## Overview

Create a minimal, teachable example demonstrating how to integrate Stable Diffusion image generation into a GAIA agent. This serves as a reference implementation for adding any multi-modal endpoint (image, audio, video) to agents using the GAIA SDK.

**Goal:** A developer can follow this tutorial and understand how to:
1. Call the Lemonade Server SD endpoint from an agent
2. Register image generation as a tool
3. Handle base64 image responses
4. Display/save generated images

## Motivation

Currently, there's no simple example showing how to add image generation capabilities to a GAIA agent. Developers need a minimal working example that:
- Is small enough to understand in one sitting (~100 lines)
- Demonstrates the complete flow from prompt to saved image
- Can be extended for more complex use cases
- Follows GAIA SDK patterns and best practices

## Deliverables

### 1. Minimal Agent Implementation

**File:** `examples/sd_agent_minimal.py` (~100 lines)

```python
"""
Minimal SD Agent Example

Demonstrates how to integrate Stable Diffusion image generation
into a GAIA agent using the Lemonade Server endpoint.
"""

from pathlib import Path
import base64
import requests
from gaia.agents.base import Agent
from gaia.agents.base.tools import tool


class SimpleSDAgent(Agent):
    """A minimal agent that can generate images from text descriptions."""

    def __init__(
        self,
        base_url: str = "http://localhost:8000",
        output_dir: str = "./generated_images",
        **kwargs
    ):
        super().__init__(**kwargs)
        self.sd_endpoint = f"{base_url}/api/v1/images/generations"
        self.output_dir = Path(output_dir)
        self.output_dir.mkdir(parents=True, exist_ok=True)
        self._register_tools()

    def _get_system_prompt(self) -> str:
        return """You are an image generation assistant.

When the user asks you to create or generate an image, use the generate_image tool.
Enhance simple prompts with artistic details for better results.

Example enhancements:
- "a cat" → "a fluffy orange cat, soft lighting, detailed fur, photorealistic"
- "sunset" → "vibrant sunset over ocean, golden hour, dramatic clouds, 4k"
"""

    def _register_tools(self):
        @tool
        def generate_image(
            prompt: str,
            model: str = "SD-Turbo",
            size: str = "512x512",
            steps: int = 4
        ) -> dict:
            """
            Generate an image from a text prompt using Stable Diffusion.

            Args:
                prompt: Text description of the image to generate
                model: SD model to use (SD-Turbo or SDXL-Turbo)
                size: Image dimensions (512x512, 768x768, 1024x1024)
                steps: Number of inference steps (4 for Turbo models)

            Returns:
                Dict with image_path and generation details
            """
            # Call SD endpoint
            response = requests.post(
                self.sd_endpoint,
                json={
                    "prompt": prompt,
                    "model": model,
                    "size": size,
                    "n": 1,
                    "response_format": "b64_json"
                },
                timeout=60
            )
            response.raise_for_status()

            # Decode and save image
            data = response.json()
            image_b64 = data["data"][0]["b64_json"]
            image_bytes = base64.b64decode(image_b64)

            # Generate filename from prompt
            safe_name = "".join(c if c.isalnum() else "_" for c in prompt[:30])
            image_path = self.output_dir / f"{safe_name}.png"
            image_path.write_bytes(image_bytes)

            return {
                "image_path": str(image_path),
                "prompt": prompt,
                "model": model,
                "size": size
            }

        self.register_tool(generate_image)


# CLI usage
if __name__ == "__main__":
    agent = SimpleSDAgent()
    agent.run("Generate an image of a dragon perched on a mountain cliff at sunset")
```

### 2. User Guide

**File:** `docs/guides/sd-tutorial.mdx`

Structure:
1. **Introduction** - What you'll build and learn
2. **Prerequisites** - Lemonade Server running with SD model
3. **Quick Start** - Run the example in 2 minutes
4. **Code Walkthrough** - Line-by-line explanation
5. **Customization** - How to modify for your needs
6. **Next Steps** - Link to full SD Agent plan

Key sections:

#### Prerequisites
```bash
# Ensure Lemonade Server is running with SD model
lemonade-server serve --model SD-Turbo

# Verify SD endpoint is available
curl http://localhost:8000/api/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{"prompt": "test", "model": "SD-Turbo"}'
```

#### Quick Start
```bash
# Run the minimal example
python examples/sd_agent_minimal.py

# Or use interactively
python -c "
from examples.sd_agent_minimal import SimpleSDAgent
agent = SimpleSDAgent()
agent.run('Create an image of a cyberpunk city')
"
```

#### Code Walkthrough

| Component | Purpose |
|-----------|---------|
| `sd_endpoint` | URL to Lemonade Server's SD API |
| `@tool` decorator | Registers function as agent tool |
| `response_format: b64_json` | Request base64-encoded image |
| `base64.b64decode()` | Convert response to image bytes |
| `register_tool()` | Make tool available to agent |

### 3. Playbook

**File:** `docs/playbooks/sd-agent/part-1-minimal.mdx`

Step-by-step tutorial:

#### Part 1: Your First Image Agent (30 min)

1. **Set Up Environment**
   - Install GAIA SDK
   - Start Lemonade Server with SD model
   - Verify endpoint connectivity

2. **Create the Agent Class**
   - Inherit from `Agent` base class
   - Configure SD endpoint URL
   - Set up output directory

3. **Register the Image Generation Tool**
   - Use `@tool` decorator
   - Define parameters with types and defaults
   - Document with docstring (used by LLM)

4. **Handle the SD API Response**
   - Parse JSON response
   - Decode base64 image data
   - Save to file system

5. **Write the System Prompt**
   - Instruct agent when to use the tool
   - Add prompt enhancement guidance

6. **Test Your Agent**
   - Run with simple prompts
   - Verify images are generated
   - Check output directory

#### Exercises

1. **Add JPEG support** - Modify to save as JPEG with quality parameter
2. **Add prompt enhancement** - Create a second tool that enhances prompts before generation
3. **Add image display** - Use `term-image` to show results in terminal

### 4. Integration Test

**File:** `tests/examples/test_sd_agent_minimal.py`

```python
"""Test the minimal SD agent example."""

import pytest
from pathlib import Path


@pytest.fixture
def sd_agent():
    from examples.sd_agent_minimal import SimpleSDAgent
    return SimpleSDAgent(output_dir="./test_output")


@pytest.mark.integration
@pytest.mark.requires_lemonade
def test_generate_image(sd_agent, tmp_path):
    """Test basic image generation."""
    sd_agent.output_dir = tmp_path

    result = sd_agent.tools["generate_image"](
        prompt="a simple red circle",
        model="SD-Turbo",
        size="512x512"
    )

    assert Path(result["image_path"]).exists()
    assert result["model"] == "SD-Turbo"


def test_agent_has_tool(sd_agent):
    """Test that generate_image tool is registered."""
    assert "generate_image" in sd_agent.tools
```

## Acceptance Criteria

- [ ] `examples/sd_agent_minimal.py` runs standalone with `python examples/sd_agent_minimal.py`
- [ ] Example is under 100 lines of code (excluding comments)
- [ ] Works with default Lemonade Server configuration
- [ ] User guide explains every line of code
- [ ] Playbook can be completed in 30 minutes by a Python developer
- [ ] Integration test passes when Lemonade Server is running
- [ ] Code follows GAIA SDK patterns (inherits Agent, uses @tool decorator)

## Related

- [SD Agent Full Plan](/plans/image-agent) - Complete SD optimization agent
- [Agent System Docs](/sdk/core/agent-system) - Base Agent class reference
- [Tool Decorator Docs](/sdk/core/tools) - @tool decorator usage

## Labels

`documentation`, `example`, `tutorial`, `image-generation`, `good-first-issue`

## Estimate

- Agent implementation: 2 hours
- User guide: 3 hours
- Playbook: 4 hours
- Tests: 1 hour

**Total: ~10 hours**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SD Agent Tutorial: Integrating Multi-Modal Endpoints #280

Overview

Motivation

Deliverables

1. Minimal Agent Implementation

2. User Guide

Prerequisites

Quick Start

Code Walkthrough

3. Playbook

Part 1: Your First Image Agent (30 min)

Exercises

4. Integration Test

Acceptance Criteria

Related

Labels

Estimate

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Purpose
`sd_endpoint`	URL to Lemonade Server's SD API
`@tool` decorator	Registers function as agent tool
`response_format: b64_json`	Request base64-encoded image
`base64.b64decode()`	Convert response to image bytes
`register_tool()`	Make tool available to agent

SD Agent Tutorial: Integrating Multi-Modal Endpoints #280

Description

Overview

Motivation

Deliverables

1. Minimal Agent Implementation

2. User Guide

Prerequisites

Quick Start

Code Walkthrough

3. Playbook

Part 1: Your First Image Agent (30 min)

Exercises

4. Integration Test

Acceptance Criteria

Related

Labels

Estimate

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions