-
Notifications
You must be signed in to change notification settings - Fork 61
Description
Overview
Create a minimal, teachable example demonstrating how to integrate Stable Diffusion image generation into a GAIA agent. This serves as a reference implementation for adding any multi-modal endpoint (image, audio, video) to agents using the GAIA SDK.
Goal: A developer can follow this tutorial and understand how to:
- Call the Lemonade Server SD endpoint from an agent
- Register image generation as a tool
- Handle base64 image responses
- Display/save generated images
Motivation
Currently, there's no simple example showing how to add image generation capabilities to a GAIA agent. Developers need a minimal working example that:
- Is small enough to understand in one sitting (~100 lines)
- Demonstrates the complete flow from prompt to saved image
- Can be extended for more complex use cases
- Follows GAIA SDK patterns and best practices
Deliverables
1. Minimal Agent Implementation
File: examples/sd_agent_minimal.py (~100 lines)
"""
Minimal SD Agent Example
Demonstrates how to integrate Stable Diffusion image generation
into a GAIA agent using the Lemonade Server endpoint.
"""
from pathlib import Path
import base64
import requests
from gaia.agents.base import Agent
from gaia.agents.base.tools import tool
class SimpleSDAgent(Agent):
"""A minimal agent that can generate images from text descriptions."""
def __init__(
self,
base_url: str = "http://localhost:8000",
output_dir: str = "./generated_images",
**kwargs
):
super().__init__(**kwargs)
self.sd_endpoint = f"{base_url}/api/v1/images/generations"
self.output_dir = Path(output_dir)
self.output_dir.mkdir(parents=True, exist_ok=True)
self._register_tools()
def _get_system_prompt(self) -> str:
return """You are an image generation assistant.
When the user asks you to create or generate an image, use the generate_image tool.
Enhance simple prompts with artistic details for better results.
Example enhancements:
- "a cat" → "a fluffy orange cat, soft lighting, detailed fur, photorealistic"
- "sunset" → "vibrant sunset over ocean, golden hour, dramatic clouds, 4k"
"""
def _register_tools(self):
@tool
def generate_image(
prompt: str,
model: str = "SD-Turbo",
size: str = "512x512",
steps: int = 4
) -> dict:
"""
Generate an image from a text prompt using Stable Diffusion.
Args:
prompt: Text description of the image to generate
model: SD model to use (SD-Turbo or SDXL-Turbo)
size: Image dimensions (512x512, 768x768, 1024x1024)
steps: Number of inference steps (4 for Turbo models)
Returns:
Dict with image_path and generation details
"""
# Call SD endpoint
response = requests.post(
self.sd_endpoint,
json={
"prompt": prompt,
"model": model,
"size": size,
"n": 1,
"response_format": "b64_json"
},
timeout=60
)
response.raise_for_status()
# Decode and save image
data = response.json()
image_b64 = data["data"][0]["b64_json"]
image_bytes = base64.b64decode(image_b64)
# Generate filename from prompt
safe_name = "".join(c if c.isalnum() else "_" for c in prompt[:30])
image_path = self.output_dir / f"{safe_name}.png"
image_path.write_bytes(image_bytes)
return {
"image_path": str(image_path),
"prompt": prompt,
"model": model,
"size": size
}
self.register_tool(generate_image)
# CLI usage
if __name__ == "__main__":
agent = SimpleSDAgent()
agent.run("Generate an image of a dragon perched on a mountain cliff at sunset")2. User Guide
File: docs/guides/sd-tutorial.mdx
Structure:
- Introduction - What you'll build and learn
- Prerequisites - Lemonade Server running with SD model
- Quick Start - Run the example in 2 minutes
- Code Walkthrough - Line-by-line explanation
- Customization - How to modify for your needs
- Next Steps - Link to full SD Agent plan
Key sections:
Prerequisites
# Ensure Lemonade Server is running with SD model
lemonade-server serve --model SD-Turbo
# Verify SD endpoint is available
curl http://localhost:8000/api/v1/images/generations \
-H "Content-Type: application/json" \
-d '{"prompt": "test", "model": "SD-Turbo"}'Quick Start
# Run the minimal example
python examples/sd_agent_minimal.py
# Or use interactively
python -c "
from examples.sd_agent_minimal import SimpleSDAgent
agent = SimpleSDAgent()
agent.run('Create an image of a cyberpunk city')
"Code Walkthrough
| Component | Purpose |
|---|---|
sd_endpoint |
URL to Lemonade Server's SD API |
@tool decorator |
Registers function as agent tool |
response_format: b64_json |
Request base64-encoded image |
base64.b64decode() |
Convert response to image bytes |
register_tool() |
Make tool available to agent |
3. Playbook
File: docs/playbooks/sd-agent/part-1-minimal.mdx
Step-by-step tutorial:
Part 1: Your First Image Agent (30 min)
-
Set Up Environment
- Install GAIA SDK
- Start Lemonade Server with SD model
- Verify endpoint connectivity
-
Create the Agent Class
- Inherit from
Agentbase class - Configure SD endpoint URL
- Set up output directory
- Inherit from
-
Register the Image Generation Tool
- Use
@tooldecorator - Define parameters with types and defaults
- Document with docstring (used by LLM)
- Use
-
Handle the SD API Response
- Parse JSON response
- Decode base64 image data
- Save to file system
-
Write the System Prompt
- Instruct agent when to use the tool
- Add prompt enhancement guidance
-
Test Your Agent
- Run with simple prompts
- Verify images are generated
- Check output directory
Exercises
- Add JPEG support - Modify to save as JPEG with quality parameter
- Add prompt enhancement - Create a second tool that enhances prompts before generation
- Add image display - Use
term-imageto show results in terminal
4. Integration Test
File: tests/examples/test_sd_agent_minimal.py
"""Test the minimal SD agent example."""
import pytest
from pathlib import Path
@pytest.fixture
def sd_agent():
from examples.sd_agent_minimal import SimpleSDAgent
return SimpleSDAgent(output_dir="./test_output")
@pytest.mark.integration
@pytest.mark.requires_lemonade
def test_generate_image(sd_agent, tmp_path):
"""Test basic image generation."""
sd_agent.output_dir = tmp_path
result = sd_agent.tools["generate_image"](
prompt="a simple red circle",
model="SD-Turbo",
size="512x512"
)
assert Path(result["image_path"]).exists()
assert result["model"] == "SD-Turbo"
def test_agent_has_tool(sd_agent):
"""Test that generate_image tool is registered."""
assert "generate_image" in sd_agent.toolsAcceptance Criteria
-
examples/sd_agent_minimal.pyruns standalone withpython examples/sd_agent_minimal.py - Example is under 100 lines of code (excluding comments)
- Works with default Lemonade Server configuration
- User guide explains every line of code
- Playbook can be completed in 30 minutes by a Python developer
- Integration test passes when Lemonade Server is running
- Code follows GAIA SDK patterns (inherits Agent, uses @tool decorator)
Related
- SD Agent Full Plan - Complete SD optimization agent
- Agent System Docs - Base Agent class reference
- Tool Decorator Docs - @tool decorator usage
Labels
documentation, example, tutorial, image-generation, good-first-issue
Estimate
- Agent implementation: 2 hours
- User guide: 3 hours
- Playbook: 4 hours
- Tests: 1 hour
Total: ~10 hours