Improve SD Agent Playbook and Agent Core Reliability#296
Merged
kovtcharov merged 47 commits intomainfrom Feb 4, 2026
Merged
Conversation
Split SD agent playbook into 3 parts for better learning progression: - Part 1: Quick start + build your first agent (25 min) - Part 2: Architecture deep dive (20 min) - Part 3: Advanced patterns and variations (20 min) Improved SD agent reliability: - Default to generating one image unless explicitly requested - Fix empty string handling in create_story_from_last_image - Include story text in final answer for better UX Updated documentation: - Added Lemonade Server architecture explanation - Added Mermaid diagrams with AMD branding - Added 5 video placeholders for production - Removed presentation references from docs.json Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Generate random seeds by default to produce unique images on each run. Users can still specify --seed option for reproducible results. Updated documentation to explain seed behavior and reproducibility. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Use clearer section headers and separators in final answer: - Story text displayed prominently first - Clean separator line between sections - Enhanced prompt and file paths grouped separately Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Provide full example story with proper spacing and formatting so LLM knows exactly how to structure the final answer with story text. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Restore the technical introduction explaining multi-modal architecture, tool composition through mixins, and Lemonade Server integration. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Removed redundant sections: - Deleted "Quick Concepts" section (duplicated "What You'll Build") - Deleted standalone "Choosing SD Models" section (moved to Step 2 accordion) - Removed redundant "Run Your Agent" section (integrated into Step 3) Added technical depth to "What You'll Build": - MRO chain, HTTP endpoints, tool signatures - Complete tool registry explanation - Instance state details Fixed technical inaccuracy: - Corrected model formats: LLMs use GGUF, SD uses safetensors - Added specific format details for each model Added missing prerequisite: - Virtual environment creation step Result: 14% shorter, more accurate, better flow. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Clarify what runs where: - GGUF models (Qwen3-8B, Qwen3-VL-4B) run on iGPU (Radeon) via Vulkan - SDXL-Turbo (safetensors) currently runs on CPU Added tabbed selection for venv activation (Windows/Linux). Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Add --python 3.12 flag to uv venv command for consistency with quickstart guide and to ensure correct Python version. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Add proper 4-space indentation to method definitions in Step 2 so they can be directly pasted under the class definition from Step 1. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Specify model_id="Qwen3-8B-GGUF" in super().__init__() to use the model downloaded by gaia init --profile sd, not the default Qwen3-Coder-30B which isn't included in the SD profile. This fixes "model_load_error" when running the example. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Replace simple system prompt with the actual SDAgent prompts that include: - Research-backed prompt enhancement strategies - Model-specific guidelines (SDXL-Turbo, SD-Turbo, etc.) - Workflow instructions for tool usage Tool schemas (parameters, models) are auto-injected by Agent base class. This fixes issues where LLM uses wrong models or parameters. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Add get_sd_system_prompt() method to SDToolsMixin so agents can compose system prompts from inherited mixins instead of manually importing prompt fragments. Pattern: - Mixins provide both tools AND domain-specific prompt fragments - Agents compose them: return self.get_sd_system_prompt() - Tool schemas auto-injected by Agent base class Benefits: Single responsibility, reusability, composability. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Add template method pattern for automatic prompt composition: - _get_mixin_prompts() - Auto-collects from inherited mixins - _compose_system_prompt() - Composes mixin + agent prompts - Mixins provide get_*_system_prompt() methods Benefits: - Mixins own their domain knowledge (tools + prompts) - Agents automatically inherit behavior - Can modify, extend, or override prompts - Fully backwards compatible SDToolsMixin now provides get_sd_system_prompt() with research-backed prompt engineering guidelines. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Moved SD prompts from src/gaia/agents/sd/ to src/gaia/sd/ to keep mixins self-contained (tools + prompts in same package). Changes: - Created src/gaia/sd/prompts.py (moved from agents/sd/) - SDToolsMixin imports from new location - Added get_vlm_system_prompt() to VLMToolsMixin for consistency - SDAgent uses mixin's get_sd_system_prompt() (no manual import) - Deprecated old prompts.py location Benefits: - Mixins are truly self-contained (src/gaia/sd/ has everything for SD) - Cleaner agent implementations (just call mixin methods) - Better separation of concerns Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Prompts now live with the mixin in src/gaia/sd/prompts.py. Removed deprecated file from src/gaia/agents/sd/. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Added comprehensive documentation of composable system prompts: - Part 1: Explains the simple usage (return self.get_sd_system_prompt()) - Part 2: Deep dive with 5 usage patterns and debugging guide Patterns covered: 1. Use mixin prompts as-is (automatic) 2. Return mixin prompt explicitly 3. Extend with custom instructions 4. Modify mixin prompts 5. Custom composition order Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Updated Part 1 example to show: - Using mixin tools directly (generate_image, analyze_image) - Creating custom tools that wrap mixin functionality - Adding custom prompt instructions Example now creates create_story_about_image() that wraps VLM's create_story_from_image with custom metadata, demonstrating the wrapper pattern for building specialized tools. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Changed _get_system_prompt from abstract to concrete method with empty string default. This allows agents using only mixin prompts to avoid implementing it. Fixed SDAgent duplicate prompt bug: - Was returning self.get_sd_system_prompt() causing duplication - Now returns "" to use automatic mixin composition Backwards compatible: existing agents continue to work. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Enhanced comments and docstrings to better explain: - Why create a custom tool (specialization vs. generic) - What the wrapper pattern adds (fixed style, metadata, extensibility) - How it calls mixin methods via inheritance Fixed incorrect system prompt explanation to accurately describe composition: mixin prompts + custom instructions. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Solved initialization order issues with hybrid approach: - Mixins provide static base guidelines (no instance state needed) - get_*_system_prompt() composes base + instance-specific gracefully - Falls back to base if called before mixin initialization Changes: - SDToolsMixin: get_base_sd_guidelines() (static) + get_sd_system_prompt() (instance) - VLMToolsMixin: get_base_vlm_guidelines() (static) + get_vlm_system_prompt() (instance) - Agent: system_prompt now lazy property (composes on first access) - Composition includes: mixin prompts + custom + tools + response format Benefits: - No initialization order issues - Graceful degradation (works even if called early) - Simple, robust, debuggable Added debugging documentation with 5 methods to observe prompts. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Contributor
There was a problem hiding this comment.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
- Fix VLMToolsMixin tool attribution: create_story_from_image is a custom tool, not provided by VLMToolsMixin - Update SDAgent example in Part 2 to match actual implementation (includes educational style and file saving) - Fix max_steps default references (20 for base Agent, 10 for SDAgent) - Add educational style to Part 1 code examples - Fix composition pattern examples to use mixin clients directly - Update gaia init documentation to mention 8K context configuration - Fix SD agent unit tests for composable system prompts architecture - Fix util/lint.py to fall back to direct tool execution when uvx not available - Remove vlm-tools-expansion.md issue template - Correct tool counts: 5 from mixins + 1 custom = 6 total
7d25d08 to
e409fb3
Compare
Implements $PREV.field and $STEP_N.field placeholder syntax to resolve parameter dependencies between plan steps. This fixes the issue where agents would hallucinate placeholder paths instead of using actual tool results. Core changes: - Add _resolve_plan_parameters() method to Agent base class with recursion depth limit - Integrate parameter resolution into plan execution loop - Update SDAgent and example to use placeholder syntax in system prompts - Clear step_results on error recovery to prevent stale data contamination - Add comprehensive unit tests including edge cases Documentation: - Add "Multi-Step Planning with Dynamic Parameters" section to Part 2 - Simplify Part 1 by removing redundant sections and example output - Add GitHub issue reporting and contact info to troubleshooting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Remove redundant 're' import in _resolve_plan_parameters() - already imported at module level. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
The 8K context was insufficient for multi-step planning with dynamic parameters. Increased to 16K to handle: - SD system prompts with placeholder examples - VLM system prompts - Agent custom prompts - Multi-step plan with tool schemas - Conversation history Changes: - SDAgentConfig.ctx_size: 8192 → 16384 - SD profile min_context_size: 8192 → 16384 - Fix Pylint W0201: Initialize _ctx_verified and _ctx_warning in __init__ - Add warning when context verification fails for LLM models - Update all documentation references from 8K to 16K 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
When testing model inference during init, unload the model first if already loaded, then reload with save_options=True. This ensures recipe_options with ctx_size are properly persisted. Also improved display to show warning emoji when context verification fails: - ✓ Qwen3-8B-GGUF - OK (ctx: 16384) ← Success - ✓ Qwen3-8B-GGUF - OK⚠️ Context unverified! ← Warning 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Merged index.mdx and part-1-building-agent.mdx into a single streamlined playbook. Deleted parts 2 and 3 which were overly verbose and redundant. Changes: - Merged index intro, video, and quick test into main playbook - Kept all hands-on tutorial content from part-1 - Removed architecture deep-dive (part-2) and variations (part-3) - Updated docs.json navigation to show single page - Result: One comprehensive, focused guide instead of fragmented multi-part tutorial The consolidated guide maintains the strong structure of both original files while eliminating unnecessary complexity. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
After image generation and story creation complete, the agent was creating additional stories in a loop. Added explicit instruction to provide final answer immediately after both tools complete. Fix prevents infinite loop detection warning when agent tries to call create_story_from_image multiple times. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
…opping Changed from single-repeat detection to consecutive-count detection. Agent now allows up to 3 consecutive identical tool calls before triggering the loop detector. Changes: - Replaced last_tool_call with tool_call_history (tracks last 5 calls) - Count consecutive identical calls - Trigger after 3 consecutive repeats (was: 1 repeat) This allows legitimate use cases like "create 3 robot designs" while still preventing infinite loops. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Added max_consecutive_repeats parameter to Agent.__init__() to allow customization of how many consecutive identical tool calls are allowed before triggering loop detection. Default: 4 consecutive calls (increased from hardcoded 3) This allows users to adjust sensitivity: - Lower value (2-3): More aggressive loop detection - Higher value (5-10): More tolerant of repetition - For SD agent: Default 4 allows multiple variations while preventing infinite loops 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Use console.print_repeated_tool_warning() exclusively instead of also logging. Console provides better user visibility and Rich formatting. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Fixed outdated comment that still referenced 8K context. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
Fixed all references to: - 8K context → 16K context for multi-step planning - Updated context size table for all models - Updated troubleshooting with correct ctx-size command - Updated playbook reference from "3-part" to single comprehensive guide - Removed broken link to deleted part-2-architecture.mdx 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>
kovtcharov-amd
approved these changes
Feb 4, 2026
375e666 to
52dd702
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR significantly improves the SD Agent's reliability and usability through architectural enhancements, documentation consolidation, and bug fixes.
Core Agent Capabilities & Improvements
1. Dynamic Parameter Substitution for Multi-Step Plans
Problem: Agent planning system couldn't handle parameter dependencies between steps. When creating multi-step plans, the LLM would hallucinate placeholder paths instead of using actual tool results, causing "Image not found" errors.
Solution: Implemented dynamic parameter substitution with
$PREV.fieldand$STEP_N.fieldplaceholder syntax.Implementation:
_resolve_plan_parameters()method to Agent base classstep_resultson error recovery to prevent stale data contaminationExample:
{ "plan": [ {"tool": "generate_image", "tool_args": {"prompt": "robot kitten"}}, {"tool": "create_story_from_image", "tool_args": {"image_path": "$PREV.image_path"}} ] } # System automatically substitutes $PREV.image_path with actual path from step 1Impact: Enables complex multi-step workflows for ALL agents, not just SDAgent.
2. Configurable Loop Detection
Problem: Loop detector was too aggressive - stopped after single repeat, preventing legitimate use cases like "create 3 robot designs".
Solution:
max_consecutive_repeatsparameter (default: 4)Impact: Agents can now handle multi-iteration requests while still preventing infinite loops.
3. Context Size Optimization
Problem: 8K context was insufficient for multi-step planning with dynamic parameters. Workflow hit "context exceeded" errors (9154 tokens needed vs 8192 available).
Solution:
min_context_sizein init systemImpact: SD multi-step workflows now complete without context errors.
Documentation Improvements
Consolidated Playbook Structure
Before: 4 files, ~1,900 lines
After: 1 file, 543 lines (~70% reduction)
Result: Users can build a working multi-modal agent without being overwhelmed by implementation details.
Testing & Quality
New Tests
test_parameter_substitution()- Basic placeholder resolutiontest_parameter_substitution_edge_cases()- Edge cases:Test Results
Files Changed
Core Implementation:
src/gaia/agents/base/agent.py(+291 lines) - Dynamic parameter substitution, configurable loop detectionsrc/gaia/agents/sd/agent.py- Updated to 16K context, placeholder syntax in promptsexamples/sd_agent_example.py- Consistent with SDAgent implementationConfiguration:
src/gaia/installer/init_command.py- 16K context for SD profile, force unload/reload, context verificationDocumentation:
docs/playbooks/sd-agent/index.mdx- Consolidated single-page guidedocs/docs.json- Updated navigationTests:
tests/unit/test_sd_agent.py(+163 lines) - Comprehensive test coverageUtilities:
util/lint.py- uvx fallback when command not availableBackward Compatibility
✅ 100% Backward Compatible
Security & Robustness
Security:
Reliability:
What's Next
Users should test the workflow after this PR merges:
gaia init --profile sd gaia sd "create a robot exploring ancient ruins"Expected: Image generated + story created with no errors or warnings.