Improve SD Agent Playbook and Agent Core Reliability by kovtcharov · Pull Request #296 · amd/gaia

kovtcharov · 2026-02-03T06:07:37Z

Summary

This PR significantly improves the SD Agent's reliability and usability through architectural enhancements, documentation consolidation, and bug fixes.

Core Agent Capabilities & Improvements

1. Dynamic Parameter Substitution for Multi-Step Plans

Problem: Agent planning system couldn't handle parameter dependencies between steps. When creating multi-step plans, the LLM would hallucinate placeholder paths instead of using actual tool results, causing "Image not found" errors.

Solution: Implemented dynamic parameter substitution with $PREV.field and $STEP_N.field placeholder syntax.

Implementation:

Added _resolve_plan_parameters() method to Agent base class
Recursively resolves placeholders in tool arguments from previous step results
Integrated into plan execution loop with state management
Stack overflow protection (MAX_DEPTH=50)
Clears step_results on error recovery to prevent stale data contamination

Example:

{
  "plan": [
    {"tool": "generate_image", "tool_args": {"prompt": "robot kitten"}},
    {"tool": "create_story_from_image", "tool_args": {"image_path": "$PREV.image_path"}}  
  ]
}
# System automatically substitutes $PREV.image_path with actual path from step 1

Impact: Enables complex multi-step workflows for ALL agents, not just SDAgent.

2. Configurable Loop Detection

Problem: Loop detector was too aggressive - stopped after single repeat, preventing legitimate use cases like "create 3 robot designs".

Solution:

Changed from single-repeat detection to consecutive-count tracking
Made threshold configurable via max_consecutive_repeats parameter (default: 4)
Allows users to adjust sensitivity per agent

Impact: Agents can now handle multi-iteration requests while still preventing infinite loops.

3. Context Size Optimization

Problem: 8K context was insufficient for multi-step planning with dynamic parameters. Workflow hit "context exceeded" errors (9154 tokens needed vs 8192 available).

Solution:

Increased SDAgent context from 8K to 16K
Updated SD profile min_context_size in init system
Force unload/reload LLM models during init to ensure context settings persist
Added warning display when context verification fails

Impact: SD multi-step workflows now complete without context errors.

Documentation Improvements

Consolidated Playbook Structure

Before: 4 files, ~1,900 lines

index.mdx (overview)
part-1-building-agent.mdx (tutorial)
part-2-architecture.mdx (deep dive - 628 lines)
part-3-variations.mdx (patterns - 389 lines)

After: 1 file, 543 lines (~70% reduction)

Single comprehensive guide with focused, practical content
Removed redundant architecture explanations (MRO, composable prompts, 5 debugging methods)
Removed advanced variations that added cognitive load
Added troubleshooting with GitHub issue reporting and contact info

Result: Users can build a working multi-modal agent without being overwhelmed by implementation details.

Testing & Quality

New Tests

test_parameter_substitution() - Basic placeholder resolution
test_parameter_substitution_edge_cases() - Edge cases:
- Empty step_results
- Non-dict results
- Recursion depth limit (51 levels)
- Circular references
- Unicode field names
- Special characters
- Primitive type preservation

Test Results

All 5 SD agent unit tests pass
All lint checks pass (Black, isort, Pylint, Flake8)
Comprehensive edge case coverage

Files Changed

Core Implementation:

src/gaia/agents/base/agent.py (+291 lines) - Dynamic parameter substitution, configurable loop detection
src/gaia/agents/sd/agent.py - Updated to 16K context, placeholder syntax in prompts
examples/sd_agent_example.py - Consistent with SDAgent implementation

Configuration:

src/gaia/installer/init_command.py - 16K context for SD profile, force unload/reload, context verification

Documentation:

docs/playbooks/sd-agent/index.mdx - Consolidated single-page guide
Deleted: part-1, part-2, part-3 (-1,017 lines)
docs/docs.json - Updated navigation

Tests:

tests/unit/test_sd_agent.py (+163 lines) - Comprehensive test coverage

Utilities:

util/lint.py - uvx fallback when command not available

Backward Compatibility

✅ 100% Backward Compatible

Plans without placeholders work unchanged
Invalid placeholders degrade gracefully (returned as-is)
No breaking changes to existing APIs
All existing tests still pass
New parameters have sensible defaults

Security & Robustness

Security:

Stack overflow protection (MAX_DEPTH limit)
State isolation (step_results cleared on error recovery)
No code injection risk (string substitution only)

Reliability:

Comprehensive test coverage
Edge case handling (empty results, circular refs, Unicode)
Graceful degradation on errors
Clear user warnings via console

What's Next

Users should test the workflow after this PR merges:

gaia init --profile sd
gaia sd "create a robot exploring ancient ruins"

Expected: Image generated + story created with no errors or warnings.

Split SD agent playbook into 3 parts for better learning progression: - Part 1: Quick start + build your first agent (25 min) - Part 2: Architecture deep dive (20 min) - Part 3: Advanced patterns and variations (20 min) Improved SD agent reliability: - Default to generating one image unless explicitly requested - Fix empty string handling in create_story_from_last_image - Include story text in final answer for better UX Updated documentation: - Added Lemonade Server architecture explanation - Added Mermaid diagrams with AMD branding - Added 5 video placeholders for production - Removed presentation references from docs.json Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Generate random seeds by default to produce unique images on each run. Users can still specify --seed option for reproducible results. Updated documentation to explain seed behavior and reproducibility. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Use clearer section headers and separators in final answer: - Story text displayed prominently first - Clean separator line between sections - Enhanced prompt and file paths grouped separately Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Provide full example story with proper spacing and formatting so LLM knows exactly how to structure the final answer with story text. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Restore the technical introduction explaining multi-modal architecture, tool composition through mixins, and Lemonade Server integration. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Removed redundant sections: - Deleted "Quick Concepts" section (duplicated "What You'll Build") - Deleted standalone "Choosing SD Models" section (moved to Step 2 accordion) - Removed redundant "Run Your Agent" section (integrated into Step 3) Added technical depth to "What You'll Build": - MRO chain, HTTP endpoints, tool signatures - Complete tool registry explanation - Instance state details Fixed technical inaccuracy: - Corrected model formats: LLMs use GGUF, SD uses safetensors - Added specific format details for each model Added missing prerequisite: - Virtual environment creation step Result: 14% shorter, more accurate, better flow. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Clarify what runs where: - GGUF models (Qwen3-8B, Qwen3-VL-4B) run on iGPU (Radeon) via Vulkan - SDXL-Turbo (safetensors) currently runs on CPU Added tabbed selection for venv activation (Windows/Linux). Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Add --python 3.12 flag to uv venv command for consistency with quickstart guide and to ensure correct Python version. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Add proper 4-space indentation to method definitions in Step 2 so they can be directly pasted under the class definition from Step 1. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Specify model_id="Qwen3-8B-GGUF" in super().__init__() to use the model downloaded by gaia init --profile sd, not the default Qwen3-Coder-30B which isn't included in the SD profile. This fixes "model_load_error" when running the example. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Replace simple system prompt with the actual SDAgent prompts that include: - Research-backed prompt enhancement strategies - Model-specific guidelines (SDXL-Turbo, SD-Turbo, etc.) - Workflow instructions for tool usage Tool schemas (parameters, models) are auto-injected by Agent base class. This fixes issues where LLM uses wrong models or parameters. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Add get_sd_system_prompt() method to SDToolsMixin so agents can compose system prompts from inherited mixins instead of manually importing prompt fragments. Pattern: - Mixins provide both tools AND domain-specific prompt fragments - Agents compose them: return self.get_sd_system_prompt() - Tool schemas auto-injected by Agent base class Benefits: Single responsibility, reusability, composability. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Add template method pattern for automatic prompt composition: - _get_mixin_prompts() - Auto-collects from inherited mixins - _compose_system_prompt() - Composes mixin + agent prompts - Mixins provide get_*_system_prompt() methods Benefits: - Mixins own their domain knowledge (tools + prompts) - Agents automatically inherit behavior - Can modify, extend, or override prompts - Fully backwards compatible SDToolsMixin now provides get_sd_system_prompt() with research-backed prompt engineering guidelines. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Moved SD prompts from src/gaia/agents/sd/ to src/gaia/sd/ to keep mixins self-contained (tools + prompts in same package). Changes: - Created src/gaia/sd/prompts.py (moved from agents/sd/) - SDToolsMixin imports from new location - Added get_vlm_system_prompt() to VLMToolsMixin for consistency - SDAgent uses mixin's get_sd_system_prompt() (no manual import) - Deprecated old prompts.py location Benefits: - Mixins are truly self-contained (src/gaia/sd/ has everything for SD) - Cleaner agent implementations (just call mixin methods) - Better separation of concerns Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Prompts now live with the mixin in src/gaia/sd/prompts.py. Removed deprecated file from src/gaia/agents/sd/. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Added comprehensive documentation of composable system prompts: - Part 1: Explains the simple usage (return self.get_sd_system_prompt()) - Part 2: Deep dive with 5 usage patterns and debugging guide Patterns covered: 1. Use mixin prompts as-is (automatic) 2. Return mixin prompt explicitly 3. Extend with custom instructions 4. Modify mixin prompts 5. Custom composition order Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Updated Part 1 example to show: - Using mixin tools directly (generate_image, analyze_image) - Creating custom tools that wrap mixin functionality - Adding custom prompt instructions Example now creates create_story_about_image() that wraps VLM's create_story_from_image with custom metadata, demonstrating the wrapper pattern for building specialized tools. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Changed _get_system_prompt from abstract to concrete method with empty string default. This allows agents using only mixin prompts to avoid implementing it. Fixed SDAgent duplicate prompt bug: - Was returning self.get_sd_system_prompt() causing duplication - Now returns "" to use automatic mixin composition Backwards compatible: existing agents continue to work. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Enhanced comments and docstrings to better explain: - Why create a custom tool (specialization vs. generic) - What the wrapper pattern adds (fixed style, metadata, extensibility) - How it calls mixin methods via inheritance Fixed incorrect system prompt explanation to accurately describe composition: mixin prompts + custom instructions. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Solved initialization order issues with hybrid approach: - Mixins provide static base guidelines (no instance state needed) - get_*_system_prompt() composes base + instance-specific gracefully - Falls back to base if called before mixin initialization Changes: - SDToolsMixin: get_base_sd_guidelines() (static) + get_sd_system_prompt() (instance) - VLMToolsMixin: get_base_vlm_guidelines() (static) + get_vlm_system_prompt() (instance) - Agent: system_prompt now lazy property (composes on first access) - Composition includes: mixin prompts + custom + tools + response format Benefits: - No initialization order issues - Graceful degradation (works even if called early) - Simple, robust, debuggable Added debugging documentation with 5 methods to observe prompts. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

.github/workflows/docs.yml

.github/workflows/pypi.yml

github-advanced-security

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

- Fix VLMToolsMixin tool attribution: create_story_from_image is a custom tool, not provided by VLMToolsMixin - Update SDAgent example in Part 2 to match actual implementation (includes educational style and file saving) - Fix max_steps default references (20 for base Agent, 10 for SDAgent) - Add educational style to Part 1 code examples - Fix composition pattern examples to use mixin clients directly - Update gaia init documentation to mention 8K context configuration - Fix SD agent unit tests for composable system prompts architecture - Fix util/lint.py to fall back to direct tool execution when uvx not available - Remove vlm-tools-expansion.md issue template - Correct tool counts: 5 from mixins + 1 custom = 6 total

Implements $PREV.field and $STEP_N.field placeholder syntax to resolve parameter dependencies between plan steps. This fixes the issue where agents would hallucinate placeholder paths instead of using actual tool results. Core changes: - Add _resolve_plan_parameters() method to Agent base class with recursion depth limit - Integrate parameter resolution into plan execution loop - Update SDAgent and example to use placeholder syntax in system prompts - Clear step_results on error recovery to prevent stale data contamination - Add comprehensive unit tests including edge cases Documentation: - Add "Multi-Step Planning with Dynamic Parameters" section to Part 2 - Simplify Part 1 by removing redundant sections and example output - Add GitHub issue reporting and contact info to troubleshooting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Remove redundant 're' import in _resolve_plan_parameters() - already imported at module level. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

The 8K context was insufficient for multi-step planning with dynamic parameters. Increased to 16K to handle: - SD system prompts with placeholder examples - VLM system prompts - Agent custom prompts - Multi-step plan with tool schemas - Conversation history Changes: - SDAgentConfig.ctx_size: 8192 → 16384 - SD profile min_context_size: 8192 → 16384 - Fix Pylint W0201: Initialize _ctx_verified and _ctx_warning in __init__ - Add warning when context verification fails for LLM models - Update all documentation references from 8K to 16K 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

When testing model inference during init, unload the model first if already loaded, then reload with save_options=True. This ensures recipe_options with ctx_size are properly persisted. Also improved display to show warning emoji when context verification fails: - ✓ Qwen3-8B-GGUF - OK (ctx: 16384) ← Success - ✓ Qwen3-8B-GGUF - OK ⚠️ Context unverified! ← Warning 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Merged index.mdx and part-1-building-agent.mdx into a single streamlined playbook. Deleted parts 2 and 3 which were overly verbose and redundant. Changes: - Merged index intro, video, and quick test into main playbook - Kept all hands-on tutorial content from part-1 - Removed architecture deep-dive (part-2) and variations (part-3) - Updated docs.json navigation to show single page - Result: One comprehensive, focused guide instead of fragmented multi-part tutorial The consolidated guide maintains the strong structure of both original files while eliminating unnecessary complexity. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

After image generation and story creation complete, the agent was creating additional stories in a loop. Added explicit instruction to provide final answer immediately after both tools complete. Fix prevents infinite loop detection warning when agent tries to call create_story_from_image multiple times. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

…opping Changed from single-repeat detection to consecutive-count detection. Agent now allows up to 3 consecutive identical tool calls before triggering the loop detector. Changes: - Replaced last_tool_call with tool_call_history (tracks last 5 calls) - Count consecutive identical calls - Trigger after 3 consecutive repeats (was: 1 repeat) This allows legitimate use cases like "create 3 robot designs" while still preventing infinite loops. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Added max_consecutive_repeats parameter to Agent.__init__() to allow customization of how many consecutive identical tool calls are allowed before triggering loop detection. Default: 4 consecutive calls (increased from hardcoded 3) This allows users to adjust sensitivity: - Lower value (2-3): More aggressive loop detection - Higher value (5-10): More tolerant of repetition - For SD agent: Default 4 allows multiple variations while preventing infinite loops 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Use console.print_repeated_tool_warning() exclusively instead of also logging. Console provides better user visibility and Rich formatting. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Fixed outdated comment that still referenced 8K context. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Fixed all references to: - 8K context → 16K context for multi-step planning - Updated context size table for all models - Updated troubleshooting with correct ctx-size command - Updated playbook reference from "3-part" to single comprehensive guide - Removed broken link to deleted part-2-architecture.mdx 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

kovtcharov and others added 5 commits February 2, 2026 21:36

Show concrete story example in system prompt for better formatting

287eaf6

Provide full example story with proper spacing and formatting so LLM knows exactly how to structure the final answer with story text. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

remove pr desc

4e81d80

kovtcharov added this to the v0.15.3 milestone Feb 3, 2026

kovtcharov self-assigned this Feb 3, 2026

kovtcharov requested a review from kovtcharov-amd as a code owner February 3, 2026 06:07

kovtcharov added the documentation Documentation changes label Feb 3, 2026

github-actions bot added devops DevOps/infrastructure changes agents Agent system changes llm LLM backend changes performance Performance-critical changes labels Feb 3, 2026

kovtcharov and others added 17 commits February 2, 2026 22:13

lint

79b9957

Add introduction to SD agent playbook landing page

cc35d1a

Restore the technical introduction explaining multi-modal architecture, tool composition through mixins, and Lemonade Server integration. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Specify Python version in venv creation

f847d25

Add --python 3.12 flag to uv venv command for consistency with quickstart guide and to ensure correct Python version. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Indent Step 2 code for easy copy/paste under class

599d632

Add proper 4-space indentation to method definitions in Step 2 so they can be directly pasted under the class definition from Step 1. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

Remove deprecated prompts.py (moved to src/gaia/sd/)

0fd9456

Prompts now live with the mixin in src/gaia/sd/prompts.py. Removed deprecated file from src/gaia/agents/sd/. Co-Authored-By: Claude Sonnet 4.5 (1M context) <[email protected]>

github-actions bot added eval Evaluation framework changes tests Test changes electron Electron app changes security Security-sensitive changes labels Feb 4, 2026

github-advanced-security bot found potential problems Feb 4, 2026

View reviewed changes

.github/workflows/docs.yml Fixed Show fixed Hide fixed

.github/workflows/pypi.yml Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Feb 4, 2026

View reviewed changes

kovtcharov force-pushed the kalin/sd-playbook branch from 7d25d08 to e409fb3 Compare February 4, 2026 01:31

kovtcharov and others added 11 commits February 3, 2026 18:12

Merge branch 'main' into kalin/sd-playbook

ec17413

kovtcharov enabled auto-merge February 4, 2026 03:05

kovtcharov and others added 2 commits February 3, 2026 19:07

removed unnecessary files

db462b2

kovtcharov changed the title ~~Improve SD Agent Playbook and Reliability~~ Improve SD Agent Playbook and Agent Core Reliability Feb 4, 2026

kovtcharov-amd approved these changes Feb 4, 2026

View reviewed changes

kovtcharov force-pushed the kalin/sd-playbook branch from 375e666 to 52dd702 Compare February 4, 2026 03:47

lint

4bc1f72

kovtcharov added this pull request to the merge queue Feb 4, 2026

Merged via the queue into main with commit 3e921dd Feb 4, 2026
51 checks passed

kovtcharov deleted the kalin/sd-playbook branch February 4, 2026 04:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve SD Agent Playbook and Agent Core Reliability#296

Improve SD Agent Playbook and Agent Core Reliability#296
kovtcharov merged 47 commits intomainfrom
kalin/sd-playbook

kovtcharov commented Feb 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

github-advanced-security bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kovtcharov commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core Agent Capabilities & Improvements

1. Dynamic Parameter Substitution for Multi-Step Plans

2. Configurable Loop Detection

3. Context Size Optimization

Documentation Improvements

Consolidated Playbook Structure

Testing & Quality

New Tests

Test Results

Files Changed

Backward Compatibility

Security & Robustness

What's Next

Uh oh!

Uh oh!

Uh oh!

github-advanced-security bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kovtcharov commented Feb 3, 2026 •

edited

Loading