-
Notifications
You must be signed in to change notification settings - Fork 62
Description
Description
Build a tool-agnostic MCP agent that enables GAIA to control desktop environments through external MCP
servers. The Computer Use Desktop Control Agent (CUA) acts as a transparent proxy, dynamically
discovering and forwarding tools from external MCP servers without hardcoding tool definitions.
This capability allows language models running in GAIA to interact with desktop applications, UI
elements, and system controls through a standardized MCP interface.
Motivation
Desktop automation capabilities are essential for enabling AI agents to assist with complex workflows
that span multiple applications. By wrapping external MCP servers that provide desktop control
primitives, GAIA can offer computer use capabilities while remaining agnostic to the underlying
implementation.
Technical Approach
Architecture Pattern
Follow the Docker MCP pattern as reference:
- Inherit from
MCPAgentandAgentbase classes - Implement dynamic tool discovery via
get_mcp_tool_definitions() - Implement tool execution proxying via
execute_mcp_tool() - Use
AgentMCPServerwrapper for MCP server implementation - Create standalone CLI following code agent pattern
Core Design Principles
- Tool-Agnostic Design
- Agent must NOT hardcode tool definitions
- Tools are discovered dynamically from external MCP server at runtime
- Agent remains compatible with any MCP server providing desktop control tools
- Stateless Proxy
- Agent doesn't maintain tool state
- Each request is independent
- External MCP server handles all state management
- Graceful Degradation
- Agent assumes external MCP server is already running
- Returns user-friendly error messages when server unavailable
- No automatic retry or recovery mechanisms
- No Process Management
- Agent does NOT start, stop, or monitor external MCP servers
- Agent ONLY connects to already-running servers
- Clear error messages guide users to start external server
Connection Modes
Support multiple connection types:
- stdio: Connect via stdin/stdout using JSON-RPC 2.0
- HTTP: Connect via HTTP endpoint
Success Criteria
Functional Requirements
- Agent wraps external MCP servers providing desktop control capabilities
- Dynamic tool discovery - no hardcoded tool definitions
- Graceful error handling when external server unavailable
- Lemonade can discover available tools through agent
- Lemonade can execute discovered tools through agent
- Standalone CLI interface
- Response times <5 seconds end-to-end
- Memory footprint <16GB
Quality Requirements
- Test coverage >90% for agent code
- Documentation follows existing GAIA patterns
- Clean integration with GAIA codebase
- Follows established MCP agent patterns
Implementation Requirements
Agent Class
The CUA agent must:
- Support both stdio and HTTP connection modes
- Track server availability with graceful degradation
- Cache discovered tools to minimize external requests
- Provide clear, user-friendly error messages
- Use JSON-RPC 2.0 protocol for external server communication
MCP Server Launcher
Create launcher that:
- Uses
AgentMCPServerwrapper - Accepts configuration (port, host, verbosity, server URL)
- Follows Docker MCP reference pattern
Standalone CLI
Create standalone CLI that:
- Follows code agent CLI pattern
- Provides standard options (port, host, verbose, mcp-server-url)
- Runs as:
python -m gaia.agents.os_automation.cli - Not integrated into main
gaia/cli.py
Testing Strategy
Unit Tests
Mock external server connections and verify:
- Graceful initialization when server unavailable
- Tool discovery from external server
- Tool execution proxying
- Error handling and user-friendly messages
- Connection mode support (stdio and HTTP)
Integration Tests
Test with real external MCP servers:
- Tool discovery from live server
- Tool execution through agent
- Full MCP server startup workflow
- Lemonade integration and tool usage
Reference Files
Study these files to understand the implementation pattern:
gaia/mcp/servers/docker_mcp.py- MCP server launcher referencegaia/agents/docker/agent.py- Docker agent implementationgaia/agents/base/mcp_agent.py- Base class for MCP agentsgaia/mcp/agent_mcp_server.py- Generic MCP server wrappergaia/apps/code/cli.py- Code agent standalone CLI pattern
Deliverables
- CUA agent implementation with dynamic tool discovery
- Graceful error handling for unavailable external servers
- Support for stdio and HTTP connection modes
- Standalone CLI implementation
- MCP server launcher
- Unit tests achieving >90% coverage
- Integration tests with external MCP servers
- Documentation matching GAIA patterns