Skip to content

Computer Use Desktop Control Agent (CUA) - MCP Client Integration #224

@itomek

Description

@itomek

Description

Build a tool-agnostic MCP agent that enables GAIA to control desktop environments through external MCP
servers. The Computer Use Desktop Control Agent (CUA) acts as a transparent proxy, dynamically
discovering and forwarding tools from external MCP servers without hardcoding tool definitions.

This capability allows language models running in GAIA to interact with desktop applications, UI
elements, and system controls through a standardized MCP interface.

Motivation

Desktop automation capabilities are essential for enabling AI agents to assist with complex workflows
that span multiple applications. By wrapping external MCP servers that provide desktop control
primitives, GAIA can offer computer use capabilities while remaining agnostic to the underlying
implementation.

Technical Approach

Architecture Pattern

Follow the Docker MCP pattern as reference:

  • Inherit from MCPAgent and Agent base classes
  • Implement dynamic tool discovery via get_mcp_tool_definitions()
  • Implement tool execution proxying via execute_mcp_tool()
  • Use AgentMCPServer wrapper for MCP server implementation
  • Create standalone CLI following code agent pattern

Core Design Principles

  1. Tool-Agnostic Design
  • Agent must NOT hardcode tool definitions
  • Tools are discovered dynamically from external MCP server at runtime
  • Agent remains compatible with any MCP server providing desktop control tools
  1. Stateless Proxy
  • Agent doesn't maintain tool state
  • Each request is independent
  • External MCP server handles all state management
  1. Graceful Degradation
  • Agent assumes external MCP server is already running
  • Returns user-friendly error messages when server unavailable
  • No automatic retry or recovery mechanisms
  1. No Process Management
  • Agent does NOT start, stop, or monitor external MCP servers
  • Agent ONLY connects to already-running servers
  • Clear error messages guide users to start external server

Connection Modes

Support multiple connection types:

  • stdio: Connect via stdin/stdout using JSON-RPC 2.0
  • HTTP: Connect via HTTP endpoint

Success Criteria

Functional Requirements

  • Agent wraps external MCP servers providing desktop control capabilities
  • Dynamic tool discovery - no hardcoded tool definitions
  • Graceful error handling when external server unavailable
  • Lemonade can discover available tools through agent
  • Lemonade can execute discovered tools through agent
  • Standalone CLI interface
  • Response times <5 seconds end-to-end
  • Memory footprint <16GB

Quality Requirements

  • Test coverage >90% for agent code
  • Documentation follows existing GAIA patterns
  • Clean integration with GAIA codebase
  • Follows established MCP agent patterns

Implementation Requirements

Agent Class

The CUA agent must:

  • Support both stdio and HTTP connection modes
  • Track server availability with graceful degradation
  • Cache discovered tools to minimize external requests
  • Provide clear, user-friendly error messages
  • Use JSON-RPC 2.0 protocol for external server communication

MCP Server Launcher

Create launcher that:

  • Uses AgentMCPServer wrapper
  • Accepts configuration (port, host, verbosity, server URL)
  • Follows Docker MCP reference pattern

Standalone CLI

Create standalone CLI that:

  • Follows code agent CLI pattern
  • Provides standard options (port, host, verbose, mcp-server-url)
  • Runs as: python -m gaia.agents.os_automation.cli
  • Not integrated into main gaia/cli.py

Testing Strategy

Unit Tests

Mock external server connections and verify:

  • Graceful initialization when server unavailable
  • Tool discovery from external server
  • Tool execution proxying
  • Error handling and user-friendly messages
  • Connection mode support (stdio and HTTP)

Integration Tests

Test with real external MCP servers:

  • Tool discovery from live server
  • Tool execution through agent
  • Full MCP server startup workflow
  • Lemonade integration and tool usage

Reference Files

Study these files to understand the implementation pattern:

  1. gaia/mcp/servers/docker_mcp.py - MCP server launcher reference
  2. gaia/agents/docker/agent.py - Docker agent implementation
  3. gaia/agents/base/mcp_agent.py - Base class for MCP agents
  4. gaia/mcp/agent_mcp_server.py - Generic MCP server wrapper
  5. gaia/apps/code/cli.py - Code agent standalone CLI pattern

Deliverables

  • CUA agent implementation with dynamic tool discovery
  • Graceful error handling for unavailable external servers
  • Support for stdio and HTTP connection modes
  • Standalone CLI implementation
  • MCP server launcher
  • Unit tests achieving >90% coverage
  • Integration tests with external MCP servers
  • Documentation matching GAIA patterns

Sub-issues

Metadata

Metadata

Labels

agentsAgent system changescuaComputer Use Agent

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions