Skip to content

Implement AgentRuntime interface and type contracts #3

@alexey-pelykh

Description

@alexey-pelykh

Context

RemoteClaw is middleware that connects CLI-based AI agents (Claude, Gemini, Codex, OpenCode) to messaging channels (WhatsApp, Telegram, Slack, Discord, etc.). It replaces the upstream OpenClaw execution engine (an in-process Pi-based orchestrator) with a subprocess model where each agent CLI runs as a child process.

This issue defines the foundational type contracts that every other middleware component depends on.

Problem

The middleware layer (src/middleware/) does not yet exist. All downstream work (CLI runtime implementations, event extractors, error classifier, session map, channel bridge, MCP server, delivery adapter) depends on the type contracts defined here.

The key architectural insight is that in the subprocess model, the delivery pipeline has two sources of truth:

  1. CLI subprocess output — text, session info, usage, timing, abort status (captured as AgentRunResult)
  2. Gateway-side MCP server — what messages the agent sent, how many cron jobs were added (captured as McpSideEffects)

These combine into AgentDeliveryResult, which is what the delivery pipeline consumers receive.

Architecture

CLI subprocess (claude/gemini/codex/opencode)
    |
    |  NDJSON stream → AgentEvent[] → AgentRunResult
    |
    v
ChannelBridge.handle()
    |                   \
    |  AgentRunResult    \  McpSideEffects (from MCP server)
    |                     \
    v                      v
AgentDeliveryResult = AgentRunResult + McpSideEffects + derived payloads
    |
    v
Delivery pipeline

Tasks

Create src/middleware/types.ts with the following type contracts:

1. AgentRuntime interface

The core interface that all CLI runtime implementations (Claude, Gemini, Codex, OpenCode) will implement:

export interface AgentRuntime {
    execute(params: AgentExecuteParams): AsyncIterable<AgentEvent>;
}

2. AgentExecuteParams

Input to execute() — the prompt, session context, MCP config, abort signal, working directory, environment variables.

3. AgentEvent discriminated union

Events emitted during CLI subprocess execution:

Event Type Purpose
text Text content from the agent
tool_use Agent is invoking a tool
tool_result Tool execution result
error Error during execution
done Execution complete, carries AgentRunResult

4. AgentRunResult

Final CLI output summary:

export type AgentRunResult = {
    text: string;
    sessionId: string | undefined;
    durationMs: number;
    usage: AgentUsage | undefined;
    aborted: boolean;
    totalCostUsd?: number | undefined;
    apiDurationMs?: number | undefined;
    numTurns?: number | undefined;
    stopReason?: string | undefined;
    errorSubtype?: string | undefined;
    permissionDenials?: PermissionDenial[] | undefined;
};

5. McpSideEffects and McpMessageTarget

Gateway-side MCP server tracking:

export type McpMessageTarget = {
    tool: string;       // e.g., "message", "sessions_send", "telegram_send"
    provider: string;   // e.g., "telegram", "discord", "slack"
    accountId?: string;
    to?: string;        // target identifier (chat ID, channel ID, etc.)
};

export type McpSideEffects = {
    sentTexts: string[];
    sentMediaUrls: string[];
    sentTargets: McpMessageTarget[];
    cronAdds: number;
};

6. AgentDeliveryResult

The three-type delivery contract (AgentRunResult + McpSideEffects = AgentDeliveryResult):

export type AgentDeliveryResult = {
    payloads: ReplyPayload[];
    run: AgentRunResult;
    mcp: McpSideEffects;
    error?: string | undefined;
};

Why composite (.run + .mcp) instead of flat?

  • Clean separation: CLI output vs gateway-side tracking
  • Self-documenting: result.mcp.sentTexts is clear about origin
  • No field collisions
  • ChannelBridge constructs it naturally: { run: agentResult, mcp: mcpServer.drain() }

7. Supporting types

  • AgentUsage — token counts (input, output, cache read/write)
  • PermissionDenial — permission denial tracking
  • ChannelMessage — incoming message from a channel
  • BridgeCallbacks — streaming callbacks (onPartialReply, onBlockReply, onToolResult)

Acceptance Criteria

  • src/middleware/types.ts exists and exports all types listed above
  • AgentRuntime interface with execute(params) -> AsyncIterable<AgentEvent>
  • AgentEvent is a discriminated union covering text, tool_use, tool_result, error, done
  • Three-type delivery contract: AgentRunResult + McpSideEffects = AgentDeliveryResult
  • pnpm build passes with the new types
  • No runtime code — this is types only (implementations come in subsequent issues)

Design Decisions

  • Composite over flat: AgentDeliveryResult wraps run and mcp as nested objects rather than flattening all fields. This preserves origin clarity and avoids naming collisions.
  • payloads at top level: The delivery pipeline universally operates on ReplyPayload[]. For CLI backends, payloads is text ? [{ text }] : []. The conversion happens once in ChannelBridge.
  • McpSideEffects has zero-value defaults: All fields default to [] or 0 — no optionality needed.
  • didSendViaMcpTool is derived: sentTargets.length > 0 (no separate boolean field).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions