Skip to content

Implement ClaudeCliRuntime concrete class #8

@alexey-pelykh

Description

@alexey-pelykh

Context

RemoteClaw's middleware architecture uses CLI subprocesses to interact with AI agents. The abstract base class CLIRuntimeBase (in src/middleware/cli-runtime-base.ts) handles subprocess spawning, NDJSON parsing, watchdog timers, abort signal propagation, and stdin prompt delivery. Concrete runtimes extend it and implement three abstract methods.

This issue implements the Claude CLI runtime — the first and primary concrete runtime.

Architecture

AgentRuntime (interface, src/middleware/types.ts)
  └── CLIRuntimeBase (abstract, src/middleware/cli-runtime-base.ts)
        └── ClaudeCliRuntime ← THIS ISSUE

CLIRuntimeBase requires subclasses to implement:

/** Construct CLI-specific command-line arguments. */
protected abstract buildArgs(params: AgentExecuteParams): string[];

/** Parse a single NDJSON line into an AgentEvent (or null to skip). */
protected abstract extractEvent(line: string): AgentEvent | null;

/** Construct provider-specific environment variables. */
protected abstract buildEnv(params: AgentExecuteParams): Record<string, string>;

Dependencies

  • src/middleware/types.tsAgentRuntime, AgentExecuteParams, AgentEvent, AgentRunResult, etc.
  • src/middleware/cli-runtime-base.tsCLIRuntimeBase abstract class

Both exist on main.

Specification

File: src/middleware/runtimes/claude.ts

Create ClaudeCliRuntime extending CLIRuntimeBase.

Constructor

constructor() {
  super("claude"); // CLI binary name
}

buildArgs(params: AgentExecuteParams): string[]

Build the Claude CLI argument list:

Flag Value When
-p (none) Always — enables pipe/print mode
--output-format stream-json Always — NDJSON streaming output
--verbose (none) Always — enables usage/cost reporting in output
--resume params.sessionId When params.sessionId is provided
--mcp-config <JSON string> When params.mcpServers has entries
(positional) params.prompt When prompt length ≤ 10,000 chars (stdin threshold is handled by CLIRuntimeBase)

MCP config format (Claude JSON format, passed as inline string):

{
  "mcpServers": {
    "<server-name>": {
      "command": "<command>",
      "args": ["<arg1>", "<arg2>"],
      "env": { "<KEY>": "<VALUE>" }
    }
  }
}

The --mcp-config flag accepts JSON strings directly (no temp file needed). Pass the serialized JSON as a CLI argument: --mcp-config '{"mcpServers": {...}}'. This eliminates temp file lifecycle management entirely.

Important: The prompt is passed as a positional argument (last arg) only when it fits within CLI argument limits. CLIRuntimeBase handles the >10KB stdin delivery case — buildArgs() should always include the prompt as the final positional argument; the base class writes to stdin in addition when the threshold is exceeded.

extractEvent(line: string): AgentEvent | null

Parse a single NDJSON line from Claude's stream-json output into an AgentEvent.

Claude stream-json format (headless docs, SDK streaming docs):

Each NDJSON line is one of:

  1. A stream_event envelope wrapping a standard Claude API streaming event
  2. A final line (result) emitted after all streaming events

stream_event envelope structure:

{
  "type": "stream_event",
  "uuid": "<UUID>",
  "session_id": "<session-id>",
  "parent_tool_use_id": "<tool-use-id> | null",
  "event": { /* Standard Claude API RawMessageStreamEvent */ }
}

Event mapping (stream_event lines — where line.type === "stream_event"):

Inner event.type Condition Maps To Notes
message_start Skip Extract session_id from envelope
content_block_start content_block.type === "text" Skip Text content arrives via deltas
content_block_start content_block.type === "tool_use" Buffer Store name, id from content_block; init input accumulator
content_block_delta delta.type === "text_delta" AgentTextEvent { type: "text", text: delta.text }
content_block_delta delta.type === "input_json_delta" Accumulate Append delta.partial_json to buffered tool input
content_block_delta delta.type === "thinking_delta" Skip Extended thinking content, not user-facing
content_block_stop Tool buffered AgentToolUseEvent Emit with JSON.parse(accumulated_input); clear buffer
content_block_stop No tool buffered Skip End of text block
message_delta Skip Extract delta.stop_reason and usage for result metadata
message_stop Skip
ping Skip Keepalive

Event mapping (final result line — where line.type === "result"):

The result line is emitted after all stream_event lines. It contains cost, usage, session ID, and run metadata. Map to AgentDoneEvent with populated AgentRunResult.

Note on tool results: The Claude CLI handles tool execution internally. AgentToolResultEvent mapping requires empirical verification — the CLI may or may not emit observable events for tool results. If it does, they likely appear as part of the interleaved conversation flow (the next response's stream_event lines will have parent_tool_use_id set). Initial implementation may omit AgentToolResultEvent and add it when the exact format is confirmed.

Stateful fields (instance-level, reset per execute() call):

  • currentSessionId: string | undefined — from first stream_event.session_id envelope
  • accumulatedText: string — concatenated text deltas for AgentRunResult.text
  • toolBuffer: { name: string; id: string; input: string } | null — in-progress tool_use block
  • lastUsage: AgentUsage | undefined — from message_delta event's usage field
  • lastStopReason: string | undefined — from message_delta event's delta.stop_reason

Session ID tracking: Every stream_event envelope contains a session_id field. Capture it from the first event. Also available on the final result line. Store as instance state for inclusion in AgentRunResult.

Usage extraction (from message_delta inner event and/or result line):

// message_delta event contains usage in its top-level usage field:
// { output_tokens: number }
// The result line contains cumulative usage:
{
  input_tokens: number;
  output_tokens: number;
  cache_read_input_tokens?: number;
  cache_creation_input_tokens?: number;
}

Map to AgentUsage:

  • inputTokensinput_tokens
  • outputTokensoutput_tokens
  • cacheReadTokenscache_read_input_tokens
  • cacheWriteTokenscache_creation_input_tokens

Result metadata mapping (from result line → AgentRunResult):

  • text ← accumulated from all text_delta events
  • sessionId ← from stream_event.session_id envelope or result.session_id
  • usage ← from result line usage fields
  • totalCostUsdresult.cost_usd
  • apiDurationMsresult.duration_api_ms
  • numTurnsresult.num_turns
  • stopReason ← from message_delta delta.stop_reason or result.subtype

Note on field names: The exact field names on the result line (e.g., cost_usd vs costUsd, duration_api_ms vs apiDurationMs) require empirical verification via claude -p --output-format stream-json output. The names above are best-guess based on SDK type definitions; implementation should adapt to actual output.

Note: CLIRuntimeBase.execute() currently synthesizes a minimal AgentDoneEvent with empty fields after the subprocess exits. extractEvent() should emit its own AgentDoneEvent from the result line before the stream closes, which will be yielded to consumers. The base class's synthetic done event will follow — consumers should handle receiving the richer one first. Alternatively, refactor CLIRuntimeBase to skip its synthetic done event if a subclass already emitted one (preferred if straightforward).

buildEnv(params: AgentExecuteParams): Record<string, string>

Return environment variable overrides for the Claude subprocess. Currently returns an empty record {}.

Auth credentials (ANTHROPIC_API_KEY, CLAUDE_CODE_OAUTH_TOKEN) are passed through params.env by the caller, not hardcoded in the runtime. The runtime should not assume any particular auth mechanism.

File: src/middleware/runtimes/claude.test.ts

Unit tests verifying:

  1. Argument construction (6+ test cases):

    • Basic invocation: -p --output-format stream-json --verbose <prompt>
    • Session resume: adds --resume <session-id>
    • MCP config: serializes JSON, adds --mcp-config <json-string>
    • Short prompt: included as positional arg
    • All flags present: session + MCP + prompt combined
    • No session, no MCP: minimal args
  2. Event extraction (10+ test cases):

    • stream_event with message_start → skip (but session ID captured from envelope)
    • stream_event with content_block_delta / text_deltaAgentTextEvent
    • stream_event with content_block_start / tool_use → buffers tool name+id
    • stream_event with content_block_delta / input_json_delta → accumulates tool input
    • stream_event with content_block_stop (after tool_use) → AgentToolUseEvent with parsed input
    • stream_event with content_block_stop (after text) → skip
    • stream_event with message_delta → extracts stop_reason and usage
    • stream_event with thinking_delta → skip (returns null)
    • result line → AgentDoneEvent with full AgentRunResult (usage, cost, session, etc.)
    • ping → skip (returns null)
    • Unknown event type → skip (returns null)
    • Malformed JSON → handled by CLIRuntimeBase (skip at base level)
  3. Environment construction (2+ test cases):

    • Returns empty record (no hardcoded env vars)
    • Does not inject auth vars (caller responsibility)
  4. MCP config serialization (2+ test cases):

    • JSON string is correctly serialized from McpServerConfig entries
    • --mcp-config arg is omitted when no MCP servers configured

Acceptance Criteria

  • src/middleware/runtimes/claude.ts exists and exports ClaudeCliRuntime
  • Class extends CLIRuntimeBase and implements all three abstract methods
  • buildArgs() produces correct Claude CLI flags for all parameter combinations
  • extractEvent() correctly maps Claude stream_event envelopes and result line to AgentEvent types
  • extractEvent() is stateful: buffers tool_use blocks, accumulates text, tracks session ID
  • Session ID is extracted from stream_event envelope and included in the done result
  • Usage/cost metadata is populated from the result event
  • MCP server config is serialized to JSON string and passed via --mcp-config
  • Unit tests cover argument construction, event extraction, and environment setup
  • pnpm build passes
  • pnpm test passes

Reference

  • The existing upstream CLI runner at src/agents/cli-runner.ts and src/agents/cli-backends.ts shows how Claude is invoked today (using --output-format json, collecting full output). Our implementation uses stream-json for streaming NDJSON events instead, which is the key architectural difference.
  • Claude Code headless mode docs — defines stream-json output format and stream_event envelope
  • Agent SDK streaming output docs — defines SDKPartialAssistantMessage type (stream_event with RawMessageStreamEvent inner events)
  • Known limitation: When maxThinkingTokens is explicitly set (extended thinking mode), StreamEvent messages are not emitted — the SDK returns only the final AssistantMessage and ResultMessage.
  • Empirical verification needed: Field-level names on the result line and exact tool result event format require capture of actual claude -p --output-format stream-json output during implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions