-
Notifications
You must be signed in to change notification settings - Fork 0
Implement ClaudeCliRuntime concrete class #8
Description
Context
RemoteClaw's middleware architecture uses CLI subprocesses to interact with AI agents. The abstract base class CLIRuntimeBase (in src/middleware/cli-runtime-base.ts) handles subprocess spawning, NDJSON parsing, watchdog timers, abort signal propagation, and stdin prompt delivery. Concrete runtimes extend it and implement three abstract methods.
This issue implements the Claude CLI runtime — the first and primary concrete runtime.
Architecture
AgentRuntime (interface, src/middleware/types.ts)
└── CLIRuntimeBase (abstract, src/middleware/cli-runtime-base.ts)
└── ClaudeCliRuntime ← THIS ISSUE
CLIRuntimeBase requires subclasses to implement:
/** Construct CLI-specific command-line arguments. */
protected abstract buildArgs(params: AgentExecuteParams): string[];
/** Parse a single NDJSON line into an AgentEvent (or null to skip). */
protected abstract extractEvent(line: string): AgentEvent | null;
/** Construct provider-specific environment variables. */
protected abstract buildEnv(params: AgentExecuteParams): Record<string, string>;Dependencies
src/middleware/types.ts—AgentRuntime,AgentExecuteParams,AgentEvent,AgentRunResult, etc.src/middleware/cli-runtime-base.ts—CLIRuntimeBaseabstract class
Both exist on main.
Specification
File: src/middleware/runtimes/claude.ts
Create ClaudeCliRuntime extending CLIRuntimeBase.
Constructor
constructor() {
super("claude"); // CLI binary name
}buildArgs(params: AgentExecuteParams): string[]
Build the Claude CLI argument list:
| Flag | Value | When |
|---|---|---|
-p |
(none) | Always — enables pipe/print mode |
--output-format |
stream-json |
Always — NDJSON streaming output |
--verbose |
(none) | Always — enables usage/cost reporting in output |
--resume |
params.sessionId |
When params.sessionId is provided |
--mcp-config |
<JSON string> |
When params.mcpServers has entries |
| (positional) | params.prompt |
When prompt length ≤ 10,000 chars (stdin threshold is handled by CLIRuntimeBase) |
MCP config format (Claude JSON format, passed as inline string):
{
"mcpServers": {
"<server-name>": {
"command": "<command>",
"args": ["<arg1>", "<arg2>"],
"env": { "<KEY>": "<VALUE>" }
}
}
}The --mcp-config flag accepts JSON strings directly (no temp file needed). Pass the serialized JSON as a CLI argument: --mcp-config '{"mcpServers": {...}}'. This eliminates temp file lifecycle management entirely.
Important: The prompt is passed as a positional argument (last arg) only when it fits within CLI argument limits. CLIRuntimeBase handles the >10KB stdin delivery case — buildArgs() should always include the prompt as the final positional argument; the base class writes to stdin in addition when the threshold is exceeded.
extractEvent(line: string): AgentEvent | null
Parse a single NDJSON line from Claude's stream-json output into an AgentEvent.
Claude stream-json format (headless docs, SDK streaming docs):
Each NDJSON line is one of:
- A
stream_eventenvelope wrapping a standard Claude API streaming event - A final line (
result) emitted after all streaming events
stream_event envelope structure:
{
"type": "stream_event",
"uuid": "<UUID>",
"session_id": "<session-id>",
"parent_tool_use_id": "<tool-use-id> | null",
"event": { /* Standard Claude API RawMessageStreamEvent */ }
}Event mapping (stream_event lines — where line.type === "stream_event"):
Inner event.type |
Condition | Maps To | Notes |
|---|---|---|---|
message_start |
— | Skip | Extract session_id from envelope |
content_block_start |
content_block.type === "text" |
Skip | Text content arrives via deltas |
content_block_start |
content_block.type === "tool_use" |
Buffer | Store name, id from content_block; init input accumulator |
content_block_delta |
delta.type === "text_delta" |
AgentTextEvent |
{ type: "text", text: delta.text } |
content_block_delta |
delta.type === "input_json_delta" |
Accumulate | Append delta.partial_json to buffered tool input |
content_block_delta |
delta.type === "thinking_delta" |
Skip | Extended thinking content, not user-facing |
content_block_stop |
Tool buffered | AgentToolUseEvent |
Emit with JSON.parse(accumulated_input); clear buffer |
content_block_stop |
No tool buffered | Skip | End of text block |
message_delta |
— | Skip | Extract delta.stop_reason and usage for result metadata |
message_stop |
— | Skip | — |
ping |
— | Skip | Keepalive |
Event mapping (final result line — where line.type === "result"):
The result line is emitted after all stream_event lines. It contains cost, usage, session ID, and run metadata. Map to AgentDoneEvent with populated AgentRunResult.
Note on tool results: The Claude CLI handles tool execution internally.
AgentToolResultEventmapping requires empirical verification — the CLI may or may not emit observable events for tool results. If it does, they likely appear as part of the interleaved conversation flow (the next response'sstream_eventlines will haveparent_tool_use_idset). Initial implementation may omitAgentToolResultEventand add it when the exact format is confirmed.
Stateful fields (instance-level, reset per execute() call):
currentSessionId: string | undefined— from firststream_event.session_idenvelopeaccumulatedText: string— concatenated text deltas forAgentRunResult.texttoolBuffer: { name: string; id: string; input: string } | null— in-progresstool_useblocklastUsage: AgentUsage | undefined— frommessage_deltaevent'susagefieldlastStopReason: string | undefined— frommessage_deltaevent'sdelta.stop_reason
Session ID tracking: Every stream_event envelope contains a session_id field. Capture it from the first event. Also available on the final result line. Store as instance state for inclusion in AgentRunResult.
Usage extraction (from message_delta inner event and/or result line):
// message_delta event contains usage in its top-level usage field:
// { output_tokens: number }
// The result line contains cumulative usage:
{
input_tokens: number;
output_tokens: number;
cache_read_input_tokens?: number;
cache_creation_input_tokens?: number;
}Map to AgentUsage:
inputTokens←input_tokensoutputTokens←output_tokenscacheReadTokens←cache_read_input_tokenscacheWriteTokens←cache_creation_input_tokens
Result metadata mapping (from result line → AgentRunResult):
text← accumulated from alltext_deltaeventssessionId← fromstream_event.session_idenvelope orresult.session_idusage← fromresultline usage fieldstotalCostUsd←result.cost_usdapiDurationMs←result.duration_api_msnumTurns←result.num_turnsstopReason← frommessage_deltadelta.stop_reasonorresult.subtype
Note on field names: The exact field names on the
resultline (e.g.,cost_usdvscostUsd,duration_api_msvsapiDurationMs) require empirical verification viaclaude -p --output-format stream-jsonoutput. The names above are best-guess based on SDK type definitions; implementation should adapt to actual output.
Note: CLIRuntimeBase.execute() currently synthesizes a minimal AgentDoneEvent with empty fields after the subprocess exits. extractEvent() should emit its own AgentDoneEvent from the result line before the stream closes, which will be yielded to consumers. The base class's synthetic done event will follow — consumers should handle receiving the richer one first. Alternatively, refactor CLIRuntimeBase to skip its synthetic done event if a subclass already emitted one (preferred if straightforward).
buildEnv(params: AgentExecuteParams): Record<string, string>
Return environment variable overrides for the Claude subprocess. Currently returns an empty record {}.
Auth credentials (ANTHROPIC_API_KEY, CLAUDE_CODE_OAUTH_TOKEN) are passed through params.env by the caller, not hardcoded in the runtime. The runtime should not assume any particular auth mechanism.
File: src/middleware/runtimes/claude.test.ts
Unit tests verifying:
-
Argument construction (6+ test cases):
- Basic invocation:
-p --output-format stream-json --verbose <prompt> - Session resume: adds
--resume <session-id> - MCP config: serializes JSON, adds
--mcp-config <json-string> - Short prompt: included as positional arg
- All flags present: session + MCP + prompt combined
- No session, no MCP: minimal args
- Basic invocation:
-
Event extraction (10+ test cases):
stream_eventwithmessage_start→ skip (but session ID captured from envelope)stream_eventwithcontent_block_delta/text_delta→AgentTextEventstream_eventwithcontent_block_start/tool_use→ buffers tool name+idstream_eventwithcontent_block_delta/input_json_delta→ accumulates tool inputstream_eventwithcontent_block_stop(after tool_use) →AgentToolUseEventwith parsed inputstream_eventwithcontent_block_stop(after text) → skipstream_eventwithmessage_delta→ extracts stop_reason and usagestream_eventwiththinking_delta→ skip (returnsnull)resultline →AgentDoneEventwith fullAgentRunResult(usage, cost, session, etc.)ping→ skip (returnsnull)- Unknown event type → skip (returns
null) - Malformed JSON → handled by
CLIRuntimeBase(skip at base level)
-
Environment construction (2+ test cases):
- Returns empty record (no hardcoded env vars)
- Does not inject auth vars (caller responsibility)
-
MCP config serialization (2+ test cases):
- JSON string is correctly serialized from
McpServerConfigentries --mcp-configarg is omitted when no MCP servers configured
- JSON string is correctly serialized from
Acceptance Criteria
-
src/middleware/runtimes/claude.tsexists and exportsClaudeCliRuntime - Class extends
CLIRuntimeBaseand implements all three abstract methods -
buildArgs()produces correct Claude CLI flags for all parameter combinations -
extractEvent()correctly maps Claudestream_eventenvelopes andresultline toAgentEventtypes -
extractEvent()is stateful: buffers tool_use blocks, accumulates text, tracks session ID - Session ID is extracted from
stream_eventenvelope and included in the done result - Usage/cost metadata is populated from the
resultevent - MCP server config is serialized to JSON string and passed via
--mcp-config - Unit tests cover argument construction, event extraction, and environment setup
-
pnpm buildpasses -
pnpm testpasses
Reference
- The existing upstream CLI runner at
src/agents/cli-runner.tsandsrc/agents/cli-backends.tsshows how Claude is invoked today (using--output-format json, collecting full output). Our implementation usesstream-jsonfor streaming NDJSON events instead, which is the key architectural difference. - Claude Code headless mode docs — defines
stream-jsonoutput format andstream_eventenvelope - Agent SDK streaming output docs — defines
SDKPartialAssistantMessagetype (stream_eventwithRawMessageStreamEventinner events) - Known limitation: When
maxThinkingTokensis explicitly set (extended thinking mode),StreamEventmessages are not emitted — the SDK returns only the finalAssistantMessageandResultMessage. - Empirical verification needed: Field-level names on the
resultline and exact tool result event format require capture of actualclaude -p --output-format stream-jsonoutput during implementation.