Skip to content

Implement GeminiCliRuntime concrete class #10

@alexey-pelykh

Description

@alexey-pelykh

Context

RemoteClaw's middleware architecture uses CLI subprocesses to interact with AI agents. The abstract base class CLIRuntimeBase (in src/middleware/cli-runtime-base.ts) handles subprocess spawning, NDJSON parsing, watchdog timers, abort signal propagation, and stdin prompt delivery. Concrete runtimes extend it and implement three abstract methods.

This issue implements the Gemini CLI runtime — the second concrete runtime, targeting Google's gemini CLI from google-gemini/gemini-cli.

Architecture

AgentRuntime (interface, src/middleware/types.ts)
  └── CLIRuntimeBase (abstract, src/middleware/cli-runtime-base.ts)
        ├── ClaudeCliRuntime (src/middleware/runtimes/claude.ts) ✅ done
        └── GeminiCliRuntime ← THIS ISSUE

CLIRuntimeBase requires subclasses to implement:

/** Construct CLI-specific command-line arguments. */
protected abstract buildArgs(params: AgentExecuteParams): string[];

/** Parse a single NDJSON line into an AgentEvent (or null to skip). */
protected abstract extractEvent(line: string): AgentEvent | null;

/** Construct provider-specific environment variables. */
protected abstract buildEnv(params: AgentExecuteParams): Record<string, string>;

Additionally, subclasses may override:

  • get supportsStdinPrompt(): boolean — whether the CLI accepts prompts via stdin (default: true)
  • execute() — to wrap the base execution with per-call setup/teardown

Dependencies

  • src/middleware/types.tsAgentRuntime, AgentExecuteParams, AgentEvent, AgentRunResult, etc.
  • src/middleware/cli-runtime-base.tsCLIRuntimeBase abstract class
  • src/middleware/runtimes/claude.ts — reference implementation (same pattern)

All exist on main.

Specification

File: src/middleware/runtimes/gemini.ts

Create GeminiCliRuntime extending CLIRuntimeBase.

Constructor

constructor() {
  super("gemini"); // CLI binary name
}

get supportsStdinPrompt(): boolean

Override to return false. The Gemini CLI does not support stdin prompt delivery — prompts must be passed via the -p flag.

buildArgs(params: AgentExecuteParams): string[]

Build the Gemini CLI argument list:

Flag Value When
--output-format stream-json Always — NDJSON streaming output
-p params.prompt Always — prompt delivery via flag
-r params.sessionId When params.sessionId is provided

Note on prompt delivery: Unlike Claude (positional arg), Gemini requires the -p flag for prompt delivery. Since supportsStdinPrompt is false, CLIRuntimeBase will not attempt stdin delivery regardless of prompt length. The -p flag should always be included.

Note on missing flags: The --verbose flag (used by Claude) is not applicable to Gemini. The stream-json output format already includes all available metadata in the result event.

execute() override — MCP config file lifecycle

The Gemini CLI reads MCP server configuration from a settings file (.gemini/settings.json in the working directory or ~/.gemini/settings.json globally). There is no --mcp-config CLI flag.

When params.mcpServers has entries, the runtime must manage a project-local settings file:

async *execute(params: AgentExecuteParams): AsyncIterable<AgentEvent> {
  this.resetState();

  const mcpConfigManager = params.mcpServers && Object.keys(params.mcpServers).length > 0
    ? new GeminiMcpConfigManager(params.workingDirectory, params.mcpServers)
    : null;

  try {
    await mcpConfigManager?.setup();

    for await (const event of super.execute(params)) {
      if (event.type === "done") {
        this.enrichDoneEvent(event);
      }
      yield event;
    }
  } finally {
    await mcpConfigManager?.teardown();
  }
}

MCP config file management (internal helper class or methods):

  1. Setup:

    • Check if .gemini/settings.json exists in params.workingDirectory
    • If exists: read it, save a copy, merge mcpServers key into it, write back
    • If doesn't exist: create .gemini/ directory if needed, write { "mcpServers": {...} }
    • Track what was created (directory, file, or just modified) for cleanup
  2. Teardown (in finally block — always runs):

    • If original file existed: restore the saved copy
    • If file was created: delete it
    • If .gemini/ directory was created (was empty): rmdir it

Gemini settings.json MCP format:

{
  "mcpServers": {
    "<server-name>": {
      "command": "<command>",
      "args": ["<arg1>", "<arg2>"],
      "env": { "<KEY>": "<VALUE>" }
    }
  }
}

Implementation note: Check during implementation whether the Gemini CLI supports a --settings-dir or --config flag. If it does, a cleaner approach would be to create a temp directory with the settings file and point the flag there, avoiding any file collision concerns. The merge-restore pattern described above is the fallback.

extractEvent(line: string): AgentEvent | null

Parse a single NDJSON line from Gemini's stream-json output into an AgentEvent.

Gemini stream-json format (verified from google-gemini/gemini-cli source: packages/core/src/output/types.ts, stream-json-formatter.ts):

Each NDJSON line is bare JSON (no envelope) with a type discriminator and timestamp base field. There are 6 output event types.

Event mapping:

Gemini type Condition Maps To Notes
init Skip Extract session_id as currentSessionId
message delta === true AND role === "assistant" AgentTextEvent { type: "text", text: content }
message delta === false OR role !== "assistant" Skip Final message echo or user message
tool_use AgentToolUseEvent { type: "tool_use", toolName: tool_name, toolId: tool_id, input: parameters }
tool_result AgentToolResultEvent { type: "tool_result", toolId: tool_id, output: output, isError: status === "error" }
error AgentErrorEvent { type: "error", message: message, code: severity }
result Store for enrichment Extract stats for usage data; do not emit directly

Stateful fields (instance-level, reset per execute() call):

  • currentSessionId: string | undefined — from init event's session_id field
  • accumulatedText: string — concatenated text from message deltas for AgentRunResult.text
  • resultStats: GeminiResultStats | undefined — from result event's stats field

Session ID tracking: The init event contains a session_id field. Capture it into instance state. Include in AgentRunResult via done event enrichment.

Usage extraction (from result event's stats field):

// result.stats structure:
{
  total_tokens: number;
  input_tokens: number;
  output_tokens: number;
  cached: number;          // cache read tokens
  duration_ms: number;     // API duration
  tool_calls: number;      // number of tool invocations (analogous to "turns")
}

Map to AgentUsage:

  • inputTokensstats.input_tokens
  • outputTokensstats.output_tokens
  • cacheReadTokensstats.cached (when > 0)

Additional result metadata:

  • apiDurationMsstats.duration_ms
  • numTurnsstats.tool_calls

Note on tool_result events: The Gemini CLI source defines a tool_result output type, but channel adapter behavior with this event type should be tested during integration. The formatter maps internal events to output types; actual emission of tool_result may vary depending on tool execution patterns.

buildEnv(params: AgentExecuteParams): Record<string, string>

Return environment variable overrides for the Gemini subprocess. Currently returns an empty record {}.

Auth credentials (GEMINI_API_KEY) are passed through params.env by the caller, not hardcoded in the runtime. The runtime should not assume any particular auth mechanism.

Done event enrichment

Same pattern as ClaudeCliRuntime: override execute() to intercept the done event from CLIRuntimeBase and enrich AgentRunResult with accumulated state.

Result metadata mapping (from accumulated state → AgentRunResult):

  • text ← accumulated from all message delta events
  • sessionId ← from init event's session_id
  • usage ← from result.stats (see usage extraction above)
  • apiDurationMs ← from result.stats.duration_ms
  • numTurns ← from result.stats.tool_calls

File: src/middleware/runtimes/gemini.test.ts

Unit tests following the same pattern as claude.test.ts (testable subclass exposing protected methods).

  1. Argument construction (5+ test cases):

    • Basic invocation: --output-format stream-json -p <prompt>
    • Session resume: adds -r <session-id>
    • No session: no -r flag
    • Prompt always via -p flag (not positional)
    • All flags present: session + prompt combined
  2. Event extraction (10+ test cases):

    • init → skip (but session ID captured)
    • message with delta: true, role: "assistant"AgentTextEvent
    • message with delta: false → skip (final message)
    • message with role: "user" → skip
    • tool_useAgentToolUseEvent with toolName, toolId, input
    • tool_result with status: "success"AgentToolResultEvent with isError: false
    • tool_result with status: "error"AgentToolResultEvent with isError: true
    • errorAgentErrorEvent with message and code from severity
    • result → stores stats, returns null
    • Unknown event type → skip (returns null)
    • Text accumulation across multiple message deltas
  3. Environment construction (2+ test cases):

    • Returns empty record (no hardcoded env vars)
    • Does not inject auth vars (caller responsibility)
  4. MCP config file management (4+ test cases):

    • Settings file is created when mcpServers has entries (mock filesystem or check args)
    • Settings file is not created when mcpServers is empty/undefined
    • Correct JSON structure written ({ "mcpServers": {...} })
    • Cleanup restores original file if it existed
  5. supportsStdinPrompt (1 test case):

    • Returns false
  6. Done event enrichment (3+ test cases):

    • Enriches with accumulated text, session ID, usage from stats
    • Maps duration_ms to apiDurationMs, tool_calls to numTurns
    • Handles missing stats gracefully

Acceptance Criteria

  • src/middleware/runtimes/gemini.ts exists and exports GeminiCliRuntime
  • Class extends CLIRuntimeBase and implements all three abstract methods
  • supportsStdinPrompt returns false
  • buildArgs() produces correct Gemini CLI flags (--output-format stream-json -p <prompt>)
  • buildArgs() adds -r <session-id> when sessionId is provided
  • extractEvent() correctly maps all 6 Gemini event types to AgentEvent types
  • extractEvent() is stateful: accumulates text, tracks session ID from init, stores result stats
  • Session ID is extracted from init event and included in the done result
  • Usage metadata is populated from the result.stats field
  • MCP server config is written to .gemini/settings.json in the working directory when mcpServers has entries
  • MCP config file is cleaned up after execution completes (including on error)
  • Existing .gemini/settings.json is preserved (merge-restore pattern)
  • Unit tests cover argument construction, event extraction, environment setup, MCP config lifecycle, and done enrichment
  • pnpm build passes
  • pnpm test passes

Reference

  • src/middleware/runtimes/claude.ts — reference implementation following the same pattern
  • src/middleware/runtimes/claude.test.ts — reference test file with testable subclass pattern
  • google-gemini/gemini-cli source:
    • packages/core/src/output/types.ts — output event type definitions
    • packages/core/src/output/stream-json-formatter.ts — NDJSON formatter mapping internal events to output types
  • Feature shipped in gemini-cli v0.11.0. session_id added in PR feat(agents): detect and recover from consecutive empty model responses (0 output tokens) openclaw/openclaw#14504 (Dec 2025). Token cached/input breakdown added PR fix: session file locks not released after write (#15000) openclaw/openclaw#15021.
  • Internal GeminiEventType has 18 values but the formatter maps/aggregates to the 6 output types — thought, citation, retry, loop_detected etc. are NOT emitted as separate output events.
  • Empirical verification needed: Exact field names on message events (e.g., content vs text) and tool_result emission patterns require capture of actual gemini --output-format stream-json output during implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions