-
Notifications
You must be signed in to change notification settings - Fork 0
Implement GeminiCliRuntime concrete class #10
Description
Context
RemoteClaw's middleware architecture uses CLI subprocesses to interact with AI agents. The abstract base class CLIRuntimeBase (in src/middleware/cli-runtime-base.ts) handles subprocess spawning, NDJSON parsing, watchdog timers, abort signal propagation, and stdin prompt delivery. Concrete runtimes extend it and implement three abstract methods.
This issue implements the Gemini CLI runtime — the second concrete runtime, targeting Google's gemini CLI from google-gemini/gemini-cli.
Architecture
AgentRuntime (interface, src/middleware/types.ts)
└── CLIRuntimeBase (abstract, src/middleware/cli-runtime-base.ts)
├── ClaudeCliRuntime (src/middleware/runtimes/claude.ts) ✅ done
└── GeminiCliRuntime ← THIS ISSUE
CLIRuntimeBase requires subclasses to implement:
/** Construct CLI-specific command-line arguments. */
protected abstract buildArgs(params: AgentExecuteParams): string[];
/** Parse a single NDJSON line into an AgentEvent (or null to skip). */
protected abstract extractEvent(line: string): AgentEvent | null;
/** Construct provider-specific environment variables. */
protected abstract buildEnv(params: AgentExecuteParams): Record<string, string>;Additionally, subclasses may override:
get supportsStdinPrompt(): boolean— whether the CLI accepts prompts via stdin (default:true)execute()— to wrap the base execution with per-call setup/teardown
Dependencies
src/middleware/types.ts—AgentRuntime,AgentExecuteParams,AgentEvent,AgentRunResult, etc.src/middleware/cli-runtime-base.ts—CLIRuntimeBaseabstract classsrc/middleware/runtimes/claude.ts— reference implementation (same pattern)
All exist on main.
Specification
File: src/middleware/runtimes/gemini.ts
Create GeminiCliRuntime extending CLIRuntimeBase.
Constructor
constructor() {
super("gemini"); // CLI binary name
}get supportsStdinPrompt(): boolean
Override to return false. The Gemini CLI does not support stdin prompt delivery — prompts must be passed via the -p flag.
buildArgs(params: AgentExecuteParams): string[]
Build the Gemini CLI argument list:
| Flag | Value | When |
|---|---|---|
--output-format |
stream-json |
Always — NDJSON streaming output |
-p |
params.prompt |
Always — prompt delivery via flag |
-r |
params.sessionId |
When params.sessionId is provided |
Note on prompt delivery: Unlike Claude (positional arg), Gemini requires the -p flag for prompt delivery. Since supportsStdinPrompt is false, CLIRuntimeBase will not attempt stdin delivery regardless of prompt length. The -p flag should always be included.
Note on missing flags: The --verbose flag (used by Claude) is not applicable to Gemini. The stream-json output format already includes all available metadata in the result event.
execute() override — MCP config file lifecycle
The Gemini CLI reads MCP server configuration from a settings file (.gemini/settings.json in the working directory or ~/.gemini/settings.json globally). There is no --mcp-config CLI flag.
When params.mcpServers has entries, the runtime must manage a project-local settings file:
async *execute(params: AgentExecuteParams): AsyncIterable<AgentEvent> {
this.resetState();
const mcpConfigManager = params.mcpServers && Object.keys(params.mcpServers).length > 0
? new GeminiMcpConfigManager(params.workingDirectory, params.mcpServers)
: null;
try {
await mcpConfigManager?.setup();
for await (const event of super.execute(params)) {
if (event.type === "done") {
this.enrichDoneEvent(event);
}
yield event;
}
} finally {
await mcpConfigManager?.teardown();
}
}MCP config file management (internal helper class or methods):
-
Setup:
- Check if
.gemini/settings.jsonexists inparams.workingDirectory - If exists: read it, save a copy, merge
mcpServerskey into it, write back - If doesn't exist: create
.gemini/directory if needed, write{ "mcpServers": {...} } - Track what was created (directory, file, or just modified) for cleanup
- Check if
-
Teardown (in
finallyblock — always runs):- If original file existed: restore the saved copy
- If file was created: delete it
- If
.gemini/directory was created (was empty): rmdir it
Gemini settings.json MCP format:
{
"mcpServers": {
"<server-name>": {
"command": "<command>",
"args": ["<arg1>", "<arg2>"],
"env": { "<KEY>": "<VALUE>" }
}
}
}Implementation note: Check during implementation whether the Gemini CLI supports a --settings-dir or --config flag. If it does, a cleaner approach would be to create a temp directory with the settings file and point the flag there, avoiding any file collision concerns. The merge-restore pattern described above is the fallback.
extractEvent(line: string): AgentEvent | null
Parse a single NDJSON line from Gemini's stream-json output into an AgentEvent.
Gemini stream-json format (verified from google-gemini/gemini-cli source: packages/core/src/output/types.ts, stream-json-formatter.ts):
Each NDJSON line is bare JSON (no envelope) with a type discriminator and timestamp base field. There are 6 output event types.
Event mapping:
Gemini type |
Condition | Maps To | Notes |
|---|---|---|---|
init |
— | Skip | Extract session_id as currentSessionId |
message |
delta === true AND role === "assistant" |
AgentTextEvent |
{ type: "text", text: content } |
message |
delta === false OR role !== "assistant" |
Skip | Final message echo or user message |
tool_use |
— | AgentToolUseEvent |
{ type: "tool_use", toolName: tool_name, toolId: tool_id, input: parameters } |
tool_result |
— | AgentToolResultEvent |
{ type: "tool_result", toolId: tool_id, output: output, isError: status === "error" } |
error |
— | AgentErrorEvent |
{ type: "error", message: message, code: severity } |
result |
— | Store for enrichment | Extract stats for usage data; do not emit directly |
Stateful fields (instance-level, reset per execute() call):
currentSessionId: string | undefined— frominitevent'ssession_idfieldaccumulatedText: string— concatenated text frommessagedeltas forAgentRunResult.textresultStats: GeminiResultStats | undefined— fromresultevent'sstatsfield
Session ID tracking: The init event contains a session_id field. Capture it into instance state. Include in AgentRunResult via done event enrichment.
Usage extraction (from result event's stats field):
// result.stats structure:
{
total_tokens: number;
input_tokens: number;
output_tokens: number;
cached: number; // cache read tokens
duration_ms: number; // API duration
tool_calls: number; // number of tool invocations (analogous to "turns")
}Map to AgentUsage:
inputTokens←stats.input_tokensoutputTokens←stats.output_tokenscacheReadTokens←stats.cached(when > 0)
Additional result metadata:
apiDurationMs←stats.duration_msnumTurns←stats.tool_calls
Note on
tool_resultevents: The Gemini CLI source defines atool_resultoutput type, but channel adapter behavior with this event type should be tested during integration. The formatter maps internal events to output types; actual emission oftool_resultmay vary depending on tool execution patterns.
buildEnv(params: AgentExecuteParams): Record<string, string>
Return environment variable overrides for the Gemini subprocess. Currently returns an empty record {}.
Auth credentials (GEMINI_API_KEY) are passed through params.env by the caller, not hardcoded in the runtime. The runtime should not assume any particular auth mechanism.
Done event enrichment
Same pattern as ClaudeCliRuntime: override execute() to intercept the done event from CLIRuntimeBase and enrich AgentRunResult with accumulated state.
Result metadata mapping (from accumulated state → AgentRunResult):
text← accumulated from allmessagedelta eventssessionId← frominitevent'ssession_idusage← fromresult.stats(see usage extraction above)apiDurationMs← fromresult.stats.duration_msnumTurns← fromresult.stats.tool_calls
File: src/middleware/runtimes/gemini.test.ts
Unit tests following the same pattern as claude.test.ts (testable subclass exposing protected methods).
-
Argument construction (5+ test cases):
- Basic invocation:
--output-format stream-json -p <prompt> - Session resume: adds
-r <session-id> - No session: no
-rflag - Prompt always via
-pflag (not positional) - All flags present: session + prompt combined
- Basic invocation:
-
Event extraction (10+ test cases):
init→ skip (but session ID captured)messagewithdelta: true,role: "assistant"→AgentTextEventmessagewithdelta: false→ skip (final message)messagewithrole: "user"→ skiptool_use→AgentToolUseEventwithtoolName,toolId,inputtool_resultwithstatus: "success"→AgentToolResultEventwithisError: falsetool_resultwithstatus: "error"→AgentToolResultEventwithisError: trueerror→AgentErrorEventwithmessageandcodefromseverityresult→ stores stats, returns null- Unknown event type → skip (returns
null) - Text accumulation across multiple message deltas
-
Environment construction (2+ test cases):
- Returns empty record (no hardcoded env vars)
- Does not inject auth vars (caller responsibility)
-
MCP config file management (4+ test cases):
- Settings file is created when
mcpServershas entries (mock filesystem or check args) - Settings file is not created when
mcpServersis empty/undefined - Correct JSON structure written (
{ "mcpServers": {...} }) - Cleanup restores original file if it existed
- Settings file is created when
-
supportsStdinPrompt(1 test case):- Returns
false
- Returns
-
Done event enrichment (3+ test cases):
- Enriches with accumulated text, session ID, usage from stats
- Maps
duration_mstoapiDurationMs,tool_callstonumTurns - Handles missing stats gracefully
Acceptance Criteria
-
src/middleware/runtimes/gemini.tsexists and exportsGeminiCliRuntime - Class extends
CLIRuntimeBaseand implements all three abstract methods -
supportsStdinPromptreturnsfalse -
buildArgs()produces correct Gemini CLI flags (--output-format stream-json -p <prompt>) -
buildArgs()adds-r <session-id>whensessionIdis provided -
extractEvent()correctly maps all 6 Gemini event types toAgentEventtypes -
extractEvent()is stateful: accumulates text, tracks session ID frominit, stores result stats - Session ID is extracted from
initevent and included in the done result - Usage metadata is populated from the
result.statsfield - MCP server config is written to
.gemini/settings.jsonin the working directory whenmcpServershas entries - MCP config file is cleaned up after execution completes (including on error)
- Existing
.gemini/settings.jsonis preserved (merge-restore pattern) - Unit tests cover argument construction, event extraction, environment setup, MCP config lifecycle, and done enrichment
-
pnpm buildpasses -
pnpm testpasses
Reference
src/middleware/runtimes/claude.ts— reference implementation following the same patternsrc/middleware/runtimes/claude.test.ts— reference test file with testable subclass patterngoogle-gemini/gemini-clisource:packages/core/src/output/types.ts— output event type definitionspackages/core/src/output/stream-json-formatter.ts— NDJSON formatter mapping internal events to output types
- Feature shipped in gemini-cli v0.11.0.
session_idadded in PR feat(agents): detect and recover from consecutive empty model responses (0 output tokens) openclaw/openclaw#14504 (Dec 2025). Tokencached/inputbreakdown added PR fix: session file locks not released after write (#15000) openclaw/openclaw#15021. - Internal
GeminiEventTypehas 18 values but the formatter maps/aggregates to the 6 output types —thought,citation,retry,loop_detectedetc. are NOT emitted as separate output events. - Empirical verification needed: Exact field names on
messageevents (e.g.,contentvstext) andtool_resultemission patterns require capture of actualgemini --output-format stream-jsonoutput during implementation.