Skip to content

Implement CodexCliRuntime concrete class #12

@alexey-pelykh

Description

@alexey-pelykh

Context

RemoteClaw's middleware architecture uses CLI subprocesses to interact with AI agents. The abstract base class CLIRuntimeBase (in src/middleware/cli-runtime-base.ts) handles subprocess spawning, NDJSON parsing, watchdog timers, abort signal propagation, and stdin prompt delivery. Concrete runtimes extend it and implement three abstract methods.

This issue implements the Codex CLI runtime — targeting OpenAI's codex CLI from openai/codex.

Architecture

AgentRuntime (interface, src/middleware/types.ts)
  └── CLIRuntimeBase (abstract, src/middleware/cli-runtime-base.ts)
        ├── ClaudeCliRuntime  (src/middleware/runtimes/claude.ts)  ✅ done
        ├── GeminiCliRuntime  (src/middleware/runtimes/gemini.ts)  ✅ done
        └── CodexCliRuntime   ← THIS ISSUE

CLIRuntimeBase requires subclasses to implement:

/** Construct CLI-specific command-line arguments. */
protected abstract buildArgs(params: AgentExecuteParams): string[];

/** Parse a single NDJSON line into an AgentEvent (or null to skip). */
protected abstract extractEvent(line: string): AgentEvent | null;

/** Construct provider-specific environment variables. */
protected abstract buildEnv(params: AgentExecuteParams): Record<string, string>;

Additionally, subclasses may override:

  • get supportsStdinPrompt(): boolean — whether the CLI accepts prompts via stdin (default: true)
  • execute() — to wrap the base execution with per-call setup/teardown

Dependencies

  • src/middleware/types.tsAgentRuntime, AgentExecuteParams, AgentEvent, AgentRunResult, etc.
  • src/middleware/cli-runtime-base.tsCLIRuntimeBase abstract class
  • src/middleware/runtimes/claude.ts / gemini.ts — reference implementations (same pattern)

All exist on main.

Specification

File: src/middleware/runtimes/codex.ts

Create CodexCliRuntime extending CLIRuntimeBase.

Constructor

constructor() {
  super("codex"); // CLI binary name
}

get supportsStdinPrompt(): boolean

Override to return false. The Codex CLI accepts prompts only as positional arguments, not via stdin.

buildArgs(params: AgentExecuteParams): string[]

Build the Codex CLI argument list. Codex uses the exec subcommand with different syntax for new vs resumed sessions.

New session:

Arg Value When
exec (subcommand) Always
--json (none) Always — NDJSON streaming output
--color never Always — prevents ANSI escape codes in output
(positional) params.prompt Always (new session)

Result: ["exec", "--json", "--color", "never", params.prompt]

Session resume:

Arg Value When
exec (subcommand) Always
resume (resume sub-subcommand) When params.sessionId is provided
(positional) params.sessionId Session ID to resume
--json (none) Always
--color never Always

Result: ["exec", "resume", params.sessionId, "--json", "--color", "never"]

Important: Prompt is excluded on resume. This is a documented Codex CLI limitation — codex exec resume <id> does not accept a new prompt. The agent continues from where it left off. The prompt from params.prompt is ignored when params.sessionId is provided.

MCP config is not handled via CLI args — see the execute() override section below.

execute() override — state reset + MCP config + done enrichment

Same pattern as GeminiCliRuntime: override execute() to manage per-execution state and MCP config file lifecycle.

async *execute(params: AgentExecuteParams): AsyncIterable<AgentEvent> {
  this.resetState();

  const mcpConfigManager =
    params.mcpServers && Object.keys(params.mcpServers).length > 0
      ? new CodexMcpConfigManager(params.workingDirectory, params.mcpServers)
      : null;

  try {
    await mcpConfigManager?.setup();

    for await (const event of super.execute(params)) {
      if (event.type === "done") {
        this.enrichDoneEvent(event);
      }
      yield event;
    }
  } finally {
    await mcpConfigManager?.teardown();
  }
}

MCP config file management:

The Codex CLI reads MCP config from ~/.codex/config.toml (global) or a project-local equivalent. The config uses TOML format.

The CodexMcpConfigManager follows the same merge-restore pattern as GeminiMcpConfigManager:

  1. Setup: Check for existing config file, save copy if it exists, merge mcp_servers section, write back
  2. Teardown: Restore original or delete created file

Codex TOML MCP format:

[mcp_servers.server_name]
type = "stdio"
command = ["node", "server.js"]

[mcp_servers.server_name.env]
KEY = "VALUE"

Note the differences from our McpServerConfig type:

  • command is an array: [config.command, ...(config.args ?? [])]
  • type = "stdio" is required (always "stdio" for RemoteClaw's MCP servers)

TOML generation: Since the MCP config TOML structure is simple and predictable, generate it manually without a TOML library. The format is straightforward string concatenation of [mcp_servers.<name>] sections. If a TOML library is preferred, check smol-toml (small, zero-dependency).

Implementation note: Check during implementation whether the Codex CLI supports a --config flag pointing to a custom config file. If it does, use a temp directory approach (cleaner). The merge-restore pattern on ~/.codex/config.toml is the fallback.

extractEvent(line: string): AgentEvent | null

Parse a single NDJSON line from Codex's --json output into an AgentEvent.

Codex --json format (verified from openai/codex SDK source: sdk/typescript/src/events.ts, items.ts and official docs):

Each NDJSON line is bare JSON (no envelope) with a type discriminator. There are 8 event types. Items within events have their own item.type discriminator with 8 sub-types.

Event types:

Event Type Description
thread.started New thread created, contains thread_id
turn.started Agent turn begins
item.started Item lifecycle start, contains full item data
item.updated Item state update (progressive text for agent_message)
item.completed Item lifecycle end, contains final item data
turn.completed Agent turn ends, contains usage
turn.failed Agent turn failed
error Stream-level error

Item types (discriminated by item.type):

Item Type Description Relevant For
agent_message Text output from agent AgentTextEvent
command_execution Shell command execution AgentToolUseEvent / AgentToolResultEvent
mcp_tool_call MCP tool invocation AgentToolUseEvent / AgentToolResultEvent
file_change File modification Skip (or AgentToolUseEvent if useful)
reasoning Reasoning/thinking content Skip
web_search Web search invocation Skip (or AgentToolUseEvent if useful)
error Error item AgentErrorEvent
todo_list Task/todo tracking Skip

Event mapping:

Codex Event Condition Maps To Notes
thread.started Skip Extract thread_id as currentSessionId
turn.started Skip Turn lifecycle boundary
item.started item.type === "command_execution" AgentToolUseEvent { toolName: "command_execution", toolId, input: { command } }
item.started item.type === "mcp_tool_call" AgentToolUseEvent { toolName: item.name, toolId, input: item.arguments }
item.started other Skip
item.updated item.type === "agent_message" AgentTextEvent Delta computation (see below)
item.updated other Skip Intermediate state
item.completed item.type === "agent_message" AgentTextEvent Emit final delta if any
item.completed item.type === "command_execution" AgentToolResultEvent { toolId, output: item.output, isError: item.exit_code !== 0 }
item.completed item.type === "mcp_tool_call" AgentToolResultEvent { toolId, output: item.output, isError: !!item.error }
item.completed item.type === "error" AgentErrorEvent { message: item.message }
item.completed other Skip
turn.completed Store for enrichment Extract usage field
turn.failed AgentErrorEvent { message, code: "turn_failed" }
error AgentErrorEvent { message }

Text streaming — delta computation:

The item.updated event for agent_message contains the full accumulated text so far (not just the new delta). To emit incremental AgentTextEvents:

// Instance state:
private lastEmittedTextLength = 0;

// In item.updated handler for agent_message:
const fullText = /* extract text from item content */;
const delta = fullText.substring(this.lastEmittedTextLength);
this.lastEmittedTextLength = fullText.length;
if (delta) {
  this.accumulatedText += delta;
  return { type: "text", text: delta };
}
return null;

Reset lastEmittedTextLength to 0 on each new item.started for agent_message.

Tool ID generation: Codex items have SDK-native IDs. Extract from item.id field. If not present, generate with codex-item-${counter}.

Stateful fields (instance-level, reset per execute() call):

  • currentSessionId: string | undefined — from thread.started event's thread_id
  • accumulatedText: string — concatenated text from message deltas for AgentRunResult.text
  • lastEmittedTextLength: number — for delta computation within a message item
  • lastUsage: AgentUsage | undefined — from turn.completed event's usage field
  • currentToolId: string | undefined — tracks the active tool item ID for correlating item.starteditem.completed

Usage extraction (from turn.completed event's usage field):

// turn.completed.usage structure:
{
  input_tokens: number;
  cached_input_tokens: number;
  output_tokens: number;
}

Map to AgentUsage:

  • inputTokensusage.input_tokens
  • outputTokensusage.output_tokens
  • cacheReadTokensusage.cached_input_tokens (when > 0)

buildEnv(params: AgentExecuteParams): Record<string, string>

Return environment variable overrides for the Codex subprocess.

Cross-contamination prevention: Codex strips ANTHROPIC_API_KEY from its subprocess environment to prevent accidental cross-provider auth leakage. Implement this in buildEnv():

protected buildEnv(_params: AgentExecuteParams): Record<string, string> {
  return {
    ANTHROPIC_API_KEY: "",  // Prevent cross-provider leakage
  };
}

Auth credentials (OPENAI_API_KEY) are passed through params.env by the caller, not hardcoded in the runtime.

Done event enrichment

Same pattern as Claude/Gemini: intercept the done event and enrich AgentRunResult with accumulated state.

Result metadata mapping (from accumulated state → AgentRunResult):

  • text ← accumulated from all agent_message delta events
  • sessionId ← from thread.started event's thread_id
  • usage ← from turn.completed event's usage (last one, if multiple turns)

Note: Codex does not report cost, API duration, or stop reason in its NDJSON output.

File: src/middleware/runtimes/codex.test.ts

Unit tests following the same testable-subclass pattern.

  1. Argument construction (6+ test cases):

    • New session: ["exec", "--json", "--color", "never", "<prompt>"]
    • Session resume: ["exec", "resume", "<session-id>", "--json", "--color", "never"] — no prompt
    • No session: no resume sub-subcommand
    • Verify prompt is excluded on resume
    • Verify --color never always present
    • Verify exec always first arg
  2. Event extraction (12+ test cases):

    • thread.started → skip (but thread_id captured as session ID)
    • turn.started → skip
    • item.started + command_executionAgentToolUseEvent
    • item.started + mcp_tool_callAgentToolUseEvent with tool name and arguments
    • item.started + agent_message → skip
    • item.updated + agent_messageAgentTextEvent with delta from accumulated text
    • item.updated + agent_message (multiple) → correct incremental deltas
    • item.completed + command_executionAgentToolResultEvent with exit_code check
    • item.completed + mcp_tool_callAgentToolResultEvent
    • item.completed + errorAgentErrorEvent
    • turn.completed → stores usage, returns null
    • turn.failedAgentErrorEvent
    • errorAgentErrorEvent
    • Unknown event type → skip
  3. Environment construction (3+ test cases):

    • Strips ANTHROPIC_API_KEY (cross-contamination prevention)
    • Does not inject OPENAI_API_KEY (caller responsibility)
  4. MCP config file management (4+ test cases):

    • TOML file created when mcpServers has entries
    • Correct TOML structure ([mcp_servers.<name>] sections)
    • command array correctly formed from McpServerConfig.command + args
    • Cleanup on teardown
  5. supportsStdinPrompt (1 test case):

    • Returns false
  6. Done event enrichment (3+ test cases):

    • Enriches with accumulated text, session ID from thread_id, usage
    • Handles missing usage gracefully
    • Handles multiple turns (uses last turn's usage)

Acceptance Criteria

  • src/middleware/runtimes/codex.ts exists and exports CodexCliRuntime
  • Class extends CLIRuntimeBase and implements all three abstract methods
  • supportsStdinPrompt returns false
  • buildArgs() produces exec --json --color never <prompt> for new sessions
  • buildArgs() produces exec resume <id> --json --color never for resumed sessions (no prompt)
  • extractEvent() correctly maps all 8 Codex event types to AgentEvent types
  • extractEvent() handles the two-level event model: events (thread/turn) + items (8 sub-types)
  • Text streaming uses delta computation from progressive item.updated events
  • Tool events: item.startedAgentToolUseEvent, item.completedAgentToolResultEvent
  • Session ID extracted from thread.started event's thread_id
  • Usage extracted from turn.completed event's usage field
  • buildEnv() strips ANTHROPIC_API_KEY (cross-contamination prevention)
  • MCP config written to TOML format in Codex config location when mcpServers has entries
  • MCP config file cleaned up after execution (merge-restore pattern)
  • Unit tests cover argument construction, event extraction, environment setup, MCP config, and done enrichment
  • pnpm build passes
  • pnpm test passes

Reference

  • src/middleware/runtimes/claude.ts / gemini.ts — reference implementations
  • src/middleware/runtimes/claude.test.ts / gemini.test.ts — reference test files
  • openai/codex SDK source:
    • sdk/typescript/src/events.ts — event type definitions
    • sdk/typescript/src/items.ts — item type definitions
  • Official Codex non-interactive docscodex exec --json format
  • Historical: --json was previously --experimental-json; agent_message was previously assistant_message
  • The app-server protocol (codex app-server) uses a richer format with slash-delimited events — NOT relevant for CLI --json output
  • Known limitation: Prompt is excluded on session resume (codex exec resume <id> does not accept a new prompt)
  • Empirical verification needed: Exact item field names (e.g., item.output vs item.result, item.command vs item.input) and item.id availability require capture of actual codex exec --json output during implementation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions