-
Notifications
You must be signed in to change notification settings - Fork 0
Implement CodexCliRuntime concrete class #12
Description
Context
RemoteClaw's middleware architecture uses CLI subprocesses to interact with AI agents. The abstract base class CLIRuntimeBase (in src/middleware/cli-runtime-base.ts) handles subprocess spawning, NDJSON parsing, watchdog timers, abort signal propagation, and stdin prompt delivery. Concrete runtimes extend it and implement three abstract methods.
This issue implements the Codex CLI runtime — targeting OpenAI's codex CLI from openai/codex.
Architecture
AgentRuntime (interface, src/middleware/types.ts)
└── CLIRuntimeBase (abstract, src/middleware/cli-runtime-base.ts)
├── ClaudeCliRuntime (src/middleware/runtimes/claude.ts) ✅ done
├── GeminiCliRuntime (src/middleware/runtimes/gemini.ts) ✅ done
└── CodexCliRuntime ← THIS ISSUE
CLIRuntimeBase requires subclasses to implement:
/** Construct CLI-specific command-line arguments. */
protected abstract buildArgs(params: AgentExecuteParams): string[];
/** Parse a single NDJSON line into an AgentEvent (or null to skip). */
protected abstract extractEvent(line: string): AgentEvent | null;
/** Construct provider-specific environment variables. */
protected abstract buildEnv(params: AgentExecuteParams): Record<string, string>;Additionally, subclasses may override:
get supportsStdinPrompt(): boolean— whether the CLI accepts prompts via stdin (default:true)execute()— to wrap the base execution with per-call setup/teardown
Dependencies
src/middleware/types.ts—AgentRuntime,AgentExecuteParams,AgentEvent,AgentRunResult, etc.src/middleware/cli-runtime-base.ts—CLIRuntimeBaseabstract classsrc/middleware/runtimes/claude.ts/gemini.ts— reference implementations (same pattern)
All exist on main.
Specification
File: src/middleware/runtimes/codex.ts
Create CodexCliRuntime extending CLIRuntimeBase.
Constructor
constructor() {
super("codex"); // CLI binary name
}get supportsStdinPrompt(): boolean
Override to return false. The Codex CLI accepts prompts only as positional arguments, not via stdin.
buildArgs(params: AgentExecuteParams): string[]
Build the Codex CLI argument list. Codex uses the exec subcommand with different syntax for new vs resumed sessions.
New session:
| Arg | Value | When |
|---|---|---|
exec |
(subcommand) | Always |
--json |
(none) | Always — NDJSON streaming output |
--color |
never |
Always — prevents ANSI escape codes in output |
| (positional) | params.prompt |
Always (new session) |
Result: ["exec", "--json", "--color", "never", params.prompt]
Session resume:
| Arg | Value | When |
|---|---|---|
exec |
(subcommand) | Always |
resume |
(resume sub-subcommand) | When params.sessionId is provided |
| (positional) | params.sessionId |
Session ID to resume |
--json |
(none) | Always |
--color |
never |
Always |
Result: ["exec", "resume", params.sessionId, "--json", "--color", "never"]
Important: Prompt is excluded on resume. This is a documented Codex CLI limitation — codex exec resume <id> does not accept a new prompt. The agent continues from where it left off. The prompt from params.prompt is ignored when params.sessionId is provided.
MCP config is not handled via CLI args — see the execute() override section below.
execute() override — state reset + MCP config + done enrichment
Same pattern as GeminiCliRuntime: override execute() to manage per-execution state and MCP config file lifecycle.
async *execute(params: AgentExecuteParams): AsyncIterable<AgentEvent> {
this.resetState();
const mcpConfigManager =
params.mcpServers && Object.keys(params.mcpServers).length > 0
? new CodexMcpConfigManager(params.workingDirectory, params.mcpServers)
: null;
try {
await mcpConfigManager?.setup();
for await (const event of super.execute(params)) {
if (event.type === "done") {
this.enrichDoneEvent(event);
}
yield event;
}
} finally {
await mcpConfigManager?.teardown();
}
}MCP config file management:
The Codex CLI reads MCP config from ~/.codex/config.toml (global) or a project-local equivalent. The config uses TOML format.
The CodexMcpConfigManager follows the same merge-restore pattern as GeminiMcpConfigManager:
- Setup: Check for existing config file, save copy if it exists, merge
mcp_serverssection, write back - Teardown: Restore original or delete created file
Codex TOML MCP format:
[mcp_servers.server_name]
type = "stdio"
command = ["node", "server.js"]
[mcp_servers.server_name.env]
KEY = "VALUE"Note the differences from our McpServerConfig type:
commandis an array:[config.command, ...(config.args ?? [])]type = "stdio"is required (always"stdio"for RemoteClaw's MCP servers)
TOML generation: Since the MCP config TOML structure is simple and predictable, generate it manually without a TOML library. The format is straightforward string concatenation of [mcp_servers.<name>] sections. If a TOML library is preferred, check smol-toml (small, zero-dependency).
Implementation note: Check during implementation whether the Codex CLI supports a --config flag pointing to a custom config file. If it does, use a temp directory approach (cleaner). The merge-restore pattern on ~/.codex/config.toml is the fallback.
extractEvent(line: string): AgentEvent | null
Parse a single NDJSON line from Codex's --json output into an AgentEvent.
Codex --json format (verified from openai/codex SDK source: sdk/typescript/src/events.ts, items.ts and official docs):
Each NDJSON line is bare JSON (no envelope) with a type discriminator. There are 8 event types. Items within events have their own item.type discriminator with 8 sub-types.
Event types:
| Event Type | Description |
|---|---|
thread.started |
New thread created, contains thread_id |
turn.started |
Agent turn begins |
item.started |
Item lifecycle start, contains full item data |
item.updated |
Item state update (progressive text for agent_message) |
item.completed |
Item lifecycle end, contains final item data |
turn.completed |
Agent turn ends, contains usage |
turn.failed |
Agent turn failed |
error |
Stream-level error |
Item types (discriminated by item.type):
| Item Type | Description | Relevant For |
|---|---|---|
agent_message |
Text output from agent | AgentTextEvent |
command_execution |
Shell command execution | AgentToolUseEvent / AgentToolResultEvent |
mcp_tool_call |
MCP tool invocation | AgentToolUseEvent / AgentToolResultEvent |
file_change |
File modification | Skip (or AgentToolUseEvent if useful) |
reasoning |
Reasoning/thinking content | Skip |
web_search |
Web search invocation | Skip (or AgentToolUseEvent if useful) |
error |
Error item | AgentErrorEvent |
todo_list |
Task/todo tracking | Skip |
Event mapping:
| Codex Event | Condition | Maps To | Notes |
|---|---|---|---|
thread.started |
— | Skip | Extract thread_id as currentSessionId |
turn.started |
— | Skip | Turn lifecycle boundary |
item.started |
item.type === "command_execution" |
AgentToolUseEvent |
{ toolName: "command_execution", toolId, input: { command } } |
item.started |
item.type === "mcp_tool_call" |
AgentToolUseEvent |
{ toolName: item.name, toolId, input: item.arguments } |
item.started |
other | Skip | |
item.updated |
item.type === "agent_message" |
AgentTextEvent |
Delta computation (see below) |
item.updated |
other | Skip | Intermediate state |
item.completed |
item.type === "agent_message" |
AgentTextEvent |
Emit final delta if any |
item.completed |
item.type === "command_execution" |
AgentToolResultEvent |
{ toolId, output: item.output, isError: item.exit_code !== 0 } |
item.completed |
item.type === "mcp_tool_call" |
AgentToolResultEvent |
{ toolId, output: item.output, isError: !!item.error } |
item.completed |
item.type === "error" |
AgentErrorEvent |
{ message: item.message } |
item.completed |
other | Skip | |
turn.completed |
— | Store for enrichment | Extract usage field |
turn.failed |
— | AgentErrorEvent |
{ message, code: "turn_failed" } |
error |
— | AgentErrorEvent |
{ message } |
Text streaming — delta computation:
The item.updated event for agent_message contains the full accumulated text so far (not just the new delta). To emit incremental AgentTextEvents:
// Instance state:
private lastEmittedTextLength = 0;
// In item.updated handler for agent_message:
const fullText = /* extract text from item content */;
const delta = fullText.substring(this.lastEmittedTextLength);
this.lastEmittedTextLength = fullText.length;
if (delta) {
this.accumulatedText += delta;
return { type: "text", text: delta };
}
return null;Reset lastEmittedTextLength to 0 on each new item.started for agent_message.
Tool ID generation: Codex items have SDK-native IDs. Extract from item.id field. If not present, generate with codex-item-${counter}.
Stateful fields (instance-level, reset per execute() call):
currentSessionId: string | undefined— fromthread.startedevent'sthread_idaccumulatedText: string— concatenated text from message deltas forAgentRunResult.textlastEmittedTextLength: number— for delta computation within a message itemlastUsage: AgentUsage | undefined— fromturn.completedevent'susagefieldcurrentToolId: string | undefined— tracks the active tool item ID for correlatingitem.started→item.completed
Usage extraction (from turn.completed event's usage field):
// turn.completed.usage structure:
{
input_tokens: number;
cached_input_tokens: number;
output_tokens: number;
}Map to AgentUsage:
inputTokens←usage.input_tokensoutputTokens←usage.output_tokenscacheReadTokens←usage.cached_input_tokens(when > 0)
buildEnv(params: AgentExecuteParams): Record<string, string>
Return environment variable overrides for the Codex subprocess.
Cross-contamination prevention: Codex strips ANTHROPIC_API_KEY from its subprocess environment to prevent accidental cross-provider auth leakage. Implement this in buildEnv():
protected buildEnv(_params: AgentExecuteParams): Record<string, string> {
return {
ANTHROPIC_API_KEY: "", // Prevent cross-provider leakage
};
}Auth credentials (OPENAI_API_KEY) are passed through params.env by the caller, not hardcoded in the runtime.
Done event enrichment
Same pattern as Claude/Gemini: intercept the done event and enrich AgentRunResult with accumulated state.
Result metadata mapping (from accumulated state → AgentRunResult):
text← accumulated from allagent_messagedelta eventssessionId← fromthread.startedevent'sthread_idusage← fromturn.completedevent'susage(last one, if multiple turns)
Note: Codex does not report cost, API duration, or stop reason in its NDJSON output.
File: src/middleware/runtimes/codex.test.ts
Unit tests following the same testable-subclass pattern.
-
Argument construction (6+ test cases):
- New session:
["exec", "--json", "--color", "never", "<prompt>"] - Session resume:
["exec", "resume", "<session-id>", "--json", "--color", "never"]— no prompt - No session: no
resumesub-subcommand - Verify prompt is excluded on resume
- Verify
--color neveralways present - Verify
execalways first arg
- New session:
-
Event extraction (12+ test cases):
thread.started→ skip (but thread_id captured as session ID)turn.started→ skipitem.started+command_execution→AgentToolUseEventitem.started+mcp_tool_call→AgentToolUseEventwith tool name and argumentsitem.started+agent_message→ skipitem.updated+agent_message→AgentTextEventwith delta from accumulated textitem.updated+agent_message(multiple) → correct incremental deltasitem.completed+command_execution→AgentToolResultEventwith exit_code checkitem.completed+mcp_tool_call→AgentToolResultEventitem.completed+error→AgentErrorEventturn.completed→ stores usage, returns nullturn.failed→AgentErrorEventerror→AgentErrorEvent- Unknown event type → skip
-
Environment construction (3+ test cases):
- Strips
ANTHROPIC_API_KEY(cross-contamination prevention) - Does not inject
OPENAI_API_KEY(caller responsibility)
- Strips
-
MCP config file management (4+ test cases):
- TOML file created when
mcpServershas entries - Correct TOML structure (
[mcp_servers.<name>]sections) commandarray correctly formed fromMcpServerConfig.command+args- Cleanup on teardown
- TOML file created when
-
supportsStdinPrompt(1 test case):- Returns
false
- Returns
-
Done event enrichment (3+ test cases):
- Enriches with accumulated text, session ID from thread_id, usage
- Handles missing usage gracefully
- Handles multiple turns (uses last turn's usage)
Acceptance Criteria
-
src/middleware/runtimes/codex.tsexists and exportsCodexCliRuntime - Class extends
CLIRuntimeBaseand implements all three abstract methods -
supportsStdinPromptreturnsfalse -
buildArgs()producesexec --json --color never <prompt>for new sessions -
buildArgs()producesexec resume <id> --json --color neverfor resumed sessions (no prompt) -
extractEvent()correctly maps all 8 Codex event types toAgentEventtypes -
extractEvent()handles the two-level event model: events (thread/turn) + items (8 sub-types) - Text streaming uses delta computation from progressive
item.updatedevents - Tool events:
item.started→AgentToolUseEvent,item.completed→AgentToolResultEvent - Session ID extracted from
thread.startedevent'sthread_id - Usage extracted from
turn.completedevent'susagefield -
buildEnv()stripsANTHROPIC_API_KEY(cross-contamination prevention) - MCP config written to TOML format in Codex config location when
mcpServershas entries - MCP config file cleaned up after execution (merge-restore pattern)
- Unit tests cover argument construction, event extraction, environment setup, MCP config, and done enrichment
-
pnpm buildpasses -
pnpm testpasses
Reference
src/middleware/runtimes/claude.ts/gemini.ts— reference implementationssrc/middleware/runtimes/claude.test.ts/gemini.test.ts— reference test filesopenai/codexSDK source:sdk/typescript/src/events.ts— event type definitionssdk/typescript/src/items.ts— item type definitions
- Official Codex non-interactive docs —
codex exec --jsonformat - Historical:
--jsonwas previously--experimental-json;agent_messagewas previouslyassistant_message - The app-server protocol (
codex app-server) uses a richer format with slash-delimited events — NOT relevant for CLI--jsonoutput - Known limitation: Prompt is excluded on session resume (
codex exec resume <id>does not accept a new prompt) - Empirical verification needed: Exact item field names (e.g.,
item.outputvsitem.result,item.commandvsitem.input) anditem.idavailability require capture of actualcodex exec --jsonoutput during implementation