Implement CodexCliRuntime concrete class

## Context

RemoteClaw's middleware architecture uses CLI subprocesses to interact with AI agents. The abstract base class `CLIRuntimeBase` (in `src/middleware/cli-runtime-base.ts`) handles subprocess spawning, NDJSON parsing, watchdog timers, abort signal propagation, and stdin prompt delivery. Concrete runtimes extend it and implement three abstract methods.

This issue implements the **Codex CLI runtime** — targeting OpenAI's `codex` CLI from `openai/codex`.

### Architecture

```
AgentRuntime (interface, src/middleware/types.ts)
  └── CLIRuntimeBase (abstract, src/middleware/cli-runtime-base.ts)
        ├── ClaudeCliRuntime  (src/middleware/runtimes/claude.ts)  ✅ done
        ├── GeminiCliRuntime  (src/middleware/runtimes/gemini.ts)  ✅ done
        └── CodexCliRuntime   ← THIS ISSUE
```

`CLIRuntimeBase` requires subclasses to implement:

```typescript
/** Construct CLI-specific command-line arguments. */
protected abstract buildArgs(params: AgentExecuteParams): string[];

/** Parse a single NDJSON line into an AgentEvent (or null to skip). */
protected abstract extractEvent(line: string): AgentEvent | null;

/** Construct provider-specific environment variables. */
protected abstract buildEnv(params: AgentExecuteParams): Record<string, string>;
```

Additionally, subclasses may override:
- `get supportsStdinPrompt(): boolean` — whether the CLI accepts prompts via stdin (default: `true`)
- `execute()` — to wrap the base execution with per-call setup/teardown

### Dependencies

- `src/middleware/types.ts` — `AgentRuntime`, `AgentExecuteParams`, `AgentEvent`, `AgentRunResult`, etc.
- `src/middleware/cli-runtime-base.ts` — `CLIRuntimeBase` abstract class
- `src/middleware/runtimes/claude.ts` / `gemini.ts` — reference implementations (same pattern)

All exist on `main`.

## Specification

### File: `src/middleware/runtimes/codex.ts`

Create `CodexCliRuntime` extending `CLIRuntimeBase`.

#### Constructor

```typescript
constructor() {
  super("codex"); // CLI binary name
}
```

#### `get supportsStdinPrompt(): boolean`

Override to return `false`. The Codex CLI accepts prompts only as positional arguments, not via stdin.

#### `buildArgs(params: AgentExecuteParams): string[]`

Build the Codex CLI argument list. Codex uses the `exec` subcommand with different syntax for new vs resumed sessions.

**New session:**

| Arg | Value | When |
|-----|-------|------|
| `exec` | (subcommand) | Always |
| `--json` | (none) | Always — NDJSON streaming output |
| `--color` | `never` | Always — prevents ANSI escape codes in output |
| *(positional)* | `params.prompt` | Always (new session) |

Result: `["exec", "--json", "--color", "never", params.prompt]`

**Session resume:**

| Arg | Value | When |
|-----|-------|------|
| `exec` | (subcommand) | Always |
| `resume` | (resume sub-subcommand) | When `params.sessionId` is provided |
| *(positional)* | `params.sessionId` | Session ID to resume |
| `--json` | (none) | Always |
| `--color` | `never` | Always |

Result: `["exec", "resume", params.sessionId, "--json", "--color", "never"]`

**Important: Prompt is excluded on resume.** This is a documented Codex CLI limitation — `codex exec resume <id>` does not accept a new prompt. The agent continues from where it left off. The prompt from `params.prompt` is ignored when `params.sessionId` is provided.

**MCP config** is not handled via CLI args — see the `execute()` override section below.

#### `execute()` override — state reset + MCP config + done enrichment

Same pattern as `GeminiCliRuntime`: override `execute()` to manage per-execution state and MCP config file lifecycle.

```typescript
async *execute(params: AgentExecuteParams): AsyncIterable<AgentEvent> {
  this.resetState();

  const mcpConfigManager =
    params.mcpServers && Object.keys(params.mcpServers).length > 0
      ? new CodexMcpConfigManager(params.workingDirectory, params.mcpServers)
      : null;

  try {
    await mcpConfigManager?.setup();

    for await (const event of super.execute(params)) {
      if (event.type === "done") {
        this.enrichDoneEvent(event);
      }
      yield event;
    }
  } finally {
    await mcpConfigManager?.teardown();
  }
}
```

**MCP config file management:**

The Codex CLI reads MCP config from `~/.codex/config.toml` (global) or a project-local equivalent. The config uses TOML format.

The `CodexMcpConfigManager` follows the same merge-restore pattern as `GeminiMcpConfigManager`:
1. **Setup**: Check for existing config file, save copy if it exists, merge `mcp_servers` section, write back
2. **Teardown**: Restore original or delete created file

**Codex TOML MCP format:**

```toml
[mcp_servers.server_name]
type = "stdio"
command = ["node", "server.js"]

[mcp_servers.server_name.env]
KEY = "VALUE"
```

Note the differences from our `McpServerConfig` type:
- `command` is an array: `[config.command, ...(config.args ?? [])]`
- `type = "stdio"` is required (always `"stdio"` for RemoteClaw's MCP servers)

**TOML generation**: Since the MCP config TOML structure is simple and predictable, generate it manually without a TOML library. The format is straightforward string concatenation of `[mcp_servers.<name>]` sections. If a TOML library is preferred, check `smol-toml` (small, zero-dependency).

**Implementation note**: Check during implementation whether the Codex CLI supports a `--config` flag pointing to a custom config file. If it does, use a temp directory approach (cleaner). The merge-restore pattern on `~/.codex/config.toml` is the fallback.

#### `extractEvent(line: string): AgentEvent | null`

Parse a single NDJSON line from Codex's `--json` output into an `AgentEvent`.

**Codex `--json` format** (verified from `openai/codex` SDK source: `sdk/typescript/src/events.ts`, `items.ts` and [official docs](https://developers.openai.com/codex/noninteractive/)):

Each NDJSON line is bare JSON (no envelope) with a `type` discriminator. There are 8 event types. Items within events have their own `item.type` discriminator with 8 sub-types.

**Event types:**

| Event Type | Description |
|------------|-------------|
| `thread.started` | New thread created, contains `thread_id` |
| `turn.started` | Agent turn begins |
| `item.started` | Item lifecycle start, contains full `item` data |
| `item.updated` | Item state update (progressive text for `agent_message`) |
| `item.completed` | Item lifecycle end, contains final `item` data |
| `turn.completed` | Agent turn ends, contains `usage` |
| `turn.failed` | Agent turn failed |
| `error` | Stream-level error |

**Item types** (discriminated by `item.type`):

| Item Type | Description | Relevant For |
|-----------|-------------|-------------|
| `agent_message` | Text output from agent | `AgentTextEvent` |
| `command_execution` | Shell command execution | `AgentToolUseEvent` / `AgentToolResultEvent` |
| `mcp_tool_call` | MCP tool invocation | `AgentToolUseEvent` / `AgentToolResultEvent` |
| `file_change` | File modification | Skip (or `AgentToolUseEvent` if useful) |
| `reasoning` | Reasoning/thinking content | Skip |
| `web_search` | Web search invocation | Skip (or `AgentToolUseEvent` if useful) |
| `error` | Error item | `AgentErrorEvent` |
| `todo_list` | Task/todo tracking | Skip |

**Event mapping:**

| Codex Event | Condition | Maps To | Notes |
|-------------|-----------|---------|-------|
| `thread.started` | — | *Skip* | Extract `thread_id` as `currentSessionId` |
| `turn.started` | — | *Skip* | Turn lifecycle boundary |
| `item.started` | `item.type === "command_execution"` | `AgentToolUseEvent` | `{ toolName: "command_execution", toolId, input: { command } }` |
| `item.started` | `item.type === "mcp_tool_call"` | `AgentToolUseEvent` | `{ toolName: item.name, toolId, input: item.arguments }` |
| `item.started` | other | *Skip* | |
| `item.updated` | `item.type === "agent_message"` | `AgentTextEvent` | Delta computation (see below) |
| `item.updated` | other | *Skip* | Intermediate state |
| `item.completed` | `item.type === "agent_message"` | `AgentTextEvent` | Emit final delta if any |
| `item.completed` | `item.type === "command_execution"` | `AgentToolResultEvent` | `{ toolId, output: item.output, isError: item.exit_code !== 0 }` |
| `item.completed` | `item.type === "mcp_tool_call"` | `AgentToolResultEvent` | `{ toolId, output: item.output, isError: !!item.error }` |
| `item.completed` | `item.type === "error"` | `AgentErrorEvent` | `{ message: item.message }` |
| `item.completed` | other | *Skip* | |
| `turn.completed` | — | Store for enrichment | Extract `usage` field |
| `turn.failed` | — | `AgentErrorEvent` | `{ message, code: "turn_failed" }` |
| `error` | — | `AgentErrorEvent` | `{ message }` |

**Text streaming — delta computation:**

The `item.updated` event for `agent_message` contains the full accumulated text so far (not just the new delta). To emit incremental `AgentTextEvent`s:

```typescript
// Instance state:
private lastEmittedTextLength = 0;

// In item.updated handler for agent_message:
const fullText = /* extract text from item content */;
const delta = fullText.substring(this.lastEmittedTextLength);
this.lastEmittedTextLength = fullText.length;
if (delta) {
  this.accumulatedText += delta;
  return { type: "text", text: delta };
}
return null;
```

Reset `lastEmittedTextLength` to 0 on each new `item.started` for `agent_message`.

**Tool ID generation**: Codex items have SDK-native IDs. Extract from `item.id` field. If not present, generate with `codex-item-${counter}`.

**Stateful fields** (instance-level, reset per `execute()` call):
- `currentSessionId: string | undefined` — from `thread.started` event's `thread_id`
- `accumulatedText: string` — concatenated text from message deltas for `AgentRunResult.text`
- `lastEmittedTextLength: number` — for delta computation within a message item
- `lastUsage: AgentUsage | undefined` — from `turn.completed` event's `usage` field
- `currentToolId: string | undefined` — tracks the active tool item ID for correlating `item.started` → `item.completed`

**Usage extraction** (from `turn.completed` event's `usage` field):

```typescript
// turn.completed.usage structure:
{
  input_tokens: number;
  cached_input_tokens: number;
  output_tokens: number;
}
```

Map to `AgentUsage`:
- `inputTokens` ← `usage.input_tokens`
- `outputTokens` ← `usage.output_tokens`
- `cacheReadTokens` ← `usage.cached_input_tokens` (when > 0)

#### `buildEnv(params: AgentExecuteParams): Record<string, string>`

Return environment variable overrides for the Codex subprocess.

**Cross-contamination prevention**: Codex strips `ANTHROPIC_API_KEY` from its subprocess environment to prevent accidental cross-provider auth leakage. Implement this in `buildEnv()`:

```typescript
protected buildEnv(_params: AgentExecuteParams): Record<string, string> {
  return {
    ANTHROPIC_API_KEY: "",  // Prevent cross-provider leakage
  };
}
```

Auth credentials (`OPENAI_API_KEY`) are passed through `params.env` by the caller, not hardcoded in the runtime.

#### Done event enrichment

Same pattern as Claude/Gemini: intercept the `done` event and enrich `AgentRunResult` with accumulated state.

**Result metadata mapping** (from accumulated state → `AgentRunResult`):
- `text` ← accumulated from all `agent_message` delta events
- `sessionId` ← from `thread.started` event's `thread_id`
- `usage` ← from `turn.completed` event's `usage` (last one, if multiple turns)

Note: Codex does not report cost, API duration, or stop reason in its NDJSON output.

### File: `src/middleware/runtimes/codex.test.ts`

Unit tests following the same testable-subclass pattern.

1. **Argument construction** (6+ test cases):
   - New session: `["exec", "--json", "--color", "never", "<prompt>"]`
   - Session resume: `["exec", "resume", "<session-id>", "--json", "--color", "never"]` — no prompt
   - No session: no `resume` sub-subcommand
   - Verify prompt is excluded on resume
   - Verify `--color never` always present
   - Verify `exec` always first arg

2. **Event extraction** (12+ test cases):
   - `thread.started` → skip (but thread_id captured as session ID)
   - `turn.started` → skip
   - `item.started` + `command_execution` → `AgentToolUseEvent`
   - `item.started` + `mcp_tool_call` → `AgentToolUseEvent` with tool name and arguments
   - `item.started` + `agent_message` → skip
   - `item.updated` + `agent_message` → `AgentTextEvent` with delta from accumulated text
   - `item.updated` + `agent_message` (multiple) → correct incremental deltas
   - `item.completed` + `command_execution` → `AgentToolResultEvent` with exit_code check
   - `item.completed` + `mcp_tool_call` → `AgentToolResultEvent`
   - `item.completed` + `error` → `AgentErrorEvent`
   - `turn.completed` → stores usage, returns null
   - `turn.failed` → `AgentErrorEvent`
   - `error` → `AgentErrorEvent`
   - Unknown event type → skip

3. **Environment construction** (3+ test cases):
   - Strips `ANTHROPIC_API_KEY` (cross-contamination prevention)
   - Does not inject `OPENAI_API_KEY` (caller responsibility)

4. **MCP config file management** (4+ test cases):
   - TOML file created when `mcpServers` has entries
   - Correct TOML structure (`[mcp_servers.<name>]` sections)
   - `command` array correctly formed from `McpServerConfig.command` + `args`
   - Cleanup on teardown

5. **`supportsStdinPrompt`** (1 test case):
   - Returns `false`

6. **Done event enrichment** (3+ test cases):
   - Enriches with accumulated text, session ID from thread_id, usage
   - Handles missing usage gracefully
   - Handles multiple turns (uses last turn's usage)

## Acceptance Criteria

- [ ] `src/middleware/runtimes/codex.ts` exists and exports `CodexCliRuntime`
- [ ] Class extends `CLIRuntimeBase` and implements all three abstract methods
- [ ] `supportsStdinPrompt` returns `false`
- [ ] `buildArgs()` produces `exec --json --color never <prompt>` for new sessions
- [ ] `buildArgs()` produces `exec resume <id> --json --color never` for resumed sessions (no prompt)
- [ ] `extractEvent()` correctly maps all 8 Codex event types to `AgentEvent` types
- [ ] `extractEvent()` handles the two-level event model: events (thread/turn) + items (8 sub-types)
- [ ] Text streaming uses delta computation from progressive `item.updated` events
- [ ] Tool events: `item.started` → `AgentToolUseEvent`, `item.completed` → `AgentToolResultEvent`
- [ ] Session ID extracted from `thread.started` event's `thread_id`
- [ ] Usage extracted from `turn.completed` event's `usage` field
- [ ] `buildEnv()` strips `ANTHROPIC_API_KEY` (cross-contamination prevention)
- [ ] MCP config written to TOML format in Codex config location when `mcpServers` has entries
- [ ] MCP config file cleaned up after execution (merge-restore pattern)
- [ ] Unit tests cover argument construction, event extraction, environment setup, MCP config, and done enrichment
- [ ] `pnpm build` passes
- [ ] `pnpm test` passes

## Reference

- `src/middleware/runtimes/claude.ts` / `gemini.ts` — reference implementations
- `src/middleware/runtimes/claude.test.ts` / `gemini.test.ts` — reference test files
- `openai/codex` SDK source:
  - `sdk/typescript/src/events.ts` — event type definitions
  - `sdk/typescript/src/items.ts` — item type definitions
- [Official Codex non-interactive docs](https://developers.openai.com/codex/noninteractive/) — `codex exec --json` format
- Historical: `--json` was previously `--experimental-json`; `agent_message` was previously `assistant_message`
- The app-server protocol (`codex app-server`) uses a richer format with slash-delimited events — NOT relevant for CLI `--json` output
- **Known limitation**: Prompt is excluded on session resume (`codex exec resume <id>` does not accept a new prompt)
- **Empirical verification needed**: Exact item field names (e.g., `item.output` vs `item.result`, `item.command` vs `item.input`) and `item.id` availability require capture of actual `codex exec --json` output during implementation

Event Type	Description
`thread.started`	New thread created, contains `thread_id`
`turn.started`	Agent turn begins
`item.started`	Item lifecycle start, contains full `item` data
`item.updated`	Item state update (progressive text for `agent_message`)
`item.completed`	Item lifecycle end, contains final `item` data
`turn.completed`	Agent turn ends, contains `usage`
`turn.failed`	Agent turn failed
`error`	Stream-level error

Item Type	Description	Relevant For
`agent_message`	Text output from agent	`AgentTextEvent`
`command_execution`	Shell command execution	`AgentToolUseEvent` / `AgentToolResultEvent`
`mcp_tool_call`	MCP tool invocation	`AgentToolUseEvent` / `AgentToolResultEvent`
`file_change`	File modification	Skip (or `AgentToolUseEvent` if useful)
`reasoning`	Reasoning/thinking content	Skip
`web_search`	Web search invocation	Skip (or `AgentToolUseEvent` if useful)
`error`	Error item	`AgentErrorEvent`
`todo_list`	Task/todo tracking	Skip

Codex Event	Condition	Maps To	Notes
`thread.started`	—	Skip	Extract `thread_id` as `currentSessionId`
`turn.started`	—	Skip	Turn lifecycle boundary
`item.started`	`item.type === "command_execution"`	`AgentToolUseEvent`	`{ toolName: "command_execution", toolId, input: { command } }`
`item.started`	`item.type === "mcp_tool_call"`	`AgentToolUseEvent`	`{ toolName: item.name, toolId, input: item.arguments }`
`item.started`	other	Skip
`item.updated`	`item.type === "agent_message"`	`AgentTextEvent`	Delta computation (see below)
`item.updated`	other	Skip	Intermediate state
`item.completed`	`item.type === "agent_message"`	`AgentTextEvent`	Emit final delta if any
`item.completed`	`item.type === "command_execution"`	`AgentToolResultEvent`	`{ toolId, output: item.output, isError: item.exit_code !== 0 }`
`item.completed`	`item.type === "mcp_tool_call"`	`AgentToolResultEvent`	`{ toolId, output: item.output, isError: !!item.error }`
`item.completed`	`item.type === "error"`	`AgentErrorEvent`	`{ message: item.message }`
`item.completed`	other	Skip
`turn.completed`	—	Store for enrichment	Extract `usage` field
`turn.failed`	—	`AgentErrorEvent`	`{ message, code: "turn_failed" }`
`error`	—	`AgentErrorEvent`	`{ message }`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement CodexCliRuntime concrete class #12

Context

Architecture

Dependencies

Specification

File: `src/middleware/runtimes/codex.ts`

Constructor

`get supportsStdinPrompt(): boolean`

`buildArgs(params: AgentExecuteParams): string[]`

`execute()` override — state reset + MCP config + done enrichment

`extractEvent(line: string): AgentEvent | null`

`buildEnv(params: AgentExecuteParams): Record<string, string>`

Done event enrichment

File: `src/middleware/runtimes/codex.test.ts`

Acceptance Criteria

Reference

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Arg	Value	When
`exec`	(subcommand)	Always
`--json`	(none)	Always — NDJSON streaming output
`--color`	`never`	Always — prevents ANSI escape codes in output
(positional)	`params.prompt`	Always (new session)

Implement CodexCliRuntime concrete class #12

Description

Context

Architecture

Dependencies

Specification

File: src/middleware/runtimes/codex.ts

Constructor

get supportsStdinPrompt(): boolean

buildArgs(params: AgentExecuteParams): string[]

execute() override — state reset + MCP config + done enrichment

extractEvent(line: string): AgentEvent | null

buildEnv(params: AgentExecuteParams): Record<string, string>

Done event enrichment

File: src/middleware/runtimes/codex.test.ts

Acceptance Criteria

Reference

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

File: `src/middleware/runtimes/codex.ts`

`get supportsStdinPrompt(): boolean`

`buildArgs(params: AgentExecuteParams): string[]`

`execute()` override — state reset + MCP config + done enrichment

`extractEvent(line: string): AgentEvent | null`

`buildEnv(params: AgentExecuteParams): Record<string, string>`

File: `src/middleware/runtimes/codex.test.ts`