Skip to content

ADR-0003: Async coloring costs in production — evidence from Codex, Symphony, opencode #3

@justrach

Description

@justrach

Summary

This ADR documents the real-world cost of function coloring in production systems, using OpenAI Codex (Rust/Tokio), OpenAI Symphony (Elixir/BEAM), and opencode (TypeScript) as evidence. The findings directly validate ZEP-0002's decision to reject both async fn and Io parameter infection in favor of fiber-based structured concurrency.

This is a language design ADR — it's about what happens when a language forces concurrency into function signatures, not about building any specific application.


Evidence: What coloring costs in practice

Rust async/await (OpenAI Codex)

Codex is a 63k-star Rust CLI. Its core module (codex.rs) has 150+ import lines — half are async runtime plumbing (tokio::sync::*, futures::prelude::*, async_channel::*, tokio_util::sync::CancellationToken, #[async_trait]).

Concrete costs observed:

  1. Trait object workarounds. Codex defines a SessionTask trait for polymorphic task execution. Because Rust's async fn in traits isn't object-safe, every implementation needs #[async_trait] — which boxes the future, adding allocation and indirection. With uncolored functions, this is just a function pointer.

  2. Cancellation token threading. Every async function that needs to be cancellable must accept a CancellationToken parameter and check it at yield points. This is parameter infection by another name — not Io, but CancellationToken. The token must be threaded through run_turn → try_run_turn → stream → tool_exec → permission_check. In a fiber model, cancellation state is per-fiber and checked via checkCancel() with zero parameters.

  3. Shared state ceremony. Mutable state that crosses async boundaries requires Arc<Mutex<T>> or Arc<RwLock<T>>. Codex wraps session state, agent status, and configuration in these. With fibers on a work-stealing pool, fiber-aware locks achieve the same safety without Arc wrapping.

  4. Reference cycle avoidance. Codex uses Weak<ThreadManagerState> in AgentControl to avoid reference cycles between sessions and their parent manager. This is a symptom of Arc-based shared ownership forced by async — fiber-scoped lifetimes eliminate this category of bug.

  5. Startup prewarm complexity. RegularTask has a Mutex<Option<ModelClientSession>> to hold a prewarmed WebSocket session that gets taken once. This pattern exists because async task construction and async task execution are separate steps. With fibers, construction and execution happen in the same function scope.

TypeScript async/await (opencode)

opencode demonstrates coloring in a dynamically typed language:

  1. Full-chain infection. SessionPrompt.loop()resolveTools()execute()ask() — every function is async because the leaf (ask()) might prompt the user. Adding one I/O call to a utility function forces async up the entire call chain.

  2. Event bus as coloring workaround. opencode's typed pub/sub bus exists partly to decouple async producers from async consumers. The bus is itself an async boundary. With blocking channels, this decoupling is free.

Elixir/BEAM (OpenAI Symphony) — the uncolored baseline

Symphony is OpenAI's agent orchestrator in Elixir. Zero function coloring. A function that reads a file, calls an API, or sleeps looks identical whether called from a GenServer, a Task, or inline. The language doesn't distinguish sync from async.

Result: Symphony's codebase has dramatically less ceremony than Codex despite solving a similar problem (agent orchestration with concurrency, cancellation, and isolation). Functions are universally composable.


What this means for ZEP-0002

These three systems demonstrate the same taxonomy of coloring costs:

Cost category Rust (Codex) TypeScript (opencode) Elixir (Symphony) ZEP-0002 (zag)
Function signature infection async fn everywhere async everywhere None None
Cancellation threading CancellationToken param AbortSignal param Process signals checkCancel() (no param)
Trait/interface workarounds #[async_trait] boxing N/A N/A Plain function pointers
Shared state wrapping Arc<Mutex<T>> N/A (single-threaded) N/A (message passing) Fiber-aware locks (no Arc)
Reference cycle risk Weak<T> patterns N/A N/A Scope-based lifetimes

ZEP-0002's fiber model eliminates all five cost categories while preserving the good patterns these systems use (protocol boundaries, bounded concurrency, cooperative cancellation, OS-level sandboxing).

Validation of specific ZEP-0002 decisions

  1. "Concurrency at the call site, not in function types" — Codex proves the alternative (coloring) scales poorly. 150+ imports of async plumbing in one file.
  2. zag.checkCancel() with no parameters — Codex's CancellationToken threading is the exact problem this solves.
  3. Fiber-aware Mutex/RwLock — Codex's Arc<Mutex<T>> wrapping disappears when locks are fiber-aware.
  4. zag.Scope for structured lifetimes — Codex's Weak<T> reference cycle avoidance is unnecessary when fiber scopes define lifetimes.
  5. zag.blockingCall for FFI — Codex uses dedicated threads for subprocess execution (sandbox-exec, codex-linux-sandbox). Same escape hatch.

Proposed ADR-0003 scope

Record as docs/adr/ADR-0003-async-coloring-costs.md:

  • Document the five categories of coloring cost with real-world evidence
  • Reference Codex (Rust), opencode (TypeScript), Symphony (Elixir) as case studies
  • Validate ZEP-0002's specific design decisions against this evidence
  • Note that higher-level patterns (supervision, actor mailboxes, task DAGs) are separate language design concerns for future ZEPs — this ADR only addresses the foundational async model

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions