fix(tui_app_server): preserve transcript events under backpressure by fcoury · Pull Request #15759 · openai/codex

fcoury · 2026-03-25T11:29:35Z

TL;DR

When running codex with -c features.tui_app_server=true we see corruption when streaming large amounts of data. This PR marks other event types as critical by making them must-deliver.

Problem

When the TUI consumer falls behind the app-server event stream, the bounded mpsc channel fills up and the forwarding layer drops events via try_send. Previously only TurnCompleted was marked as must-deliver. Streamed assistant text (AgentMessageDelta) and the authoritative final item (ItemCompleted) were treated as droppable — the same as ephemeral command output deltas. Because the TUI renders markdown incrementally from these deltas, dropping any of them produces permanently corrupted or incomplete paragraphs that persist for the rest of the session.

Mental model

The app-server event stream has two tiers of importance:

Lossless (transcript + terminal): Events that form the authoritative record of what the assistant said or that signal turn lifecycle transitions. Losing any of these corrupts the visible output or leaves surfaces waiting forever. These are: AgentMessageDelta, PlanDelta, ReasoningSummaryTextDelta, ReasoningTextDelta, ItemCompleted, and TurnCompleted.
Best-effort (everything else): Ephemeral status events like CommandExecutionOutputDelta and progress notifications. Dropping these under load causes cosmetic gaps but no permanent corruption.

The forwarding layer uses try_send for best-effort events (dropping on backpressure) and blocking send().await for lossless events (applying back-pressure to the producer until the consumer catches up).

Non-goals

Eliminating backpressure entirely. The bounded queue is intentional; this change only widens the set of events that survive it.
Changing the event protocol or adding new notification types.
Addressing root causes of consumer slowness (e.g. TUI render cost).

Tradeoffs

Blocking on transcript events means a slow consumer can now stall the producer for the duration of those events. This is acceptable because: (a) the alternative is permanently broken output, which is worse; (b) the consumer already had to keep up with TurnCompleted blocking sends; and (c) transcript events arrive at model-output speed, not burst speed, so sustained saturation is unlikely in practice.

Architecture

Two parallel changes, one per transport:

In-process path (lib.rs): The inline forwarding logic was extracted into forward_in_process_event, a standalone async function that encapsulates the lag-marker / must-deliver / try-send decision tree. The worker loop now delegates to it. A new server_notification_requires_delivery function (shared pub(crate)) centralizes the notification classification.
Remote path (remote.rs): The local event_requires_delivery now delegates to the same shared server_notification_requires_delivery, keeping both transports in sync.

Observability

No new metrics or log lines. The existing warn! on event drops continues to fire for best-effort events. Lossless events that block will not produce a log line (they simply wait).

Tests

event_requires_delivery_marks_transcript_and_terminal_events: unit test confirming the expanded classification covers AgentMessageDelta, ItemCompleted, TurnCompleted, and excludes CommandExecutionOutputDelta and Lagged.
forward_in_process_event_preserves_transcript_notifications_under_backpressure: integration-style test that fills a capacity-1 channel, verifies a best-effort event is dropped (skipped count increments), then sends lossless transcript events and confirms they all arrive in order with the correct lag marker preceding them.
remote_backpressure_preserves_transcript_notifications: end-to-end test over a real websocket that verifies the remote transport preserves transcript events under the same backpressure scenario.
event_requires_delivery_marks_transcript_and_disconnect_events (remote): unit test confirming the remote-side classification covers transcript events and Disconnected.

Widen the set of must-deliver notifications to include all transcript events (AgentMessageDelta, ItemCompleted, PlanDelta, reasoning deltas) so the TUI receives lossless assistant text even when the event queue is saturated. Extract backpressure forwarding into a standalone function for testability.

Add test helpers, expand the event_requires_delivery unit test to cover transcript variants, and add integration tests proving lossless delivery under backpressure for both in-process and remote transports.

Add rustdoc to server_notification_requires_delivery, ForwardEventResult, and forward_in_process_event explaining the lossless/best-effort event classification and blocking semantics.

Copilot

Pull request overview

This PR updates the app-server-client event forwarding logic to ensure transcript-critical server notifications are preserved under consumer backpressure (instead of being dropped), preventing corrupted/incomplete incremental markdown rendering in the TUI.

Changes:

Centralizes the “must-deliver” classification for ServerNotification into server_notification_requires_delivery, shared by both in-process and remote transports.
Refactors the in-process forwarding decision tree into forward_in_process_event, ensuring lossless transcript/terminal events block on send().await while best-effort events still use try_send.
Adds/updates tests to validate ordering and preservation of transcript events under bounded-channel backpressure for both in-process and websocket (remote) paths.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`codex-rs/app-server-client/src/lib.rs`	Adds shared notification classification + extracts in-process forwarding logic to preserve transcript/terminal events under backpressure; expands/updates tests.
`codex-rs/app-server-client/src/remote.rs`	Switches remote delivery classification to the shared helper and adds a unit test for transcript + disconnect delivery requirements.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

etraut-openai

Good analysis and fix.

Relax the remote websocket backpressure test so it asserts the lossless contract instead of a brittle fixed slot for `Lagged`. The test now accepts either a lag marker or the surviving best-effort `stdout-2` event while still requiring transcript and completion notifications to arrive in order. This matches the remote transport behavior more closely under CI scheduling variance, where websocket task timing can change whether a best-effort event is dropped before the consumer starts draining the channel.

fcoury added 3 commits March 24, 2026 22:30

test(app-server-client): add backpressure regression coverage

f45919b

Add test helpers, expand the event_requires_delivery unit test to cover transcript variants, and add integration tests proving lossless delivery under backpressure for both in-process and remote transports.

docs(app-server-client): document backpressure forwarding internals

e189c1c

Add rustdoc to server_notification_requires_delivery, ForwardEventResult, and forward_in_process_event explaining the lossless/best-effort event classification and blocking semantics.

Copilot AI review requested due to automatic review settings March 25, 2026 11:29

Copilot started reviewing on behalf of fcoury March 25, 2026 11:30 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

etraut-openai approved these changes Mar 25, 2026

View reviewed changes

etraut-openai and others added 2 commits March 25, 2026 11:01

Merge branch 'main' into fix/paragraph-render-issues

24b166c

etraut-openai merged commit e9996ec into openai:main Mar 25, 2026
56 of 63 checks passed

github-actions bot locked and limited conversation to collaborators Mar 25, 2026

fcoury deleted the fix/paragraph-render-issues branch March 25, 2026 19:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tui_app_server): preserve transcript events under backpressure#15759

fix(tui_app_server): preserve transcript events under backpressure#15759
etraut-openai merged 5 commits intoopenai:mainfrom
fcoury:fix/paragraph-render-issues

fcoury commented Mar 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

etraut-openai left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fcoury commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Problem

Mental model

Non-goals

Tradeoffs

Architecture

Observability

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

etraut-openai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fcoury commented Mar 25, 2026 •

edited

Loading