Summary
OpenClaw hit a stuck Discord reply lane after Codex app-server accepted a turn/start request and then failed to deliver the terminal turn/completed or abort notification OpenClaw was waiting for. The active embedded-run handle stayed registered, so diagnostics correctly treated the session as an active run and did not release the lane.
Observed stuck session key: agent:main:discord:channel:1456744319972282449
Observed run/session id: b5e075cc-bf19-4f91-83e3-79e32f338bb5
OpenClaw workaround commit: 54e6e3d7daf5d0d857edf756b35628a29d11c7f5
What OpenClaw did
- Added a Codex app-server terminal-progress watchdog after
turn/start returns an in-progress turn.
- The watchdog resets on Codex app-server notifications and request/response activity.
- If a Codex turn remains silent before any terminal event, OpenClaw marks the attempt timed out, sends best-effort
turn/interrupt, resolves the attempt, clears the active embedded-run handle, and releases the session lane.
- Added regression coverage for the accepted-but-silent turn case in
extensions/codex/src/app-server/run-attempt.test.ts.
- Documented the behavior in the agent-loop and queue docs.
What Codex should fix
turn/start should not accept work unless the app-server listener/subscription path is healthy for that conversation.
- App-server should guarantee a terminal notification (
turn/completed, turn/aborted, or an explicit terminal error event) for every accepted turn/start, even when the underlying Responses/SSE stream idles or fails.
- App-server should expose enough read-back state for clients to reconcile an accepted turn after listener failure, for example via
thread/read turn status or a dedicated active-turn status endpoint.
- Listener failure and SSE idle timeout paths should be surfaced as terminal app-server events, not just internal logs.
Evidence from code read
Codex app-server currently treats turn/start as accepted once it submits Op::UserInput; the turn lifecycle notifications depend on the separate listener path reading conversation.next_event and translating TurnStarted/TurnComplete/TurnAborted into app-server events. That creates a gap where OpenClaw can receive turn/start success but never receive the terminal notification it needs to release the channel lane.
Relevant Codex paths inspected locally:
codex-rs/app-server/src/codex_message_processor.rs: turn/start, thread/start, thread/resume, listener loop.
codex-rs/app-server/src/bespoke_event_handling.rs: mapping core turn lifecycle events into app-server notifications.
codex-rs/core/src/tasks/regular.rs and codex-rs/core/src/tasks/mod.rs: core TurnStarted, TurnComplete, and TurnAborted emission.
Verification
pnpm test extensions/codex/src/app-server/run-attempt.test.ts passed locally: 40 tests.
pnpm exec oxfmt --check --threads=1 extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts docs/concepts/agent-loop.md docs/concepts/queue.md passed locally.
git diff --check origin/main...HEAD passed after rebase.
- Testbox
pnpm check:changed passed for lanes extensions, extensionTests, and docs.
Summary
OpenClaw hit a stuck Discord reply lane after Codex app-server accepted a
turn/startrequest and then failed to deliver the terminalturn/completedor abort notification OpenClaw was waiting for. The active embedded-run handle stayed registered, so diagnostics correctly treated the session as an active run and did not release the lane.Observed stuck session key:
agent:main:discord:channel:1456744319972282449Observed run/session id:
b5e075cc-bf19-4f91-83e3-79e32f338bb5OpenClaw workaround commit:
54e6e3d7daf5d0d857edf756b35628a29d11c7f5What OpenClaw did
turn/startreturns an in-progress turn.turn/interrupt, resolves the attempt, clears the active embedded-run handle, and releases the session lane.extensions/codex/src/app-server/run-attempt.test.ts.What Codex should fix
turn/startshould not accept work unless the app-server listener/subscription path is healthy for that conversation.turn/completed,turn/aborted, or an explicit terminal error event) for every acceptedturn/start, even when the underlying Responses/SSE stream idles or fails.thread/readturn status or a dedicated active-turn status endpoint.Evidence from code read
Codex app-server currently treats
turn/startas accepted once it submitsOp::UserInput; the turn lifecycle notifications depend on the separate listener path readingconversation.next_eventand translatingTurnStarted/TurnComplete/TurnAbortedinto app-server events. That creates a gap where OpenClaw can receiveturn/startsuccess but never receive the terminal notification it needs to release the channel lane.Relevant Codex paths inspected locally:
codex-rs/app-server/src/codex_message_processor.rs:turn/start,thread/start,thread/resume, listener loop.codex-rs/app-server/src/bespoke_event_handling.rs: mapping core turn lifecycle events into app-server notifications.codex-rs/core/src/tasks/regular.rsandcodex-rs/core/src/tasks/mod.rs: coreTurnStarted,TurnComplete, andTurnAbortedemission.Verification
pnpm test extensions/codex/src/app-server/run-attempt.test.tspassed locally: 40 tests.pnpm exec oxfmt --check --threads=1 extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts docs/concepts/agent-loop.md docs/concepts/queue.mdpassed locally.git diff --check origin/main...HEADpassed after rebase.pnpm check:changedpassed for lanesextensions,extensionTests, anddocs.