-
Notifications
You must be signed in to change notification settings - Fork 2
orchestration: /plan confirm race condition in piped stdin mode — 'Plan canceled' before sub-agent results #2246
Description
Summary
When running the agent in piped stdin mode (echo "..." | cargo run ...) with confirm_before_execute = true, after sending /plan confirm, the DagScheduler cancels tasks with Plan canceled. N/N tasks did not run. — but sub-agent results still appear afterward. This is a race condition between stdin EOF triggering shutdown and background tokio sub-agent tasks completing.
Reproduction
echo -e "/plan research memory systems\n/plan confirm" | \
RUST_LOG=zeph_orchestration=debug cargo run --features full -- --config .local/config/testing.toml \
2>.local/testing/debug/session.logConfig: confirm_before_execute = true, max_tasks = 10, max_parallel = 2
Observed behavior
- Agent receives
/plan research ..., generates a plan, displays it - Agent receives
/plan confirm, starts executing plan - Stdin reaches EOF → agent shutdown begins
- DagScheduler logs
Plan canceled. 2/2 tasks did not run. - Sub-agent results still print to stdout after the "canceled" message
Expected behavior
Either:
- The scheduler should wait for in-flight tasks before honoring the shutdown signal (graceful drain), OR
- The "Plan canceled" message should only appear if no tasks started
Root cause (hypothesis)
DagScheduler receives cancellation via the shutdown token when stdin reaches EOF. It logs "canceled" immediately, but tokio sub-agent tasks that were already spawned continue to completion on the runtime. The shutdown/cancel path does not drain in-flight futures.
Context
- Verified in CI-214 with PR feat(orchestration): AdaptOrch topology-routing and Plan-Execute-Verify-Replan (#2219, #2202) #2235 (AdaptOrch topology routing)
- Only affects piped stdin mode — interactive sessions (TTY) are unaffected since shutdown only triggers on explicit quit
- Related: feat(orchestration): Phase 6 — TUI integration #1241 (confirm_before_execute does not store plan in pending_graph)
Fix direction
In DagScheduler::run() (or shutdown handler), after receiving cancellation, await all in-flight task futures before logging "canceled" and returning. Alternatively, only log "canceled" for tasks that were never started (not in-flight).