Skip to content

fix(orchestration): mark non-terminal tasks Canceled on scheduler deadlock (#1879)#1894

Merged
bug-ops merged 4 commits intomainfrom
1879-plan-failed-misleading
Mar 15, 2026
Merged

fix(orchestration): mark non-terminal tasks Canceled on scheduler deadlock (#1879)#1894
bug-ops merged 4 commits intomainfrom
1879-plan-failed-misleading

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 15, 2026

Summary

  • Scheduler deadlock branch now marks all non-terminal tasks TaskStatus::Canceled (mirrors cancel_all() pattern), instead of leaving them in Pending/Skipped
  • format_plan_done_message() distinguishes three cases: pure deadlock → "Plan canceled. N/M tasks did not run.", mixed failure+cancellation → "Plan failed. X/M tasks failed, Y canceled:", normal failure → original message unchanged
  • Added debug_assert!(self.running.is_empty()) to make the deadlock-branch invariant explicit
  • Added tracing::warn! for the impossible empty-failed+empty-canceled edge case

Root cause

When the deadlock branch fired it set GraphStatus::Failed but never updated individual task statuses. The message formatter counted TaskStatus::Failed tasks (zero), producing "Plan failed. 0/N tasks failed:" — correct but misleading.

Test plan

  • cargo +nightly fmt --check — pass
  • cargo clippy --workspace --features full -- -D warnings — pass
  • cargo nextest run --config-file .github/nextest.toml --workspace --features full --lib --bins — 5975/5975 passed
  • Existing tests: test_deadlock_marks_non_terminal_tasks_canceled, test_deadlock_not_triggered_when_task_running (scheduler.rs)
  • New agent-level regression tests: finalize_plan_execution_deadlock_emits_cancelled_message, finalize_plan_execution_mixed_failed_and_cancelled (agent/tests.rs)

Closes #1879

bug-ops added 2 commits March 15, 2026 23:55
…dlock (#1879)

When the scheduler detected a deadlock (no running or ready tasks, graph
not complete), it set GraphStatus::Failed but left individual tasks in
their previous status (Pending/Skipped). The message formatter then
reported "Plan failed. 0/N tasks failed:" — accurate but misleading.

Fix: mirror the cancel_all() pattern — iterate non-terminal tasks and
set them to TaskStatus::Canceled at deadlock time. Update
format_plan_done_message() to distinguish three cases:
- Pure deadlock (0 failed, N canceled): "Plan canceled. N/M tasks did not run."
- Mixed (failed + canceled): "Plan failed. X/M tasks failed, Y canceled:"
- Normal failure (failed only): original message unchanged

Add debug_assert! to make the self.running invariant explicit.
Add tracing::warn! for the impossible empty-both edge case.
@github-actions github-actions bot added documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate bug Something isn't working size/L Large PR (201-500 lines) labels Mar 15, 2026
@bug-ops bug-ops enabled auto-merge (squash) March 15, 2026 23:02
@bug-ops bug-ops merged commit 47e7f39 into main Mar 15, 2026
20 checks passed
@bug-ops bug-ops deleted the 1879-plan-failed-misleading branch March 15, 2026 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working core zeph-core crate documentation Improvements or additions to documentation rust Rust code changes size/L Large PR (201-500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(orchestration): "Plan failed. 0/N tasks failed" misleading message on scheduler deadlock

1 participant