Skip to content

feat(orchestration): wire DagScheduler into /plan confirm flow#1458

Merged
bug-ops merged 2 commits intomainfrom
dag-scheduler-confirm-flow
Mar 9, 2026
Merged

feat(orchestration): wire DagScheduler into /plan confirm flow#1458
bug-ops merged 2 commits intomainfrom
dag-scheduler-confirm-flow

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 9, 2026

Summary

Fixes #1434/plan confirm now executes the task graph through DagScheduler before aggregating results.

  • Wire full DagScheduler tick loop into handle_plan_confirm(): tasks execute via SubAgentManager::spawn_for_task(), secret requests are bridged with 120s timeout, results aggregated by LlmAggregator on completion
  • Fix handle_plan_list() — always returned "No recent plans" even with a pending graph
  • Fix handle_plan_retry() — stale Running tasks now reset to Ready with cleared assigned_agent before re-execution
  • Pre-validate graph before moving into DagScheduler constructor to preserve it on failure
  • Truncate task error output to 500 chars in user-facing failure summaries
  • Increment tasks_completed/tasks_failed orchestration metrics on plan completion

Test plan

  • 8 new unit tests added (GAP-1..GAP-7 + 1 for pre-validation): no-manager fallback, no-pending-graph, completed/failed graph paths, plan list with/without graph, retry Running→Ready reset
  • cargo nextest run --workspace --features full --lib --bins — 4886 passed, 0 failures
  • cargo clippy --workspace --features full -- -D warnings — clean
  • cargo +nightly fmt --check — clean

Follow-up issues filed

- Replace stub aggregation in handle_plan_confirm with full DagScheduler
  tick loop: Spawn -> spawn_for_task(), Cancel -> manager.cancel(),
  Done -> LlmAggregator::aggregate()
- Extract run_scheduler_loop(), process_pending_secret_requests(),
  finalize_plan_execution() helpers for clarity
- Fix handle_plan_list to show pending graph summary instead of
  always returning "No recent plans"
- Fix handle_plan_retry to reset stale Running tasks to Ready and
  clear assigned_agent before re-execution
- Add pre-validation before DagScheduler constructor to preserve graph
  on predictable failures (empty tasks, terminal status)
- Use tokio::select! with 120s timeout on secret request bridging to
  avoid blocking the tick loop indefinitely
- Add spawn_counter for accurate sequential progress messages
- Truncate task error output to 500 chars in failure summary
- Increment tasks_completed/tasks_failed orchestration metrics
- Add 8 unit tests covering no-manager fallback, no-pending-graph,
  completed/failed graph paths, plan list, and retry reset

Fixes: #1434
Related: #1454, #1455, #1456, #1457 (follow-up issues filed)
@github-actions github-actions bot added documentation Improvements or additions to documentation llm zeph-llm crate (Ollama, Claude) channels zeph-channels crate (Telegram) rust Rust code changes core zeph-core crate enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 9, 2026
@github-actions github-actions bot removed llm zeph-llm crate (Ollama, Claude) channels zeph-channels crate (Telegram) labels Mar 9, 2026
@bug-ops bug-ops merged commit c70bdb4 into main Mar 9, 2026
18 checks passed
@bug-ops bug-ops deleted the dag-scheduler-confirm-flow branch March 9, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation enhancement New feature or request rust Rust code changes size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wire DagScheduler execution into /plan confirm flow

1 participant