Skip to content

feat(orchestration): Phase 3 — DAG Scheduler + SubAgentManager integration #1238

@bug-ops

Description

@bug-ops

Parent: #1235
Blocked by: #1236

Summary

Implement the DAG scheduler that drives parallel task execution through existing SubAgentManager infrastructure.

Branch: feat/m33/orchestration-scheduler

Deliverables

New files

  • crates/zeph-core/src/orchestration/scheduler.rsDagScheduler, SchedulerAction, TaskEvent, TaskOutcome
  • crates/zeph-core/src/orchestration/router.rsAgentRouter trait + RuleBasedRouter impl

Modified files

  • crates/zeph-core/src/subagent/manager.rs — add completion_tx: Option<mpsc::Sender<TaskEvent>> to SubAgentHandle, fire on loop termination. Add spawn_for_task() method.

Key design decisions

  • ADR-026: Command pattern — scheduler produces SchedulerAction values, caller executes against manager. No &mut SubAgentManager held across await points.
  • ADR-027: Single mpsc::Sender<TaskEvent> channel for all agent completions (not per-agent watch receivers).
  • Cross-task context: <completed-dependencies> block injected into task prompt with dependency_context_budget (16384 chars total, divided equally across deps).
  • Task timeout: wall-clock monitoring via task_timeout_secs config.
  • Concurrency: respects both max_parallel and SubAgentManager.max_concurrent.

Scheduler tick loop

  1. Drain pending TaskEvents from event_rx
  2. Update task statuses, call ready_tasks() for newly schedulable tasks
  3. Route ready tasks to agents via AgentRouter
  4. Emit SchedulerAction::Spawn for each (up to concurrency limit)
  5. If no running/ready tasks and all terminal → Done

Router fallback chain

  1. task.agent_hint exact match against loaded definitions
  2. Tool requirement matching (future: task keywords → agent tool policy)
  3. First available agent (fallback)

Tests (~20)

  • Linear chain executes sequentially (via mock manager)
  • Independent tasks produce parallel Spawn actions
  • Completion triggers downstream scheduling
  • Failure + Abort produces Cancel actions for all running tasks
  • Failure + Skip propagates correctly through dependents
  • Failure + Retry re-schedules task (resets to Ready)
  • Concurrency limit respected in action generation
  • All tasks completed → Done with Completed status
  • Router: agent_hint exact match
  • Router: fallback to first available
  • Task timeout produces Cancel action
  • Manager spawn rejection (concurrency) keeps task Ready

Dependencies

Blocked by: Phase 1 (#1236)
Can run in parallel with: Phase 2 (#1237)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestorchestrationTask orchestration / DAG scheduling

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions