Skip to content

feat: Task Orchestration — TODO lists and DAG dependency graphs for sub-agents #1235

@bug-ops

Description

@bug-ops

Summary

Add a task orchestration layer to Zeph's sub-agent system that enables:

  • Task decomposition: LLM breaks complex goals into structured TODO lists
  • DAG dependency modeling: Tasks have explicit dependencies, forming a directed acyclic graph
  • Parallel scheduling: Independent tasks execute concurrently across sub-agents
  • Capability-based routing: Automatic agent selection based on task requirements
  • Failure handling: Retry, fallback, skip, abort strategies
  • Result aggregation: Collecting and synthesizing outputs from completed sub-tasks

Child issues

Phase Issue Title Blocked by
1 #1236 TaskGraph core types, DAG operations, persistence
2 #1237 LLM Planner (goal decomposition) #1236
3 #1238 DAG Scheduler + SubAgentManager integration #1236
4 #1239 CLI commands + agent loop integration #1237, #1238
5 #1240 Aggregator + resume/retry #1239
6 #1241 TUI integration #1240
7 #1242 Documentation + full feature flag #1240

Phase dependency graph

Phase 1 (#1236) Types + DAG + Persistence
    |         \
    v          v
Phase 2       Phase 3
(#1237)       (#1238)
Planner       Scheduler + Router
    \         /
     v       v
    Phase 4 (#1239) CLI + Agent Loop
        |
        v
    Phase 5 (#1240) Aggregator + Resume/Retry
       / \
      v   v
Phase 6   Phase 7
(#1241)   (#1242)
TUI       Docs

Phases 2 and 3 can run in parallel. Phases 6 and 7 can run in parallel.

Cross-Epic Dependencies

With #1195 (Untrusted Content Isolation)

Orchestration Security Relationship Type
#1238 (scheduler, cross-task context injection) #1206 (tool call argument validation) Cross-task <completed-dependencies> block injects sub-agent output into downstream task prompts — new injection vector not covered by existing tool result sanitization (#1200). Validation guard should cover orchestrated task prompts. Should coordinate
#1240 (LLM aggregator) #1204 (quarantined summarizer) Both implement isolated LLM call for synthesis/summarization. First to implement defines the abstraction pattern — avoid duplication. Shared pattern
#1241 (TUI plan view) #1208 (TUI security indicators) Both add new TUI widgets to crates/zeph-tui/src/widgets/. Layout coordination needed to avoid conflicts. Layout coordination
#1236 (SqliteGraphStore persistence) #1207 (memory write poisoning guard) SqliteGraphStore in zeph-memory is a new write path into the same SQLite database. Poisoning guard must cover task graph writes (malicious sub-agent could inject crafted output into TaskResult). Guard must cover

With #1222 (Graph Memory)

Orchestration Graph Memory Relationship Type
#1236 (migration 021_task_graphs.sql) #1224 (migration 021_graph_entities.sql) Migration number conflict: both plan migration 021_*. One must be renumbered. If #1224 lands first, orchestration migration becomes 022_task_graphs.sql. Ordering conflict
#1236 (SqliteGraphStore in zeph-memory) #1224 (GraphStore in zeph-memory) Both add new SQLite store modules to crates/zeph-memory/src/sqlite/. Different tables, same pattern — coordinate naming and feature gating to avoid confusion (graph_store.rs vs graph_memory_store.rs or similar). Naming coordination
#1240 (LLM aggregator) #1228 (community summaries) Both do LLM-based summarization/synthesis. Shared pattern with #1204 (quarantined summarizer). First to implement sets the abstraction. Shared pattern
#1241 (TUI plan view) #1229 (graph memory TUI /graph commands) Both add new TUI widgets/commands. Layout coordination needed. Layout coordination

Recommended ordering

  1. Migration numbering: Whichever of feat(memory): graph memory schema, core types, and CRUD operations #1224 or feat(orchestration): Phase 1 — TaskGraph core types, DAG operations, persistence #1236 lands first takes 021_. The other takes 022_. Track in PR review.
  2. Quarantined summarizer pattern ([SEC-3.1] QuarantinedSummarizer for high-risk sources #1204): Should land before both feat(orchestration): Phase 5 — Aggregator + resume/retry #1240 (aggregator) and feat(memory): community detection with label propagation #1228 (community summaries) to define the shared isolated-LLM-call abstraction.
  3. Memory write poisoning ([SEC-4.3] Memory write poisoning guard #1207): Should land before or alongside feat(memory): LLM-powered entity and relation extraction pipeline #1225 (graph extraction) AND feat(orchestration): Phase 1 — TaskGraph core types, DAG operations, persistence #1236 (task graph persistence) to cover both new write paths.
  4. Cross-task sanitization: feat(orchestration): Phase 3 — DAG Scheduler + SubAgentManager integration #1238 should document that <completed-dependencies> injection is a new untrusted-content boundary; [SEC-4.2] Tool call argument validation guard #1206 should be updated to cover it.

Architecture

Feature-gated under orchestration (optional, not default).

Module: crates/zeph-core/src/orchestration/ — coordination layer over existing SubAgentManager
Persistence: GraphStore trait in zeph-core, SqliteGraphStore impl in zeph-memory (ADR-021)
Scheduler: Command pattern (SchedulerAction enum) — no long-lived &mut SubAgentManager borrow (ADR-026)
Events: Aggregated mpsc::Sender<TaskEvent> channel for agent completion notifications (ADR-027)

Core types

  • TaskGraph, TaskNode, TaskId(u32), TaskStatus, GraphStatus, TaskResult
  • FailureStrategy: Abort | Retry | Skip | Ask
  • GraphId(Uuid): graph identifier

Traits

  • Planner — LLM-based goal decomposition
  • AgentRouter — capability-based agent selection (rule-based MVP, semantic future)
  • Aggregator — result synthesis via LLM
  • GraphStore — persistence (SQLite in zeph-memory)

Config: [orchestration]

  • enabled (false), max_tasks (20), max_parallel (4)
  • default_failure_strategy (abort), task_timeout_secs (600)
  • planner_model, confirm_before_execute (true), dependency_context_budget (16384)

Design documents

  • Research: .local/plan/subagent-task-orchestration-research.md
  • Architecture (v2): .local/plan/task-orchestration-architecture.md
  • Critique: .local/plan/task-orchestration-critique.md

Estimated tests: ~105 across all phases

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestepicMilestone-level tracking issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions