-
Notifications
You must be signed in to change notification settings - Fork 2
feat: Task Orchestration — TODO lists and DAG dependency graphs for sub-agents #1235
Copy link
Copy link
Closed
7 / 77 of 7 issues completedClosed
7 / 77 of 7 issues completed
Copy link
Labels
enhancementNew feature or requestNew feature or requestepicMilestone-level tracking issueMilestone-level tracking issue
Description
Summary
Add a task orchestration layer to Zeph's sub-agent system that enables:
- Task decomposition: LLM breaks complex goals into structured TODO lists
- DAG dependency modeling: Tasks have explicit dependencies, forming a directed acyclic graph
- Parallel scheduling: Independent tasks execute concurrently across sub-agents
- Capability-based routing: Automatic agent selection based on task requirements
- Failure handling: Retry, fallback, skip, abort strategies
- Result aggregation: Collecting and synthesizing outputs from completed sub-tasks
Child issues
| Phase | Issue | Title | Blocked by |
|---|---|---|---|
| 1 | #1236 | TaskGraph core types, DAG operations, persistence | — |
| 2 | #1237 | LLM Planner (goal decomposition) | #1236 |
| 3 | #1238 | DAG Scheduler + SubAgentManager integration | #1236 |
| 4 | #1239 | CLI commands + agent loop integration | #1237, #1238 |
| 5 | #1240 | Aggregator + resume/retry | #1239 |
| 6 | #1241 | TUI integration | #1240 |
| 7 | #1242 | Documentation + full feature flag | #1240 |
Phase dependency graph
Phase 1 (#1236) Types + DAG + Persistence
| \
v v
Phase 2 Phase 3
(#1237) (#1238)
Planner Scheduler + Router
\ /
v v
Phase 4 (#1239) CLI + Agent Loop
|
v
Phase 5 (#1240) Aggregator + Resume/Retry
/ \
v v
Phase 6 Phase 7
(#1241) (#1242)
TUI Docs
Phases 2 and 3 can run in parallel. Phases 6 and 7 can run in parallel.
Cross-Epic Dependencies
With #1195 (Untrusted Content Isolation)
| Orchestration | Security | Relationship | Type |
|---|---|---|---|
| #1238 (scheduler, cross-task context injection) | #1206 (tool call argument validation) | Cross-task <completed-dependencies> block injects sub-agent output into downstream task prompts — new injection vector not covered by existing tool result sanitization (#1200). Validation guard should cover orchestrated task prompts. |
Should coordinate |
| #1240 (LLM aggregator) | #1204 (quarantined summarizer) | Both implement isolated LLM call for synthesis/summarization. First to implement defines the abstraction pattern — avoid duplication. | Shared pattern |
| #1241 (TUI plan view) | #1208 (TUI security indicators) | Both add new TUI widgets to crates/zeph-tui/src/widgets/. Layout coordination needed to avoid conflicts. |
Layout coordination |
| #1236 (SqliteGraphStore persistence) | #1207 (memory write poisoning guard) | SqliteGraphStore in zeph-memory is a new write path into the same SQLite database. Poisoning guard must cover task graph writes (malicious sub-agent could inject crafted output into TaskResult). |
Guard must cover |
With #1222 (Graph Memory)
| Orchestration | Graph Memory | Relationship | Type |
|---|---|---|---|
| #1236 (migration 021_task_graphs.sql) | #1224 (migration 021_graph_entities.sql) | Migration number conflict: both plan migration 021_*. One must be renumbered. If #1224 lands first, orchestration migration becomes 022_task_graphs.sql. |
Ordering conflict |
| #1236 (SqliteGraphStore in zeph-memory) | #1224 (GraphStore in zeph-memory) | Both add new SQLite store modules to crates/zeph-memory/src/sqlite/. Different tables, same pattern — coordinate naming and feature gating to avoid confusion (graph_store.rs vs graph_memory_store.rs or similar). |
Naming coordination |
| #1240 (LLM aggregator) | #1228 (community summaries) | Both do LLM-based summarization/synthesis. Shared pattern with #1204 (quarantined summarizer). First to implement sets the abstraction. | Shared pattern |
| #1241 (TUI plan view) | #1229 (graph memory TUI /graph commands) | Both add new TUI widgets/commands. Layout coordination needed. | Layout coordination |
Recommended ordering
- Migration numbering: Whichever of feat(memory): graph memory schema, core types, and CRUD operations #1224 or feat(orchestration): Phase 1 — TaskGraph core types, DAG operations, persistence #1236 lands first takes
021_. The other takes022_. Track in PR review. - Quarantined summarizer pattern ([SEC-3.1] QuarantinedSummarizer for high-risk sources #1204): Should land before both feat(orchestration): Phase 5 — Aggregator + resume/retry #1240 (aggregator) and feat(memory): community detection with label propagation #1228 (community summaries) to define the shared isolated-LLM-call abstraction.
- Memory write poisoning ([SEC-4.3] Memory write poisoning guard #1207): Should land before or alongside feat(memory): LLM-powered entity and relation extraction pipeline #1225 (graph extraction) AND feat(orchestration): Phase 1 — TaskGraph core types, DAG operations, persistence #1236 (task graph persistence) to cover both new write paths.
- Cross-task sanitization: feat(orchestration): Phase 3 — DAG Scheduler + SubAgentManager integration #1238 should document that
<completed-dependencies>injection is a new untrusted-content boundary; [SEC-4.2] Tool call argument validation guard #1206 should be updated to cover it.
Architecture
Feature-gated under orchestration (optional, not default).
Module: crates/zeph-core/src/orchestration/ — coordination layer over existing SubAgentManager
Persistence: GraphStore trait in zeph-core, SqliteGraphStore impl in zeph-memory (ADR-021)
Scheduler: Command pattern (SchedulerAction enum) — no long-lived &mut SubAgentManager borrow (ADR-026)
Events: Aggregated mpsc::Sender<TaskEvent> channel for agent completion notifications (ADR-027)
Core types
TaskGraph,TaskNode,TaskId(u32),TaskStatus,GraphStatus,TaskResultFailureStrategy: Abort | Retry | Skip | AskGraphId(Uuid): graph identifier
Traits
Planner— LLM-based goal decompositionAgentRouter— capability-based agent selection (rule-based MVP, semantic future)Aggregator— result synthesis via LLMGraphStore— persistence (SQLite in zeph-memory)
Config: [orchestration]
enabled(false),max_tasks(20),max_parallel(4)default_failure_strategy(abort),task_timeout_secs(600)planner_model,confirm_before_execute(true),dependency_context_budget(16384)
Design documents
- Research:
.local/plan/subagent-task-orchestration-research.md - Architecture (v2):
.local/plan/task-orchestration-architecture.md - Critique:
.local/plan/task-orchestration-critique.md
Estimated tests: ~105 across all phases
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestepicMilestone-level tracking issueMilestone-level tracking issue