feat(orchestration): topology-aware scheduling and GAP parallel annotations#2183
Merged
feat(orchestration): topology-aware scheduling and GAP parallel annotations#2183
Conversation
…ations (#1840, #2172) Phase 1 of two research issues: (edge count, longest path, fan-out ratio) and classifies it as AllParallel, LinearChain, FanOut, or Mixed. When topology_selection=true in OrchestrationConfig, DagScheduler uses the topology hint to override max_parallel (capped at user config, never below 1). Also applied in resume_from() to preserve classification across suspension. (Parallel/Sequential) annotations per task node. Planner prompt now requests structured parallel/sequential markers; convert_response parses typed ExecutionMode with Parallel fallback on invalid/missing values. DagScheduler respects Sequential mode by serializing sequential tasks globally (at most one runs at a time), leaving parallel tasks unaffected. Both features are disabled by default (topology_selection = false). +27 tests (6504 -> 6531), all passing.
129b262 to
8043d35
Compare
bug-ops
added a commit
that referenced
this pull request
Mar 27, 2026
#2172) Follow-up to #2183 per multi-model design principle: LlmPlanner now accepts a dedicated provider reference instead of always using the primary provider. Config field `planner_provider = "fast"` in `[orchestration]` references a `[[llm.providers]]` entry by name; empty string falls back to the primary provider. Field replaces the dead `planner_model: Option<String>` (never wired, pre-v1.0.0 removal). `build_planner_provider()` in bootstrap resolves the named provider and passes it to `LlmPlanner::new()` at all three wiring sites (daemon, runner, acp). `OrchestrationState.planner_provider` holds the optional resolved provider, scoped to orchestration rather than the global ProviderState. `migrate_planner_model_to_provider()` auto-migration step comments out any existing `planner_model` value with a MIGRATED marker so users know to supply a provider name instead of a model name. Also adds latency-threshold doc comment on `TopologyClassifier::suggest_max_parallel` (parallel scheduling overhead only pays off when individual tool calls take >= 500ms; for LLM API calls this is always satisfied; for local tools such as file reads, consider keeping topology_selection disabled). +50 tests (6504 -> 6554).
4 tasks
bug-ops
added a commit
that referenced
this pull request
Mar 27, 2026
#2184) * feat(orchestration): add planner_provider field to OrchestrationConfig (#2172) Follow-up to #2183 per multi-model design principle: LlmPlanner now accepts a dedicated provider reference instead of always using the primary provider. Config field `planner_provider = "fast"` in `[orchestration]` references a `[[llm.providers]]` entry by name; empty string falls back to the primary provider. Field replaces the dead `planner_model: Option<String>` (never wired, pre-v1.0.0 removal). `build_planner_provider()` in bootstrap resolves the named provider and passes it to `LlmPlanner::new()` at all three wiring sites (daemon, runner, acp). `OrchestrationState.planner_provider` holds the optional resolved provider, scoped to orchestration rather than the global ProviderState. `migrate_planner_model_to_provider()` auto-migration step comments out any existing `planner_model` value with a MIGRATED marker so users know to supply a provider name instead of a model name. Also adds latency-threshold doc comment on `TopologyClassifier::suggest_max_parallel` (parallel scheduling overhead only pays off when individual tool calls take >= 500ms; for LLM API calls this is always satisfied; for local tools such as file reads, consider keeping topology_selection disabled). +50 tests (6504 -> 6554). * style: fix fmt in migrate.rs test assertion
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements Phase 1 of two research issues in a single cohesive PR:
TopologyClassifierinspectsTaskGraphstructure (edge count, longest path, fan-out ratio) and classifies it asAllParallel,LinearChain,FanOut, orMixed. Whentopology_selection = true,DagScheduleruses the hint to capmax_parallelappropriately — applied in bothnew()andresume_from().LlmPlannerprompt updated to request explicitexecution_mode(parallel/sequential) annotation per task node.DagSchedulerrespectsSequentialmode with global serialization (at most one sequential task at a time), leaving parallel tasks unaffected.Both features default to
off(topology_selection = false).Changes
crates/zeph-orchestration/src/topology.rs— new:Topologyenum,TopologyClassifiercrates/zeph-orchestration/src/graph.rs—ExecutionModeenum,execution_modefield onTaskNode(#[serde(default)]for backward-compat)crates/zeph-orchestration/src/planner.rs—execution_modeannotation in prompt + typedOption<ExecutionMode>onPlannedTaskcrates/zeph-orchestration/src/scheduler.rs— topology classification + sequential dispatchcrates/zeph-config/src/experiment.rs—topology_selection: boolfield onOrchestrationConfigTest plan
TopologyClassifier: all 4 classes, edge cases (empty, single-node,max_parallel=1)ExecutionModeserde: roundtrip + missing-field backward-compatresume_frompreserves classification, sequential dispatch serializes correctly while parallel tasks proceedcargo +nightly fmt --check,cargo clippy --all-targets --all-features --workspace -- -D warnings,cargo nextest runall passCloses #1840, closes #2172