Skip to content

feat(orchestration): topology-aware scheduling and GAP parallel annotations#2183

Merged
bug-ops merged 1 commit intomainfrom
adaptorch-topology-selection
Mar 26, 2026
Merged

feat(orchestration): topology-aware scheduling and GAP parallel annotations#2183
bug-ops merged 1 commit intomainfrom
adaptorch-topology-selection

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 26, 2026

Summary

Implements Phase 1 of two research issues in a single cohesive PR:

Both features default to off (topology_selection = false).

Changes

  • crates/zeph-orchestration/src/topology.rs — new: Topology enum, TopologyClassifier
  • crates/zeph-orchestration/src/graph.rsExecutionMode enum, execution_mode field on TaskNode (#[serde(default)] for backward-compat)
  • crates/zeph-orchestration/src/planner.rsexecution_mode annotation in prompt + typed Option<ExecutionMode> on PlannedTask
  • crates/zeph-orchestration/src/scheduler.rs — topology classification + sequential dispatch
  • crates/zeph-config/src/experiment.rstopology_selection: bool field on OrchestrationConfig

Test plan

  • +27 tests (6504 → 6531), all passing
  • TopologyClassifier: all 4 classes, edge cases (empty, single-node, max_parallel=1)
  • ExecutionMode serde: roundtrip + missing-field backward-compat
  • Scheduler: topology hint applied, resume_from preserves classification, sequential dispatch serializes correctly while parallel tasks proceed
  • Pre-commit: cargo +nightly fmt --check, cargo clippy --all-targets --all-features --workspace -- -D warnings, cargo nextest run all pass

Closes #1840, closes #2172

@github-actions github-actions bot added enhancement New feature or request documentation Improvements or additions to documentation rust Rust code changes size/XL Extra large PR (500+ lines) labels Mar 26, 2026
@bug-ops bug-ops enabled auto-merge (squash) March 26, 2026 23:29
…ations (#1840, #2172)

Phase 1 of two research issues:

(edge count, longest path, fan-out ratio) and classifies it as
AllParallel, LinearChain, FanOut, or Mixed. When topology_selection=true
in OrchestrationConfig, DagScheduler uses the topology hint to override
max_parallel (capped at user config, never below 1). Also applied in
resume_from() to preserve classification across suspension.

(Parallel/Sequential) annotations per task node. Planner prompt now
requests structured parallel/sequential markers; convert_response parses
typed ExecutionMode with Parallel fallback on invalid/missing values.
DagScheduler respects Sequential mode by serializing sequential tasks
globally (at most one runs at a time), leaving parallel tasks unaffected.

Both features are disabled by default (topology_selection = false).
+27 tests (6504 -> 6531), all passing.
@bug-ops bug-ops force-pushed the adaptorch-topology-selection branch from 129b262 to 8043d35 Compare March 26, 2026 23:34
@bug-ops bug-ops merged commit 50f2460 into main Mar 26, 2026
25 checks passed
@bug-ops bug-ops deleted the adaptorch-topology-selection branch March 26, 2026 23:41
bug-ops added a commit that referenced this pull request Mar 27, 2026
#2172)

Follow-up to #2183 per multi-model design principle: LlmPlanner now
accepts a dedicated provider reference instead of always using the
primary provider.

Config field `planner_provider = "fast"` in `[orchestration]`
references a `[[llm.providers]]` entry by name; empty string falls
back to the primary provider. Field replaces the dead
`planner_model: Option<String>` (never wired, pre-v1.0.0 removal).

`build_planner_provider()` in bootstrap resolves the named provider
and passes it to `LlmPlanner::new()` at all three wiring sites
(daemon, runner, acp). `OrchestrationState.planner_provider` holds
the optional resolved provider, scoped to orchestration rather than
the global ProviderState.

`migrate_planner_model_to_provider()` auto-migration step comments
out any existing `planner_model` value with a MIGRATED marker so
users know to supply a provider name instead of a model name.

Also adds latency-threshold doc comment on
`TopologyClassifier::suggest_max_parallel` (parallel scheduling
overhead only pays off when individual tool calls take >= 500ms;
for LLM API calls this is always satisfied; for local tools such as
file reads, consider keeping topology_selection disabled).

+50 tests (6504 -> 6554).
bug-ops added a commit that referenced this pull request Mar 27, 2026
#2184)

* feat(orchestration): add planner_provider field to OrchestrationConfig (#2172)

Follow-up to #2183 per multi-model design principle: LlmPlanner now
accepts a dedicated provider reference instead of always using the
primary provider.

Config field `planner_provider = "fast"` in `[orchestration]`
references a `[[llm.providers]]` entry by name; empty string falls
back to the primary provider. Field replaces the dead
`planner_model: Option<String>` (never wired, pre-v1.0.0 removal).

`build_planner_provider()` in bootstrap resolves the named provider
and passes it to `LlmPlanner::new()` at all three wiring sites
(daemon, runner, acp). `OrchestrationState.planner_provider` holds
the optional resolved provider, scoped to orchestration rather than
the global ProviderState.

`migrate_planner_model_to_provider()` auto-migration step comments
out any existing `planner_model` value with a MIGRATED marker so
users know to supply a provider name instead of a model name.

Also adds latency-threshold doc comment on
`TopologyClassifier::suggest_max_parallel` (parallel scheduling
overhead only pays off when individual tool calls take >= 500ms;
for LLM API calls this is always satisfied; for local tools such as
file reads, consider keeping topology_selection disabled).

+50 tests (6504 -> 6554).

* style: fix fmt in migrate.rs test assertion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request rust Rust code changes size/XL Extra large PR (500+ lines)

Projects

None yet

1 participant