Skip to content

feat(orchestration): activate VMAO verify-completeness in PlanVerifier#2346

Merged
bug-ops merged 1 commit intomainfrom
vmao-adaptive-replanning
Mar 28, 2026
Merged

feat(orchestration): activate VMAO verify-completeness in PlanVerifier#2346
bug-ops merged 1 commit intomainfrom
vmao-adaptive-replanning

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 28, 2026

Summary

Activates the existing PlanVerifier skeleton (introduced in PR #2235) with a full per-task and whole-plan verification pipeline based on the VMAO paper (arXiv:2603.11445). When verify_completeness = true, the agent now runs an LLM-judged completeness check after DAG execution and triggers a targeted gap-filling replan cycle for incomplete sub-tasks.

  • Add completeness_threshold: f32 (default 0.7) to OrchestrationConfig with sanitizer clamping and startup validation
  • Add verify_plan() for whole-plan verification after DagScheduler completes
  • Add replan_from_plan() generating root TaskNodes for plan-level gaps
  • Wire SchedulerAction::Verify handler in agent loop (was no-op since feat(orchestration): AdaptOrch topology-routing and Plan-Execute-Verify-Replan (#2219, #2202) #2235)
  • Insert whole-plan verify step between Done{Completed} and aggregation
  • Partial replan DAG runs in a separate DagScheduler with max_replans=0 and verify_completeness=false (INV-2: no recursive loops)
  • Partial DAG outputs merged with original task outputs before Aggregator
  • Output truncated to verify_max_tokens * 4 chars before verify_plan() call
  • GapSeverity implements Display returning lowercase names consistent with serde snake_case and LLM system prompt expectations
  • All LLM error paths are fail-open (verify → complete=true, replan → empty Vec, whole-plan → None)
  • 25 new unit tests across verifier.rs and experiment.rs
  • verify_completeness = false by default — no behavior change when disabled

Closes #2252

Research evaluations (in architect handoff

.local/handoff/2026-03-28T10-59-52-architect.yaml)

Test plan

  • cargo +nightly fmt --check — passed
  • cargo clippy --workspace --features full -- -D warnings — 0 warnings
  • cargo nextest run --workspace --features experiments --lib --bins — 6581/6581 passed
  • Verify verify_completeness = false (default) produces no behavior change
  • Verify completeness_threshold outside [0.0, 1.0] rejected at startup

#2252)

Activate the existing PlanVerifier skeleton with per-task and whole-plan
verification gates, a completeness_threshold config field, and a single-cycle
gap-filling replan loop based on the VMAO paper (arXiv:2603.11445).

- Add completeness_threshold: f32 (default 0.7) to OrchestrationConfig with
  sanitizer clamping to [0.0, 1.0] and startup validation
- Add verify_plan() for whole-plan verification after DagScheduler completion
- Add replan_from_plan() generating root TaskNodes for plan-level gaps
- Wire SchedulerAction::Verify handler in agent loop (was a no-op since #2235)
- Add whole-plan verification step between Done{Completed} and aggregation
- Partial replan DAG runs in a separate DagScheduler with max_replans=0 and
  verify_completeness=false to prevent recursive loops (INV-2)
- Partial DAG outputs merged with original task outputs before aggregation
- Output truncated to verify_max_tokens*4 chars before verify_plan() call
- GapSeverity implements Display returning lowercase names consistent with
  serde snake_case serialization and LLM system prompt expectations
- All LLM error paths are fail-open: verify returns complete=true, replan
  returns empty Vec, whole-plan verify returns None
- 25 new unit tests across verifier.rs and experiment.rs
- verify_completeness = false by default; no behavior change when disabled

Closes #2252
@bug-ops bug-ops force-pushed the vmao-adaptive-replanning branch from e4085e0 to 3cffd78 Compare March 28, 2026 11:16
@bug-ops bug-ops merged commit ff649f1 into main Mar 28, 2026
25 checks passed
@bug-ops bug-ops deleted the vmao-adaptive-replanning branch March 28, 2026 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation enhancement New feature or request rust Rust code changes size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

research(orchestration): VMAO adaptive replanning — LLM-judged completeness triggers gap-filling DAG replay (arXiv:2603.11445)

1 participant