You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Robust Asynchronous Planning via Auto-Formalization
We study LLM-based asynchronous planning across three settings of increasing difficulty: action-constrained (durations + precedence), state-constrained (grounded states + resources + agents), and online (execution-time replanning). We compare two planning interfaces — Planner and Formalizer — across four LLMs and four benchmarks, and show that the choice of formal representation, not model capacity, primarily determines whether planning scales.
Methods
Method
Description
Planner
LLM directly outputs a schedule or makespan from natural language
PDDL2.1 Formalizer
LLM translates the problem to PDDL2.1; solved with OPTIC
CP-SAT Formalizer
LLM outputs a structured scheduling spec (actions + dependencies + resources); solved with CP-SAT
Benchmarks
Dataset
Setting
Size
Axis
AsyncHow
Action-constrained
320
Duration and dependency extraction
AsyncPlan-XXL
Action-constrained
600 (50 × 12 sizes, 5–100 nodes)
Critical-path scaling
Robo Challenge
State-constrained
140 (20 × 7 splits)
Grounded robotic constraints
Online Robo Challenge
Online
140 (20 × 7 splits)
Dynamic replanning
Key Results
AsyncPlan-XXL: plan accuracy by graph size
Method
S5
S20
S50
S80
S100
Avg
CP-SAT Formalizer
93%
99%
97%
93%
83%
94%
PDDL2.1 Formalizer
13%
8%
6%
0%
0%
5%
Planner
96%
70%
19%
10%
5%
40%
CP-SAT Formalizer maintains ~94% average accuracy from 5 to 100 steps. Planner and PDDL2.1 collapse sharply with graph size.
Robo Challenge: plan accuracy by split (averaged across 4 LLMs)
Split
Planner
PDDL2.1
CP-SAT
Easy
97.5%
98.8%
100%
Medium
71.2%
97.5%
100%
Hard Station
28.8%
68.8%
97.5%
Hard Temporal
53.8%
48.8%
100%
Hard Multi-Agent
17.5%
6.2%
97.5%
Hard Optimization
3.8%
0.0%
100%
Hard High-Speedup
6.2%
37.5%
96.2%
Average
39.8%
51.1%
98.8%
Online Robo Challenge: one-shot vs. state-aware repair