-
Notifications
You must be signed in to change notification settings - Fork 2
research(orchestration): agentic plan caching for LLM planner cost reduction (APC) #1856
Description
Source
Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents — arXiv 2506.14852, 2025
Core Idea
Extracts structured plan templates from completed agent executions and stores them indexed by goal embedding. On new semantically similar requests, a lightweight model adapts the cached template rather than replanning from scratch. Reduces planning cost by 50% and latency by 27% on average across multiple agent benchmarks.
Applicability to Zeph: MEDIUM
Zeph's LlmPlanner generates a fresh task DAG on every goal decomposition call. For recurring or similar goals (common in scheduled tasks and long projects), plan templates from previous successful executions could be reused with minor adaptation. The existing Qdrant + SQLite infrastructure already supports the embedding index needed for similarity lookup.
This is complementary to the tool-result cache (#1822) — that caches data; this caches plans.
Implementation Sketch
- On successful task DAG completion (all nodes
Succeeded), serialize the plan template (goal, task nodes with types/dependencies, without specific file paths/values) and store in SQLiteplan_cachetable with goal embedding in Qdrant. - In
LlmPlanner::decompose(), before calling LLM, check for cached plan template: embed goal, query Qdrant (threshold ~0.90), retrieve best match. - If match found, pass template to LLM with "adapt this plan for the current goal" prompt (much cheaper than full decomposition).
- Gate behind
[memory] plan_cache_enabled = trueandplan_cache_similarity_threshold = 0.90. - Emit
planner: cache hit similarity=X.XX adapted_in_Nmsat INFO.
Implementation Complexity
MEDIUM — requires plan serialization format, Qdrant collection for plan embeddings, and adaptation LLM call. Can reuse the existing LlmPlanner and SemanticMemory infrastructure.
See Also
LlmPlanner,DagScheduler,TaskGraphinzeph-scheduler/zeph-core- research(tools): tool result cache — avoid redundant executions within a session #1822 (tool result cache)
- research(routing): AdaptOrch task-adaptive orchestration topology selection #1840 (AdaptOrch topology selection)