Skip to content

research(orchestration): agentic plan caching for LLM planner cost reduction (APC) #1856

@bug-ops

Description

@bug-ops

Source

Agentic Plan Caching: Test-Time Memory for Fast and Cost-Efficient LLM Agents — arXiv 2506.14852, 2025

Core Idea

Extracts structured plan templates from completed agent executions and stores them indexed by goal embedding. On new semantically similar requests, a lightweight model adapts the cached template rather than replanning from scratch. Reduces planning cost by 50% and latency by 27% on average across multiple agent benchmarks.

Applicability to Zeph: MEDIUM

Zeph's LlmPlanner generates a fresh task DAG on every goal decomposition call. For recurring or similar goals (common in scheduled tasks and long projects), plan templates from previous successful executions could be reused with minor adaptation. The existing Qdrant + SQLite infrastructure already supports the embedding index needed for similarity lookup.

This is complementary to the tool-result cache (#1822) — that caches data; this caches plans.

Implementation Sketch

  1. On successful task DAG completion (all nodes Succeeded), serialize the plan template (goal, task nodes with types/dependencies, without specific file paths/values) and store in SQLite plan_cache table with goal embedding in Qdrant.
  2. In LlmPlanner::decompose(), before calling LLM, check for cached plan template: embed goal, query Qdrant (threshold ~0.90), retrieve best match.
  3. If match found, pass template to LLM with "adapt this plan for the current goal" prompt (much cheaper than full decomposition).
  4. Gate behind [memory] plan_cache_enabled = true and plan_cache_similarity_threshold = 0.90.
  5. Emit planner: cache hit similarity=X.XX adapted_in_Nms at INFO.

Implementation Complexity

MEDIUM — requires plan serialization format, Qdrant collection for plan embeddings, and adaptation LLM call. Can reuse the existing LlmPlanner and SemanticMemory infrastructure.

See Also

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2High value, medium complexityllmzeph-llm crate (Ollama, Claude)researchResearch-driven improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions