Skip to content

research(performance): PASTE application-level speculative tool execution — 48.5% latency reduction, 1.8x throughput (arXiv:2603.18897) #2409

@bug-ops

Description

@bug-ops

Source

arXiv:2603.18897 — "Act While Thinking: Accelerating LLM Agents via Pattern-Aware Speculative Tool Execution" (PASTE, submitted March 19, 2026)

Key Contribution

PASTE identifies that agent tool-call sequences have stable application-level control flows. Speculatively pre-executes predicted tool sequences based on learned call patterns before the LLM finishes reasoning, hiding execution latency. Achieves 48.5% reduction in task completion time and 1.8x tool throughput improvement.

Distinction from #2290

Existing issue #2290 (speculative tool calls) covers pre-fetching based on LLM draft tokens at the decoding level. PASTE is a distinct approach: it learns recurring application-level call sequences (e.g., "search always precedes summarize in research tasks") as patterns. Different mechanism, different implementation point, concrete throughput numbers worth tracking separately.

Relevance to Zeph

zeph-tools / zeph-core agent loopToolOrchestrator could learn per-skill tool sequence patterns (e.g., web_search → read → summarize for research skills). Pre-execute first N predicted tools while LLM generates the response selecting them.

Implementation Sketch

  • Track per-skill tool invocation sequences in SQLite (extend scheduler/memory tables)
  • On skill activation, predict top-K likely tool calls; pre-warm them speculatively
  • Cancel if LLM selects a different tool path; commit if prediction matches

Priority Assessment

P3 (research) — High impact if tool call latency becomes user-visible bottleneck. Distinct from #2290; implement independently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Research — medium-high complexityresearchResearch-driven improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions