research(performance): PASTE application-level speculative tool execution — 48.5% latency reduction, 1.8x throughput (arXiv:2603.18897)

## Source

arXiv:2603.18897 — "Act While Thinking: Accelerating LLM Agents via Pattern-Aware Speculative Tool Execution" (PASTE, submitted March 19, 2026)

## Key Contribution

PASTE identifies that agent tool-call sequences have stable application-level control flows. Speculatively pre-executes predicted tool sequences based on learned call patterns before the LLM finishes reasoning, hiding execution latency. Achieves 48.5% reduction in task completion time and 1.8x tool throughput improvement.

## Distinction from #2290

Existing issue #2290 (speculative tool calls) covers pre-fetching based on **LLM draft tokens** at the decoding level. PASTE is a distinct approach: it learns recurring **application-level call sequences** (e.g., "search always precedes summarize in research tasks") as patterns. Different mechanism, different implementation point, concrete throughput numbers worth tracking separately.

## Relevance to Zeph

**`zeph-tools` / `zeph-core` agent loop** — `ToolOrchestrator` could learn per-skill tool sequence patterns (e.g., `web_search → read → summarize` for research skills). Pre-execute first N predicted tools while LLM generates the response selecting them.

## Implementation Sketch

- Track per-skill tool invocation sequences in SQLite (extend scheduler/memory tables)
- On skill activation, predict top-K likely tool calls; pre-warm them speculatively
- Cancel if LLM selects a different tool path; commit if prediction matches

## Priority Assessment

P3 (research) — High impact if tool call latency becomes user-visible bottleneck. Distinct from #2290; implement independently.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(performance): PASTE application-level speculative tool execution — 48.5% latency reduction, 1.8x throughput (arXiv:2603.18897) #2409

Source

Key Contribution

Distinction from #2290

Relevance to Zeph

Implementation Sketch

Priority Assessment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research(performance): PASTE application-level speculative tool execution — 48.5% latency reduction, 1.8x throughput (arXiv:2603.18897) #2409

Description

Source

Key Contribution

Distinction from #2290

Relevance to Zeph

Implementation Sketch

Priority Assessment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions