-
Notifications
You must be signed in to change notification settings - Fork 2
feat(mcp): add per-message caching for prune_tools to avoid redundant LLM calls #2298
Copy link
Copy link
Closed
Labels
P2High value, medium complexityHigh value, medium complexityenhancementNew feature or requestNew feature or requestllmzeph-llm crate (Ollama, Claude)zeph-llm crate (Ollama, Claude)toolsTool execution and MCP integrationTool execution and MCP integration
Description
Problem
prune_tools fires an LLM call on every agent loop iteration where tool count exceeds min_tools_to_prune. A multi-turn conversation with 5 tool-use steps would make 5 extra LLM calls per user message, adding ~500ms latency at 100ms/call.
Solution
Cache the pruned tool set per user message, keyed on (message_content_hash, tool_list_hash). Reset the cache when a new user message arrives or when the MCP tool list changes.
Implementation note
Cache should live in the agent loop state (not in prune_tools itself, which is a stateless free function).
Priority
Must be addressed before the wiring PR for acceptable UX.
Component
zeph-core (agent loop), zeph-mcp (pruning.rs)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P2High value, medium complexityHigh value, medium complexityenhancementNew feature or requestNew feature or requestllmzeph-llm crate (Ollama, Claude)zeph-llm crate (Ollama, Claude)toolsTool execution and MCP integrationTool execution and MCP integration