Skip to content

research(routing): Memory Augmented Routing — use retrieved context to downgrade to smaller model, 96% cost reduction (arXiv:2603.23013) #2443

@bug-ops

Description

@bug-ops

Paper

Title: Knowledge Access Beats Model Size: Memory Augmented Routing for Persistent AI Agents
arXiv: https://arxiv.org/abs/2603.23013
Published: 2026-03-24

Key Technique

Production AI agents see up to 47% semantically similar repeated queries. Instead of routing all queries to a large model, this framework:

  1. Retrieves prior conversational context from memory for each query
  2. Routes queries with high-confidence memory hits to a lightweight 8B model
  3. Routes novel/low-confidence queries to the full-scale model

Results (without additional training): 8B + memory retrieval → 30.5% F1, recovering 69% of 235B model performance at 96% cost reduction.

Key insight: memory makes routing worthwhile; routing makes memory cost-effective.

Why Relevant to Zeph

Zeph already has SemanticMemory with MMR retrieval and PILOT LinUCB bandit routing. The gap: bandit routing makes decisions on task complexity (SLM audit) WITHOUT considering memory hit confidence. This paper shows memory retrieval quality is a strong routing signal: high-confidence recall means a cheap model is sufficient.

Integration sketch: add memory_confidence as a routing feature to LinUCB bandits alongside the existing complexity score. When memory_search returns similarity >= threshold (e.g. 0.9), route to fast provider. Extends RoutingContext with memory_hit_confidence: Option<f32>.

Complements #2415 (BaRP cost-weight dial) and extends existing PILOT bandit with a new memory-derived signal.

Priority Rationale

P2: directly extends Zeph's existing bandit routing infrastructure with a well-validated signal. 96% cost reduction is a compelling production metric. Zeph already has all required subsystems — this is a composition improvement.

Metadata

Metadata

Assignees

Labels

P2High value, medium complexityllmzeph-llm crate (Ollama, Claude)memoryzeph-memory crate (SQLite)researchResearch-driven improvement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions