research(routing): Memory Augmented Routing — use retrieved context to downgrade to smaller model, 96% cost reduction (arXiv:2603.23013)

## Paper

**Title**: Knowledge Access Beats Model Size: Memory Augmented Routing for Persistent AI Agents  
**arXiv**: https://arxiv.org/abs/2603.23013  
**Published**: 2026-03-24

## Key Technique

Production AI agents see up to 47% semantically similar repeated queries. Instead of routing all queries to a large model, this framework:
1. Retrieves prior conversational context from memory for each query
2. Routes queries with high-confidence memory hits to a lightweight 8B model
3. Routes novel/low-confidence queries to the full-scale model

**Results (without additional training)**: 8B + memory retrieval → 30.5% F1, recovering 69% of 235B model performance at 96% cost reduction.

Key insight: memory makes routing worthwhile; routing makes memory cost-effective.

## Why Relevant to Zeph

Zeph already has SemanticMemory with MMR retrieval and PILOT LinUCB bandit routing. The gap: bandit routing makes decisions on task complexity (SLM audit) WITHOUT considering memory hit confidence. This paper shows memory retrieval quality is a strong routing signal: high-confidence recall means a cheap model is sufficient.

**Integration sketch**: add `memory_confidence` as a routing feature to LinUCB bandits alongside the existing complexity score. When memory_search returns similarity >= threshold (e.g. 0.9), route to fast provider. Extends `RoutingContext` with `memory_hit_confidence: Option<f32>`.

Complements #2415 (BaRP cost-weight dial) and extends existing PILOT bandit with a new memory-derived signal.

## Priority Rationale

P2: directly extends Zeph's existing bandit routing infrastructure with a well-validated signal. 96% cost reduction is a compelling production metric. Zeph already has all required subsystems — this is a composition improvement.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(routing): Memory Augmented Routing — use retrieved context to downgrade to smaller model, 96% cost reduction (arXiv:2603.23013) #2443

Paper

Key Technique

Why Relevant to Zeph

Priority Rationale

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research(routing): Memory Augmented Routing — use retrieved context to downgrade to smaller model, 96% cost reduction (arXiv:2603.23013) #2443

Description

Paper

Key Technique

Why Relevant to Zeph

Priority Rationale

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions