Introduction

Overview

Current AI memory solutions face significant scalability challenges. Most rely on "explicit modeling", requiring humans to continuously specify which information is important and which is not. This approach fundamentally limits an AI system's ability to understand what truly matters to each user, and makes it difficult for the system to retain the most critical, user-specific information.

In addition, existing solutions often adopt a "one-size-fits-all" strategy, applying the same memory mechanism across all scenarios. Such general-purpose designs struggle to support diverse use cases and cannot satisfy users' highly personalized needs across different contexts.

What is MemU?

MemU is an agentic memory framework designed for LLMs and AI agents. It ingests multimodal inputs, extracts them into memory items, and autonomously organizes and clusters them into structured memory category files.

Unlike traditional RAG systems that rely on embedding search, MemU supports non-embedding search retrieval, enabling models to directly read the memory category files themselves. Through natural-language comprehension and reasoning, the LLM can interpret these files semantically and recursively trace information across layers —"Memory Category Files → Memory Items → Raw Resources" — to achieve deep retrieval and precise reasoning.

Unified Multimodal Memory Framework

MemU progressively transforms heterogeneous input data into queryable, semantically interpretable textual memory. Its core architecture consists of three layers (bottom to top):

Layer	Name	Description
1	Resource Layer	Multimodal raw data repository (text, images, audio, video)
2	Memory Item Layer	Fine-grained memory items as natural language sentences
3	Memory Category Layer	Thematic memory categoryfiles forming higher-level knowledge structures

The three layers maintain full bidirectional traceability: Raw Data → Memory Items → Memory Categories. Every piece of knowledge can be traced to its origin and reconstructed across layers, enabling high transparency, interpretability, and robust provenance tracking.

Core Processes

The operation of MemU is built upon three core processes:

Memorization — Asynchronously transforms new multimodal input from lower layers to higher layers, storing raw input, extracting key information, and aggregating into appropriate memory categories
Retrieval — Follows a top-down hierarchical search: Memory Categories → Memory Items → Raw Resources, balancing response speed with information completeness
Self-Evolution Coming Soon — Continuously monitors user interaction and access frequency, automatically promoting frequently referenced topics and reorganizing memory structure

MemU vs. Traditional Memory Systems

Different memory approaches serve different purposes. Here's how MemU compares:

Feature	Traditional Systems	MemU
Structure	Fragmented data	File system (documents, images, videos)
Memory Formation	Explicit modeling (manual)	Autonomous agent-managed
Retrieval	Embedding search only	Embedding + LLM file reading
Multimodal	Limited support	Full multimodal (text, image, audio, video)
Traceability	No	Full bidirectional traceability
Self-Evolving	No	Yes
Dynamic Capacity	No	Yes

Dual-Mode Retrieval

MemU supports two retrieval modes that can be flexibly combined:

Embedding Search — Performs semantic matching in the Memory Item Layer for fast recall
LLM-based Search — Allows the LLM to directly read and interpret entire memory category files for richer, more accurate context

These two modes can be flexibly combined based on query needs, balancing retrieval speed with depth of understanding.