Introduction
Overview
Current AI memory solutions face significant scalability challenges. Most rely on "explicit modeling", requiring humans to continuously specify which information is important and which is not. This approach fundamentally limits an AI system's ability to understand what truly matters to each user, and makes it difficult for the system to retain the most critical, user-specific information.
In addition, existing solutions often adopt a "one-size-fits-all" strategy, applying the same memory mechanism across all scenarios. Such general-purpose designs struggle to support diverse use cases and cannot satisfy users' highly personalized needs across different contexts.
What is MemU?
MemU is an agentic memory framework designed for LLMs and AI agents. It ingests multimodal inputs, extracts them into memory items, and autonomously organizes and clusters them into structured memory category files.
Unlike traditional RAG systems that rely on embedding search, MemU supports non-embedding search retrieval, enabling models to directly read the memory category files themselves. Through natural-language comprehension and reasoning, the LLM can interpret these files semantically and recursively trace information across layers —"Memory Category Files → Memory Items → Raw Resources" — to achieve deep retrieval and precise reasoning.
Unified Multimodal Memory Framework
MemU progressively transforms heterogeneous input data into queryable, semantically interpretable textual memory. Its core architecture consists of three layers (bottom to top):
| Layer | Name | Description |
|---|---|---|
| 1 | Resource Layer | Multimodal raw data repository (text, images, audio, video) |
| 2 | Memory Item Layer | Fine-grained memory items as natural language sentences |
| 3 | Memory Category Layer | Thematic memory categoryfiles forming higher-level knowledge structures |
The three layers maintain full bidirectional traceability: Raw Data → Memory Items → Memory Categories. Every piece of knowledge can be traced to its origin and reconstructed across layers, enabling high transparency, interpretability, and robust provenance tracking.
Core Processes
The operation of MemU is built upon three core processes:
- Memorization — Asynchronously transforms new multimodal input from lower layers to higher layers, storing raw input, extracting key information, and aggregating into appropriate memory categories
- Retrieval — Follows a top-down hierarchical search: Memory Categories → Memory Items → Raw Resources, balancing response speed with information completeness
- Self-Evolution Coming Soon — Continuously monitors user interaction and access frequency, automatically promoting frequently referenced topics and reorganizing memory structure
MemU vs. Traditional Memory Systems
Different memory approaches serve different purposes. Here's how MemU compares:
| Feature | Traditional Systems | MemU |
|---|---|---|
| Structure | Fragmented data | File system (documents, images, videos) |
| Memory Formation | Explicit modeling (manual) | Autonomous agent-managed |
| Retrieval | Embedding search only | Embedding + LLM file reading |
| Multimodal | Limited support | Full multimodal (text, image, audio, video) |
| Traceability | No | Full bidirectional traceability |
| Self-Evolving | No | Yes |
| Dynamic Capacity | No | Yes |
Dual-Mode Retrieval
MemU supports two retrieval modes that can be flexibly combined:
- Embedding Search — Performs semantic matching in the Memory Item Layer for fast recall
- LLM-based Search — Allows the LLM to directly read and interpret entire memory category files for richer, more accurate context
These two modes can be flexibly combined based on query needs, balancing retrieval speed with depth of understanding.