Tip: Use 3+ messages per conversation. Max 4 parallel tasks supported.

Introduction

Overview

Current AI memory solutions face significant scalability challenges. Most rely on "explicit modeling", requiring humans to continuously specify which information is important and which is not. This approach fundamentally limits an AI system's ability to understand what truly matters to each user, and makes it difficult for the system to retain the most critical, user-specific information.

In addition, existing solutions often adopt a "one-size-fits-all" strategy, applying the same memory mechanism across all scenarios. Such general-purpose designs struggle to support diverse use cases and cannot satisfy users' highly personalized needs across different contexts.

What is MemU?

MemU is an agentic memory framework designed for LLMs and AI agents. It ingests multimodal inputs, extracts them into memory items, and autonomously organizes and clusters them into structured memory category files.

Unlike traditional RAG systems that rely on embedding search, MemU supports non-embedding search retrieval, enabling models to directly read the memory category files themselves. Through natural-language comprehension and reasoning, the LLM can interpret these files semantically and recursively trace information across layers —"Memory Category Files → Memory Items → Raw Resources" — to achieve deep retrieval and precise reasoning.

Unified Multimodal Memory Framework

MemU progressively transforms heterogeneous input data into queryable, semantically interpretable textual memory. Its core architecture consists of three layers (bottom to top):

LayerNameDescription
1Resource LayerMultimodal raw data repository (text, images, audio, video)
2Memory Item LayerFine-grained memory items as natural language sentences
3Memory Category LayerThematic memory categoryfiles forming higher-level knowledge structures

The three layers maintain full bidirectional traceability: Raw Data → Memory Items → Memory Categories. Every piece of knowledge can be traced to its origin and reconstructed across layers, enabling high transparency, interpretability, and robust provenance tracking.

Core Processes

The operation of MemU is built upon three core processes:

  • Memorization — Asynchronously transforms new multimodal input from lower layers to higher layers, storing raw input, extracting key information, and aggregating into appropriate memory categories
  • Retrieval — Follows a top-down hierarchical search: Memory Categories → Memory Items → Raw Resources, balancing response speed with information completeness
  • Self-Evolution Coming Soon — Continuously monitors user interaction and access frequency, automatically promoting frequently referenced topics and reorganizing memory structure

MemU vs. Traditional Memory Systems

Different memory approaches serve different purposes. Here's how MemU compares:

FeatureTraditional SystemsMemU
StructureFragmented dataFile system (documents, images, videos)
Memory FormationExplicit modeling (manual)Autonomous agent-managed
RetrievalEmbedding search onlyEmbedding + LLM file reading
MultimodalLimited supportFull multimodal (text, image, audio, video)
TraceabilityNoFull bidirectional traceability
Self-EvolvingNoYes
Dynamic CapacityNoYes

Dual-Mode Retrieval

MemU supports two retrieval modes that can be flexibly combined:

  • Embedding Search — Performs semantic matching in the Memory Item Layer for fast recall
  • LLM-based Search — Allows the LLM to directly read and interpret entire memory category files for richer, more accurate context

These two modes can be flexibly combined based on query needs, balancing retrieval speed with depth of understanding.

Next Steps

Read Memory Structure to understand the three-layer architecture in detail, or jump to Getting Started to begin using MemU.