CREAM: Continual Retrieval on Dynamic Streaming Corpora with Adaptive Soft Memory

Official implementation of CREAM, a self-supervised framework for memory-based continual retrieval on dynamic streaming corpora.
CREAM learns continual retrieval in a fully unsupervised setting with adaptive soft memory.

TL;DR

CREAM proposes soft memory for practical continual IR under unbounded / unlabeled / topic-shifting streaming corpora, and introduces three key techniques:

Fine-grained similarity (token-level semantics)
Regularized cluster prototypes (fixed token length prototypes via LSH-style regularization)
Stratified coreset sampling (diverse training samples from memory)

Method Overview

CREAM repeats three stages per streaming session:

Retrieval: return relevant documents with the up-to-date encoder
Memory Update: streaming clustering with regularized prototypes (soft memory maintenance)
Encoder Update: self-supervised training via contrastive objective using pseudo pos/neg sampled from memory

Datasets

Quick Start

Generate sessions(cream/src/data)

# Raw sessions
python generate_multi_test.py

# Filtering with BM25
python proposal_input_helper.py

Train and evaluate(cream/src)

You can evaluate with rolling evaluation optionally.

python main.py \
 --exp=proposal_qq_low \
 --use_tensor_key \
 --warming_up_method=stream_seed \
 --sspq=50 \
 --start=0 \
 --end=10 \
 --rdsz=50 \
 --cmnsz=50 \
 --mi=3 \
 --init_k=5 \
 --light_weight \
 --light_weight_rate=0.25

Citation

@inproceedings{son2026cream,
  title     = {CREAM: Continual Retrieval on Dynamic Streaming Corpora with Adaptive Soft Memory},
  author    = {Son, HuiJeong and Kang, Hyeongu and Kim, Sunho and Ho, Subeen and Kang, SeongKu and Lee, Dongha and Yoon, Susik},
  booktitle = {Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '26)},
  year      = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
assets		assets
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CREAM: Continual Retrieval on Dynamic Streaming Corpora with Adaptive Soft Memory

TL;DR

Method Overview

Datasets

Quick Start

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

DAIS-KU/CREAM

Folders and files

Latest commit

History

Repository files navigation

CREAM: Continual Retrieval on Dynamic Streaming Corpora with Adaptive Soft Memory

TL;DR

Method Overview

Datasets

Quick Start

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages