Official implementation of CREAM, a self-supervised framework for memory-based continual retrieval on dynamic streaming corpora.
CREAM learns continual retrieval in a fully unsupervised setting with adaptive soft memory.
CREAM proposes soft memory for practical continual IR under unbounded / unlabeled / topic-shifting streaming corpora, and introduces three key techniques:
- Fine-grained similarity (token-level semantics)
- Regularized cluster prototypes (fixed token length prototypes via LSH-style regularization)
- Stratified coreset sampling (diverse training samples from memory)
CREAM repeats three stages per streaming session:
- Retrieval: return relevant documents with the up-to-date encoder
- Memory Update: streaming clustering with regularized prototypes (soft memory maintenance)
- Encoder Update: self-supervised training via contrastive objective using pseudo pos/neg sampled from memory

- LoTTE
- MSMARCO
- Generate sessions(cream/src/data)
# Raw sessions
python generate_multi_test.py
# Filtering with BM25
python proposal_input_helper.py
- Train and evaluate(cream/src)
- You can evaluate with rolling evaluation optionally.
python main.py \
--exp=proposal_qq_low \
--use_tensor_key \
--warming_up_method=stream_seed \
--sspq=50 \
--start=0 \
--end=10 \
--rdsz=50 \
--cmnsz=50 \
--mi=3 \
--init_k=5 \
--light_weight \
--light_weight_rate=0.25
@inproceedings{son2026cream,
title = {CREAM: Continual Retrieval on Dynamic Streaming Corpora with Adaptive Soft Memory},
author = {Son, HuiJeong and Kang, Hyeongu and Kim, Sunho and Ho, Subeen and Kang, SeongKu and Lee, Dongha and Yoon, Susik},
booktitle = {Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '26)},
year = {2026}
}