GitHub - Hoar012/RAP-MLLM: [CVPR 2025] RAP: Retrieval-Augmented Personalization

RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models

Paper | Project Page | Model | Data

News

2025.3.16 The RAP dataset is now available. Access it here.🔥🔥
2025.2.27 RAP is accepted by CVPR 2025!🎉🎉
2024.11.24 Release code and model weights.

Personalize Your Multimodal Large Language Model via Retrieval Augmented Generation.


Introduce some user-specific concepts to our RAP-MLLM, it can remember them and achieve excellent performance in a variety of personalized multimodal generation tasks.

Visit our Project Page for more demostrations.

Install

Clone the repo into a local folder.

git clone https://github.com/Hoar012/RAP-MLLM.git

cd RAP-MLLM

Install packages.

conda create -n rap python=3.10 -y
conda activate rap
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

pip install -r requirements.txt

Models

Pretrained model weights are available on Hugging Face.

RAP-LLaVA: RAP-LLaVA-13b; RAP-Phi3-V: RAP-Phi3-mini

Demo

Build Your Personal Database:

Each concept record in the database can be structured with the following format:

{
    "concept_dict": {
        "<concept>": {
            "name": "concept_name",
            "image": "image_path",
            "info": "",
            "category": ""
        }
    },
    "path_to_concept": {
        "image_path": "<concept>",
    }
}

We provide an example of the database in example_database.

CLI Demo:

python cli.py --model-path Hoar012/RAP-LLaVA-13b --image-file /path/to/test_image --retrieval --database example_database --topK 1

Data

Please check Data for more detail.

Training

We provide the training scripts with DeepSpeed below. Try training on your own dataset!

Model	RAP-LLaVA	RAP-Phi3-V	LLaVA-LoRA
Script	script	script	script

Evaluation

Prepare Data

Please download the test data used in the paper from the repositories of MyVLM and Yo'LLaVA.

We also provide the images for multi-concept evaluation in this Google Drive link.

In addition, we provide the full database used for question answering at this Google Drive link.

Evaluation on Image Captioning

python eval/caption.py  --eval-file /path/to/eval_file --model-path Hoar012/RAP-LLaVA-13b --retrieval --database /path/to/database --topK 2

The eval-file records the image paths to be evaluated and their corresponding target concepts, formatted as follows:

{
    "/path/to/image": [
        "target_concept"
    ],
}

Evaluation on Question Answering

python eval/VQA.py --eval-file eval/yollava-visual-qa.json --model-path Hoar012/RAP-LLaVA-13b --retrieval --database /path/to/database --topK 1

Replace /path/to/output_file with the path to your output file, then run the following command to obtain the accuracy:

python eval/eval_qa.py --output_path /path/to/output_file

Evaluation on Visual Recognition

python eval/recognition.py --eval-file eval/recognition_test.json --model-path Hoar012/RAP-LLaVA-13b --retrieval --database /path/to/database --topK 1

BibTeX

@InProceedings{Hao_2025_CVPR,
    author    = {Hao, Haoran and Han, Jiaming and Li, Changsheng and Li, Yu-Feng and Yue, Xiangyu},
    title     = {RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {14538-14548}
}

Acknowledgement

LLaVA, MyVLM, YoLLaVA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models

Paper | Project Page | Model | Data

News

Personalize Your Multimodal Large Language Model via Retrieval Augmented Generation.

📋 Contents

Install

Models

Demo

Data

Training

Evaluation

Prepare Data

Evaluation on Image Captioning

Evaluation on Question Answering

Evaluation on Visual Recognition

BibTeX

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
data		data
eval		eval
example_database		example_database
images		images
llava		llava
scripts		scripts
README.md		README.md
cli.py		cli.py
detector.py		detector.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
retriever.py		retriever.py

Hoar012/RAP-MLLM

Folders and files

Latest commit

History

Repository files navigation

RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models

Paper | Project Page | Model | Data

News

Personalize Your Multimodal Large Language Model via Retrieval Augmented Generation.

📋 Contents

Install

Models

Demo

Data

Training

Evaluation

Prepare Data

Evaluation on Image Captioning

Evaluation on Question Answering

Evaluation on Visual Recognition

BibTeX

Acknowledgement

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages