The official repo for IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval.
git clone https://github.com/BwLiu01/IDMR.git
cd IDMR
pip install -r requirements.txt- Inference examples with Gradio: IDMR-Demo
- Inference locally:
python inference.pyWe release both the training and test splits of IDMR on Hugging Face Datasets:
- Training Set: 🤗 lbw18601752667/IDMR-train
- Test Set: 🤗 lbw18601752667/IDMR-test
Run the following script to train IDMR.
MODEL_NAME=OpenGVLab/InternVL2_5-8B
DATA_DIR=./data/IDMR/train/parquet
IMAGE_DIR=./data/IDMR/train/images
OUTPUT_DIR=./ckpt/IDMR-8B
WANDB_API_KEY=YOUR_WANDB_API_KEY
wandb login --relogin $WANDB_API_KEY
torchrun --nproc_per_node=8 --master_port=22459 --max_restarts=0 train.py \
--model_name $MODEL_NAME --model_backbone internvl_2_5 --bf16 --pooling last \
--dataset_name $DATA_DIR \
--lora_target_modules qkv,wqkv,wo,w1,w2,w3 \
--subset_name MMEB_train IDMR_train_coco IDMR_train_objects365 IDMR_train_openimages \
--image_dir $IMAGE_DIR \
--max_len 1024 --output_dir $OUTPUT_DIR --logging_steps 20 \
--lr_scheduler_type linear --learning_rate 2e-5 --num_train_epochs 1 \
--warmup_steps 120 --save_steps 100 --normalize True \
--temperature 0.02 --per_device_train_batch_size 64 \
--lora --lora_r 8 \
--grad_cache True --gc_q_chunk_size 8 --gc_p_chunk_size 8 --wandb True\Please run eval.sh.
- Evaluates both in-domain and out-of-domain splits.
- Evaluation data can be directly downloaded from IDMR-test.
@article{liu2025idmr,
title = {IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval},
author={Bangwei Liu and Yicheng Bao and Shaohui Lin and Xuhong Wang and Xin Tan and Yingchun Wang and Yuan Xie and Chaochao Lu},
journal = {arXiv preprint arXiv:2504.00954},
year = {2025}
}We have adapted code from VLM2Vec, a comprehensive implementation of transforming MLLMs to embedding models.