Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval

Zhichuan Wang¹ · Yang Zhou² · Zhe Liu³ · Rui Yu⁴ · Song Bai⁵
Yulong Wang^1* · Xinwei He^1* · Xiang Bai⁶

¹Huazhong Agricultural University ²Shenzhen University ³The University of Hong Kong
⁴University of Louisville ⁵ByteDance ⁶Huazhong University of Science and Technology

ICCV 2025

[Paper]

Abstract

Open-set 3D object retrieval (3DOR) is an emerging task aiming to retrieve 3D objects of unseen categories beyond the training set. Existing methods typically utilize all modalities (i.e., voxels, point clouds, multi-view images) and train specific backbones before fusion. However, they still struggle to produce generalized representations due to insufficient 3D training data. Being contrastively pre-trained on web-scale image-text pairs, CLIP inherently produces generalized representations for a wide range of downstream tasks. Building upon it, we present a simple yet effective framework named Describe, Adapt and Combine (DAC) by taking only multi-view images for open-set 3DOR. DAC innovatively synergizes a CLIP model with a multi-modal large language model (MLLM) to learn generalized 3D representations, where the MLLM is used for dual purposes. First, it describes the seen category information to align with CLIP's training objective for adaptation during training. Second, it provides external hints about unknown objects complementary to visual cues during inference. To improve the synergy, we introduce an Additive-Bias Low-Rank adaptation (AB-LoRA), which alleviates overfitting and further enhances the generalization to unseen categories. With only multi-view images, DAC significantly surpasses prior arts by an average of +10.01% mAP on four open-set 3DOR datasets. Moreover, its generalization is also validated on image-based and cross-dataset setups.

Data Preparation

To download the four datasets (OS-ESB-core, OS-NTU-core, OS-MN40-core, and OS-ABO-core) for the Open-Set Retrieval task, please refer to HGM2R. To download the Objaverse dataset, please refer to OpenShape.

📄 Requirement

Setup conda environment.

# Create a conda environment
conda create -y -n dac python=3.9

# Activate the environment
conda activate dac

Clone DAC code repository and install requirements

git clone https://github.com/wangzhichuan123/DAC.git

cd DAC/

# Install requirements
pip install -r requirements.txt

🚀 Running

Run bash scripts/run.sh [dataset] [backbone] [rank] [gpu_id] to run DAC, e.g.

bash scripts/run.sh esb ViT-B/32 8 0

If you have already extracted and saved the features, you can use the following command to directly test DAC :

python test.py --dataset esb --backbone ViT-B/32 --r 8

⭐ Citation

Thanks for citing our paper.

@article{wang2025describe,
  title={Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval},
  author={Wang, Zhichuan and Zhou, Yang and Liu, Zhe and Yu, Rui and Bai, Song and Wang, Yulong and He, Xinwei and Bai, Xiang},
  journal={arXiv preprint arXiv:2507.21489},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
dataset		dataset
figure		figure
loralib		loralib
misc_utils		misc_utils
scripts		scripts
splits		splits
README.md		README.md
image_feats.py		image_feats.py
requirements.txt		requirements.txt
test.py		test.py
text_feats.py		text_feats.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval

Abstract

Data Preparation

📄 Requirement

🚀 Running

⭐ Citation

About

Uh oh!

Releases

Packages

Languages

wangzhichuan123/DAC

Folders and files

Latest commit

History

Repository files navigation

Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval

Abstract

Data Preparation

📄 Requirement

🚀 Running

⭐ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages