CVPR 2025
Chunlin Yu*
Hanqing Wang*
Ye Shi
Haoyang Luo
Sibei Yang
Jingyi Yu
Jingya Wang
ShanghaiTech University
*Indicates Equal Contribution
📖 Project Page | 📄 Paper Link |
We introduce SeqAfford, a Multi-Modal Language Model (MLLM) capable of serialized affordance inference implied in human instructions: 1) Single Affordance Reasoning; 2) Sequential Affordance Reasoning; 3) Sequential Affordance Reasoning with Multiple Objects
I have quit the graduate program @ShanghaiTech Unversity in March 2025 while I am the co-first author, and therefore do not have the final version of the code. The code in this repository is my preliminary code from around September 2024, with some details potentially missing. However, to help the community follow up on our research, I have uploaded it here. But My friend Zhenhao Zhang@ShanghaiTech have utilized this model to help with HOI generation, so you can check the details in his repo OpenHOI, which was accepted by NeurIPS 2025 as an oral paper. And SeqAffordSplat is also a follow-up research.
- [12/11/2025] Please check the repo OpenHOI for the full datails of seqafford!!!
- [2/27/2025] 🎉🎉🎉SeqAfford has been accepted by CVPR 2025!!!🎉🎉🎉
- [12/2/2024] SeqAfford has been released on Arxiv now!!!
Please refer to our homepage for more thrilling results!
-
- Create a new
condaenvironment and activate it by following command
conda env create -f environment.yaml
- Create a new
-
- Down ShapeLLM model weight into your directory, and Modify the model path in the
scripts/finetune_lora.sh, including both--vision_tower_pathand--pretrain_mm_mlp_adapter
- Down ShapeLLM model weight into your directory, and Modify the model path in the
-
- Down Uni3D model weight into your directory, and Modify the model path in the
./llava/model/language_model/affordancellm.py
- Down Uni3D model weight into your directory, and Modify the model path in the
-
- you can train your own model by running the following code
sh ./scripts/finetune_lora.shvisit the link to download the Dataset
- Paper Released.
- [√ ] Source Code and Pretrained Weights.
- [√ ] Dataset.
Thanks for the wonderful works: ShapeLLM, LISA, This work is built upon them.
For academic use, this project is licensed under the 2-clause BSD License.
@article{yu2024seqafford,
title={SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model},
author={Yu, Chunlin and Wang, Hanqing and Shi, Ye and Luo, Haoyang and Yang, Sibei and Yu, Jingyi and Wang, Jingya},
journal={arXiv preprint arXiv:2412.01550},
year={2024}
}
