SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

CVPR 2025
Chunlin Yu* Hanqing Wang* Ye Shi Haoyang Luo Sibei Yang Jingyi Yu Jingya Wang
ShanghaiTech University
*Indicates Equal Contribution

📖 Project Page | 📄 Paper Link |

We introduce SeqAfford, a Multi-Modal Language Model (MLLM) capable of serialized affordance inference implied in human instructions: 1) Single Affordance Reasoning; 2) Sequential Affordance Reasoning; 3) Sequential Affordance Reasoning with Multiple Objects

I have quit the graduate program @ShanghaiTech Unversity in March 2025 while I am the co-first author, and therefore do not have the final version of the code. The code in this repository is my preliminary code from around September 2024, with some details potentially missing. However, to help the community follow up on our research, I have uploaded it here. But My friend Zhenhao Zhang@ShanghaiTech have utilized this model to help with HOI generation, so you can check the details in his repo OpenHOI, which was accepted by NeurIPS 2025 as an oral paper. And SeqAffordSplat is also a follow-up research.

📣 News

[12/11/2025] Please check the repo OpenHOI for the full datails of seqafford!!!
[2/27/2025] 🎉🎉🎉SeqAfford has been accepted by CVPR 2025!!!🎉🎉🎉
[12/2/2024] SeqAfford has been released on Arxiv now!!!

😲 Results

Please refer to our homepage for more thrilling results!

🛠️ Setup

1. Create a new conda environment and activate it by following command
```
conda env create -f environment.yaml
```
1. Down ShapeLLM model weight into your directory, and Modify the model path in the scripts/finetune_lora.sh， including both --vision_tower_path and --pretrain_mm_mlp_adapter
1. Down Uni3D model weight into your directory, and Modify the model path in the ./llava/model/language_model/affordancellm.py
1. you can train your own model by running the following code

  sh ./scripts/finetune_lora.sh

📚 Data

visit the link to download the Dataset

🚩 Plan

Paper Released.
[√ ] Source Code and Pretrained Weights.
[√ ] Dataset.

Acknowledgement

Thanks for the wonderful works: ShapeLLM, LISA, This work is built upon them.

🎫 License

For academic use, this project is licensed under the 2-clause BSD License.

🖊️ Citation

@article{yu2024seqafford,
        title={SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model},
        author={Yu, Chunlin and Wang, Hanqing and Shi, Ye and Luo, Haoyang and Yang, Sibei and Yu, Jingyi and Wang, Jingya},
        journal={arXiv preprint arXiv:2412.01550},
        year={2024}
      }

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
ReConV2		ReConV2
assets		assets
docs		docs
llava.egg-info		llava.egg-info
llava		llava
playground/data/eval		playground/data/eval
scripts		scripts
utils		utils
wandb		wandb
README.md		README.md
environment.yaml		environment.yaml
fig1.png		fig1.png
log.out		log.out
pyproject.toml		pyproject.toml
train_ds.py		train_ds.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

📣 News

😲 Results

🛠️ Setup

📚 Data

🚩 Plan

Acknowledgement

🎫 License

🖊️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

hq-King/SeqAfford

Folders and files

Latest commit

History

Repository files navigation

SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

📣 News

😲 Results

🛠️ Setup

📚 Data

🚩 Plan

Acknowledgement

🎫 License

🖊️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages