Skip to content

hq-King/SeqAfford

Repository files navigation


SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

CVPR 2025
Chunlin Yu*  Hanqing Wang*  Ye Shi  Haoyang Luo  Sibei Yang  Jingyi Yu  Jingya Wang 
ShanghaiTech University
*Indicates Equal Contribution

📖 Project Page | 📄 Paper Link |

We introduce SeqAfford, a Multi-Modal Language Model (MLLM) capable of serialized affordance inference implied in human instructions: 1) Single Affordance Reasoning; 2) Sequential Affordance Reasoning; 3) Sequential Affordance Reasoning with Multiple Objects

I have quit the graduate program @ShanghaiTech Unversity in March 2025 while I am the co-first author, and therefore do not have the final version of the code. The code in this repository is my preliminary code from around September 2024, with some details potentially missing. However, to help the community follow up on our research, I have uploaded it here. But My friend Zhenhao Zhang@ShanghaiTech have utilized this model to help with HOI generation, so you can check the details in his repo OpenHOI, which was accepted by NeurIPS 2025 as an oral paper. And SeqAffordSplat is also a follow-up research.

📣 News

  • [12/11/2025] Please check the repo OpenHOI for the full datails of seqafford!!!
  • [2/27/2025] 🎉🎉🎉SeqAfford has been accepted by CVPR 2025!!!🎉🎉🎉
  • [12/2/2024] SeqAfford has been released on Arxiv now!!!

😲 Results

Please refer to our homepage for more thrilling results!

🛠️ Setup

    1. Create a new conda environment and activate it by following command
    conda env create -f environment.yaml
    1. Down ShapeLLM model weight into your directory, and Modify the model path in the scripts/finetune_lora.sh, including both --vision_tower_path and --pretrain_mm_mlp_adapter
    1. Down Uni3D model weight into your directory, and Modify the model path in the ./llava/model/language_model/affordancellm.py
    1. you can train your own model by running the following code
  sh ./scripts/finetune_lora.sh

📚 Data

visit the link to download the Dataset

🚩 Plan

  • Paper Released.
  • [√ ] Source Code and Pretrained Weights.
  • [√ ] Dataset.

Acknowledgement

Thanks for the wonderful works: ShapeLLM, LISA, This work is built upon them.

🎫 License

For academic use, this project is licensed under the 2-clause BSD License.

🖊️ Citation

@article{yu2024seqafford,
        title={SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model},
        author={Yu, Chunlin and Wang, Hanqing and Shi, Ye and Luo, Haoyang and Yang, Sibei and Yu, Jingyi and Wang, Jingya},
        journal={arXiv preprint arXiv:2412.01550},
        year={2024}
      }

About

CVPR 2025

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages