GitHub - luckybird1994/IPSeg: Towards Training-free Open-world Segmentation via Image Prompt Foundation Models,

IJCV2024: Towards Training-free Open-world Segmentation via Image Prompt Foundation Models

This repository is the official PyTorch implementation of our IPSeg framework. [arXiv]

Abstract

The realm of computer vision has witnessed a paradigm shift with the advent of foundational models, mirroring the transformative influence of large language models in the domain of natural language processing. This paper delves into the exploration of open-world segmentation, presenting a novel approach called Image Prompt Segmentation (IPSeg) that harnesses the power of vision foundational models. IPSeg lies the principle of a training-free paradigm, which capitalizes on image prompt techniques. Specifically, IPSeg utilizes a single image containing a subjective visual concept as a flexible prompt to query vision foundation models like DINOv2 and Stable Diffusion. Our approach extracts robust features for the prompt image and input image, then matches the input representations to the prompt representations via a novel feature interaction module to generate point prompts highlighting target objects in the input image. The generated point prompts are further utilized to guide the Segment Anything Model to segment the target object in the input image. The proposed method stands out by eliminating the need for exhaustive training sessions, thereby offering a more efficient and scalable solution. Experiments on COCO, PASCAL VOC, and other datasets demonstrate IPSeg's efficacy for flexible open-world segmentation using intuitive image prompts. This work pioneers tapping foundation models for open-world understanding through visual concepts conveyed in images.

News

[2024.07.16] Paper is accepted by IJCV and GitHub repo is created.
[2024.10.21] Release the code.

Framework

Visual Results

Qualitative Comparison.

Robust Image Prompts.

How the IPSEG Work?

Complete Composed Feature (StableDiffusion and DINOV2).

Environment Setup

To install the required dependencies, use the following commands:

conda create -n ipseg python=3.9
conda activate ipseg
conda install pytorch=1.13.1 torchvision=0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia
conda install -c "nvidia/label/cuda-11.6.1" libcusolver-dev
git clone [email protected]:luckybird1994/IPSeg.git
cd IPSeg
pip install -e .

Tips: If you get network errors when using Huggingface, try: export HF_ENDPOINT=https://hf-mirror.com.

Get Started

Download the PerSeg Dataset and Checkpoint.
Get the saliency maps of PerSeg dataset by:

sh scripts/perseg_sod.sh

Get the Stable&DINOv2 features of PerSeg dataset by:

sh scripts/perseg_feat.sh

Get the segementation maps of PerSeg dataset under one reffering images group by:

sh scripts/perseg_test.sh

Citation

If you find our work useful, please cite:

@article{DBLP:journals/corr/abs-2310-10912,
  author       = {Lv Tang and Peng-Tao Jiang and Haoke Xiao an Bo Li},
  title        = {Towards Training-free Open-world Segmentation via Image Prompting Foundation Models},
  journal      = {CoRR},
  volume       = {abs/2310.10912},
  year         = {2023}
}

Acknowledgement

This repo benefits from sd-dino, SAM and TSDN. Our heartfelt gratitude goes to the developers of these resources!

Contact

Feel free to leave issues here or send our e-mails ([email protected], [email protected], [email protected], [email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
A2S-v2		A2S-v2
assets		assets
scripts		scripts
sd-dino		sd-dino
segment_anything		segment_anything
utils		utils
.gitignore		.gitignore
README.md		README.md
ipseg_fss.py		ipseg_fss.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IJCV2024: Towards Training-free Open-world Segmentation via Image Prompt Foundation Models

Abstract

News

Framework

Visual Results

Qualitative Comparison.

Robust Image Prompts.

How the IPSEG Work?

Complete Composed Feature (StableDiffusion and DINOV2).

Environment Setup

Get Started

Citation

Acknowledgement

Contact

About

Uh oh!

Releases

Packages

Languages

luckybird1994/IPSeg

Folders and files

Latest commit

History

Repository files navigation

IJCV2024: Towards Training-free Open-world Segmentation via Image Prompt Foundation Models

Abstract

News

Framework

Visual Results

Qualitative Comparison.

Robust Image Prompts.

How the IPSEG Work?

Complete Composed Feature (StableDiffusion and DINOV2).

Environment Setup

Get Started

Citation

Acknowledgement

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages