Pink

Pink: Unveiling The Power of Referential Comprehension for Multi-modal LLMs.

Pink Weights

Base: Pink_Base
Base_Object365: Pink_Object365
Base_RefCOCO: Pink_Refcoco

Data Download

Pretraining Dataset

The pretraining dataset used in this release is the same as in LLaVA which is a subset of CC-3M dataset. Please see here for a detailed description on the dataset structure and how to download the images.

Instruction Tuning Dataset

The datasets mentioned in the image need to be downloaded manually.

COCO: train2017
VisualGenome: part1, part2, objects, relationships, region descriptions
Object365: Object365
A-OKVQA: A-OKVQA
LLaVA-158K: LLaVA-158K

We also provide the converted dataset used in the instruction tuning:

https://huggingface.co/datasets/SY-Xuan/Pink_sft/

LLaMA2 Weight Download

Our model is based on Llama-2-7b-chat-hf. You need to download the weights manually.

Llama-2-7b-chat-hf: Llama-2-7b-chat-hf

Install

Install Package

conda create -n pink python=3.10 -y
conda activate pink
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Training

Stage 1

Please refer to scripts/stage1.sh.

Stage 2

Please refer to scripts/stage2.sh.

Stage 2 with Object365

Please refer to scripts/stage2_with_object365.sh.

Self-consistent Bootstrapping

We convert the *.json of Object365. Please refer to dataset_generation/object365_detection.py

Bootstrapping

Please refer to scripts/object365_generate.sh.

Self-consistent

Please refer to scripts/object365_filter.py

Evaluation

Please refer to inference.ipynb and scripts/eval_refcoco.sh.

Demo

To launch a Gradio web demo, use the following command.

python demo.py --checkpoint-path /path/to/pink --llama-path /path/to/llama2

Citation

If you find Pink useful for your research and applications, please cite using this BibTeX:

@article{xuan2023pink,
  title={Pink: Unveiling the power of referential comprehension for multi-modal llms},
  author={Xuan, Shiyu and Guo, Qingpei and Yang, Ming and Zhang, Shiliang},
  journal={arXiv preprint arXiv:2310.00582},
  year={2023}
}

Acknowledgement

LLaVA: the codebase we refer to.
Shikra: the codebase we refer to.

Name		Name	Last commit message	Last commit date
parent directory ..
dataset_generation		dataset_generation
pink		pink
scripts		scripts
47.png		47.png
README.md		README.md
demo.py		demo.py
image.png		image.png
inference.ipynb		inference.ipynb
nash_high.jpeg		nash_high.jpeg
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Pink: Unveiling The Power of Referential Comprehension for Multi-modal LLMs.

Contents

Pink Weights

Data Download

Pretraining Dataset

Instruction Tuning Dataset

LLaMA2 Weight Download

Install

Training

Stage 1

Stage 2

Stage 2 with Object365

Self-consistent Bootstrapping

Bootstrapping

Self-consistent

Evaluation

Demo

Citation

Acknowledgement

FilesExpand file tree

Pink

Directory actions

More options

Directory actions

More options

Latest commit

History

Pink

Folders and files

parent directory

README.md

Pink: Unveiling The Power of Referential Comprehension for Multi-modal LLMs.

Contents

Pink Weights

Data Download

Pretraining Dataset

Instruction Tuning Dataset

LLaMA2 Weight Download

Install

Training

Stage 1

Stage 2

Stage 2 with Object365

Self-consistent Bootstrapping

Bootstrapping

Self-consistent

Evaluation

Demo

Citation

Acknowledgement