Skip to content

Latest commit

 

History

History

README.md

Pink: Unveiling The Power of Referential Comprehension for Multi-modal LLMs.

Contents

Pink Weights

Data Download

Pretraining Dataset

The pretraining dataset used in this release is the same as in LLaVA which is a subset of CC-3M dataset. Please see here for a detailed description on the dataset structure and how to download the images.

Instruction Tuning Dataset

Alt text

The datasets mentioned in the image need to be downloaded manually.

We also provide the converted dataset used in the instruction tuning:

https://huggingface.co/datasets/SY-Xuan/Pink_sft/

LLaMA2 Weight Download

Our model is based on Llama-2-7b-chat-hf. You need to download the weights manually.

Install

  1. Install Package
conda create -n pink python=3.10 -y
conda activate pink
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Training

Stage 1

Please refer to scripts/stage1.sh.

Stage 2

Please refer to scripts/stage2.sh.

Stage 2 with Object365

Please refer to scripts/stage2_with_object365.sh.

Self-consistent Bootstrapping

We convert the *.json of Object365. Please refer to dataset_generation/object365_detection.py

Bootstrapping

Please refer to scripts/object365_generate.sh.

Self-consistent

Please refer to scripts/object365_filter.py

Evaluation

Please refer to inference.ipynb and scripts/eval_refcoco.sh.

Demo

To launch a Gradio web demo, use the following command.

python demo.py --checkpoint-path /path/to/pink --llama-path /path/to/llama2

Citation

If you find Pink useful for your research and applications, please cite using this BibTeX:

@article{xuan2023pink,
  title={Pink: Unveiling the power of referential comprehension for multi-modal llms},
  author={Xuan, Shiyu and Guo, Qingpei and Yang, Ming and Zhang, Shiliang},
  journal={arXiv preprint arXiv:2310.00582},
  year={2023}
}

Acknowledgement

  • LLaVA: the codebase we refer to.
  • Shikra: the codebase we refer to.