VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding

🚀 Welcome to the official repository of VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding!

🔍 What is VideoICL?

Applying in-context learning to video-language tasks faces challenges due to the limited context length in video LMMs, as videos require longer token lengths. To address these issues, we propose VideoICL, a novel video in-context learning framework for OOD video understanding tasks that extends effective context length without incurring high costs.

Our VideoICL implementation includes the following key features:

✅ Similarity-based Example Selection: Selects relevant video-question pairs based on query relevance.
🔁 Confidence-based Iterative Inference: Iteratively refining the results until a high-confidence response is obtained.
🏆 State-of-the-Art Performance: Outperforms existing baselines including GPT-4o and Gemini on multiple benchmarks with 7B model.

📌 Get Started

In this repository, we evaluate Qwen2-VL-7B model using VideoICL on a video classification task using the UCF-Crime dataset.

Installation

conda create -n videoicl python=3.10 -y
conda activate videoicl
git clone https://github.com/KangsanKim07/VideoICL.git
cd VideoICL

Dataset preparation

Download following files to data/UCF-Crime/raw folder from this link.

Anomaly-Videos-Part-1~4.zip
Normal_Videos_for_Event_Recognition.zip
UCF-Crimes-Train-Test-Split.zip

And run

sh data/UCF-Crimes/preprocess.sh

After running preprocessinng, data folder should be like this.

data
└── UCF-Crimes
    ├── raw
    │    ├── Anomaly-Videos-Part-*.zip
    │    ├── Normal_Videos_for_Event_Recognition.zip
    │    ├── UCF-Crimes-Train-Test-Split.zip
    │    └── ...
    ├── videos
    │    ├── Normal_Videos_event
    │    ├── Abuse
    │    ├── Arrest
    │    ├── ...
    │    └── Vandalism
    └── Action_Recognition_splits
        ├── test_001.txt
        ├── test_002.txt
        ├── ...
        ├── train_003.txt
        └── train_004.txt

Video feature extraction

Download InternVideo2 checkpoint.

And run

sh scripts/extract_visual_feat.sh ${PATH_TO_InternVideo2-stage2_1b-224p-f4.pt}

It will generate a file of video features as data/UCF-Crimes/vid_feat.pkl.

Get similarity rank

sh sctipts/get_simrank.sh

It will generate similarity rankings for each test video in data/UCF-Crimes/simrank.

Inference with VideoICL

pip install qwen-vl-utils
sh scripts/run_videoicl.sh

💯 Results

Performance

Model	#example	Animal Kingdom	Sports-QA	Pit-VQA	UCF-Crime	Drive& Act	CapERA
GPT-4o	0	58.2	-	6.9	58.0	-	0.173
Gemini-1.5 Pro	0	72.9	-	14.7	55.1	-	0.176
LLaVA-Video-72B	0	69.7	25.7	5.7	35.6	14.6	0.170
LLaVA-Video-7B	0	68.0	25.5	6.7	39.3	20.2	0.181
+VideoICL	8	72.3	47.6	61.3	53.3	53.4	0.178
Qwen2-VL-7B	0	58.6	26.8	5.8	36.1	10.6	0.138
+VideoICL	8	66.3	51.5	59.6	48.7	49.3	0.189
Oryx-1.5-7B	0	58.6	28.3	3.8	11.9	10.7	0.151
+VideoICL	8	58.5	52.0	58.4	44.0	57.3	0.179

Qualitative results

📜 Citation

If you find this work useful, please cite our paper:

@article{kim2024videoicl,
  title={VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding},
  author={Kim, Kangsan and Park, Geon and Lee, Youngwan and Yeo, Woongyeong and Hwang, Sung Ju},
  journal={arXiv preprint arXiv:2412.02186},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
InternVideo		InternVideo
assets		assets
data/UCF-Crimes		data/UCF-Crimes
scripts		scripts
.gitignore		.gitignore
README.md		README.md
qwen2vl.py		qwen2vl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding

🔍 What is VideoICL?

📌 Get Started

Installation

Dataset preparation

Video feature extraction

Get similarity rank

Inference with VideoICL

💯 Results

Performance

Qualitative results

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

KangsanKim07/VideoICL

Folders and files

Latest commit

History

Repository files navigation

VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding

🔍 What is VideoICL?

📌 Get Started

Installation

Dataset preparation

Video feature extraction

Get similarity rank

Inference with VideoICL

💯 Results

Performance

Qualitative results

📜 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages