PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Introduction

PerceptionComp is a benchmark for complex perception-centric video reasoning. It targets questions that cannot be solved from a single frame, a single moment, or a short caption: models must revisit visually complex videos, gather evidence from temporally separated segments, and combine multiple perceptual constraints before answering.

✨ Highlights

Complex perception-centric reasoning instead of caption-level shortcut solving.
1,114 manually annotated five-choice questions.
Seven categories spanning outdoor tour, shopping, sport, variety show, home tour, game, and movie.
Unified workflow for download, local video storage and evaluation.
Extensible evaluation entry point that supports OpenAI-compatible APIs, Gemini, and custom model runners.

📦 Data Release

PerceptionComp is released in two parts:

GitHub repository: contains benchmark annotations, evaluation code, runner templates, analysis utilities, and documentation.
Hugging Face dataset: stores the benchmark videos referenced by video_id.

📊 Main Results

🚀 Quick Start

Step 1. Clone the Repository

git clone https://github.com/hrinnnn/PerceptionComp.git
cd PerceptionComp

Step 2. Install Dependencies

pip install -r requirements.txt

Step 3. Download the Benchmark Videos

Download the benchmark videos from the Hugging Face dataset using the official helper script:

python scripts/download_data.py --repo-id hrinnnn/PerceptionComp

If the Hugging Face dataset requires authentication:

python scripts/download_data.py \
  --repo-id hrinnnn/PerceptionComp \
  --hf-token YOUR_HF_TOKEN

This script downloads the videos from the Hugging Face data/ directory, flattens the downloaded snapshot into the local layout expected by the evaluator, and validates the result against the official annotation file.

After the script finishes successfully, your local layout is ready for evaluation:

benchmark/
  videos/
    <video_id>.mp4

Step 4. Run Evaluation with a Built-in Backend

PerceptionComp currently supports three evaluation modes:

api: OpenAI-compatible APIs
gemini: Gemini video-upload workflow
custom: your own model runner

Option A. OpenAI-Compatible API

Use this for GPT-style APIs, Qwen API deployments, GLM-compatible endpoints, Doubao-style endpoints, and similar services.

python evaluate/evaluate.py \
  --model YOUR_MODEL_NAME \
  --provider api \
  --api-key YOUR_API_KEY \
  --base-url YOUR_BASE_URL \
  --video-dir benchmark/videos

Optional arguments:

--output-dir: change where results are written
--frames: control the number of sampled frames
--proxy: pass a proxy for API calls

Option B. Gemini

python evaluate/evaluate.py \
  --model YOUR_GEMINI_MODEL_NAME \
  --provider gemini \
  --api-key YOUR_GEMINI_API_KEY \
  --video-dir benchmark/videos

Optional arguments:

--force-thinking: retry when <think> tags are missing
--output-dir: change where results are written

Step 5. Check the Outputs

Evaluation outputs are written to:

evaluate/results/Results-<model>.json
evaluate/results/Results-<model>.csv

The JSON file stores per-question predictions and raw responses. The CSV file stores aggregated scores.

🛠️ Evaluate Your Own Model

If your model is local, implement a custom runner. You can follow these steps:

Step 1. Copy the Template

cp evaluate/tools/runners/custom_template.py evaluate/tools/runners/my_model.py

Step 2. Implement the Model Hook

Open evaluate/tools/runners/my_model.py and replace run_your_model(...) with your own inference logic.

Your function should take:

video_path
prompt
model_name
custom_config (optional)

and return:

a raw string response from the model

The simplest recommended output format is:

Answer: A

or, if your model supports reasoning traces:

<think>
your reasoning here
</think>
<answer>
A
</answer>

Step 3. Run Evaluation with the Custom Runner

python evaluate/evaluate.py \
  --model YOUR_MODEL_NAME \
  --provider custom \
  --custom-runner evaluate/tools/runners/my_model.py \
  --video-dir benchmark/videos

If your runner needs an extra config file:

python evaluate/evaluate.py \
  --model YOUR_MODEL_NAME \
  --provider custom \
  --custom-runner evaluate/tools/runners/my_model.py \
  --custom-config path/to/your_config.json \
  --video-dir benchmark/videos

Step 4. Keep the Benchmark Protocol Fixed

When adapting your own model, do not modify:

the annotation format,
the question prompt structure,
the answer parsing logic,
the metric computation,
the output schema.

Only change the model-side inference path. That is what keeps your results comparable to other models.

The default custom runner template is now a near-runnable local transformers scaffold. If your model follows a Hugging Face VLM workflow, you can often start from the template directly instead of writing a runner from scratch.

📚 Citation

If you use PerceptionComp, please cite the corresponding paper once the public version is finalized.

@misc{perceptioncomp2026,
  title={PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.vscode		.vscode
benchmark		benchmark
docs		docs
evaluate		evaluate
image/README		image/README
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Introduction

✨ Highlights

📦 Data Release

📊 Main Results

🚀 Quick Start

Step 1. Clone the Repository

Step 2. Install Dependencies

Step 3. Download the Benchmark Videos

Step 4. Run Evaluation with a Built-in Backend

Option A. OpenAI-Compatible API

Option B. Gemini

Step 5. Check the Outputs

🛠️ Evaluate Your Own Model

Step 1. Copy the Template

Step 2. Implement the Model Hook

Step 3. Run Evaluation with the Custom Runner

Step 4. Keep the Benchmark Protocol Fixed

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Introduction

✨ Highlights

📦 Data Release

📊 Main Results

🚀 Quick Start

Step 1. Clone the Repository

Step 2. Install Dependencies

Step 3. Download the Benchmark Videos

Step 4. Run Evaluation with a Built-in Backend

Option A. OpenAI-Compatible API

Option B. Gemini

Step 5. Check the Outputs

🛠️ Evaluate Your Own Model

Step 1. Copy the Template

Step 2. Implement the Model Hook

Step 3. Run Evaluation with the Custom Runner

Step 4. Keep the Benchmark Protocol Fixed

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages