A Very Big Video Reasoning Suite

We bet on a future that video reasoning is the next fundamental intelligence paradigm, after language reasoning, where spatiotemporal embodied world experiences could be more naturally captured.

Data Engines

VIEW ALL DATA ENGINE

locate_twelve_o_clock_arrows

GitHub

Knowledge training set

Prompt

The image contains 2 clocks, each with only an hour hand. Exactly one clock has its hour hand pointing to 12 o'clock. First find the single clock pointing to 12 o'clock, then draw a red circle around it. Do not change anything else. Show the complete solution step by step.

First Frame

Last Frame

Video

return_to_correct_bin

GitHub

Abstraction training set

Prompt

Move each item into the bin that matches its color. Only move items, do not change anything else.

First Frame

Last Frame

Video

grid_obtaining_award

GitHub

Spatiality training set

Prompt

The scene shows a 10x10 grid with a green start point, a red end point, and 4 triangle reward items scattered across it. A circular agent starts at the green start point and can move to adjacent cells (up, down, left, right). The agent collects a reward by moving to its cell, and once collected, the reward disappears. Find the shortest path that collects all 4 triangle rewards before reaching the red end point.

First Frame

Last Frame

Video

object_packing

GitHub

Transformation training set

Prompt

The scene shows objects on the left side and a container on the right side. Place the objects into the container one by one in the color order: orange - brown. Each object must be placed individually in the exact order specified, and all objects must end up inside the container.

First Frame

Last Frame

Video

draw_midpoint_perpendicular_line

GitHub

Perception out-of-domain testset

Prompt

Draw a red perpendicular line through the middle point between two parallel lines. The line should extend from the upper parallel line to the lower parallel line.

First Frame