Generates synthetic datasets for training and evaluating vision models on attention shifting and visual tracking tasks. Each sample contains identical objects where an attention box must move from one object to another.
Each sample pairs a task (first frame + prompt describing what needs to happen) with its ground truth solution (final frame showing the result + video demonstrating how to achieve it). This structure enables both model evaluation and training.
| Property | Value |
|---|---|
| Task ID | G-22 |
| Task | Attention Shift Same |
| Category | Perception |
| Resolution | 1024×1024 px |
| FPS | 16 fps |
| Duration | ~2-3 seconds |
| Output | PNG images + MP4 video |
# 1. Clone the repository
git clone https://github.com/VBVR-DataFactory/G-22_attention_shift_same_data-generator.git
cd G-22_attention_shift_same_data-generator
# 2. Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
pip install -e .# Generate 50 samples
python examples/generate.py --num-samples 50
# Custom output directory
python examples/generate.py --num-samples 100 --output data/my_dataset
# Reproducible generation with seed
python examples/generate.py --num-samples 50 --seed 42
# Without videos (faster)
python examples/generate.py --num-samples 50 --no-videos| Argument | Description |
|---|---|
--num-samples |
Number of tasks to generate (required) |
--output |
Output directory (default: data/questions) |
--seed |
Random seed for reproducibility |
--no-videos |
Skip video generation (images only) |
The scene shows two identical diamond objects, one on the left and one on the right, with a green attention box around the right object. The diamond objects remain stationary and unchanged throughout. Move the green attention box from the right object to the left object.
![]() |
![]() |
![]() |
| Initial Frame Green box around right object |
Animation Attention box shifts to left object |
Final Frame Green box around left object |
Move a green attention box from one object to an identical object, demonstrating attention shifting between visually similar targets.
- Objects: 2 identical shapes (circles, squares, triangles, diamonds, or hexagons)
- Object size: 70-120 pixel radius/half-size
- Positioning: Left and right sides with ~200px spacing
- Attention box: Green border (8px width) with 40px padding around object
- Initial state: Attention box around right object
- Final state: Attention box around left object
- Background: White with clear visibility
- Goal: Transfer attention marker from one object to its identical counterpart
- Identical objects testing attention shift between similar targets
- Smooth moving transition for attention box movement (box moves continuously from one object to another)
- Various shape types (circle, square, triangle, diamond, hexagon)
- Objects remain stationary throughout animation
- Green attention box as clear visual indicator
- Fixed hold frames at start and end for clarity
data/questions/attention_shift_same_task/attention_shift_same_00000000/
├── first_frame.png # Attention box on right object
├── final_frame.png # Attention box on left object
├── prompt.txt # Attention shift instruction
├── ground_truth.mp4 # Animation of box transition
└── question_metadata.json # Task metadata
File specifications:
- Images: 1024×1024 PNG format
- Video: MP4 format, 16 fps
- Duration: ~2-3 seconds
visual-reasoning attention visual-tracking object-similarity attention-shift spatial-reasoning


