Generates synthetic datasets for training and evaluating vision models on spatial perception and selective attention tasks. Each sample contains multiple shapes scattered across the image with a highlighted region, requiring the model to identify and outline specific shapes only within that region.
Each sample pairs a task (first frame + prompt describing what needs to happen) with its ground truth solution (final frame showing the result + video demonstrating how to achieve it). This structure enables both model evaluation and training.
| Property | Value |
|---|---|
| Task ID | G-9 |
| Task | Identify Objects In Region |
| Category | Perception |
| Resolution | 1024×1024 px |
| FPS | 16 fps |
| Duration | ~2 seconds |
| Output | PNG images + MP4 video |
# 1. Clone the repository
git clone https://github.com/VBVR-DataFactory/G-9_identify_objects_in_region_data-generator.git
cd G-9_identify_objects_in_region_data-generator
# 2. Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
pip install -e .# Generate 50 samples
python examples/generate.py --num-samples 50
# Custom output directory
python examples/generate.py --num-samples 100 --output data/my_dataset
# Reproducible generation with seed
python examples/generate.py --num-samples 50 --seed 42
# Without videos (faster)
python examples/generate.py --num-samples 50 --no-videos| Argument | Description |
|---|---|
--num-samples |
Number of tasks to generate (required) |
--output |
Output directory (default: data/questions) |
--seed |
Random seed for reproducibility |
--no-videos |
Skip video generation (images only) |
Outline all triangles in the circular region with a green border. Only outline objects within that region.
![]() |
![]() |
![]() |
| Initial Frame Shapes scattered, region highlighted |
Animation Target shapes get outlined sequentially |
Final Frame All target shapes within region outlined |
Identify and outline all instances of a specific shape type that lie within a designated region, demonstrating spatial awareness and selective attention.
- Regions: 2 regions per image (one circular, one rectangular), arranged left and right
- Shapes per region: 2-5 shapes per region (dynamically adjusted size based on count)
- Shape types: Circle, square, triangle, trapezoid (4 types)
- Colors: 100 distinct colors (fixed palette for reproducibility)
- Target: Specific shape type to identify in a specific region (varies per sample)
- Outline: Green border marks correctly identified shapes
- Background: Pure white
- Goal: Outline all instances of target shape within the specified region only
- Two distinct regions (circle and rectangle) with clear boundaries
- Shapes distributed within each region (not overlapping between regions)
- Only shapes within the target region matching the target type are outlined
- Multiple shape types provide distractor objects
- Dynamic shape sizing: shapes automatically resize based on count (2-5 per region)
- Grid-based distribution ensures even spacing within each region
- 100-color palette provides extensive visual diversity
- Minimum spacing between shapes prevents overlap
data/questions/identify_objects_in_region_task/identify_objects_in_region_00000000/
├── first_frame.png # Shapes scattered, region marked
├── final_frame.png # Target shapes within region outlined
├── prompt.txt # Identification instruction
├── ground_truth.mp4 # Animation of outlining process
└── question_metadata.json # Task metadata
File specifications:
- Images: 1024×1024 PNG format
- Video: MP4 format, 16 fps
- Duration: ~2 seconds
perception spatial-awareness selective-attention region-detection shape-recognition


