Skip to content

VBVR-DataFactory/G-33_visual_jenga_data-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

G-33: Visual Jenga Data Generator

Generates synthetic datasets for training and evaluating vision models on sequential object removal and physics reasoning tasks. Each sample contains vertically stacked objects that must be extracted from top to bottom in order.

Each sample pairs a task (first frame + prompt describing what needs to happen) with its ground truth solution (final frame showing the result + video demonstrating how to achieve it). This structure enables both model evaluation and training.


📌 Basic Information

Property Value
Task ID G-33
Task Visual Jenga
Category Spatiality
Resolution 1024×1024 px
FPS 16 fps
Duration ~5-6 seconds
Output PNG images + MP4 video

🚀 Usage

Installation

# 1. Clone the repository
git clone https://github.com/VBVR-DataFactory/G-33_visual_jenga_data-generator.git
cd G-33_visual_jenga_data-generator

# 2. Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
pip install -e .

Generate Data

# Generate 50 samples
python examples/generate.py --num-samples 50

# Custom output directory
python examples/generate.py --num-samples 100 --output data/my_dataset

# Reproducible generation with seed
python examples/generate.py --num-samples 50 --seed 42

# Without videos (faster)
python examples/generate.py --num-samples 50 --no-videos

Command-Line Options

Argument Description
--num-samples Number of tasks to generate (required)
--output Output directory (default: data/questions)
--seed Random seed for reproducibility
--no-videos Skip video generation (images only)

📖 Task Example

Prompt

The scene shows objects stacked vertically. Extract the objects one by one from top to bottom in order, moving each object out of the frame before extracting the next one. Continue until all objects have been removed from the frame.

Visual

Initial Frame
Objects stacked vertically
Animation
Objects removed top to bottom
Final Frame
All objects removed, empty frame

📖 Task Description

Objective

Remove vertically stacked objects sequentially from top to bottom, extracting each one completely before proceeding to the next.

Task Setup

  • Stack configuration: Multiple objects stacked vertically
  • Object types: Various shapes (rectangles, circles, triangles)
  • Extraction order: Top to bottom (strict sequential order)
  • Removal method: Move each object completely out of frame
  • Sequential constraint: Must finish removing one object before starting next
  • Background: White with clear visibility
  • Goal: Remove all objects in correct order until frame is empty

Key Features

  • Sequential top-to-bottom extraction order
  • Physics-based stacking visualization
  • Complete removal of each object before next
  • Tests understanding of vertical ordering
  • Jenga-like sequential unstacking logic
  • Smooth animation showing extraction process

📦 Data Format

data/questions/visual_jenga_task/visual_jenga_00000000/
├── first_frame.png      # Initial stack of objects
├── final_frame.png      # Empty frame after all removals
├── prompt.txt           # Sequential extraction instruction
├── ground_truth.mp4     # Animation of top-to-bottom removal
└── question_metadata.json # Task metadata

File specifications:

  • Images: 1024×1024 PNG format
  • Video: MP4 format, 16 fps
  • Duration: ~5-6 seconds (depends on number of objects)

🏷️ Tags

physics sequential-reasoning object-removal spatial-ordering stacking procedural-reasoning


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages