Long-Horizon Action-Conditioned 4D Scene Generation

Abstract

We introduce PerpetualWonder, a hybrid generative simulator that enables long-horizon, action-conditioned 4D scene generation from a single image. Current works fail at this task because their physical state is decoupled from their visual representation, which prevents generative refinements to update the underlying physics for subsequent interactions. PerpetualWonder solves this by introducing the first true closed-loop system. It features a novel unified representation that creates a bidirectional link between the physical state and visual primitives, allowing generative refinements to correct both the dynamics and appearance. It also introduces a robust update mechanism that gathers supervision from multiple viewpoints to resolve optimization ambiguity. Experiments demonstrate that from a single image, PerpetualWonder can successfully simulate complex, multi-step interactions from long-horizon actions, maintaining physical plausibility and visual consistency.

PerpetualWonder is the successor to our prior work Wonderplay.

Getting Started

For the installation to be done correctly, please proceed only with CUDA-compatible GPU available. It requires 80GB GPU memory to run.

Tested Environment:

PyTorch: 2.7.1+cu126
CUDA: 12.6

Installation

This project requires two conda environments:

pw: Main environment for running the primary functionality of the project
cosmos-predict1: Used for generating multi-view videos with GEN3C

1. Install Main Environment (pw)

Create and activate the main environment:

conda create -n pw python=3.10
conda activate pw

Install COLMAP:

conda install -c conda-forge colmap==3.11.1

Install Python dependencies:

pip install -r requirements.txt

Install Segment Anything 2:

cd submodules/sam2
pip install -e .
cd ../..

Install submodules:

cd submodules/depth_diff_gaussian_rasterization_min
pip install -e . --no-build-isolation
cd ../diff-gaussian-rasterization-main
pip install -e . --no-build-isolation
cd ../..

Install Genesis:

cd Genesis
pip install -e .
cd ..

Install gsplat (for Gaussian rendering):

cd gsplat
pip install -e . --no-build-isolation
cd ..

Install simpe_knn:

git submodule update --init --recursive submodules/simple_knn
cd submodules/simple_knn
pip install -e . --no-build-isolation
cd ../..

2. Install GEN3C Environment (cosmos-predict1)

Initialize the GEN3C submodule and set up the cosmos-predict1 environment:

git submodule update --remote --init submodules/GEN3C
cd submodules/GEN3C
conda env create --file cosmos-predict1.yaml
conda activate cosmos-predict1
pip install -r requirements.txt
ln -sf $CONDA_PREFIX/lib/python3.10/site-packages/nvidia/*/include/* $CONDA_PREFIX/include/
ln -sf $CONDA_PREFIX/lib/python3.10/site-packages/nvidia/*/include/* $CONDA_PREFIX/include/python3.10
pip install transformer-engine[pytorch]==1.12.0
git clone https://github.com/NVIDIA/apex
CUDA_HOME=$CONDA_PREFIX pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./apex
pip install git+https://github.com/microsoft/MoGe.git
cd ../..

Run examples

We provide three example cases: play_doh, dumpling and jam. Each example has a corresponding shell script in the scripts/ directory that contains the complete pipeline commands.

The pipeline consists of three main stages:

Stage 1: Scene Reconstruction

First, generate multi-view videos using GEN3C. Note that you need to modify examples/configs/{scene_name}.yaml to set gen3c_config.trajectory to ["spin_right", "spin_left"]:

conda activate cosmos-predict1
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python PerpetualWonder/reconstruction/gen3c_single_image.py --config_path examples/configs/jam.yaml

Then, switch to the main environment and perform scene reconstruction:

conda activate pw
python PerpetualWonder/reconstruction/colmap.py 3d_result/jam/images 3d_result/jam
python PerpetualWonder/reconstruction/seg_video.py --config_path examples/configs/jam.yaml
python PerpetualWonder/reconstruction/simple_trainer_2dgs_seg.py --config examples/configs/jam.yaml
python PerpetualWonder/reconstruction/segment_gaussians.py --config_path examples/configs/jam.yaml

Stage 2: Forward Pass and Backward Optimization

After completing the scene reconstruction, the pipeline alternates between forward pass and backward optimization for multiple rounds (Here, we take the first round for example.).

Forward Pass: Run a simulator simulation, then use the simulated physical particle trajectories to move the Gaussians and render the results:

python PerpetualWonder/forwardpass/simulation.py --config examples/configs/jam.yaml --round_num 1
python PerpetualWonder/forwardpass/render_particle_dynamics.py --config examples/configs/jam.yaml --round_num 1

Backward Optimization: Use the rendered results to refine with the video model.

For the first refinement, modify the corresponding config.yaml file by setting simulator_config.camera_list to [121], which refines the front view, then optimize the Gaussian scene:

python PerpetualWonder/optimization/run_video_model_v2.py --config examples/configs/jam.yaml --round_num 1
python PerpetualWonder/optimization/run_optim_4d_v2.py --config examples/configs/jam.yaml --round_num 1 --semi_round False

Then perform side-view refinement by setting simulator_config.camera_list to [83, 159]:

python PerpetualWonder/optimization/run_video_model_v2.py --config examples/configs/jam.yaml --round_num 1
python PerpetualWonder/optimization/run_optim_4d_v2.py --config examples/configs/jam.yaml --round_num 1 --semi_round True

For subsequent rounds (round 2 and round 3), simply change the round_num parameter and repeat the forward pass and backward optimization steps.

Notes

Conda Environments: Two conda environments are required:

cosmos-predict1: Used only for GEN3C to generate multi-view videos (activate with conda activate cosmos-predict1)
pw: Used for all subsequent steps in the pipeline (activate with conda activate pw)

Reproducibility: : Due to the stochastic nature of video model generation, to help reproduce the results in the paper, we will provide some intermediate results in 3d_result/, you can download it from https://downloads.cs.stanford.edu/viscam/PerpetualWonder/3d_result_release.zip:

3d_result/{scene_name}/stage1_reconstruction/3d/images: Generated by GEN3C (multi-view video generation)
3d_result/{scene_name}/stage3_optimization/go_with_flow: Results from video model refinement

How to add more examples?

To add a new example, follow these steps:

Create image directory: Add a new folder named {scene_name} under examples/imgs/ and place images with dimensions 1280×704 in that folder.
Create configuration file: Create a {scene_name}.yaml configuration file in examples/configs/ directory to configure your scene.
Configure simulation: Add simulation-specific configuration in PerpetualWonder/forwardpass/simulation_config.py based on the type of physics simulation you want to perform.

Acknowledgement

Our code references and builds upon the following open-source projects:

We are grateful to the authors and contributors of these projects for their valuable work.

References

@misc{zhan2026perpetualwonderlonghorizonactionconditioned4d, 
       title={PerpetualWonder: Long-Horizon Action-Conditioned 4D Scene Generation}, 
       author={Jiahao Zhan and Zizhang Li and Hong-Xing Yu and Jiajun Wu}, 
       year={2026}, 
       eprint={2602.04876}, 
       archivePrefix={arXiv}, 
       primaryClass={cs.CV}, 
       url={https://arxiv.org/abs/2602.04876},  
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Genesis		Genesis
PerpetualWonder		PerpetualWonder
assets		assets
examples		examples
gaussian_renderer		gaussian_renderer
gsplat		gsplat
models		models
scene		scene
scripts		scripts
submodules		submodules
toolkit		toolkit
video_models		video_models
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Long-Horizon Action-Conditioned 4D Scene Generation

Abstract

Getting Started

Installation

1. Install Main Environment (pw)

2. Install GEN3C Environment (cosmos-predict1)

Run examples

Stage 1: Scene Reconstruction

Stage 2: Forward Pass and Backward Optimization

Notes

How to add more examples?

Acknowledgement

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Long-Horizon Action-Conditioned 4D Scene Generation

Abstract

Getting Started

Installation

1. Install Main Environment (pw)

2. Install GEN3C Environment (cosmos-predict1)

Run examples

Stage 1: Scene Reconstruction

Stage 2: Forward Pass and Backward Optimization

Notes

How to add more examples?

Acknowledgement

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages