Skip to content

Official implementation of WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

Notifications You must be signed in to change notification settings

Westlake-AGI-Lab/WorldForge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

WorldForge

arXiv  project page  Hugging Face 

WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion via Training-Free Guidance

WorldForge Teaser

Chenxi Song1, Yanming Yang1, Tong Zhao1, Ruibo Li2, Chi Zhang1*

1AGI Lab, Westlake University
2The College of Computing and Data Science, Nanyang Technological University
*Corresponding Author

TODO

  • Paper released on arXiv
  • Project page available
  • Code and implementation details (Coming very soon)
  • Inference pipeline and usage manual

Update

Introduction

Welcome to WorldForge! WorldForge is a training-free framework that unlocks the world-modeling potential of video diffusion models, delivering controllable 3D/4D generation with unprecedented realism. Our method leverages the rich latent world priors of large-scale video diffusion models to achieve precise trajectory control and photorealistic content generation without requiring additional training.

Features

  • Training-Free Framework: No additional training or fine-tuning required, preserving pretrained knowledge
  • Intra-Step Recursive Refinement (IRR): Recursive refinement mechanism during inference for precise trajectory injection
  • Flow-Gated Latent Fusion (FLF): Leverages optical flow similarity to decouple motion from appearance in latent space
  • Dual-Path Self-Corrective Guidance (DSG): Adaptively corrects trajectory drift through guided and unguided path comparison
  • 3D Scene Generation: Generate controllable 3D scenes from single view images
  • 4D Video Re-cam: Dynamic trajectory-controlled re-rendering of video content
  • Video Editing: Support for object removal, addition, face swapping, and subject transformation

Method Overview

image

WorldForge adopts a warping-and-repainting pipeline with three complementary mechanisms:

  1. IRR (Intra-Step Recursive Refinement): Enables precise trajectory injection through recursive optimization
  2. FLF (Flow-Gated Latent Fusion): Selectively injects trajectory guidance into motion-related channels
  3. DSG (Dual-Path Self-Corrective Guidance): Maintains trajectory consistency and visual fidelity

Results

3D Scene Generation from Single View

3D Scene Generation - Case 1 3D Scene Generation - Case 2
...

4D Video Re-cam

4D Video Re-cam - Case 5 4D Video Re-cam - Case 6
...

Video Editing Applications

Video Editing - Case 7 Video Editing - Case 10
...

We showcase diverse capabilities including:

  • 3D Scene Generation: Voyager experiences in artworks, AIGC content, portrait photography, city walks
  • 4D Video Re-cam: Camera arc rotation, local close-ups, outpainting, viewpoint transferring, video stabilization
  • Video Editing: Object removal/addition, face swapping, subject transformation, try-on applications

Installation and Usage

Prerequisites

# Installation instructions will be provided upon code release
# Requirements: Python 3.8+, PyTorch, etc.

Quick Start

# Code and detailed usage instructions coming soon
# Example usage will be provided here

Inference

# Detailed documentation will be available upon release

Comparisons

Our method demonstrates superior performance compared to existing SOTA methods:

  • 3D Scene Generation: More consistent scene content under novel viewpoints with improved detail and trajectory accuracy
  • 4D Video Re-cam: Realistic high-quality content re-rendering along target trajectories
  • Quantitative Metrics: Superior results in realism, trajectory consistency, and visual fidelity
Comparisons

Citation

@misc{song2025worldforgeunlockingemergent3d4d,
  title={WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance}, 
  author={Chenxi Song and Yanming Yang and Tong Zhao and Ruibo Li and Chi Zhang},
  year={2025},
  url={https://arxiv.org/abs/2509.15130}, 
}

Acknowledgments

We thank the research community for their valuable contributions to video diffusion models and 3D/4D generation. Special thanks to the following open-source projects that inspired and supported our work:

Contact

For questions and discussions, please feel free to contact:


🌟 Star us on GitHub if you find WorldForge useful! 🌟

About

Official implementation of WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published