AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

Zun Wang¹, Han Lin¹, Jaehong Yoon², Jaemin Cho³, Yue Zhang¹, Mohit Bansal¹

¹University of North Carolina, Chapel Hill · ²NTU Singapore · ³AI2

Abstract

Maintaining spatial world consistency over long horizons remains a central challenge for camera-controllable video generation. Existing memory-based approaches often condition generation on globally reconstructed 3D scenes by rendering anchor videos from the reconstructed geometry in the history. However, reconstructing a global 3D scene from multiple views inevitably introduces cross-view misalignment, as pose and depth estimation errors cause the same surfaces to be reconstructed at slightly different 3D locations across views. When fused, these inconsistencies accumulate into noisy geometry that contaminates the conditioning signals and degrades generation quality.

We introduce AnchorWeave, a memory-augmented video generation framework that replaces a single misaligned global memory with multiple clean local geometric memories and learns to reconcile their cross-view inconsistencies. To this end, AnchorWeave performs coverage-driven local memory retrieval aligned with the target trajectory and integrates the selected local memories through a multi-anchor weaving controller during generation. Extensive experiments demonstrate that AnchorWeave significantly improves long-term scene consistency while maintaining strong visual quality, with ablation and analysis studies further validating the effectiveness of local geometric conditioning, multi-anchor control, and coverage-driven retrieval.

TODO

Setup

1. Clone & Environment

git clone https://github.com/wz0919/AnchorWeave.git
cd AnchorWeave
conda create -n anchorweave python=3.10
conda activate anchorweave
pip install -r requirements.txt

2. Download Models

Place CogVideoX-5B-I2V under ./pretrained/CogVideoX-5b-I2V/:

# Download from https://github.com/THUDM/CogVideo
mkdir -p pretrained
# Place CogVideoX-5b-I2V folder in pretrained/

Training

Edit scripts/train_with_latent.sh to set video_root_dir and output_dir, then:

# Edit GPU config in training/accelerate_config_machine.yaml (num_processes)
bash scripts/train_with_latent.sh

Inference

# Place example dataset under ./data/example_dataset, then:
bash scripts/inference.sh

Acknowledgements

Citation

@article{anchorweave2025,
  title={AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories},
  author={Wang, Zun and Lin, Han and Yoon, Jaehong and Cho, Jaemin and Zhang, Yue and Bansal, Mohit},
  journal={arXiv preprint arXiv:2602.14941},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
inference		inference
scripts		scripts
training		training
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
cogvideo_controlnet_pcd.py		cogvideo_controlnet_pcd.py
cogvideo_transformer.py		cogvideo_transformer.py
controlnet_pipeline.py		controlnet_pipeline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

Abstract

TODO

Setup

1. Clone & Environment

2. Download Models

Training

Inference

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

Abstract

TODO

Setup

1. Clone & Environment

2. Download Models

Training

Inference

Acknowledgements

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages