[ICLR 2026] The repository contains the official implementation of NOVA3R. Given unposed multi-view images, NOVA3R recovers complete, non-overlapping 3D geometry, reconstructing visible and occluded regions with physical plausibility.
NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction
Weirong Chen, Chuanxia Zheng, Ganlin Zhang, Andrea Vedaldi, Daniel Cremers
ICLR 2026
[Paper] [Project Page]
- Python: 3.10
- PyTorch: 2.2+ with CUDA 12.1+
- GPU: NVIDIA GPU with ≥24GB VRAM (48GB recommended). Evaluated on NVIDIA L40s GPU.
# Clone with submodules
git clone --recursive https://github.com/wrchen530/nova3r.git
cd nova3r
# Automated setup
bash setup.sh
# Download checkpoints
bash scripts/download_checkpoints.shSee docs/INSTALL.md for manual installation.
Run 3D reconstruction on your own images:
conda activate nova3r
# Single image (scene-level)
python demo_nova3r.py \
--images demo/examples/scene_1.png \
--ckpt checkpoints/scene_n1/checkpoint-last.pth \
--resolution 518 392
# Two images (multi-view, scene-level)
python demo_nova3r.py \
--images demo/examples/scrream_scene09_200.png demo/examples/scrream_scene09_275.png \
--ckpt checkpoints/scene_n2/checkpoint-last.pth \
--resolution 518 392Output .ply point clouds and .mp4 360° videos are saved to demo/outputs/<image_name>/ (configurable with --output_dir).
Reconstruct a point cloud using the Stage 1 point-conditioned autoencoder:
# Point cloud autoencoding from a SCRREAM scene
python demo_nova3r_ae.py \
--input_ply demo/examples/scrream_scene09.ply \
--ckpt checkpoints/scene_ae/checkpoint-last.pth \
--num_queries 50000from demo_nova3r import predict
# Single image → 3D point cloud
pts3d = predict(
ckpt_path="checkpoints/scene_n1/checkpoint-last.pth",
image_paths=["path/to/image.png"],
resolution=(518, 392),
output_path="output.ply",
)
# pts3d is a numpy array of shape (N, 3)Download all checkpoints:
bash scripts/download_checkpoints.sh| Model | Training Dataset | Input | Checkpoint | Size |
|---|---|---|---|---|
| Pts2Pts (AE) | 3DFront + Scannetpp | point cloud | checkpoints/scene_ae/checkpoint-last.pth |
262 MB |
| Img2Pts (N=1) | 3DFront + Scannetpp | 1 image | checkpoints/scene_n1/checkpoint-last.pth |
5.8 GB |
| Img2Pts (N=2) | 3DFront + Scannetpp | 2 images | checkpoints/scene_n2/checkpoint-last.pth |
5.8 GB |
Checkpoints are hosted on HuggingFace.
Reproduce benchmark results:
# Download datasets
bash scripts/download_datasets.sh
# SCRREAM evaluation (1-view / 2-view)
bash scripts/eval/eval_scrream_n1_stage2.sh --data_root /path/to/datasets
bash scripts/eval/eval_scrream_n2_stage2.sh --data_root /path/to/datasetsSee docs/EVALUATION.md for detailed instructions.
If you find NOVA3R useful for your research and applications, please cite us using this BibTex:
@inproceedings{chennova3r,
title={NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction},
author={Chen, Weirong and Zheng, Chuanxia and Zhang, Ganlin and Vedaldi, Andrea and Cremers, Daniel},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026}
}This project is licensed under the Apache License 2.0. See LICENSE for full terms. Third-party code (e.g., DUSt3R, CroCo, VGGT, TripoSG) retains its original license.
We build on prior advances in multi-view 3D reconstruction, global scene representations, and flow-based generative models. Our codebase utilizes code from VGGT, DUSt3R, TripoSG, and LaRI. We sincerely appreciate the authors for their wonderful work and for releasing their code and data processing scripts.

