
Overview of Nuplan-Occ dataset and the UniScenev2 pipeline. We introduce the largest semantic occupancy dataset to date, featuring dense 3D semantic annotations that contain ~19× more annotated scenes and ~18× more frames than Nuscenes-Occupancy. Facilitated with Nuplan-Occ, UniScenev2 scales up both model architecture and training data to enable high-quality occupancy expansion and forecasting, occupancy-based sparse point map condition for video generation, and sensor-specific LiDAR generation.

Visualization of scene expansion and forecasting results. UniScenev2 enables spatio-temporally disentangled generation, supporting both large-scale spatial expansion and future occupancy sequence prediction, while jointly producing multi-view video and LiDAR data in a unified pipeline.
Driving scene generation is a critical domain for autonomous driving, enabling downstream applications, including perception and planning evaluation. Occupancy-centric methods have recently achieved state-of-the-art results by offering consistent conditioning across frames and modalities; however, their performance heavily depends on annotated occupancy data, which still remains scarce. To overcome this limitation, we curate Nuplan-Occ, the largest semantic occupancy dataset to date, constructed from the widely used Nuplan benchmark. Its scale and diversity facilitate not only large-scale generative modeling but also autonomous driving downstream applications. Based on this dataset, we develop a unified framework that jointly synthesizes high-quality semantic occupancy, multi-view videos, and LiDAR point clouds. Our approach incorporates a spatio-temporal disentangled architecture to support high-fidelity spatial expansion and temporal forecasting of 4D dynamic occupancy. To bridge modal gaps, we further propose two novel techniques: a Gaussian splatting-based sparse point map rendering strategy that enhances multi-view video generation, and a sensor-aware embedding strategy that explicitly models LiDAR sensor properties for realistic multi-LiDAR simulation. Extensive experiments demonstrate that our method achieves superior generation fidelity and scalability compared to existing approaches, and validates its practical value in downstream tasks.

Nuplan-Occ dataset curation pipeline with the proposed Foreground-Background Separate Aggregate (FBSA) strategy. This strategy is composed of three key components: separated multi-frame point cloud aggregation, neural kernel-based mesh reconstruction, and hybrid semantic labeling.

The Nuplan-Occ provides dense semantic occupancy labels for 10HZ all frames in the Nuplan dataset. Compared with OpenScene, our method demonstrates high resolution (400×400×32) dense annotations with accurate geometry (e.g., clear vehicle structures and smooth road surfaces).

Comparison between Nuplan-Occ and other occupancy/LiDAR datasets. Surrounded represents surround-view image inputs. View means the number of image view inputs. C, D, and L denote camera, depth, and LiDAR, respectively.

Overall framework of UniScenev2. The joint generation process facilitates large-scale dynamic generation with an occupancy-centric hierarchy: I. Dynamic Large-scale Occupancy Generation. The optional BEV layout is concatenated with the noise volume before being fed into the occupancy spatial diffusion transformer, and decoded with the occupancy VAE decoder to generate large-scale occupancy grids. A selected occupancy scene is processed by the occupancy temporal diffusion transformer for forecasting temporal occupancy sequences. II. Occupancy-based Multi-view Video and LiDAR Generation. The occupancy is converted into 3D Gaussians and rendered into sparse semantic and depth point maps, which guide the video generation with a video diffusion transformer. The output is obtained from the video VAE decoder. For LiDAR generation, the sparse LiDAR UNet takes occupancy grids and sensor rig data as inputs, which are then passed to the LiDAR head for multi-view LiDAR generation.
@article{li2024uniscene,
title={UniScene: Unified Occupancy-centric Driving Scene Generation},
author={Li, Bohan and Guo, Jiazhe and Liu, Hongsi and Zou, Yingshuang and Ding, Yikang and Chen, Xiwu and Zhu, Hu and Tan, Feiyang and Zhang, Chi and Wang, Tiancai and others},
journal={arXiv preprint arXiv:2412.05435},
year={2024}
}
@article{li2025scaling,
title={Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method},
author={Li, Bohan and Jin, Xin and Zhu, Hu and Liu, Hongsi and Li, Ruikai and Guo, Jiazhe and Cai, Kaiwen and Ma, Chao and Jin, Yueming and Zhao, Hao and others},
journal={arXiv preprint arXiv:2510.22973},
year={2025}
}