Skip to content

cvlab-kaist/C3G

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

C3G: Learning Compact 3D Representations
with 2K Gaussians

Honggyu An1* · Jaewoo Jung1* · Mungyeom Kim1 . Sunghwan Hong2 · Chaehyun Kim1 · Kazumi Fukuda3 · Minkyeong Jeon1 · Jisang Han1 · Takuya Narihira3 . Hyuna Ko1 . Junsu Kim1 . Yuki Mitsfuji3,4† . Seungryong Kim1†

1KAIST AI, 2ETH AI Center, ETH Zurich, 3SONY AI, 4Sony Group Corporation

*Co-first author, †Co-corresponding author

Logo

We propose a feed-forward framework for learning compact 3D representations from unposed images. Our approach estimates only 2K Gaussians that allocated in meaningful regions to enable generalizable scene reconstruction and understanding.

🚀 What to Expect

  • Pretrained weights.
  • Preprocessed version of Replica dataset.
  • Multi-view novel view synthesis evaluation code.
  • Probe3d evaluation code.

Installation

Our code is developed based on pytorch 2.5.1, CUDA 12.4 and python 3.11.

We recommend using conda for installation:

conda create -n c3g python=3.11
conda activate c3g

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt

Then, you should download VGGT pretrained weights from VGGT. Create a folder named pretrained_weights and save the file as model.pt.

Here is an example:

mkdir -p pretrained_weights
wget https://huggingface.co/facebook/VGGT-1B/resolve/main/model.pt?download=true -O ./pretrained_weights/model.pt

For LSeg feature lifting, you should download LSeg pretrained weights.

gdown 1FTuHY1xPUkM-5gaDtMfgCl3D0gR89WV7 -O ./pretrained_weights/demo_e200.ckpt

Data Preparation

For training and multi-view novel view synthesis evaluation, we use the preprocessed RealEstate10K dataset following pixelSplat and MVSplat.

For 3D scene understanding evaluation, we use ScanNet following LSM and use Replica, which we follow preprocessing and evaluation protocol of Feature 3DGS.

Pretrained Weights

Our pretrained checkpoints are available on Hugging Face.

  • gaussian_decoder.ckpt: Gaussian Decoder trained for 2-view input.

  • gaussian_decoder_multiview.ckpt: Gaussian Decoder trained for multi-view input.

  • feature_decoder_lseg.ckpt: Feature Decoder trained with the LSeg model.

  • feature_decoder_dinov3L.ckpt: Feature Decoder trained with the DINOv3-L model.

  • feature_decoder_dinov2.ckpt: Feature Decoder trained with the DINOv2-L model.

Training

Gaussian Decoder Training

To train the Gaussian Decoder, you can run the following commands.

To train the Gaussian Decoder:

python -m src.main +training=gaussian_head wandb.mode=online wandb.name="wandb_name"

To train the Gaussian Decoder when multi-view is available:

python -m src.main +training=gaussian_head_multiview wandb.mode=online wandb.name="wandb_name"

To train the Gaussian Decoder faster when multi-view is available, you can continue from the 2-view training settings:

python -m src.main +training=gaussian_head wandb.mode=online wandb.name="wandb_name" checkpointing.load="2view_checkpoint" model.decoder.low_pass_filter=0.3

If you do not want to log to wandb, just set wandb.mode=disabled

Feature Decoder Training

To train Feature Decoder, you can run the following commands.

Important

Update the CUDA Rasterizer When you change the model, you must update NUM_SEMANTIC_CHANNELS in the config file.

File: ./submodules/diff_gaussian_rasterization_w_feature_detach/cuda_rasterizer/config.h

Values:

  • 512 for LSeg
  • 768 for DINOv2-base
  • 1024 for DINOv2-large / DINOv3-large
  • 128 for VGGT-tracking

To train the Feature Decoder with various VFM models (We tested LSeg, DINOv2-base, DINOv2-large, DINOv3-large, and VGGT-Tracking):

## for LSeg
python -m src.main +training=feature_head_lseg wandb.mode=online wandb.name="wandb_name" model.encoder.pretrained_weights="2view_checkpoint"

## for DINOv2-base
python -m src.main +training=feature_head_dinov2_B wandb.mode=online wandb.name="wandb_name" model.encoder.pretrained_weights="2view_checkpoint"

## for DINOv2-large
python -m src.main +training=feature_head_dinov2_L wandb.mode=online wandb.name="wandb_name" model.encoder.pretrained_weights="2view_checkpoint"

## for DINOv3-large
python -m src.main +training=feature_head_dinov3_L wandb.mode=online wandb.name="wandb_name" model.encoder.pretrained_weights="2view_checkpoint"

## for VGGT-tracking
python -m src.main +training=feature_head_vggt wandb.mode=online wandb.name="wandb_name" model.encoder.pretrained_weights="2view_checkpoint"

If you do not want to log to wandb, just set wandb.mode=disabled

This is an example of training the Feature Decoder when multi-view input is available:

## for LSeg
python -m src.main +training=feature_head_lseg_multiview wandb.mode=online wandb.name="wandb_name" model.encoder.pretrained_weights="multiview_checkpoint"

Evaluation

Evaluation code of novel view synthesis on RealEstate10K dataset when only 2 view is available.

python -m src.main +evaluation=re10k mode=test dataset/[email protected]_sampler=evaluation dataset.re10k.view_sampler.index_path=assets/evaluation_index_re10k.json test.save_compare=true wandb.mode=online checkpointing.load="checkpoint_path" wandb.name="wandb_name" 

Evaluation code of novel view synthesis on the RealEstate10K dataset when multi-view is available.

python -m src.main +evaluation=re10k_multiview mode=test dataset/[email protected]_sampler=evaluation dataset.re10k.view_sampler.index_path=assets/evaluation_index_re10k.json test.save_compare=true wandb.mode=online checkpointing.load="checkpoint_path" wandb.name="wandb_name" 

Evaluation code of 3D scene understanding on the ScanNet dataset.

python -m src.main +evaluation=scannet wandb.mode=online mode=test test.save_compare=true test.pose_align_steps=1000 checkpointing.load="checkpoint_path" wandb.name="wandb_name" 

If you do not want to log to wandb, just set wandb.mode=disabled

Citation

@article{an2025c3g,
  title={C3G: Learning Compact 3D Representations with 2K Gaussians},
  author={An, Honggyu and Jung, Jaewoo and Kim, Mungyeom and Hong, Sunghwan and Kim, Chaehyun and Fukuda, Kazumi and Jeon, Minkyeong and Han, Jisang and Narihira, Takuya and Ko, Hyuna and others},
  journal={arXiv preprint arXiv:2512.04021},
  year={2025}
}

Acknowledgement

We thank the authors of VGGT and NoPoSplat for their excellent work and code, which served as the foundation for this project.

About

Official implementation of "C3G: Learning Compact 3D Representations with 2K Gaussians"

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •