C3G: Learning Compact 3D Representations
with 2K Gaussians

Honggyu An^1* · Jaewoo Jung^1* · Mungyeom Kim¹ . Sunghwan Hong² · Chaehyun Kim¹ · Kazumi Fukuda³ · Minkyeong Jeon¹ · Jisang Han¹ · Takuya Narihira³ . Hyuna Ko¹ . Junsu Kim¹ . Yuki Mitsfuji^3,4† . Seungryong Kim^1†

¹KAIST AI, ²ETH AI Center, ETH Zurich, ³SONY AI, ⁴Sony Group Corporation

^*Co-first author, †Co-corresponding author

Paper | Project Page

We propose a feed-forward framework for learning compact 3D representations from unposed images. Our approach estimates only 2K Gaussians that allocated in meaningful regions to enable generalizable scene reconstruction and understanding.

🚀 What to Expect

Pretrained weights.
Preprocessed version of Replica dataset.
Multi-view novel view synthesis evaluation code.
Probe3d evaluation code.

Installation

Our code is developed based on pytorch 2.5.1, CUDA 12.4 and python 3.11.

We recommend using conda for installation:

conda create -n c3g python=3.11
conda activate c3g

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt

Then, you should download VGGT pretrained weights from VGGT. Create a folder named pretrained_weights and save the file as model.pt.

Here is an example:

mkdir -p pretrained_weights
wget https://huggingface.co/facebook/VGGT-1B/resolve/main/model.pt?download=true -O ./pretrained_weights/model.pt

For LSeg feature lifting, you should download LSeg pretrained weights.

gdown 1FTuHY1xPUkM-5gaDtMfgCl3D0gR89WV7 -O ./pretrained_weights/demo_e200.ckpt

Data Preparation

For training and multi-view novel view synthesis evaluation, we use the preprocessed RealEstate10K dataset following pixelSplat and MVSplat.

For 3D scene understanding evaluation, we use ScanNet following LSM and use Replica, which we follow preprocessing and evaluation protocol of Feature 3DGS.

Pretrained Weights

Our pretrained checkpoints are available on Hugging Face.

gaussian_decoder.ckpt: Gaussian Decoder trained for 2-view input.
gaussian_decoder_multiview.ckpt: Gaussian Decoder trained for multi-view input.
feature_decoder_lseg.ckpt: Feature Decoder trained with the LSeg model.
feature_decoder_dinov3L.ckpt: Feature Decoder trained with the DINOv3-L model.
feature_decoder_dinov2.ckpt: Feature Decoder trained with the DINOv2-L model.

Training

Gaussian Decoder Training

To train the Gaussian Decoder, you can run the following commands.

To train the Gaussian Decoder:

python -m src.main +training=gaussian_head wandb.mode=online wandb.name="wandb_name"

To train the Gaussian Decoder when multi-view is available:

python -m src.main +training=gaussian_head_multiview wandb.mode=online wandb.name="wandb_name"

To train the Gaussian Decoder faster when multi-view is available, you can continue from the 2-view training settings:

python -m src.main +training=gaussian_head wandb.mode=online wandb.name="wandb_name" checkpointing.load="2view_checkpoint" model.decoder.low_pass_filter=0.3

If you do not want to log to wandb, just set wandb.mode=disabled

Feature Decoder Training

To train Feature Decoder, you can run the following commands.

Important

Update the CUDA Rasterizer When you change the model, you must update NUM_SEMANTIC_CHANNELS in the config file.

File: ./submodules/diff_gaussian_rasterization_w_feature_detach/cuda_rasterizer/config.h

Values:

512 for LSeg
768 for DINOv2-base
1024 for DINOv2-large / DINOv3-large
128 for VGGT-tracking

To train the Feature Decoder with various VFM models (We tested LSeg, DINOv2-base, DINOv2-large, DINOv3-large, and VGGT-Tracking):

## for LSeg
python -m src.main +training=feature_head_lseg wandb.mode=online wandb.name="wandb_name" model.encoder.pretrained_weights="2view_checkpoint"

## for DINOv2-base
python -m src.main +training=feature_head_dinov2_B wandb.mode=online wandb.name="wandb_name" model.encoder.pretrained_weights="2view_checkpoint"

## for DINOv2-large
python -m src.main +training=feature_head_dinov2_L wandb.mode=online wandb.name="wandb_name" model.encoder.pretrained_weights="2view_checkpoint"

## for DINOv3-large
python -m src.main +training=feature_head_dinov3_L wandb.mode=online wandb.name="wandb_name" model.encoder.pretrained_weights="2view_checkpoint"

## for VGGT-tracking
python -m src.main +training=feature_head_vggt wandb.mode=online wandb.name="wandb_name" model.encoder.pretrained_weights="2view_checkpoint"

If you do not want to log to wandb, just set wandb.mode=disabled

This is an example of training the Feature Decoder when multi-view input is available:

## for LSeg
python -m src.main +training=feature_head_lseg_multiview wandb.mode=online wandb.name="wandb_name" model.encoder.pretrained_weights="multiview_checkpoint"

Evaluation

Evaluation code of novel view synthesis on RealEstate10K dataset when only 2 view is available.

python -m src.main +evaluation=re10k mode=test dataset/[email protected]_sampler=evaluation dataset.re10k.view_sampler.index_path=assets/evaluation_index_re10k.json test.save_compare=true wandb.mode=online checkpointing.load="checkpoint_path" wandb.name="wandb_name"

Evaluation code of novel view synthesis on the RealEstate10K dataset when multi-view is available.

python -m src.main +evaluation=re10k_multiview mode=test dataset/[email protected]_sampler=evaluation dataset.re10k.view_sampler.index_path=assets/evaluation_index_re10k.json test.save_compare=true wandb.mode=online checkpointing.load="checkpoint_path" wandb.name="wandb_name"

Evaluation code of 3D scene understanding on the ScanNet dataset.

python -m src.main +evaluation=scannet wandb.mode=online mode=test test.save_compare=true test.pose_align_steps=1000 checkpointing.load="checkpoint_path" wandb.name="wandb_name"

If you do not want to log to wandb, just set wandb.mode=disabled

Citation

@article{an2025c3g,
  title={C3G: Learning Compact 3D Representations with 2K Gaussians},
  author={An, Honggyu and Jung, Jaewoo and Kim, Mungyeom and Hong, Sunghwan and Kim, Chaehyun and Fukuda, Kazumi and Jeon, Minkyeong and Han, Jisang and Narihira, Takuya and Ko, Hyuna and others},
  journal={arXiv preprint arXiv:2512.04021},
  year={2025}
}

Acknowledgement

We thank the authors of VGGT and NoPoSplat for their excellent work and code, which served as the foundation for this project.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
config		config
src		src
submodules		submodules
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

C3G: Learning Compact 3D Representations
with 2K Gaussians

¹KAIST AI, ²ETH AI Center, ETH Zurich, ³SONY AI, ⁴Sony Group Corporation

Paper | Project Page

🚀 What to Expect

Installation

Data Preparation

Pretrained Weights

Training

Gaussian Decoder Training

Feature Decoder Training

Evaluation

Citation

Acknowledgement

About

Uh oh!

Contributors 4

Uh oh!

Languages

License

cvlab-kaist/C3G

Folders and files

Latest commit

History

Repository files navigation

C3G: Learning Compact 3D Representations with 2K Gaussians

1KAIST AI, 2ETH AI Center, ETH Zurich, 3SONY AI, 4Sony Group Corporation

Paper | Project Page

🚀 What to Expect

Installation

Data Preparation

Pretrained Weights

Training

Gaussian Decoder Training

Feature Decoder Training

Evaluation

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 4

Uh oh!

Languages

C3G: Learning Compact 3D Representations
with 2K Gaussians

¹KAIST AI, ²ETH AI Center, ETH Zurich, ³SONY AI, ⁴Sony Group Corporation