LoftUp: A Coordinate-Based Feature Upsampler for Vision Foundation Models

ICCV2025 (oral)

Haiwen Huang, Anpei Chen, Volodymyr Havrylov, Andreas Geiger, Dan Zhang

TL;DR: LoftUp achieves the strongest feature upsampling performance at a comparable speed to bilinear upsampling.

Install

In general, LoftUp can run with most recent pytorch environments. We encourage the users to try out LoftUp in their exisitng environment first.

We also provide two yaml file for installation. To use them, simply run:

conda env create -f environment_cuda11.yaml

or

conda env create -f environment.yaml

Inference with pretrained upsamplers

All pre-trained upsamplers are available on 🤗 here: https://huggingface.co/models?search=loftup.

We provide example code for using LoftUp in example_usage.py. Currently we provide:

Backbone Name	Featurizer Class	HF hub	Torch Hub Repo	Torch Hub Name
DINOv2 S/14	dinov2	haiwen/loftup-dinov2s	andrehuang/loftup	loftup_dinov2s
DINOv2 S/14 + Reg	dinov2s_reg	haiwen/loftup-dinov2s_reg	andrehuang/loftup	loftup_dinov2s_reg
DINOv2 B/14	dinov2b	haiwen/loftup-dinov2b	andrehuang/loftup	loftup_dinov2b
DINOv2 B/14 + Reg	dinov2b_reg	haiwen/loftup-dinov2b_reg	andrehuang/loftup	loftup_dinov2b_reg
CLIP ViT B/16	clip	haiwen/loftup-clip	andrehuang/loftup	loftup_clip
SigLIP ViT B/16	siglip	haiwen/loftup-siglip	andrehuang/loftup	loftup_siglip
SigLIP2 ViT B/16	siglip2	haiwen/loftup-siglip2	andrehuang/loftup	loftup_siglip2

To use torch hub checkpoints, simply run

upsampler = torch.hub.load('andrehuang/loftup', model_torch_hub_name, pretrained=True)

For example, upsampler = torch.hub.load('andrehuang/loftup', loftup_dinov2s, pretrained=True).

The upsampler class is defined at UpsamplerwithChannelNorm.

Evaluation on Downstream Tasks

Dataset Preparation

See Preparing Datasets for Evaluation.

Semantic Segmentation

For semantic segmentation, our implementation is adapted from FeatUp. You can use eval_seg.py by running:

python eval_seg.py  ++upsampler_path=/path/to/your/upsampler

You can also configure other hyper-parameters such as output_dir and dataset directory. The config file is configs/eval_seg.yaml.

Video Object Segmentation

For video object segmentation on DAVIS, our code is modified from the implementation in LiFT. Specifically, we first extract segmentaiton results by running:

    python eval_davis.py --dataroot your_davis_data_dir --model_type "dinov2" --output_dir your_output_dir --imsize 224 --upsampler_path=your_upsampler_path

Then run the following to get evaluation results:

python davis2017-evaluation/evaluation_method.py --davis_path /your_davis_data_dir --task semi-supervised --results_path your_output_dir/davis_vidseg_224 --imsize 224

Others

For interactive segmentation, please check out iSegProbe.

For open-vocabulary segmentation, please check out ProxyCLIP.

For depth and normal estimation, please check out Probe3D.

Training LoftUp Upsamplers

This repository contains training scripts for training LoftUp upsamplers. The training is done in two stages:

Stage 1: Basic Feature Upsampling

Stage 1 training (train_loftup_stage1.py) trains upsamplers to convert low-resolution features to high-resolution features using reconstruction loss.

Example training command:

python train_loftup_stage1.py ++dataset="sa1b" ++epochs=1 ++batch_size=2 ++num_gpus=4 ++model_type="dinov2" ++pytorch_data_dir='datasets' ++upsampler_type="loftup" ++sam_mask_alpha=0.8 ++load_size=224 ++upsample_size=224 ++tv_weight=0.001 ++clamp_featup=True

Stage 2: High-Resolution Supervision

Stage 2 training (train_loftup_stage2.py) fine-tunes the Stage 1 upsampler with high-resolution supervision for improved quality.

Example training command:

python train_loftup_stage2.py ++dataset="sa1b" ++epochs=1 ++hr_res=896 ++batch_size=2 ++consistency_method="bilinear" ++model_type="dinov2" ++num_gpus=4 ++affinity_loss=True ++pytorch_data_dir='datasets' ++pretrained_upsampler="path/to/stage1_checkpoint.ckpt" ++upsampler_type="loftup" ++sam_mask_hr_alpha=0.5 ++sam_mask_reg=0.0 ++lr=1e-3 ++use_featup=False ++aug_size ++n_jitters=2

Configuration

Both training scripts use Hydra for configuration management. Configuration files are located in configs/:

configs/train_loftup_stage1.yaml - Stage 1 configuration
configs/train_loftup_stage2.yaml - Stage 2 configuration

Key configuration parameters:

model_type: Feature extractor type (e.g., "dinov2", "clip")
upsampler_type: Type of upsampler to train (e.g., "loftup")
batch_size: Training batch size
epochs: Number of training epochs
lr: Learning rate
load_size: Input image size for feature extraction
upsample_size: Target size for upsampled features
n_jitters: Number of jittering augmentations per training step
tv_weight: Weight for total variation loss
sam_mask_alpha: Weight for SAM mask adjustment (Stage 1)
sam_mask_hr_alpha: Weight for SAM mask adjustment (Stage 2)

For more details, see the configuration files in configs/ and the training scripts themselves.

Citation

If you find our work helpful, please cite:

@misc{huang2025loftuplearningcoordinatebasedfeature,
      title={LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models}, 
      author={Haiwen Huang and Anpei Chen and Volodymyr Havrylov and Andreas Geiger and Dan Zhang},
      year={2025},
      eprint={2504.14032},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.14032}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
configs		configs
datasets		datasets
davis2017-evaluation		davis2017-evaluation
examples		examples
featurizers		featurizers
figures		figures
tools		tools
upsamplers		upsamplers
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
ema.py		ema.py
environment.yaml		environment.yaml
environment_cuda11.yaml		environment_cuda11.yaml
eval_davis_video_seg.py		eval_davis_video_seg.py
eval_seg.py		eval_seg.py
example_usage.py		example_usage.py
gen_video_davis.py		gen_video_davis.py
hubconf.py		hubconf.py
push_to_hf.py		push_to_hf.py
train_loftup_stage1.py		train_loftup_stage1.py
train_loftup_stage2.py		train_loftup_stage2.py
training_utils.py		training_utils.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LoftUp: A Coordinate-Based Feature Upsampler for Vision Foundation Models

Contents

Install

Inference with pretrained upsamplers

Evaluation on Downstream Tasks

Dataset Preparation

Semantic Segmentation

Video Object Segmentation

Others

Training LoftUp Upsamplers

Stage 1: Basic Feature Upsampling

Stage 2: High-Resolution Supervision

Configuration

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

andrehuang/loftup

Folders and files

Latest commit

History

Repository files navigation

LoftUp: A Coordinate-Based Feature Upsampler for Vision Foundation Models

Contents

Install

Inference with pretrained upsamplers

Evaluation on Downstream Tasks

Dataset Preparation

Semantic Segmentation

Video Object Segmentation

Others

Training LoftUp Upsamplers

Stage 1: Basic Feature Upsampling

Stage 2: High-Resolution Supervision

Configuration

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages