Official Implementation of NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering.

[NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering.]
Loick Chambon, Paul Couairon, Eloi Zablocki, Alexandre Boulch, Nicolas Thome, Matthieu Cord.
Valeo.ai, Sorbonne University, CNRS.

Upsample any Vision Foundation Model features, zero-shot, to high-resolution with NAF — Neighborhood Attention Filtering.

▶️ Full quality (here)

And obtain state-of-the-art results on multiple downstream tasks for multiple VFM families, sizes and datasets:

Method	Semantic Seg.	Depth Est.	Open Vocab.	Video Prop.	⚡ FPS	📏 Max Ratio
FeatUp	4th	4th	3rd	4th	🥈	🥈
JAFAR	🥈	3rd	🥈	🥇	3rd	4th
AnyUp	3rd	🥈	4th	3rd	3rd	3rd
NAF (ours)	🥇	🥇	🥇	🥈	🥇	🥇

🏆 Performance Summary: Ranks (🥇 First · 🥈 Second)

🎯 TL;DR

Three simple steps:

Select any Vision Foundation Model (DINOv3, DINOv2, RADIO, FRANCA, PE-CORE, CLIP, SAM, etc.)
Choose your target resolution (up to 2K)
Upsample features with NAF — zero-shot, no retraining needed

Why it works: NAF combines classical filtering theory with modern attention mechanisms, learning adaptive kernels through Fourier space transformations.

Usage: To use NAF on any features, to any resolution, simply run the following code (note that natten should be installed, see INSTALL.md):

import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
naf = torch.hub.load("valeoai/NAF", "naf", pretrained=True, device=device)
naf.eval()

# High-resolution image (B, 3, H, W)
image = ...
# Low-resolution features (B, C, h, w)
lr_features = ...   
# Desired output size (H_o, W_o)
target_size = ...                                

# High-resolution features (B, C, H_o, W_o)
upsampled = naf(image, lr_features, target_size)

⚡ News & Updates

Release trained checkpoints for NAF++.
[2025-11-31] Add HuggingFace demo.
[2025-11-25] NAF has been uploaded on arXiv.
[2025-11-24] NAF code has been publicly released.

📜 Abstract

Summary

Vision Foundation Models produce downsampled spatial features, which are challenging for pixel-level tasks.

❌ Traditional upsampling methods:

Classical filters – fast, generic, but fixed (bilinear, bicubic, joint bilateral, guided)
Learnable VFM-specific upsamplers – accurate, but need retraining (FeatUp, LiFT, JAFAR, LoftUp)

✅ NAF (Neighborhood Attention Filtering):

Learns adaptive spatial-and-content weights using Cross-Scale Neighborhood Attention + RoPE
Works zero-shot for any VFM
Outperforms existing upsamplers on multiple downstream tasks
Efficient: scales up to 2K features, ~18 FPS for intermediate resolutions
Also effective for image restoration

Overview

Full abstract

Vision Foundation Models (VFMs) extract spatially downsampled representations, posing challenges for pixel-level tasks. Existing upsampling approaches face a fundamental trade-off: classical filters are fast and broadly applicable but rely on fixed forms, while modern upsamplers achieve superior accuracy through learnable, VFM-specific forms at the cost of retraining for each VFM. We introduce Neighborhood Attention Filtering (NAF), which bridges this gap by learning adaptive spatial-and-content weights through Cross-Scale Neighborhood Attention and Rotary Position Embeddings (RoPE), guided solely by the high-resolution input image. NAF operates zero-shot: it upsamples features from any VFM without retraining, making it the first VFM-agnostic architecture to outperform VFM-specific upsamplers and achieve state-of-the-art performance across multiple downstream tasks. It maintains high efficiency, scaling to 2K feature maps and reconstructing intermediate-resolution maps at 18 FPS. Beyond feature upsampling, NAF demonstrates strong performance on image restoration, highlighting its versatility.

🔄 Notebooks

We provide Jupyter notebooks to easily run NAF for inference and visualize attention maps:

Inference: notebooks/inference.ipynb runs NAF upsampler on any VFM.

NAF enables zero-shot feature upsampling across any Vision Foundation Model

Seamless upsampling from low-resolution to high-resolution features

Attention Maps: notebooks/attention_maps.ipynb visualizes NAF neighborhood attention maps.


Given a query point and a kernel size, we compute and show its neighborhood attention map.

🔨 Setup

See the docs folder for detailed setup instructions concerning installs, datasets, training and evaluation.

If you need or want to retrain NAF, it takes less than 2 hours and consumes less than 8GB of GPU memory on a single NVIDIA A100. Otherwise, we provide pretrained weights for direct evaluation. We can also share evaluation logs upon request.

👍 Acknowledgements

Many thanks to these excellent open source projects:

To structure our code we used:

Do not hesitate to look and support our previous feature upsampling work:

https://github.com/PaulCouairon/JAFAR

✏️ Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry and putting a star on this repository. Feel free to open an issue for any questions.

@misc{chambon2025nafzeroshotfeatureupsampling,
      title={NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering}, 
      author={Loick Chambon and Paul Couairon and Eloi Zablocki and Alexandre Boulch and Nicolas Thome and Matthieu Cord},
      year={2025},
      url={https://arxiv.org/abs/2511.18452}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
asset		asset
config		config
docs		docs
evaluation		evaluation
hydra_plugins		hydra_plugins
notebooks		notebooks
src		src
test		test
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
denoising.py		denoising.py
hubconf.py		hubconf.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Official Implementation of NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering.

🎯 TL;DR

⚡ News & Updates

📜 Abstract

Summary

Overview

Full abstract

🔄 Notebooks

🔨 Setup

👍 Acknowledgements

✏️ Bibtex

About

Uh oh!

Releases 1

Packages

Languages

License

valeoai/NAF

Folders and files

Latest commit

History

Repository files navigation

Official Implementation of NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering.

🎯 TL;DR

⚡ News & Updates

📜 Abstract

Summary

Overview

Full abstract

🔄 Notebooks

🔨 Setup

👍 Acknowledgements

✏️ Bibtex

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages