Skip to content

yunbeizhang/Awesome-Visual-Prompt-Tuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prompt-based Adaptation in Large-scale Vision Models: A Survey

TMLR arXiv OpenReview Project Page Awesome PRs Welcome

A curated list of papers, resources and tools on Prompt-Based Adaptation (PA) for large-scale vision models.

Accepted to Transactions on Machine Learning Research (TMLR) 2026

Xi Xiao*, Yunbei Zhang*, Lin Zhao*, Yiyang Liu*, Xiaoying Liao, Zheda Mai, Xingjian Li,

Xiao Wang, Hao Xu, Jihun Hamm, Xue Lin, Min Xu, Qifan Wang, Tianyang Wang†, Cheng Han†


Comparison of transfer learning and prompt-based adaptation methods

News

  • [Feb 2026] Paper accepted to TMLR!
  • [Oct 2025] Preprint available on arXiv.

Introduction

Large vision models are typically pretrained on massive datasets and then finetuned for downstream tasks. Full finetuning is expensive and may erode pretrained knowledge. Prompt-Based Adaptation (PA) introduces small prompt parameters while freezing the backbone, efficiently steering pretrained models to new tasks.

This survey provides the first comprehensive and unified overview of PA in large vision models. We define PA as a framework covering both:

  • Visual Prompting (VP): modifies the input image via pixel-space prompts.
  • Visual Prompt Tuning (VPT): injects learnable tokens inside the network.

We further categorize methods by their generation mechanism into fixed, learnable, and generated prompts.

VPT variants
Illustration of VPT variants: Shallow, Deep, and Generated.

VP variants
Illustration of VP variants: Fixed, Learned, and Generated.


Table of Contents


Unified Taxonomy

PA methods are categorized by where prompts are injected (input vs. token space) and how they are obtained (fixed, learnable, generated).

Visual Prompting (VP)

VP modifies the input before tokenization/feature extraction. Prompts are applied directly to pixels:

  • VP-Fixed: no learnable parameters — static boxes, points, or masks (e.g., SAM).
  • VP-Learnable: optimize pixel-space overlays, frequency cues, or masks (e.g., Fourier VP, OT-VP).
  • VP-Generated: a generator produces adaptive image-level prompts (e.g., BlackVIP).
Title Venue Year Type Notes
Exploring Visual Prompts NeurIPS 2022 Learnable Foundational VP
Visual Prompting via Inpainting NeurIPS 2022 Generated Inpainting-based
BlackVIP CVPR 2023 Generated Zeroth-order black-box
Fourier Visual Prompting TMLR 2024 Learnable Frequency-domain cues
OT-VP 2025 Learnable Optimal transport alignment
Custom SAM 2023 Learnable Medical segmentation

Visual Prompt Tuning (VPT)

VPT inserts learnable tokens into frozen model layers:

  • VPT-Learnable: prompt tokens are trained via gradient descent (shallow or deep injection).
  • VPT-Generated: small networks produce adaptive prompt tokens.
Title Venue Year Type Notes
VPT ECCV 2022 Learnable Foundational method
E2VPT ICCV 2023 Learnable Key-value prompts + pruning
LPT ICLR 2023 Learnable Long-tailed classes
SA2VP AAAI 2024 Learnable Spatially aligned 2D map
LSPT CVPR 2024 Generated Long-term spatial prompts
DVPT NN 2025 Generated Cross-attention generator

Efficiency Considerations

  • VPT reduces parameter/optimizer footprint (<0.5% of backbone parameters) but activation memory remains.
  • VP-Fixed enables training-free adaptation with zero prompt-side gradients.
  • VP is black-box friendly: zeroth-order optimization avoids storing parameter gradients entirely.

Foundational CV Tasks

  • Segmentation: Prompt-driven continual, multimodal, and few-shot segmentation (SAM-adapters, SA2VP).
  • Restoration & Enhancement: Degradation-aware prompts for denoising, dehazing, deraining (PromptIR, PromptRestorer).
  • Compression: Prompt tokens control rate-distortion trade-offs in Transformer codecs.

Domain-Specific Applications

Domain Representative Methods
Medical & Biomedical CusSAM, Ma-SAM, DVPT for segmentation & reporting
Remote Sensing & Geospatial RSPrompter, ZoRI, PHTrack
Robotics & Embodied AI PointCLIP, ShapeLLM, GAPrompt
Industrial Inspection ClipSAM, SAID for defect detection
Autonomous Driving Severity-aware prompts for adverse conditions
3D Point Clouds & LiDAR PointLoRA, PromptDet

PA under Constrained Learning

Paradigm Description
Test-Time Adaptation On-the-fly prompt tuning (DynaPrompt, C-TPT)
Continual Learning Task-incremental prompt pools
Few-Shot / Zero-Shot Prompt-based transfer with limited labels
Black-Box Zeroth-order learning (BlackVIP)
Federated Learning Decentralized personalized prompts (FedPrompt)
Source-Free Adaptation without source data (DDFP)

Trustworthy AI

PA contributes to robustness, fairness, and privacy:

  • Robust prompts improve adversarial resistance.
  • Fairness prompts mitigate demographic bias.
  • Privacy prompts protect sensitive visual data.

Foundational Analysis & Theory

The survey examines behavioral evidence and theoretical underpinnings of how prompts steer frozen representations, including attention pattern analysis, representation geometry, and expressivity bounds.


Discussion & Challenges

Key open challenges identified in the survey:

  1. Safety Alignment: Aligning prompt interventions with human values and preventing malicious use.
  2. Training Overhead & Stability: Reducing hyperparameter search costs and seed sensitivity.
  3. Inference Latency: Mitigating added memory/compute from prompt components.
  4. Real-World Evaluation: Moving beyond academic benchmarks to complex, distribution-shifting scenarios.

Related Surveys

Title Venue Year
Prompt Learning in Computer Vision: A Survey FITEE 2024
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models arXiv 2024
Prompt Engineering on Vision-Language Models arXiv 2023
Visual Prompting in MLLMs arXiv 2024

Citation

If you find this survey useful in your research, please consider citing our paper:

@article{xiao2025prompt,
  title={Prompt-based Adaptation in Large-scale Vision Models: A Survey},
  author={Xiao, Xi and Zhang, Yunbei and Zhao, Lin and Liu, Yiyang and Liao, Xiaoying and Mai, Zheda and Li, Xingjian and Wang, Xiao and Xu, Hao and Hamm, Jihun and Lin, Xue and Xu, Min and Wang, Qifan and Wang, Tianyang and Han, Cheng},
  journal={Transactions on Machine Learning Research (TMLR)},
  year={2026},
  url={https://openreview.net/forum?id=UwtXDttgsE}
}

Contributing

We welcome new papers, implementations, and corrections! Please categorize contributions under:

  • VP-Fixed / VP-Learnable / VP-Generated
  • VPT-Learnable / VPT-Generated
  • And note the application domain (e.g., Medical, 3D, Remote Sensing).

About

[TMLR] A curated list of awesome papers, resources, and tools for Visual Prompt Tuning (VPT).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages