Prompt-based Adaptation in Large-scale Vision Models: A Survey

A curated list of papers, resources and tools on Prompt-Based Adaptation (PA) for large-scale vision models.

Accepted to Transactions on Machine Learning Research (TMLR) 2026

Xi Xiao*, Yunbei Zhang*, Lin Zhao*, Yiyang Liu*, Xiaoying Liao, Zheda Mai, Xingjian Li,

Xiao Wang, Hao Xu, Jihun Hamm, Xue Lin, Min Xu, Qifan Wang, Tianyang Wang†, Cheng Han†

News

[Feb 2026] Paper accepted to TMLR!
[Oct 2025] Preprint available on arXiv.

Introduction

Large vision models are typically pretrained on massive datasets and then finetuned for downstream tasks. Full finetuning is expensive and may erode pretrained knowledge. Prompt-Based Adaptation (PA) introduces small prompt parameters while freezing the backbone, efficiently steering pretrained models to new tasks.

This survey provides the first comprehensive and unified overview of PA in large vision models. We define PA as a framework covering both:

Visual Prompting (VP): modifies the input image via pixel-space prompts.
Visual Prompt Tuning (VPT): injects learnable tokens inside the network.

We further categorize methods by their generation mechanism into fixed, learnable, and generated prompts.

Illustration of VPT variants: Shallow, Deep, and Generated.

Illustration of VP variants: Fixed, Learned, and Generated.

Unified Taxonomy

PA methods are categorized by where prompts are injected (input vs. token space) and how they are obtained (fixed, learnable, generated).

Visual Prompting (VP)

VP modifies the input before tokenization/feature extraction. Prompts are applied directly to pixels:

VP-Fixed: no learnable parameters — static boxes, points, or masks (e.g., SAM).
VP-Learnable: optimize pixel-space overlays, frequency cues, or masks (e.g., Fourier VP, OT-VP).
VP-Generated: a generator produces adaptive image-level prompts (e.g., BlackVIP).

Title	Venue	Year	Type	Notes
Exploring Visual Prompts	NeurIPS	2022	Learnable	Foundational VP
Visual Prompting via Inpainting	NeurIPS	2022	Generated	Inpainting-based
BlackVIP	CVPR	2023	Generated	Zeroth-order black-box
Fourier Visual Prompting	TMLR	2024	Learnable	Frequency-domain cues
OT-VP	2025	Learnable	Optimal transport alignment
Custom SAM	2023	Learnable	Medical segmentation

Visual Prompt Tuning (VPT)

VPT inserts learnable tokens into frozen model layers:

VPT-Learnable: prompt tokens are trained via gradient descent (shallow or deep injection).
VPT-Generated: small networks produce adaptive prompt tokens.

Title	Venue	Year	Type	Notes
VPT	ECCV	2022	Learnable	Foundational method
E2VPT	ICCV	2023	Learnable	Key-value prompts + pruning
LPT	ICLR	2023	Learnable	Long-tailed classes
SA2VP	AAAI	2024	Learnable	Spatially aligned 2D map
LSPT	CVPR	2024	Generated	Long-term spatial prompts
DVPT	NN	2025	Generated	Cross-attention generator

Efficiency Considerations

VPT reduces parameter/optimizer footprint (<0.5% of backbone parameters) but activation memory remains.
VP-Fixed enables training-free adaptation with zero prompt-side gradients.
VP is black-box friendly: zeroth-order optimization avoids storing parameter gradients entirely.

Foundational CV Tasks

Segmentation: Prompt-driven continual, multimodal, and few-shot segmentation (SAM-adapters, SA2VP).
Restoration & Enhancement: Degradation-aware prompts for denoising, dehazing, deraining (PromptIR, PromptRestorer).
Compression: Prompt tokens control rate-distortion trade-offs in Transformer codecs.

Domain-Specific Applications

Domain	Representative Methods
Medical & Biomedical	CusSAM, Ma-SAM, DVPT for segmentation & reporting
Remote Sensing & Geospatial	RSPrompter, ZoRI, PHTrack
Robotics & Embodied AI	PointCLIP, ShapeLLM, GAPrompt
Industrial Inspection	ClipSAM, SAID for defect detection
Autonomous Driving	Severity-aware prompts for adverse conditions
3D Point Clouds & LiDAR	PointLoRA, PromptDet

PA under Constrained Learning

Paradigm	Description
Test-Time Adaptation	On-the-fly prompt tuning (DynaPrompt, C-TPT)
Continual Learning	Task-incremental prompt pools
Few-Shot / Zero-Shot	Prompt-based transfer with limited labels
Black-Box	Zeroth-order learning (BlackVIP)
Federated Learning	Decentralized personalized prompts (FedPrompt)
Source-Free	Adaptation without source data (DDFP)

Trustworthy AI

PA contributes to robustness, fairness, and privacy:

Robust prompts improve adversarial resistance.
Fairness prompts mitigate demographic bias.
Privacy prompts protect sensitive visual data.

Foundational Analysis & Theory

The survey examines behavioral evidence and theoretical underpinnings of how prompts steer frozen representations, including attention pattern analysis, representation geometry, and expressivity bounds.

Discussion & Challenges

Key open challenges identified in the survey:

Safety Alignment: Aligning prompt interventions with human values and preventing malicious use.
Training Overhead & Stability: Reducing hyperparameter search costs and seed sensitivity.
Inference Latency: Mitigating added memory/compute from prompt components.
Real-World Evaluation: Moving beyond academic benchmarks to complex, distribution-shifting scenarios.

Related Surveys

Title	Venue	Year
Prompt Learning in Computer Vision: A Survey	FITEE	2024
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models	arXiv	2024
Prompt Engineering on Vision-Language Models	arXiv	2023
Visual Prompting in MLLMs	arXiv	2024

Citation

If you find this survey useful in your research, please consider citing our paper:

@article{xiao2025prompt,
  title={Prompt-based Adaptation in Large-scale Vision Models: A Survey},
  author={Xiao, Xi and Zhang, Yunbei and Zhao, Lin and Liu, Yiyang and Liao, Xiaoying and Mai, Zheda and Li, Xingjian and Wang, Xiao and Xu, Hao and Hamm, Jihun and Lin, Xue and Xu, Min and Wang, Qifan and Wang, Tianyang and Han, Cheng},
  journal={Transactions on Machine Learning Research (TMLR)},
  year={2026},
  url={https://openreview.net/forum?id=UwtXDttgsE}
}

Contributing

We welcome new papers, implementations, and corrections! Please categorize contributions under:

VP-Fixed / VP-Learnable / VP-Generated
VPT-Learnable / VPT-Generated
And note the application domain (e.g., Medical, 3D, Remote Sensing).

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompt-based Adaptation in Large-scale Vision Models: A Survey

News

Introduction

Table of Contents

Unified Taxonomy

Visual Prompting (VP)

Visual Prompt Tuning (VPT)

Efficiency Considerations

Foundational CV Tasks

Domain-Specific Applications

PA under Constrained Learning

Trustworthy AI

Foundational Analysis & Theory

Discussion & Challenges

Related Surveys

Citation

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Prompt-based Adaptation in Large-scale Vision Models: A Survey

News

Introduction

Table of Contents

Unified Taxonomy

Visual Prompting (VP)

Visual Prompt Tuning (VPT)

Efficiency Considerations

Foundational CV Tasks

Domain-Specific Applications

PA under Constrained Learning

Trustworthy AI

Foundational Analysis & Theory

Discussion & Challenges

Related Surveys

Citation

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages