A curated list of papers, resources and tools on Prompt-Based Adaptation (PA) for large-scale vision models.
Accepted to Transactions on Machine Learning Research (TMLR) 2026
Xi Xiao*, Yunbei Zhang*, Lin Zhao*, Yiyang Liu*, Xiaoying Liao, Zheda Mai, Xingjian Li,
Xiao Wang, Hao Xu, Jihun Hamm, Xue Lin, Min Xu, Qifan Wang, Tianyang Wang†, Cheng Han†
- [Feb 2026] Paper accepted to TMLR!
- [Oct 2025] Preprint available on arXiv.
Large vision models are typically pretrained on massive datasets and then finetuned for downstream tasks. Full finetuning is expensive and may erode pretrained knowledge. Prompt-Based Adaptation (PA) introduces small prompt parameters while freezing the backbone, efficiently steering pretrained models to new tasks.
This survey provides the first comprehensive and unified overview of PA in large vision models. We define PA as a framework covering both:
- Visual Prompting (VP): modifies the input image via pixel-space prompts.
- Visual Prompt Tuning (VPT): injects learnable tokens inside the network.
We further categorize methods by their generation mechanism into fixed, learnable, and generated prompts.
Illustration of VPT variants: Shallow, Deep, and Generated.
Illustration of VP variants: Fixed, Learned, and Generated.
- Unified Taxonomy
- Foundational CV Tasks
- Domain-Specific Applications
- PA under Constrained Learning
- Trustworthy AI
- Foundational Analysis & Theory
- Discussion & Challenges
- Related Surveys
- Citation
- Contributing
PA methods are categorized by where prompts are injected (input vs. token space) and how they are obtained (fixed, learnable, generated).
VP modifies the input before tokenization/feature extraction. Prompts are applied directly to pixels:
- VP-Fixed: no learnable parameters — static boxes, points, or masks (e.g., SAM).
- VP-Learnable: optimize pixel-space overlays, frequency cues, or masks (e.g., Fourier VP, OT-VP).
- VP-Generated: a generator produces adaptive image-level prompts (e.g., BlackVIP).
| Title | Venue | Year | Type | Notes |
|---|---|---|---|---|
| Exploring Visual Prompts | NeurIPS | 2022 | Learnable | Foundational VP |
| Visual Prompting via Inpainting | NeurIPS | 2022 | Generated | Inpainting-based |
| BlackVIP | CVPR | 2023 | Generated | Zeroth-order black-box |
| Fourier Visual Prompting | TMLR | 2024 | Learnable | Frequency-domain cues |
| OT-VP | 2025 | Learnable | Optimal transport alignment | |
| Custom SAM | 2023 | Learnable | Medical segmentation |
VPT inserts learnable tokens into frozen model layers:
- VPT-Learnable: prompt tokens are trained via gradient descent (shallow or deep injection).
- VPT-Generated: small networks produce adaptive prompt tokens.
| Title | Venue | Year | Type | Notes |
|---|---|---|---|---|
| VPT | ECCV | 2022 | Learnable | Foundational method |
| E2VPT | ICCV | 2023 | Learnable | Key-value prompts + pruning |
| LPT | ICLR | 2023 | Learnable | Long-tailed classes |
| SA2VP | AAAI | 2024 | Learnable | Spatially aligned 2D map |
| LSPT | CVPR | 2024 | Generated | Long-term spatial prompts |
| DVPT | NN | 2025 | Generated | Cross-attention generator |
- VPT reduces parameter/optimizer footprint (<0.5% of backbone parameters) but activation memory remains.
- VP-Fixed enables training-free adaptation with zero prompt-side gradients.
- VP is black-box friendly: zeroth-order optimization avoids storing parameter gradients entirely.
- Segmentation: Prompt-driven continual, multimodal, and few-shot segmentation (SAM-adapters, SA2VP).
- Restoration & Enhancement: Degradation-aware prompts for denoising, dehazing, deraining (PromptIR, PromptRestorer).
- Compression: Prompt tokens control rate-distortion trade-offs in Transformer codecs.
| Domain | Representative Methods |
|---|---|
| Medical & Biomedical | CusSAM, Ma-SAM, DVPT for segmentation & reporting |
| Remote Sensing & Geospatial | RSPrompter, ZoRI, PHTrack |
| Robotics & Embodied AI | PointCLIP, ShapeLLM, GAPrompt |
| Industrial Inspection | ClipSAM, SAID for defect detection |
| Autonomous Driving | Severity-aware prompts for adverse conditions |
| 3D Point Clouds & LiDAR | PointLoRA, PromptDet |
| Paradigm | Description |
|---|---|
| Test-Time Adaptation | On-the-fly prompt tuning (DynaPrompt, C-TPT) |
| Continual Learning | Task-incremental prompt pools |
| Few-Shot / Zero-Shot | Prompt-based transfer with limited labels |
| Black-Box | Zeroth-order learning (BlackVIP) |
| Federated Learning | Decentralized personalized prompts (FedPrompt) |
| Source-Free | Adaptation without source data (DDFP) |
PA contributes to robustness, fairness, and privacy:
- Robust prompts improve adversarial resistance.
- Fairness prompts mitigate demographic bias.
- Privacy prompts protect sensitive visual data.
The survey examines behavioral evidence and theoretical underpinnings of how prompts steer frozen representations, including attention pattern analysis, representation geometry, and expressivity bounds.
Key open challenges identified in the survey:
- Safety Alignment: Aligning prompt interventions with human values and preventing malicious use.
- Training Overhead & Stability: Reducing hyperparameter search costs and seed sensitivity.
- Inference Latency: Mitigating added memory/compute from prompt components.
- Real-World Evaluation: Moving beyond academic benchmarks to complex, distribution-shifting scenarios.
| Title | Venue | Year |
|---|---|---|
| Prompt Learning in Computer Vision: A Survey | FITEE | 2024 |
| Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models | arXiv | 2024 |
| Prompt Engineering on Vision-Language Models | arXiv | 2023 |
| Visual Prompting in MLLMs | arXiv | 2024 |
If you find this survey useful in your research, please consider citing our paper:
@article{xiao2025prompt,
title={Prompt-based Adaptation in Large-scale Vision Models: A Survey},
author={Xiao, Xi and Zhang, Yunbei and Zhao, Lin and Liu, Yiyang and Liao, Xiaoying and Mai, Zheda and Li, Xingjian and Wang, Xiao and Xu, Hao and Hamm, Jihun and Lin, Xue and Xu, Min and Wang, Qifan and Wang, Tianyang and Han, Cheng},
journal={Transactions on Machine Learning Research (TMLR)},
year={2026},
url={https://openreview.net/forum?id=UwtXDttgsE}
}We welcome new papers, implementations, and corrections! Please categorize contributions under:
- VP-Fixed / VP-Learnable / VP-Generated
- VPT-Learnable / VPT-Generated
- And note the application domain (e.g., Medical, 3D, Remote Sensing).
