Highlights
Training-free adaptation
Improves rare-disease grounding at inference time without updating model weights.
Language + visual evolution
Searches instruction prompts while aggregating predictions over augmented image views.
Robust localization
Reduces sensitivity to prompt wording and visual perturbations across long-tailed pathologies.
Overview
Large vision-language models struggle with clinical abnormality grounding on rare diseases and long-tailed distributions. Severe data scarcity and distribution shifts make fine-tuning impractical, while single-pass inference is highly sensitive to prompt phrasing and visual perturbations. We propose Dynamic Decision Learning (DDL), a novel framework that optimizes instruction prompts and consolidates predictions across augmented views using structured matching. Across two brain imaging benchmarks with 281 rare pathologies, DDL yields up to 105% improvement in localization and consistently outperforms other methods.
Method
Overview of the DDL framework. DDL enables test-time dynamic evolution by synergizing the Language Space and Visual Space to navigate the decision landscape toward success. It optimizes instruction prompts and consolidates predictions across multiple augmented views using structured matching, achieving significant improvements in localization reliability without any additional training.
Results
Visualization: 3B to 72B Model Scale
Qwen2.5-VL-3B Grounding Performance
Qwen2.5-VL-7B Grounding Performance
Qwen2.5-VL-32B Grounding Performance
Qwen2.5-VL-72B Grounding Performance
Scaling Performance on NOVA (Rare Diseases)
| Model Size | Method | mAP@25 | mAP@50 | mAP@75 |
|---|---|---|---|---|
| 3B | Vanilla | 0.221 | 0.088 | 0.040 |
| DDL (Ours) | 0.298 (+35%) | 0.150 (+70%) | 0.066 (+66%) | |
| 7B | Vanilla | 0.286 | 0.135 | 0.036 |
| DDL (Ours) | 0.369 (+29%) | 0.206 (+52%) | 0.075 (+105%) | |
| 32B | Vanilla | 0.406 | 0.208 | 0.058 |
| DDL (Ours) | 0.454 (+12%) | 0.266 (+28%) | 0.096 (+66%) | |
| 72B | Vanilla | 0.411 | 0.245 | 0.065 |
| DDL (Ours) | 0.500 (+22%) | 0.301 (+23%) | 0.107 (+65%) |
* DDL achieves superior scaling, even enabling the 32B model to surpass the 72B Zero-Shot baseline.
Citation
If our work is useful for your research, please consider citing our paper:
@inproceedings{li2026dynamic,
title = {Dynamic Decision Learning: Test-Time Evolution for Abnormality Grounding in Rare Diseases},
author = {Li, Jun and Liu, Mingxuan and Pan, Jiazhen and Liu, Che and Bai, Wenjia and Bercea, Cosmin I. and Schnabel, Julia A.},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2026}
}
@article{li2026dynamic_arxiv,
title = {Dynamic Decision Learning: Test-Time Evolution for Abnormality Grounding in Rare Diseases},
author = {Li, Jun and Liu, Mingxuan and Pan, Jiazhen and Liu, Che and Bai, Wenjia and Bercea, Cosmin I. and Schnabel, Julia A.},
journal = {arXiv preprint arXiv:2604.24972},
year = {2026}
}