Dynamic Decision Learning: Test-Time Evolution for Abnormality Grounding in Rare Diseases

Dynamic Decision Learning Test-Time Evolution for Abnormality Grounding in Rare Diseases

¹Technical University of Munich ²Munich Center for Machine Learning ³University of Trento

⁴Imperial College London ⁵Helmholtz Munich ⁶King's College London

Dynamic Decision Learning evolves decisions at test time by combining prompt-space optimization and visual-view consolidation for rare-disease abnormality grounding.

Highlights

            Training-free adaptation
            Improves rare-disease grounding at inference time without updating model weights.
          

            Language + visual evolution
            Searches instruction prompts while aggregating predictions over augmented image views.
          

            Robust localization
            Reduces sensitivity to prompt wording and visual perturbations across long-tailed pathologies.
          

Overview

Large vision-language models struggle with clinical abnormality grounding on rare diseases and long-tailed distributions. Severe data scarcity and distribution shifts make fine-tuning impractical, while single-pass inference is highly sensitive to prompt phrasing and visual perturbations. We propose Dynamic Decision Learning (DDL), a novel framework that optimizes instruction prompts and consolidates predictions across augmented views using structured matching. Across two brain imaging benchmarks with 281 rare pathologies, DDL yields up to 105% improvement in localization and consistently outperforms other methods.

Method

Overview of the DDL framework. DDL enables test-time dynamic evolution by synergizing the Language Space and Visual Space to navigate the decision landscape toward success. It optimizes instruction prompts and consolidates predictions across multiple augmented views using structured matching, achieving significant improvements in localization reliability without any additional training.

Results

Visualization: 3B to 72B Model Scale

Qwen2.5-VL-3B Grounding Performance

Qwen2.5-VL-7B Grounding Performance

Qwen2.5-VL-32B Grounding Performance

Qwen2.5-VL-72B Grounding Performance

Scaling Performance on NOVA (Rare Diseases)

Model Size	Method	mAP@25	mAP@50	mAP@75
3B	Vanilla	0.221	0.088	0.040
	DDL (Ours)	0.298 (+35%)	0.150 (+70%)	0.066 (+66%)
7B	Vanilla	0.286	0.135	0.036
	DDL (Ours)	0.369 (+29%)	0.206 (+52%)	0.075 (+105%)
32B	Vanilla	0.406	0.208	0.058
	DDL (Ours)	0.454 (+12%)	0.266 (+28%)	0.096 (+66%)
72B	Vanilla	0.411	0.245	0.065
	DDL (Ours)	0.500 (+22%)	0.301 (+23%)	0.107 (+65%)

* DDL achieves superior scaling, even enabling the 32B model to surpass the 72B Zero-Shot baseline.

Citation

If our work is useful for your research, please consider citing our paper:

@inproceedings{li2026dynamic,
  title     = {Dynamic Decision Learning: Test-Time Evolution for Abnormality Grounding in Rare Diseases},
  author    = {Li, Jun and Liu, Mingxuan and Pan, Jiazhen and Liu, Che and Bai, Wenjia and Bercea, Cosmin I. and Schnabel, Julia A.},
  booktitle = {International Conference on Machine Learning (ICML)},
  year      = {2026}
}

@article{li2026dynamic_arxiv,
  title   = {Dynamic Decision Learning: Test-Time Evolution for Abnormality Grounding in Rare Diseases},
  author  = {Li, Jun and Liu, Mingxuan and Pan, Jiazhen and Liu, Che and Bai, Wenjia and Bercea, Cosmin I. and Schnabel, Julia A.},
  journal = {arXiv preprint arXiv:2604.24972},
  year    = {2026}
}

Content