ICML 2026

Dynamic Decision Learning Test-Time Evolution for Abnormality Grounding in Rare Diseases

1Technical University of Munich 2Munich Center for Machine Learning 3University of Trento
4Imperial College London 5Helmholtz Munich 6King's College London
DDL teaser

Dynamic Decision Learning evolves decisions at test time by combining prompt-space optimization and visual-view consolidation for rare-disease abnormality grounding.

Highlights

Training-free adaptation Improves rare-disease grounding at inference time without updating model weights.
Language + visual evolution Searches instruction prompts while aggregating predictions over augmented image views.
Robust localization Reduces sensitivity to prompt wording and visual perturbations across long-tailed pathologies.

Overview

Large vision-language models struggle with clinical abnormality grounding on rare diseases and long-tailed distributions. Severe data scarcity and distribution shifts make fine-tuning impractical, while single-pass inference is highly sensitive to prompt phrasing and visual perturbations. We propose Dynamic Decision Learning (DDL), a novel framework that optimizes instruction prompts and consolidates predictions across augmented views using structured matching. Across two brain imaging benchmarks with 281 rare pathologies, DDL yields up to 105% improvement in localization and consistently outperforms other methods.

Method

Overview of the DDL framework. DDL enables test-time dynamic evolution by synergizing the Language Space and Visual Space to navigate the decision landscape toward success. It optimizes instruction prompts and consolidates predictions across multiple augmented views using structured matching, achieving significant improvements in localization reliability without any additional training.
Method

Results

Visualization: 3B to 72B Model Scale

3B Results

Qwen2.5-VL-3B Grounding Performance

7B Results

Qwen2.5-VL-7B Grounding Performance

32B Results

Qwen2.5-VL-32B Grounding Performance

72B Results

Qwen2.5-VL-72B Grounding Performance

Scaling Performance on NOVA (Rare Diseases)

Model Size Method mAP@25 mAP@50 mAP@75
3B Vanilla 0.221 0.088 0.040
DDL (Ours) 0.298 (+35%) 0.150 (+70%) 0.066 (+66%)
7B Vanilla 0.286 0.135 0.036
DDL (Ours) 0.369 (+29%) 0.206 (+52%) 0.075 (+105%)
32B Vanilla 0.406 0.208 0.058
DDL (Ours) 0.454 (+12%) 0.266 (+28%) 0.096 (+66%)
72B Vanilla 0.411 0.245 0.065
DDL (Ours) 0.500 (+22%) 0.301 (+23%) 0.107 (+65%)

* DDL achieves superior scaling, even enabling the 32B model to surpass the 72B Zero-Shot baseline.

Citation

If our work is useful for your research, please consider citing our paper:

@inproceedings{li2026dynamic,
  title     = {Dynamic Decision Learning: Test-Time Evolution for Abnormality Grounding in Rare Diseases},
  author    = {Li, Jun and Liu, Mingxuan and Pan, Jiazhen and Liu, Che and Bai, Wenjia and Bercea, Cosmin I. and Schnabel, Julia A.},
  booktitle = {International Conference on Machine Learning (ICML)},
  year      = {2026}
}

@article{li2026dynamic_arxiv,
  title   = {Dynamic Decision Learning: Test-Time Evolution for Abnormality Grounding in Rare Diseases},
  author  = {Li, Jun and Liu, Mingxuan and Pan, Jiazhen and Liu, Che and Bai, Wenjia and Bercea, Cosmin I. and Schnabel, Julia A.},
  journal = {arXiv preprint arXiv:2604.24972},
  year    = {2026}
}