Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection
Zhijing Wan, Zhixiang Wang, Zheng Wang, Xin Xu, Shin'ichi Satoh.ICML 2025 Oral (Top 1%)
This repository contains the official PyTorch implementation of RAM-APL, along with several competitive baselines such as MIN, kCenterGreedy, and Moderate-DS. It supports both traditional pre-trained models (e.g., the target model trained on the target dataset) and foundation models as feature extractors (e.g., CLIP, DINOv2, SigLIP, EVA-CLIP).
Make sure to install dependencies:
pip install -r requirements.txtBy default, foundation model weights (e.g., DINOv2, CLIP, SIGLIP, EVA-CLIP) are loaded offline from local directories.
It is recommended that you manually store the pre-trained weights under:
./deepcore/methods/pretrain/To enable online loading (via HuggingFace or TorchHub), please refer to the file:
./deepcore/methods/earlytrain.pySet your AutoModel / torch.hub.load methods accordingly, or configure your internet-enabled environment to fetch weights dynamically.
We provide sample commands for evaluating different subset selection methods under varying settings.
CUDA_VISIBLE_DEVICES=0 nohup python -u main.py \
--fraction 0.5 \
--dataset Pet_NOISY --noise_type symmetric --noise_rate 0.2 \
--data_path /path/to/data \
--num_exp 3 --workers 4 --optimizer SGD -se 10 \
--selection MIN --model ResNet18 --lr 0.1 \
-sp ./results/MIN_pet_sym0.2_10epoch_0.5 \
--batch 128 >> ./results/MIN_pet_sym0.2_10epoch_0.5.txt 2>&1CUDA_VISIBLE_DEVICES=0 nohup python -u main.py \
--fraction 0.5 \
--dataset_pretrain TinyImageNet --dataset Pet_NOISY \
--noise_type symmetric --noise_rate 0.2 \
--data_path /path/to/data \
--num_exp 3 --workers 4 --optimizer SGD -se 10 \
--selection MIN --model ResNet18 --lr 0.1 \
-sp ./results/MIN_pet_sym0.2_TIN_10epoch_0.5 \
--batch 128 >> ./results/MIN_pet_sym0.2_TIN_10epoch_0.5.txt 2>&1CUDA_VISIBLE_DEVICES=0 nohup python -u main.py \
--specific_model DINOV2 \
--fraction 0.5 \
--dataset Pet_NOISY --noise_type symmetric --noise_rate 0.2 \
--data_path /path/to/data \
--num_exp 3 --workers 4 --optimizer SGD -se 0 \
--selection MIN --model ResNet18 --lr 0.1 \
-sp ./results/MIN_pet_sym0.2_DINOv2_0.5 \
--batch 128 >> ./results/MIN_pet_sym0.2_DINOv2_0.5.txt 2>&1CUDA_VISIBLE_DEVICES=0 nohup python -u main.py \
--fraction 0.5 \
--dataset Pet \
--data_path /path/to/data \
--num_exp 5 --workers 4 --optimizer SGD -se 0 \
--selection RAM_APL --model ResNet18 --lr 0.1 \
-sp ./results/DINOv2Clip_mcr_a0.2k1_0.5 \
--batch 128 >> ./results/DINOv2Clip_mcr_a0.2k1_0.5.txt 2>&1.
├── main.py # Main entry point
├── utils.py/ # Training and testing functions
├── deepcore/
│ ├── methods/ # All subset selection methods
│ │ ├── min.py
│ │ ├── ram_apl.py
│ │ ├── kcentergreedy.py
│ │ └── moderate_ds.py
│ │ ├── pretrain/ # Downloaded foundation model weights (offline mode)
│ ├── nets/ # Model definitions and wrappers
│ ├── datasets/ # Dataset loaders
├── results/ # Saved logs and outputs
└── requirements.txt # Dependencies- All dataset paths are expected to be provided via
--data_path. - We support training with or without pretraining, as well as using foundation models as frozen feature extractors.
- To enable offline loading of foundation model weights (default), please download and store them under
deepcore/methods/pretrain/. - To enable online loading, refer to the logic in
deepcore/methods/earlytrain.py. - If you need help preparing your datasets or pretrained weights, feel free to open an issue (if public).
If you find our work is helpful for your research, please consider to cite:
@InProceedings{pmlr-v267-wan25f,
title = {Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection},
author = {Wan, Zhijing and Wang, Zhixiang and Wang, Zheng and Xu, Xin and Satoh, Shin'Ichi},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
pages = {62084--62101},
year = {2025},
editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
volume = {267},
series = {Proceedings of Machine Learning Research},
month = {13--19 Jul},
publisher = {PMLR},
}The implementation is based on DeepCore code. Thanks for their brilliant work!