Skip to content

This repository is the official implementation of Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection, accepted at ICML 2025.

License

Notifications You must be signed in to change notification settings

ZhijingWan/RAM-APL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License

Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection
Zhijing Wan, Zhixiang Wang, Zheng Wang, Xin Xu, Shin'ichi Satoh.

ICML 2025 Oral (Top 1%)

Arxiv Paper

📌 Overview

This repository contains the official PyTorch implementation of RAM-APL, along with several competitive baselines such as MIN, kCenterGreedy, and Moderate-DS. It supports both traditional pre-trained models (e.g., the target model trained on the target dataset) and foundation models as feature extractors (e.g., CLIP, DINOv2, SigLIP, EVA-CLIP).

🔧 Installation

Make sure to install dependencies:

pip install -r requirements.txt

🛆 Foundation Model Weights (Important)

By default, foundation model weights (e.g., DINOv2, CLIP, SIGLIP, EVA-CLIP) are loaded offline from local directories.

It is recommended that you manually store the pre-trained weights under:

./deepcore/methods/pretrain/

To enable online loading (via HuggingFace or TorchHub), please refer to the file:

./deepcore/methods/earlytrain.py

Set your AutoModel / torch.hub.load methods accordingly, or configure your internet-enabled environment to fetch weights dynamically.


🧪 Example Usage

We provide sample commands for evaluating different subset selection methods under varying settings.

➔ 1. Single-model Study: MIN method on Oxford-IIITPet with 20% symmetric label noise

a) With pretraining (Traditional IE):

CUDA_VISIBLE_DEVICES=0 nohup python -u main.py \
  --fraction 0.5 \
  --dataset Pet_NOISY --noise_type symmetric --noise_rate 0.2 \
  --data_path /path/to/data \
  --num_exp 3 --workers 4 --optimizer SGD -se 10 \
  --selection MIN --model ResNet18 --lr 0.1 \
  -sp ./results/MIN_pet_sym0.2_10epoch_0.5 \
  --batch 128 >> ./results/MIN_pet_sym0.2_10epoch_0.5.txt 2>&1

b) With TinyImageNet pretraining:

CUDA_VISIBLE_DEVICES=0 nohup python -u main.py \
  --fraction 0.5 \
  --dataset_pretrain TinyImageNet --dataset Pet_NOISY \
  --noise_type symmetric --noise_rate 0.2 \
  --data_path /path/to/data \
  --num_exp 3 --workers 4 --optimizer SGD -se 10 \
  --selection MIN --model ResNet18 --lr 0.1 \
  -sp ./results/MIN_pet_sym0.2_TIN_10epoch_0.5 \
  --batch 128 >> ./results/MIN_pet_sym0.2_TIN_10epoch_0.5.txt 2>&1

c) With DINOv2 feature extraction (zero pretrain epochs):

CUDA_VISIBLE_DEVICES=0 nohup python -u main.py \
  --specific_model DINOV2 \
  --fraction 0.5 \
  --dataset Pet_NOISY --noise_type symmetric --noise_rate 0.2 \
  --data_path /path/to/data \
  --num_exp 3 --workers 4 --optimizer SGD -se 0 \
  --selection MIN --model ResNet18 --lr 0.1 \
  -sp ./results/MIN_pet_sym0.2_DINOv2_0.5 \
  --batch 128 >> ./results/MIN_pet_sym0.2_DINOv2_0.5.txt 2>&1

➔ 2. RAM-APL on Pet dataset

CUDA_VISIBLE_DEVICES=0 nohup python -u main.py \
  --fraction 0.5 \
  --dataset Pet \
  --data_path /path/to/data \
  --num_exp 5 --workers 4 --optimizer SGD -se 0 \
  --selection RAM_APL --model ResNet18 --lr 0.1 \
  -sp ./results/DINOv2Clip_mcr_a0.2k1_0.5 \
  --batch 128 >> ./results/DINOv2Clip_mcr_a0.2k1_0.5.txt 2>&1

🗂 Project Structure (partial)

.
├── main.py                # Main entry point
├── utils.py/              # Training and testing functions
├── deepcore/
│   ├── methods/           # All subset selection methods
│   │   ├── min.py
│   │   ├── ram_apl.py
│   │   ├── kcentergreedy.py
│   │   └── moderate_ds.py
│   │   ├── pretrain/      # Downloaded foundation model weights (offline mode)
│   ├── nets/              # Model definitions and wrappers
│   ├── datasets/          # Dataset loaders
├── results/               # Saved logs and outputs
└── requirements.txt       # Dependencies

📌 Notes

  • All dataset paths are expected to be provided via --data_path.
  • We support training with or without pretraining, as well as using foundation models as frozen feature extractors.
  • To enable offline loading of foundation model weights (default), please download and store them under deepcore/methods/pretrain/.
  • To enable online loading, refer to the logic in deepcore/methods/earlytrain.py.
  • If you need help preparing your datasets or pretrained weights, feel free to open an issue (if public).

🌠 Bibtex

If you find our work is helpful for your research, please consider to cite:

@InProceedings{pmlr-v267-wan25f,
  title = 	 {Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection},
  author =       {Wan, Zhijing and Wang, Zhixiang and Wang, Zheng and Xu, Xin and Satoh, Shin'Ichi},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {62084--62101},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
}

Credits

The implementation is based on DeepCore code. Thanks for their brilliant work!

About

This repository is the official implementation of Foundation Model Insights and a Multi-Model Approach for Superior Fine-Grained One-shot Subset Selection, accepted at ICML 2025.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages