GitHub - House-Leo/FoundIR: [ICCV 2025] FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration

FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration [ICCV 2025]

[Project Page] [Paper] [arxiv] [Supplemental Material] [中文版介绍]

Hao Li*, Xiang Chen*, Jiangxin Dong, Jinhui Tang, Jinshan Pan
IMAG Lab, Nanjing University of Science and Technology

Welcome to visit our website (专注底层视觉领域的信息服务平台) for low-level vision: https://lowlevelcv.com/

The potential of large-scale training data for universal image restoration. (a) Analysis of universal image restoration performance in real-world scenarios as training data vary. As the size of real-world training data increases, the image restoration model can achieve significant performance improvement. (b) Our proposed FoundIR, trained on our million-scale dataset, achieves state-of-the-art performance across a broad range of image restoration tasks compared to existing universal image restoration methods.

🚩 New Features/Updates

✅ July, 07, 2025. Release the training set. Please fill out the request form to get the download link!
✅ July, 02, 2025. Release the training code and script!
✅ June 26, 2025. 🎉 Our FoundIR was accepted by ICCV 2025!
✅ May 13, 2025. Update the script for calculating the metrics, including PSNR, SSIM, LPIPS, FID, CLIP-IQA, MANIQA, MUSIQ, NIQE, NIMA. Thanks to the awesome pyiqa.
✅ February 18, 2025. Release the testing code and pre-trained models of the specialist models, the testset (GT) on Google Drive (GT), and the visual results on Google Drive (FoundIR) and Baidu Yun (Others).
✅ February 05, 2025. Release the testing code and pre-trained model of the generalist model, and the testset (LQ) on Google Drive (LQ).
✅ December 03, 2024. Release paper and supplemental material.
✅ November 22, 2024. Creat the repository and the project page.

⚡ To Do

Release training code
Release training dataset

📖 Dataset

Testset & Visual Results (Table 1)

Benchmark	GT	LQ	FoundIR	Compared methods
Download Link	Google Drive	Google Drive	Google Drive	Baidu Yun (pw: b6qb)

The public data results of Table 2 can be found in Baidu Yun (pw: 58vg).

Training Set

The total size of the training set is nearly 4.85 TB. Make sure you have enough space on your hard disk to unzip the data.

Training Set	GT&LQ
Download Link	Request Form

More samples can be found in the supplemental material (P7-P9).

💻 Evaluation

➡️ Environment

conda env create -f environment.yml

➡️ Testing the Generalist Model

Download the pre-trained model and put it in the ./premodel folder.

For our testset:

download the testset and organize them as follows:

    |--dataset
    |    |--01Blur
    |    |    |--GT
    |    |    |    |--0001.png
    |    |    |    |--0002.png
    |    |    |    |...
    |    |    |--LQ
    |    |    |    |--0001.png
    |    |    |    |--0002.png
    |    |    |    |...
    |    |--02Blur_Noise
    |    |    |--GT
    |    |    |    |--0151.png
    |    |    |    |--0152.png
    |    |    |    |...
    |    |    |--LQ
    |    |    |    |--0151.png
    |    |    |    |--0152.png
    |    |    |    |...
    |    | ...

Run the following command to test the generalist model.

python test.py --dataroot ./dataset --meta ./Testset_meta_info.txt

For your own data:

Put the testset in the ./dataset/LQ folder (simply copy the LQ folder and rename it GT if you don't have GT images.).
Recomment L40-L44 in test.py to test your own data.

## For our testset
# dataset = CombinedDataset(opt, image_size, augment_flip=False, equalizeHist=True, crop_patch=False, generation=False, task='meta_info')

## For your own data
dataset = CombinedDataset(opt, image_size, augment_flip=False, equalizeHist=True, crop_patch=False, generation=False, task=None)

Run the following command to test the generalist model.

python test.py --dataroot ./dataset --meta None

(If you have a GPU with less than 24GB, you can reduce the crop_size on test.py L102.)

➡️ Testing the Specialist Models (Optional)

We provide two specialist models, i.e., Lowlight and Weather models, to refine the results of the generalist model.

In our experiments, we refine the generalist model's outputs as follows:

Weather model: 0501-0700 and 1051-1100 inputs
Lowlight model: 0701-0800, 1101-1250, and 1301-1500 inputs

Please note that this is optional, allowing you to further refine the generalist model’s outputs using the following commands, especially for challenging lowlight, hazy, and rainy inputs.

Install the environment.

cd ./specialist_model
pip install -r requirements.txt
python setup.py develop

Put the testset in the ./dataset folder.
Run the following command to test the specialist models.

python inference_lowlight.py
or
python inference_weather.py

And you can find the output visual results in the folder results/.

➡️ Evaluation Metrics

Install the pyiqa package. Thanks to Chaofeng Chen.

pip install pyiqa

Run the following command to calculate the metrics.

python cal_metrics.py --inp_imgs ./dataset/restored --gt_imgs ./dataset/GT --log ptah_save_log

💻 Training

Our FoundIR is trained in two stages, please follow the steps in train.sh to train the model.

sh train.sh

Results

Quantitative Results

Qualitative Results

More qualitative results can be found in the supplemental material (P10-P37).

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@inproceedings{li2024foundir,
      title={FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration},
      author={Li, Hao and Chen, Xiang and Dong, Jiangxin and Tang, Jinhui and Pan, Jinshan},
      booktitle={ICCV},
      year={2025}
}

Acknowledgement

We would like to thank our team members (Hao Chen, Yinghui Fang, Jiashuo Liu, Ke Wu, Renyuan Situ, ...) for their contributions in data collection and post-processing of this work.

Contact

If you have any questions, please feel free to reach us out at [email protected] and [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
assets		assets
data		data
dataset		dataset
eval		eval
metrics		metrics
premodel		premodel
specialist_model		specialist_model
src		src
LICENSE		LICENSE
MillionIRData_single_train_meta_info.txt		MillionIRData_single_train_meta_info.txt
MillionIRData_train_meta_info.txt		MillionIRData_train_meta_info.txt
README.md		README.md
Testset_meta_info.txt		Testset_meta_info.txt
cal_metrics.py		cal_metrics.py
environment.yaml		environment.yaml
matlab_functions.py		matlab_functions.py
test.py		test.py
test.sh		test.sh
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration [ICCV 2025]

🚩 New Features/Updates

⚡ To Do

📖 Dataset

Testset & Visual Results (Table 1)

Training Set

💻 Evaluation

➡️ Environment

➡️ Testing the Generalist Model

➡️ Testing the Specialist Models (Optional)

➡️ Evaluation Metrics

💻 Training

Results

Citation

Acknowledgement

Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Languages

License

House-Leo/FoundIR

Folders and files

Latest commit

History

Repository files navigation

FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration [ICCV 2025]

🚩 New Features/Updates

⚡ To Do

📖 Dataset

Testset & Visual Results (Table 1)

Training Set

💻 Evaluation

➡️ Environment

➡️ Testing the Generalist Model

➡️ Testing the Specialist Models (Optional)

➡️ Evaluation Metrics

💻 Training

Results

Citation

Acknowledgement

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Languages

Packages