🔥 DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution (ICCV2025)

This is the official PyTorch codes for the paper:

DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution
Zheng-Peng Duan^{1,2 *}, Jiawei Zhang², Xin Jin¹, Ziheng Zhang¹, Zheng Xiong², Dongqing Zou^2,3, Jimmy S. Ren^2,4, Chunle Guo¹, Chongyi Li^{1 †}
¹ VCIP, CS, Nankai University, ² SenseTime Research, ³ PBVR, ⁴ Hong Kong Metropolitan University
^*This project is done during the internship at SenseTime Research.
^†Corresponding author.

⭐ If DiT4SR is helpful to your images or projects, please help star this repo. Thank you! 👈

💥 News

2025.07.07 Create this repo and release related code of our paper.

🏃 TODO

~~Release a huggingface demo~~
~~Release Checkpoints~~
~~Release NKUSR8K dataset~~
~~Release training and inference code~~
~~Release Chinese version and supplemenatary material~~

🔧 Dependencies and Installation

Clone repo

git clone https://github.com/adam-duan/DiT4SR.git
cd DiT4SR

Install packages

conda env create -f environment.yaml

🏄 Quick Inference

Step 1: Download Checkpoints

Download the [dit4sr_f and dit4sr_q] checkpoints and place them in the following directories: preset/dit4sr_f and preset/dit4sr_q.
Download the [stable-diffusion-3.5-medium] checkpoints and place it in the preset/stable-diffusion-3.5-medium directory.
Download the [clip-vit-large-patch14-336] and [ llava-v1.5-13b] and place them in the llava_ckpt directory.

Step 2: Prepare testing data

Place low-quality images in preset/datasets/test_datasets/. You can download RealSR, DrealSR and RealLR200 from [SeeSR], and download RealLQ250 from [DreamClear]. Thanks for their awesome works.

Step 3: Running testing command

# test w/o llava, one GPU is enough
bash bash/test_wllava.sh

# test w/ llava, two GPUs are required
bash bash/test_wollava.sh

Replace the placeholders [pretrained_model_name_or_path], [transformer_model_name_or_path], [image_path], [output_dir], and [prompt_path] with their respective paths before running the command. The evaluation script (test_wollava.sh) is designed to run with pre-generated prompts in order to reduce the computational cost of LLaVA during testing. We provide our pre-generated and processed prompts in the following directory preset/prompts.

Step 4: Check the results

The processed results will be saved in the [output_dir] directory.

🎁 Gradio Demo

We provide a gradio demo for DiT4SR, which is the same with . You can use the demo to test your own images.

CUDA_VISIBLE_DEVICES=0,1 python gradio_dit4sr.py \
    --transformer_model_name_or_path "preset/models/dit4sr_f"

Note that dit4sr_q achieves superior performance in terms of perceptual quality, while dit4sr_f better preserves image fidelity. All results reported in this paper are generated using dit4sr_q.

💪 Train

Step 1: Download the training data

Download the training datasets including DIV2K, DIV8K, Flickr2K, Flickr8K, and our [NKUSR8K] dataset.

Step 2: Prepare the training data

Following [SeeSR], you can generate the LR-HR pairs for training using bash_data/make_pairs.sh.
Using bash_data/make_prompt.sh to generate the prompts for each HR image.
Using bash_data/make_latent.sh to generate the latent codes for both HR and LR images.
Using bash_data/make_embedding.sh to generate the embedding for each prompt.
Don't forget to download [NULL_pooled_prompt_embeds.pt and NULL_prompt_embeds.pt] and place them in the corresponding directories.

Data Structure After Preprocessing

preset/datasets/training_datasets/ 
    └── gt
        └── 0000001.png # GT images, (3, 512, 512)
        └── ...
    └── sr_bicubic
        └── 0000001.png # Bicubic LR images, (3, 512, 512)
        └── ...
    └── prompt_txt
        └── 0000001.txt # prompts for teacher model and lora model
        └── ...
    └── prompt_embeds
        └── NULL_prompt_embeds.pt # SD3 prompt embedding tensors, (154, 4096)
        └── 0000001.pt 
        └── ...
    └── pooled_prompt_embeds
        └── NULL_pooled_prompt_embeds.pt # SD3 pooled embedding tensors, (2048,)
        └── 0000001.pt 
        └── ...
    └── latent_hr
        └── 0000001.pt # SD3 latent space tensors, (16, 64, 64)
        └── ...
    └── latent_lr
        └── 0000001.pt # SD3 latent space tensors, (16, 64, 64)
        └── ...

Step 3: Start train

Use the following command to start the training process:

bash bash/train.sh

📜 License

This project is licensed under the Pi-Lab License 1.0 - see the LICENSE file for details.

📖 Citation

If you find our repo useful for your research, please consider citing our paper:

@inproceedings{duan2025dit4sr,
  title={DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution},
  author={Duan, Zheng-Peng and Zhang, Jiawei and Jin, Xin and Zhang, Ziheng and Xiong, Zheng and Zou, Dongqing and Ren, Jimmy and Guo, Chun-Le and Li, Chongyi},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

📮 Contact

For technical questions, please contact adamduan0211[AT]gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
bash		bash
bash_data		bash_data
basicsr		basicsr
dataloaders		dataloaders
examples		examples
llava		llava
model_dit4sr		model_dit4sr
pipelines		pipelines
preset/prompts		preset/prompts
test		test
train		train
utils		utils
utils_data		utils_data
.gitignore		.gitignore
CKPT_PTH.py		CKPT_PTH.py
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
gradio_dit4sr.py		gradio_dit4sr.py
multi-gpu.yaml		multi-gpu.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔥 DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution (ICCV2025)

💥 News

🏃 TODO

🔧 Dependencies and Installation

🏄 Quick Inference

🎁 Gradio Demo

💪 Train

📜 License

📖 Citation

📮 Contact

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

Adam-duan/DiT4SR

Folders and files

Latest commit

History

Repository files navigation

🔥 DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution (ICCV2025)

💥 News

🏃 TODO

🔧 Dependencies and Installation

🏄 Quick Inference

🎁 Gradio Demo

💪 Train

📜 License

📖 Citation

📮 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages