Skip to content

[ICCV 2025] This is the official PyTorch codes for the paper: "DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution"

License

Notifications You must be signed in to change notification settings

Adam-duan/DiT4SR

Repository files navigation

🔥 DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution (ICCV2025)

                 

This is the official PyTorch codes for the paper:

DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution
Zheng-Peng Duan1,2 *, Jiawei Zhang2, Xin Jin1, Ziheng Zhang1, Zheng Xiong2, Dongqing Zou2,3, Jimmy S. Ren2,4, Chunle Guo1, Chongyi Li1 †
1 VCIP, CS, Nankai University, 2 SenseTime Research, 3 PBVR, 4 Hong Kong Metropolitan University
*This project is done during the internship at SenseTime Research.
Corresponding author.

teaser_img

⭐ If DiT4SR is helpful to your images or projects, please help star this repo. Thank you! 👈


💥 News

  • 2025.07.07 Create this repo and release related code of our paper.

🏃 TODO

  • Release a huggingface demo
  • Release Checkpoints
  • Release NKUSR8K dataset
  • Release training and inference code
  • Release Chinese version and supplemenatary material

🔧 Dependencies and Installation

  1. Clone repo
git clone https://github.com/adam-duan/DiT4SR.git
cd DiT4SR
  1. Install packages
conda env create -f environment.yaml

🏄 Quick Inference

Step 1: Download Checkpoints

Step 2: Prepare testing data

Place low-quality images in preset/datasets/test_datasets/. You can download RealSR, DrealSR and RealLR200 from [SeeSR], and download RealLQ250 from [DreamClear]. Thanks for their awesome works.

Step 3: Running testing command

# test w/o llava, one GPU is enough
bash bash/test_wllava.sh

# test w/ llava, two GPUs are required
bash bash/test_wollava.sh

Replace the placeholders [pretrained_model_name_or_path], [transformer_model_name_or_path], [image_path], [output_dir], and [prompt_path] with their respective paths before running the command. The evaluation script (test_wollava.sh) is designed to run with pre-generated prompts in order to reduce the computational cost of LLaVA during testing. We provide our pre-generated and processed prompts in the following directory preset/prompts.

Step 4: Check the results

The processed results will be saved in the [output_dir] directory.

🎁 Gradio Demo

We provide a gradio demo for DiT4SR, which is the same with   . You can use the demo to test your own images.

CUDA_VISIBLE_DEVICES=0,1 python gradio_dit4sr.py \
    --transformer_model_name_or_path "preset/models/dit4sr_f" 

Note that dit4sr_q achieves superior performance in terms of perceptual quality, while dit4sr_f better preserves image fidelity. All results reported in this paper are generated using dit4sr_q.

💪 Train

Step 1: Download the training data

Download the training datasets including DIV2K, DIV8K, Flickr2K, Flickr8K, and our [NKUSR8K] dataset.

Step 2: Prepare the training data

  • Following [SeeSR], you can generate the LR-HR pairs for training using bash_data/make_pairs.sh.
  • Using bash_data/make_prompt.sh to generate the prompts for each HR image.
  • Using bash_data/make_latent.sh to generate the latent codes for both HR and LR images.
  • Using bash_data/make_embedding.sh to generate the embedding for each prompt.
  • Don't forget to download [NULL_pooled_prompt_embeds.pt and NULL_prompt_embeds.pt] and place them in the corresponding directories.

Data Structure After Preprocessing

preset/datasets/training_datasets/ 
    └── gt
        └── 0000001.png # GT images, (3, 512, 512)
        └── ...
    └── sr_bicubic
        └── 0000001.png # Bicubic LR images, (3, 512, 512)
        └── ...
    └── prompt_txt
        └── 0000001.txt # prompts for teacher model and lora model
        └── ...
    └── prompt_embeds
        └── NULL_prompt_embeds.pt # SD3 prompt embedding tensors, (154, 4096)
        └── 0000001.pt 
        └── ...
    └── pooled_prompt_embeds
        └── NULL_pooled_prompt_embeds.pt # SD3 pooled embedding tensors, (2048,)
        └── 0000001.pt 
        └── ...
    └── latent_hr
        └── 0000001.pt # SD3 latent space tensors, (16, 64, 64)
        └── ...
    └── latent_lr
        └── 0000001.pt # SD3 latent space tensors, (16, 64, 64)
        └── ...

Step 3: Start train

Use the following command to start the training process:

bash bash/train.sh

📜 License

This project is licensed under the Pi-Lab License 1.0 - see the LICENSE file for details.

📖 Citation

If you find our repo useful for your research, please consider citing our paper:

@inproceedings{duan2025dit4sr,
  title={DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution},
  author={Duan, Zheng-Peng and Zhang, Jiawei and Jin, Xin and Zhang, Ziheng and Xiong, Zheng and Zou, Dongqing and Ren, Jimmy and Guo, Chun-Le and Li, Chongyi},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

📮 Contact

For technical questions, please contact adamduan0211[AT]gmail.com

About

[ICCV 2025] This is the official PyTorch codes for the paper: "DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages