Chen Wu, Ling Wang, Zhuoran Zheng, Weidong Jiang, Yuning Cui* and Jingyuan Xia
Abstract: Transformer-based architectures exhibit substantial promise in the realm of ultra-high-definition (UHD) image restoration (IR). Nevertheless, they encounter significant challenges in maintaining high-frequency (HF) details, which are crucial for the reconstruction of texture. Conventional methods tackle computational complexity by significantly reducing the resolution (by a factor of 4 to 8). Moreover, the majority of high-frequency components are eliminated due to the inherent characteristics of self-attention mechanisms, as these mechanisms tend to naturally suppress high-frequency elements during non-local feature integration. This paper proposes a dual-branch transformer architecture that synergistically combines native-resolution HF preservation with efficient contextual modeling, named HiFormer. The high-resolution branch utilizes a directionally-sensitive large-kernel decomposition to effectively address anisotropic degradations with fewer parameters and applies depthwise separable convolutions for localized high-frequency (HF) information extraction. Concurrently, the low-resolution branch assimilates these localized HF elements using adaptive channel modulation to offset spectral losses induced by the inherent smoothing effect of self-attention. Comprehensive experiments across numerous UHD image restoration tasks reveal that our approach surpasses current leading methods in both quantitative metrics and qualitative analysis.
- Model Architecture
- Features
- Installation
- Project Structure
- Dataset Preparation
- Training
- Testing
- Results
- Tips & Tricks
- Citation
- Contact
- Acknowledgments
- License
High-Resolution Path:
- Directionally-decomposed large kernels to efficiently model anisotropic degradations
- Explicit high-frequency mining using depthwise convolutions to extract fine details
Low-Resolution Path:
- Self-attention mechanisms for global context understanding
- Adaptive high-frequency compensation that uses details from the high-res path to counteract the spectral losses caused by downsampling and attention's inherent low-pass filtering
- Parameters: ~2.16M
- Inference Memory: <12G for UHD (4K) Images, even smaller for BF16 precision.
| Task | Dataset Code | Dataset | Description |
|---|---|---|---|
| Low-Light Enhancement | lol4k |
UHD-LOL4K | Enhance low-light 4K images |
| Low-Light Enhancement | uhd-ll |
UHD-LL | Enhance real-wolrd low-light UHD images |
| Deraining | rain4k |
4K-Rain13k | Remove rain streaks from 4K images |
| Deraining | uhd-rain |
UHD-Rain | Remove rain streaks from 4K images |
| Dehazing | uhd-haze |
UHD-Haze | Remove haze from UHD images |
| Deblurring | uhd-blur |
UHD-Blur | Remove blur from UHD images |
| Snow Removal | uhd-snow |
UHD-Snow | Remove snow artifacts from UHD images |
- π Efficient UHD Processing: Optimized for 4K and higher resolution images
- π¨ Multi-Task Support: Handles multiple degradation types
- β‘ Mixed Precision Training: Faster training with lower memory usage
- π Comprehensive Logging: WandB and TensorBoard integration
- Python >= 3.8
- PyTorch >= 1.8.1
- CUDA >= 11.6 (for GPU training)
# Clone the repository
cd HiFormer
# Create conda environment
conda env create -f env.yml
conda activate hiformerHiFormer/
βββ net/
β βββ hiformer.py # Model definition
βββ utils/ # Utilities
β βββ dataset_utils.py # Dataset loaders
β βββ schedulers.py # LR schedulers
β βββ ...
βββ train_hiformer.py # Training script
βββ test_hiformer.py # Evaluation script (with metrics)
βββ demo_hiformer.py # Demo script (inference only)
βββ options_hiformer.py # Configuration
βββ train_hiformer.sh # Training launcher
βββ test_hiformer.sh # Testing launcher
βββ README.md # This file
Organize your datasets as follows:
data/
βββ Train/
βββ UHD-haze/
β βββ input/ # Hazy images
β βββ gt/ # Clear images
βββ UHD-rain/
β βββ input/ # Rainy images
β βββ gt/ # Clean images
βββ ...
data_dir/
βββ UHD-haze/
β βββ UHD-haze.txt # UHD-Haze dataset list (uhd-haze)
βββ UHD-rain/
β βββ UHD-rain.txt # 4K-Rain dataset list (rain4k)
βββ LOL-4K/
β βββ UHD_LOL4K.txt # LOL4K dataset list (lol4k)
βββ UHD-LL/
β βββ UHD-LL.txt # UHD-LL dataset list (uhd-ll)
βββ UHD-blur/
β βββ UHD-blur.txt # UHD-Blur dataset list (uhd-blur)
βββ UHD-snow/
β βββ UHD-snow.txt # UHD-Snow dataset list (uhd-snow)
βββ ...
Create text files listing your training images:
# Example: data_dir/hazy/hazy_UHD.txt
data/Train/UHD_haze/train/input/25_250000111.jpg
data/Train/UHD_haze/train/input/37_37000032.jpg
...# Train with default settings (UHD dehazing)
bash train_hiformer.sh
# Or run directly
python train_hiformer.py --de_type uhd-haze --epochs 500python train_hiformer.py \
--de_type uhd-haze \ # Task type (uhd-haze, uhd-blur, uhd-ll, uhd-snow, lol4k, rain4k)
--epochs 500 \ # Training epochs
--batch_size 8 \ # Batch size per GPU
--patch_size 128 \ # Input patch size
--num_gpus 2 \ # Number of GPUs
--lr 2e-4 \ # Learning rate
--use_amp \ # Use mixed precision
--gradient_clip \ # Enable gradient clipping
--ckpt_dir ckpt/hiformer/ \ # Checkpoint directory
--wblogger hiformer # WandB project namepython train_hiformer.py \
--resume_from ckpt/hiformer/hiformer-epoch-100.ckpt \
--epochs 500For evaluating PSNR/SSIM metrics on test datasets with ground truth:
# Test on UHD-Haze dataset
python test_hiformer.py \
--valid_data_dir data/Test/UHD_haze/test/input/ \
--ckpt_path ckpt/hiformer/hiformer-epoch-499.ckpt \
--output_path output/
# Test on UHD-Rain dataset
python test_hiformer.py \
--valid_data_dir data/Test/UHD_rain/test/input/ \
--ckpt_path ckpt/hiformer/hiformer-epoch-499.ckpt \
--output_path output/
# Test on LOL4K dataset
python test_hiformer.py \
--valid_data_dir data/Test/LOL4K/test/input/ \
--ckpt_path ckpt/hiformer/hiformer-epoch-499.ckpt \
--output_path output/For generating restored images from degraded inputs (without ground truth):
# Process a directory of images
python demo_hiformer.py \
--test_path test/input/ \
--output_path test/output/ \
--ckpt_path ckpt/hiformer/hiformer-epoch-499.ckpt
# Process with tiling (for very large images)
python demo_hiformer.py \
--test_path test/input/ \
--output_path test/output/ \
--ckpt_path ckpt/hiformer/hiformer-epoch-499.ckpt \
--tile True \
--tile_size 512 \
--tile_overlap 32test_hiformer.py (for evaluation with metrics):
--valid_data_dir: Path to test input images (requires corresponding GT ingt/folder)--ckpt_path: Path to checkpoint file--output_path: Directory to save results--cuda: GPU device ID (default: 0)
demo_hiformer.py (for inference only):
--test_path: Path to input images (directory or single image)--output_path: Directory to save restored images--ckpt_path: Path to checkpoint file--tile: Enable tiling for large images (default: False)--tile_size: Tile size for tiling mode (default: 128)--tile_overlap: Overlap between tiles (default: 32)--cuda: GPU device ID (default: 0)
If you encounter OOM errors:
# Reduce batch size and patch size
python train_hiformer.py --batch_size 4 --patch_size 128 --use_amp
# Use gradient accumulation
python train_hiformer.py --batch_size 2 --accumulate_grad_batches 4For faster training:
# Use more workers
python train_hiformer.py --num_workers 32
# Use multiple GPUs
python train_hiformer.py --num_gpus 4
# Enable mixed precision
python train_hiformer.py --use_ampIf you find this work useful, please cite:
@ARTICLE{11263975,
author={Wu, Chen and Wang, Ling and Zheng, Zhuoran and Jiang, Weidong and Cui, Yuning and Xia, Jingyuan},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
title={Ultra-High-Definition Image Restoration via High-Frequency Enhanced Transformer},
year={2025},
volume={},
number={},
pages={1-1},
keywords={Image restoration;Transformers;MODFETs;HEMTs;High frequency;Degradation;Frequency-domain analysis;Faces;Computational modeling;Videos;Image restoration;UHD image;frequency learning;Transformer},
doi={10.1109/TCSVT.2025.3636011}
}For any questions or issues, please open an issue on GitHub, or contact [email protected], [email protected].
This work builds upon: PromptIR, Restormer and PyTorch Lightning repositories.
This project is released under the MIT License. See LICENSE for details.





