The Robustness of Natural Image Priors in Remote Sensing: A Zero-Shot VAE Study
Accepted at the ICLR 2026 Machine Learning for Remote Sensing (ML4RS) Workshop (Tiny Paper Track).
OpenReview: https://openreview.net/forum?id=63yoOFB24h
Are pre-trained VAEs good zero-shot remote sensing image reconstructors?
This repository evaluates variational autoencoders (VAEs) pre-trained on natural image datasets when applied to remote sensing data in a zero-shot manner.
Columns: Ground Truth | SD21-VAE | SDXL-VAE | SD35-VAE | FLUX1-VAE | FLUX2-VAE | SANA-VAE | Qwen-VAE
Rows: 8 samples (RESISC45 | All)
RESISC45 & AID (full datasets, original sizes: 256×256 / 600×600)
| Model | GFLOPs | Spatial Comp. | Latent Ch. | PSNR↑ RESISC45 | PSNR↑ AID | SSIM↑ RESISC45 | SSIM↑ AID | LPIPS↓ RESISC45 | LPIPS↓ AID | FID↓ RESISC45 | FID↓ AID |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SANA-VAE | 846.76 | 32 | 32 | 23.36 | 24.72 | 0.558 | 0.606 | 0.124 | 0.123 | 8.69 | 5.01 |
| SD21-VAE | 894.91 | 8 | 4 | 25.71 | 26.66 | 0.672 | 0.709 | 0.095 | 0.094 | 4.13 | 3.08 |
| SDXL-VAE | 894.91 | 8 | 4 | 25.83 | 26.80 | 0.692 | 0.726 | 0.098 | 0.098 | 4.98 | 3.11 |
| SD35-VAE | 895.25 | 8 | 16 | 29.71 | 30.72 | 0.862 | 0.876 | 0.035 | 0.037 | 1.11 | 0.69 |
| FLUX1-VAE | 895.25 | 8 | 16 | 33.30 | 33.63 | 0.923 | 0.918 | 0.022 | 0.025 | 0.38 | 0.26 |
| Qwen-VAE | 1143.88 | 8 | 16 | 30.38 | 31.46 | 0.874 | 0.889 | 0.080 | 0.077 | 9.51 | 0.42 |
| FLUX2-VAE | 895.71 | 8 | 32 | 33.42 | 34.46 | 0.925 | 0.926 | 0.021 | 0.022 | 0.46 | 0.37 |
UCMerced (2.1K images, 256×256)
| Model | GFLOPs | Spatial Comp. | Latent Ch. | PSNR↑ | SSIM↑ | LPIPS↓ | FID↓ | CMMD↓ |
|---|---|---|---|---|---|---|---|---|
| SANA-VAE | 846.76 | 32 | 32 | 22.33 | 0.564 | 0.112 | 28.64 | 0.0002 |
| SD21-VAE | 894.91 | 8 | 4 | 25.81 | 0.688 | 0.082 | 16.43 | 0.0172 |
| SDXL-VAE | 894.91 | 8 | 4 | 25.92 | 0.705 | 0.084 | 15.97 | 0.0203 |
| SD35-VAE | 895.25 | 8 | 16 | 30.06 | 0.858 | 0.030 | 6.85 | 0.0001 |
| FLUX1-VAE | 895.25 | 8 | 16 | 31.73 | 0.899 | 0.020 | 5.19 | 0.0010 |
| Qwen-VAE | 1143.88 | 8 | 16 | 30.76 | 0.873 | 0.064 | 15.83 | 0.0106 |
| FLUX2-VAE | 895.71 | 8 | 32 | 32.16 | 0.901 | 0.019 | 4.23 | 0.0001 |
Quick reconstruction (10 images per modality, single-channel expansion)
Insights 1: We find that VAEs reconstruct remote sensing images remarkably well, with reconstructions appearing visually nearly identical to the input. We argue that VAEs may have the potential to implicitly deblur and denoise input images, where the reconstructed image serves as a better data source for model training (e.g., representation learning) with possibly improved statistics.
Insights 2: As the compression appears effectively lossless, we argue for directly storing latent representations instead of original images as datasets to reduce storage requirements.
In this work, we explored the robustness of natural image priors in VAEs for remote sensing. Our findings indicate that these models, when used zero-shot, can provide significant utility in data compression across various categories. We will release the reconstructed images along with their corresponding latents for community exploration and further research.
For code usage, installation, and detailed documentation, see src/README.md.
Fine-tune any VAE on remote sensing images:
# Single-channel RS (IR, SAR, EO) with SD-VAE
python scripts/train_vae.py --config configs/train_rs_vae.yaml
# Any VAE with generic config
python scripts/train_vae.py --config configs/train_vae.yaml
# Multi-GPU training
accelerate launch scripts/train_vae.py --config configs/train_vae.yaml
# Override settings via CLI
python scripts/train_vae.py --config configs/train_vae.yaml \
--pretrained_path stabilityai/sd-vae-ft-mse \
--train_dir datasets/rs/train \
--num_epochs 50python scripts/run_experiments.py # Run main evaluation
python scripts/run_experiments.py --ablation # Run ablation study
python scripts/run_experiments.py --visualize # Generate visualizations
# Quick single-image reconstruction sanity check (1-channel SAR/IR/EO)
python scripts/quick_vae_reconstruction.py --input-dir /path/to/images \
--vae-path ./models/BiliSakura/VAEs --resolution 512 --output-dir ./outputsstreamlit run scripts/streamlit_app.pyResources:
- VAE Models: https://huggingface.co/BiliSakura/VAEs
- Datasets: https://huggingface.co/blanchon/AID and https://huggingface.co/blanchon/RESISC45
- Latents Dataset (FLUX2-VAE): https://huggingface.co/datasets/BiliSakura/RS-Dataset-Latents - Latents version of AID and RESISC45 using FLUX2-VAE
If you find this work useful, please cite:
@inproceedings{chen2026robustness,
author = {Zhenyuan Chen and Feng Zhang},
title = {THE ROBUSTNESS OF NATURAL IMAGE PRIORS IN REMOTE SENSING: A ZERO-SHOT VAE STUDY},
booktitle = {ICLR 2026 Machine Learning for Remote Sensing (ML4RS) Workshop},
year = {2026},
url = {https://openreview.net/forum?id=63yoOFB24h}
}


