Skip to content

Bili-Sakura/VAEs4RS

Repository files navigation

VAEs4RS

The Robustness of Natural Image Priors in Remote Sensing: A Zero-Shot VAE Study

Accepted at the ICLR 2026 Machine Learning for Remote Sensing (ML4RS) Workshop (Tiny Paper Track).
OpenReview: https://openreview.net/forum?id=63yoOFB24h

Are pre-trained VAEs good zero-shot remote sensing image reconstructors?

This repository evaluates variational autoencoders (VAEs) pre-trained on natural image datasets when applied to remote sensing data in a zero-shot manner.

Results and Findings

Quantitative and qualitative results

Columns: Ground Truth | SD21-VAE | SDXL-VAE | SD35-VAE | FLUX1-VAE | FLUX2-VAE | SANA-VAE | Qwen-VAE
Rows: 8 samples (RESISC45 | All)

Quantitative Results

RESISC45 & AID (full datasets, original sizes: 256×256 / 600×600)

Model GFLOPs Spatial Comp. Latent Ch. PSNR↑ RESISC45 PSNR↑ AID SSIM↑ RESISC45 SSIM↑ AID LPIPS↓ RESISC45 LPIPS↓ AID FID↓ RESISC45 FID↓ AID
SANA-VAE 846.76 32 32 23.36 24.72 0.558 0.606 0.124 0.123 8.69 5.01
SD21-VAE 894.91 8 4 25.71 26.66 0.672 0.709 0.095 0.094 4.13 3.08
SDXL-VAE 894.91 8 4 25.83 26.80 0.692 0.726 0.098 0.098 4.98 3.11
SD35-VAE 895.25 8 16 29.71 30.72 0.862 0.876 0.035 0.037 1.11 0.69
FLUX1-VAE 895.25 8 16 33.30 33.63 0.923 0.918 0.022 0.025 0.38 0.26
Qwen-VAE 1143.88 8 16 30.38 31.46 0.874 0.889 0.080 0.077 9.51 0.42
FLUX2-VAE 895.71 8 32 33.42 34.46 0.925 0.926 0.021 0.022 0.46 0.37

UCMerced (2.1K images, 256×256)

Model GFLOPs Spatial Comp. Latent Ch. PSNR↑ SSIM↑ LPIPS↓ FID↓ CMMD↓
SANA-VAE 846.76 32 32 22.33 0.564 0.112 28.64 0.0002
SD21-VAE 894.91 8 4 25.81 0.688 0.082 16.43 0.0172
SDXL-VAE 894.91 8 4 25.92 0.705 0.084 15.97 0.0203
SD35-VAE 895.25 8 16 30.06 0.858 0.030 6.85 0.0001
FLUX1-VAE 895.25 8 16 31.73 0.899 0.020 5.19 0.0010
Qwen-VAE 1143.88 8 16 30.76 0.873 0.064 15.83 0.0106
FLUX2-VAE 895.71 8 32 32.16 0.901 0.019 4.23 0.0001

Quick reconstruction (10 images per modality, single-channel expansion)

PSNR by modality MAE by modality SSIM by modality

Insights and Conclusion

Insights 1: We find that VAEs reconstruct remote sensing images remarkably well, with reconstructions appearing visually nearly identical to the input. We argue that VAEs may have the potential to implicitly deblur and denoise input images, where the reconstructed image serves as a better data source for model training (e.g., representation learning) with possibly improved statistics.

Insights 2: As the compression appears effectively lossless, we argue for directly storing latent representations instead of original images as datasets to reduce storage requirements.

In this work, we explored the robustness of natural image priors in VAEs for remote sensing. Our findings indicate that these models, when used zero-shot, can provide significant utility in data compression across various categories. We will release the reconstructed images along with their corresponding latents for community exploration and further research.

Quick Start

For code usage, installation, and detailed documentation, see src/README.md.

Training

Fine-tune any VAE on remote sensing images:

# Single-channel RS (IR, SAR, EO) with SD-VAE
python scripts/train_vae.py --config configs/train_rs_vae.yaml

# Any VAE with generic config
python scripts/train_vae.py --config configs/train_vae.yaml

# Multi-GPU training
accelerate launch scripts/train_vae.py --config configs/train_vae.yaml

# Override settings via CLI
python scripts/train_vae.py --config configs/train_vae.yaml \
    --pretrained_path stabilityai/sd-vae-ft-mse \
    --train_dir datasets/rs/train \
    --num_epochs 50

Evaluation

python scripts/run_experiments.py              # Run main evaluation
python scripts/run_experiments.py --ablation   # Run ablation study
python scripts/run_experiments.py --visualize  # Generate visualizations

# Quick single-image reconstruction sanity check (1-channel SAR/IR/EO)
python scripts/quick_vae_reconstruction.py --input-dir /path/to/images \
    --vae-path ./models/BiliSakura/VAEs --resolution 512 --output-dir ./outputs

Interactive Viewer

streamlit run scripts/streamlit_app.py

Resources:

Citation

If you find this work useful, please cite:

@inproceedings{chen2026robustness,
  author = {Zhenyuan Chen and Feng Zhang},
  title = {THE ROBUSTNESS OF NATURAL IMAGE PRIORS IN REMOTE SENSING: A ZERO-SHOT VAE STUDY},
  booktitle = {ICLR 2026 Machine Learning for Remote Sensing (ML4RS) Workshop},
  year = {2026},
  url = {https://openreview.net/forum?id=63yoOFB24h}
}

About

Are pre-trained VAEs good zero-shot remote sensing image reconstructors?

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors