VAEs4RS

The Robustness of Natural Image Priors in Remote Sensing: A Zero-Shot VAE Study

Accepted at the ICLR 2026 Machine Learning for Remote Sensing (ML4RS) Workshop (Tiny Paper Track).
OpenReview: https://openreview.net/forum?id=63yoOFB24h

Are pre-trained VAEs good zero-shot remote sensing image reconstructors?

This repository evaluates variational autoencoders (VAEs) pre-trained on natural image datasets when applied to remote sensing data in a zero-shot manner.

Results and Findings

Quantitative Results

RESISC45 & AID (full datasets, original sizes: 256×256 / 600×600)

Model	GFLOPs	Spatial Comp.	Latent Ch.	PSNR↑ RESISC45	PSNR↑ AID	SSIM↑ RESISC45	SSIM↑ AID	LPIPS↓ RESISC45	LPIPS↓ AID	FID↓ RESISC45	FID↓ AID
SANA-VAE	846.76	32	32	23.36	24.72	0.558	0.606	0.124	0.123	8.69	5.01
SD21-VAE	894.91	8	4	25.71	26.66	0.672	0.709	0.095	0.094	4.13	3.08
SDXL-VAE	894.91	8	4	25.83	26.80	0.692	0.726	0.098	0.098	4.98	3.11
SD35-VAE	895.25	8	16	29.71	30.72	0.862	0.876	0.035	0.037	1.11	0.69
FLUX1-VAE	895.25	8	16	33.30	33.63	0.923	0.918	0.022	0.025	0.38	0.26
Qwen-VAE	1143.88	8	16	30.38	31.46	0.874	0.889	0.080	0.077	9.51	0.42
FLUX2-VAE	895.71	8	32	33.42	34.46	0.925	0.926	0.021	0.022	0.46	0.37

UCMerced (2.1K images, 256×256)

Model	GFLOPs	Spatial Comp.	Latent Ch.	PSNR↑	SSIM↑	LPIPS↓	FID↓	CMMD↓
SANA-VAE	846.76	32	32	22.33	0.564	0.112	28.64	0.0002
SD21-VAE	894.91	8	4	25.81	0.688	0.082	16.43	0.0172
SDXL-VAE	894.91	8	4	25.92	0.705	0.084	15.97	0.0203
SD35-VAE	895.25	8	16	30.06	0.858	0.030	6.85	0.0001
FLUX1-VAE	895.25	8	16	31.73	0.899	0.020	5.19	0.0010
Qwen-VAE	1143.88	8	16	30.76	0.873	0.064	15.83	0.0106
FLUX2-VAE	895.71	8	32	32.16	0.901	0.019	4.23	0.0001

Quick reconstruction (10 images per modality, single-channel expansion)

Insights and Conclusion

Insights 1: We find that VAEs reconstruct remote sensing images remarkably well, with reconstructions appearing visually nearly identical to the input. We argue that VAEs may have the potential to implicitly deblur and denoise input images, where the reconstructed image serves as a better data source for model training (e.g., representation learning) with possibly improved statistics.

Insights 2: As the compression appears effectively lossless, we argue for directly storing latent representations instead of original images as datasets to reduce storage requirements.

In this work, we explored the robustness of natural image priors in VAEs for remote sensing. Our findings indicate that these models, when used zero-shot, can provide significant utility in data compression across various categories. We will release the reconstructed images along with their corresponding latents for community exploration and further research.

Quick Start

For code usage, installation, and detailed documentation, see src/README.md.

Training

Fine-tune any VAE on remote sensing images:

# Single-channel RS (IR, SAR, EO) with SD-VAE
python scripts/train_vae.py --config configs/train_rs_vae.yaml

# Any VAE with generic config
python scripts/train_vae.py --config configs/train_vae.yaml

# Multi-GPU training
accelerate launch scripts/train_vae.py --config configs/train_vae.yaml

# Override settings via CLI
python scripts/train_vae.py --config configs/train_vae.yaml \
    --pretrained_path stabilityai/sd-vae-ft-mse \
    --train_dir datasets/rs/train \
    --num_epochs 50

Evaluation

python scripts/run_experiments.py              # Run main evaluation
python scripts/run_experiments.py --ablation   # Run ablation study
python scripts/run_experiments.py --visualize  # Generate visualizations

# Quick single-image reconstruction sanity check (1-channel SAR/IR/EO)
python scripts/quick_vae_reconstruction.py --input-dir /path/to/images \
    --vae-path ./models/BiliSakura/VAEs --resolution 512 --output-dir ./outputs

Interactive Viewer

streamlit run scripts/streamlit_app.py

Resources:

VAE Models: https://huggingface.co/BiliSakura/VAEs
Datasets: https://huggingface.co/blanchon/AID and https://huggingface.co/blanchon/RESISC45
Latents Dataset (FLUX2-VAE): https://huggingface.co/datasets/BiliSakura/RS-Dataset-Latents - Latents version of AID and RESISC45 using FLUX2-VAE

Citation

If you find this work useful, please cite:

@inproceedings{chen2026robustness,
  author = {Zhenyuan Chen and Feng Zhang},
  title = {THE ROBUSTNESS OF NATURAL IMAGE PRIORS IN REMOTE SENSING: A ZERO-SHOT VAE STUDY},
  booktitle = {ICLR 2026 Machine Learning for Remote Sensing (ML4RS) Workshop},
  year = {2026},
  url = {https://openreview.net/forum?id=63yoOFB24h}
}

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
assets		assets
configs		configs
manuscript/ICLR26_ML4RS_Workshop_Template		manuscript/ICLR26_ML4RS_Workshop_Template
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VAEs4RS

Results and Findings

Quantitative Results

Insights and Conclusion

Quick Start

Training

Evaluation

Interactive Viewer

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VAEs4RS

Results and Findings

Quantitative Results

Insights and Conclusion

Quick Start

Training

Evaluation

Interactive Viewer

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages