MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data

Paul Borne--Pons, Mikolaj Czerkawski,Rosalie Martin, Romain Rouffet

CVPR 2025 Workshop MORSE

MESA is a novel generative model based on latent denoising diffusion capable of generating 2.5D representations of terrain based on the text prompt conditioning supplied via natural language. The model produces two co-registered modalities of optical and depth maps.

Abstract

Terrain modeling has traditionally relied on procedural techniques, which often require extensive domain expertise and handcrafted rules. In this paper, we present MESA - a novel data-centric alternative by training a diffusion model on global remote sensing data. This approach leverages large-scale geospatial information to generate high-quality terrain samples from text descriptions, showcasing a flexible and scalable solution for terrain generation. The model’s capabilities are demonstrated through extensive experiments, highlighting its ability to generate realistic and diverse terrain landscapes. The dataset produced to support this work, the Major TOM Core-DEM extension dataset, is released openly as a comprehensive resource for global terrain data. The results suggest that data-driven models, trained on remote sensing data, can provide a powerful tool for realistic terrain modeling and generation.

Model Weights

You still manually acquire the weights by cloning the models from Hugging Face:

mkdir weights
huggingface-cli download NewtNewt/MESA --local-dir weights

Installation

# using python 3.11.12
pip install -r requirements.txt

Note that this environment is only compatible with NVIDIA GPUs. Additionally, we recommend using a GPU with a minimum of 8GB of memory.

Inference

from MESA.pipeline_terrain import TerrainDiffusionPipeline
import torch

pipe = TerrainDiffusionPipeline.from_pretrained("./weights", torch_dtype=torch.float16)
pipe.to("cuda");

prompt = "A sentinel-2 image of montane forests and mountains in Mexico in August"
image,dem = pipe(prompt, num_inference_steps=50, guidance_scale=7.5)

A straightforward code for inference is provided in

Alternatively, you can download and use the Gradio demo from the HF page.

Citation

@inproceedings{mesa2025,
title={MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data},
author={Paul Borne--Pons and Mikolaj Czerkawski and Rosalie Martin and Romain Rouffet},
year={2025},
booktitle={MORSE Workshop at CVPR 2025},
eprint={2504.07210},
url={https://arxiv.org/abs/2504.07210},}

Acknowledgements

This implementation builds upon Hugging Face’s Diffusers library. We also acknowledge Gradio for providing an easy-to-use interface that allowed us to create the inference demos for our models.

This model is the product of a collaboration between Φ-lab, European Space Agency (ESA) and the Adobe Research (Paris, France).

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
LICENSE		LICENSE
MESA_inference_colab.ipynb		MESA_inference_colab.ipynb
README.md		README.md
models.py		models.py
pipeline_terrain.py		pipeline_terrain.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data

Abstract

Model Weights

Installation

Inference

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

PaulBorneP/MESA

Folders and files

Latest commit

History

Repository files navigation

MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data

Abstract

Model Weights

Installation

Inference

Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages