Generative EnKF is designed to surrogate data assimilation (DA) for numerical simulations without ensemble runs. Forecasting a real world system is often achieved by combining a scientific model for the time evolution of the system and an estimate of the current state of the system. Basically, DA like Ensemble Kalman Filter (EnKF) blends the simulation states and observation data iteratively to give a reasonable estimate of the current state. Though widely used, EnKF has two critical issues: fragility to model biases (errors) and the ensemble simulation costs. Here, we propose a DA method using the pseudo ensembles generated by observation guided denoising diffusion probabilistic model (DDPM). Thanks to the variance in generated ensembles, our proposed method displays better performance than the well-established ensemble DA method (say, EnKF) when the simulation model is biased. For questions or comments, please find us in the AUTHORS file.
This code relies on the following packages. As a deeplearing framework, we use PyTorch.
-
Install Python libraries numpy, PyTorch, xarray, and netcdf4
-
Clone this repo
git clone https://github.com/yasahi-hpc/Generative-EnKF.git
Before running simulation with Generative EnKF, we need to train a diffusion model guided by observations. Firstly, one needs to construct a dataset and train the model for that. See deep learning model for detail.
For simulation, we rely on the simulation codes and the pretrained diffusion model. We perform an observing system simulation experiment (OSSE) for Lorenz96 system. To try data assimilation (DA), you may compare the twin simulations with and without DA. For DA, you can use LETKF or Generative EnKF (EFDA). We use 32 ensembles for LETKF. One needs to run a nature simulation followd by DA simulations. We can perform the OSSE of Lorenz96 in the following manner:
- Create observation data with
Naturerun
python run.py --filename dns.jsonThe results would be stored in netcdf files under <base_dir/case_name>.
- DA simulation You can perform simulation with LETKF or Generative EnKF by
python run.py --model_name [LETKF, EFDA] --filename [letkf.json, efda.json]See simulation codes for detail.
Following table summarizes the major scripts and their inputs.
| Scripts | Arguments | Explanation |
|---|---|---|
run.py |
-dirname --filename --model_name |
Running a simulation |
train.py |
-dirname --filename --model_name --inference_mode |
Training or inference |
post.py |
-dirname --filename --model_name |
Postscript for simulations or trained models |
convert.py |
-dirname --filename --mode --start_idx --end_idx |
Convert the simulatin data into dataset |
setup.py |
Install the required packages | |
cleanup.py |
symdir --verbose |
Erase the result directory and symbolic link (Be careful to use this!) |
@INPROCEEDINGS{Asahi2023,
author={Asahi, Yuuichi and Hasegawa, Yuta and Onodera, Naoyuki and and Shimokawabe, Takashi and Shiba, Hayato and Idomura, Yasuhiro},
booktitle={ICML 2023 Workshop SynS and ML}
title={Generating observation guided ensembles for data assimilation with denoising diffusion probabilistic model},
year={2023},
volume={},
number={},
pages={},
keywords = {Deep learning; Graphics-processing-unit-based computing; Data Assimilation; Lorenz96},
}