GitHub

Causal discovery with ESS data

This page describes a dataset from the European Spallation Source (ESS). The data was generated by a refrigeration system, and system experts have constructed a ground-truth causal graph from their knowledge of the system. If you use the data, please cite

S.W.Mogensen, K.Rathsman, P.Nilsson, Causal discovery in a complex industrial system: A time series benchmark, in Proceedings of the 3rd Conference on Causal Learning and Reasoning (CLeaR), 2024, https://doi.org/10.48550/arXiv.2310.18654

The data can be downloaded from https://zenodo.org/records/10641290 . The full dataset is quite large, and a smaller version is available for testing https://zenodo.org/records/10679737 . The following describes the ressources in this repo. This page describes the data and the related terminology in more detail, see also the paper.

R code

The R folder contains R-code to load, visualize, and analyze the data.

Regularly sampled data set

The full data set is irregularly sampled in the sense that not all PVs are sampled at the same time points. The R-code in data.R aggregates data to produce a dataset which is regularly sampled in the sense that there exists equidistant time points such that every PV has an observation at every time point. For each PV, this is done by averaging the observations that fall between the same two time points. If there are no observations, this value is initially missing, and in that case we carry the last observation forward. Such preprocessed data from the first hour of Period 1 is available in preprocessedData.csv (output from createSmallDataset.R).

Note that subsampling and aggregation over time may create issues in the context of causal discovery, see also the paper.

Adjacency matrix

The adjacency matrix (adjMatrix.csv) defines the ground-truth causal graph and is therefore the learning target when doing causal discovery using the data set. The causal graph is defined on the level of subsystems, and therefore each node in the graph represents subsystem. There is a total of 35 subsystems, however, some of them are not represented in the causal graph for various reasons (see the paper for details). The causal graph has a total of 23 nodes. In the adjacency matrix, rows are 'to' and columns are `from'. This means that the entry in the i'th row and the j'th column describes the edge j -> i. In the adjacency matrix, 0 = no edge, 1 = weak edge, 2 = strong edge (by convention, every diagonal element is 2).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
R		R
LICENSE		LICENSE
README.md		README.md
adjMatrix.csv		adjMatrix.csv
preprocessedData.csv		preprocessedData.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Causal discovery with ESS data

R code

Regularly sampled data set

Adjacency matrix

About

Uh oh!

Releases

Packages

Languages

License

soerenwengel/essdata

Folders and files

Latest commit

History

Repository files navigation

Causal discovery with ESS data

R code

Regularly sampled data set

Adjacency matrix

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages