This page describes a dataset from the European Spallation Source (ESS). The data was generated by a refrigeration system, and system experts have constructed a ground-truth causal graph from their knowledge of the system. If you use the data, please cite
S.W.Mogensen, K.Rathsman, P.Nilsson, Causal discovery in a complex industrial system: A time series benchmark, in Proceedings of the 3rd Conference on Causal Learning and Reasoning (CLeaR), 2024, https://doi.org/10.48550/arXiv.2310.18654
The data can be downloaded from https://zenodo.org/records/10641290 . The full dataset is quite large, and a smaller version is available for testing https://zenodo.org/records/10679737 . The following describes the ressources in this repo. This page describes the data and the related terminology in more detail, see also the paper.
The R folder contains R-code to load, visualize, and analyze the data.
The full data set is irregularly sampled in the sense that not all PVs are sampled at the same time points. The R-code in data.R aggregates data to produce a dataset which is regularly sampled in the sense that there exists equidistant time points such that every PV has an observation at every time point. For each PV, this is done by averaging the observations that fall between the same two time points. If there are no observations, this value is initially missing, and in that case we carry the last observation forward. Such preprocessed data from the first hour of Period 1 is available in preprocessedData.csv (output from createSmallDataset.R).
Note that subsampling and aggregation over time may create issues in the context of causal discovery, see also the paper.
The adjacency matrix (adjMatrix.csv) defines the ground-truth causal graph and is therefore the learning target when doing causal discovery using the data set. The causal graph is defined on the level of subsystems, and therefore each node in the graph represents subsystem. There is a total of 35 subsystems, however, some of them are not represented in the causal graph for various reasons (see the paper for details). The causal graph has a total of 23 nodes. In the adjacency matrix, rows are 'to' and columns are `from'. This means that the entry in the i'th row and the j'th column describes the edge j -> i. In the adjacency matrix, 0 = no edge, 1 = weak edge, 2 = strong edge (by convention, every diagonal element is 2).