Skip to content

Python-based package for detecting oscillatory signals in observational or experimental time series with the EMD technique and assessing their statistical significance vs. power-law distributed background noise.

License

Notifications You must be signed in to change notification settings

Warwick-Solar/scope

Repository files navigation

scope

Scope Logo

Run Tests Codecov Docs PyPI version

scope - Statistical Confidence of Oscillatory Processes with EMD (Empirical Mode Decomposition).

Project description

scope is a Python-based package for detecting oscillatory signals in observational or experimental time series with the EMD technique and assessing their statistical significance vs. power-law distributed background noise. Oscillatory processes in real data sets of various origins are often contaminated by a combination of white and coloured noise with a power-law spectral dependence, so that the EMD-revealed intrinsic mode functions need to be rigorously tested against the periodic components generated by noise. To do so, we compute the EMD energy spectrum containing the total energy and dominant period of each EMD-revealed intrinsic mode and the noise confidence limits for modal energy. This allows us to identify the significant mode(s) with the energy beyond the confidence limits, which is expected to be of a non-noise origin and associate it with a quasi-periodic oscillatory process of interest. The developed package does not assume the physical origin of the input data set, making it readily applicable for analysing oscillatory processes across various fields of science and industry.

The project consists of the following main parts:

  • Perform EMD analysis of the original time series and reveal the set of intrinsic modes using Empirical Mode Decomposition in Python (Quinn et al. 2021) package. See function emd_modes from emd module in scope.
  • Estimation of the power-law index and energy of superimposed background noise from the Fourier power spectrum as described in Vaughan (2005). See function fit_fourier from fourier module in scope.
  • Estimation of the dominant period of each EMD-revealed intrinsic mode from the global wavelet spectrum produced with Torrence & Compo Wavelet Analysis Software (Torrence & Compo 1998) package. See function emd_period_energy from emd module in scope.
  • Calculate EMD confidence limits using the method proposed by Kolotkov et al. (2016). See function emd_noise_conf from emd module in scope.
  • Visualise the results with plot_signal, plot_modes, plot_fft_spectrum, and plot_emd_spectrum functions from utils module.

Table of Contents

Prerequisites

Python ≥ 3.8

Installation

To install the package, ensure you have Python installed on your system. You can then install the package using pip:

pip install scope-emd

Example

Click to expand

The example described below is provided in emd_example.py.

The sample signal in this example consists of an oscillatory component, an exponentially decaying trend and a combination of white and coloured noise obeying the power law:

After setting the mean of the input signal to zero, we apply EMD to obtain the set of intrinsic mode functions (IMFs):

modes = emd_modes(x, sd_thresh=1e-4)
plot_modes(t, modes)

where the 'sd_thresh' parameter is the threshold at which the sift of each IMF stops. In our example, we obtained seven EMD modes, six of which are oscillatory IMFs and one is a non-oscillatory residual (usually, the number of EMD modes is about $$\log_2(N)$$ where $$N$$ is the number of data points in the input signal).

The empirical trend of the signal is estimated using the emd_trend function. This function identifies modes with periods exceeding a fraction of the total signal duration (denoted by the 'cutoff' parameter) and the residual, combines them into an empirical trend of the input signal, and returns a new set of modes in which all modes have periods shorter than the cutoff and the last mode represents the signal's trend. This cutoff is set to 0.4 of the total signal length by default, which means that a mode with less than 2.5 oscillation cycles is considered as part of the empirical trend.

modes = emd_trend(modes, t)
trend_emd = modes[:, -1]
plot_signal(t, trend_emd, 'Trend of the signal')

For our example, the empirical trend of the signal is found to form by the last EMD mode (the residual) only:

Hence, the detrended signal is:

Now we can estimate the parameters of superimposed noise by applying the fit_fourier function to the detrended signal. The function returns the FFT spectrum of the detrended signal best-fitted by a power-law model, with powers of white (if present) and coloured noise and the power-law index of coloured noise as model parameters. For our example, the FFT spectrum shows a combination of white and coloured noise components in the detrended signal, with the power-law index of coloured noise being 1.1±0.3 and the ratio of the white to coloured noise energies about 0.3. The fit_fourier function also estimates the confidence interval of a given value (e.g. 95%, false alarm probability = 0.05). The Fourier peaks outside this confidence interval are attributed to statistically significant oscillatory processes of non-noise origin.

fit_fft = fit_fourier(x, dt, fap=0.05)
plot_fft_spectrum(fit_fft)

The EMD energy spectrum, i.e. the relationship between the EMD modal energy vs. dominant oscillation period for the set of EMD modes identified in the original signal, is computed by the emd_energy_spectrum function:

emd_sp = emd_energy_spectrum(modes, t)
cutoff_period = 0.4 * len(x) * dt #show cutoff period
plot_emd_spectrum(emd_sp, cutoff_period)

The vertical dashed line corresponds to the cutoff period adopted in the emd_trend function; all modes beyond this line are considered as components of trend.

With the power-law index and noise energy returned by the fit_fourier function, we can compute the confidence limits of the EMD energy spectrum using the emd_noise_conf function (separately for coloured noise and, if present, white noise):

# false alarm probability
fap = 0.05
#Confidence limits for coloured noise
conf_c = emd_noise_conf(t, alpha=alpha, period_min=2*dt, 
                        period_max=N*dt, num_samples=500, 
                        signal_energy=fit_fft['color_energy'], fap=fap)
#Confidence limits for white noise
if fit_fft['white_energy'] > 0: # check if there is only colored noise model
    conf_w = emd_noise_conf(t, alpha=0, period_min=2*dt,
                            period_max=N*dt, num_samples = 500, 
                            signal_energy=fit_fft['white_energy'], fap=fap)

Here, the false alarm probability (fap) is set to 0.05 (95% confidence). The emd_noise_conf function generates 500 independent noise samples with the same power law index ('alpha') and energy ('signal_energy') as the input. The other two parameters, 'period_min' and 'period_max', set the range of periods over which the confidence limits are computed. Combining the upper and lower confidence limits for white and coloured noise compenents,

#Upper confidence limit for the combined noises
conf_up = conf_c['up'] + conf_w['up']

#Lower confidence limit for the combined noises
conf_down = conf_c['down'] + conf_w['down']

and visualising the EMD energy spectrum with confidence limits,

# plot emd spectrum
plot_emd_spectrum(emd_sp, cutoff_period, conf_period, conf_up, conf_down, conf_mean, fap)

we obtain

Here, 'conf_mean' stands for the expected mean value of noise energy (conf_mean = conf_c['mean_energy'] + conf_w['mean_energy']) and 'conf_period' (conf_period = conf_c['period']) is the array of oscillation periods over which the confidence limits are computed. The EMD modes beyond the confidence limits are considered significant, which are not likely to be caused by random noise. In our example, only one mode is found to be significant which seems consistent with the input oscillatory component of the original signal.

Functions

Click to expand

'emd_period_energy'

As mentioned in the example section, the total energy and (dominant) period of each EMD mode are required for constructing an EMD energy spectrum. The total modal energy is estimated by summing up squares of instantaneous amplitudes of each EMD mode. The dominant period of each EMD mode is estimated by best-fitting the global wavelet spectrum (Torrence & Compo 1998) of the mode with a Gaussian + Parabolic function, performed in the emd_period_energy function. The position and standard deviation of the Gaussian peak are used for the dominant EMD modal period and the uncertainty of this estimation.

'fit_fourier'

In the fit_fourier function, we fit the FFT spectrum by a power-law model in log-log scale to extract the power-law index and energy of the noise component of the signal. Firstly, we must note that, at each Fourier frequency, the Fourier power $$I(f_{j})$$ follows a chi-squared distribution with 2 degrees of freedom, denoted as:

$$I(f_{j}) = P(f_{j}) \chi_{2}^{2}/2$$

where $P(f_{j})$ is the true power spectrum, and $\chi_{2}^{2}$ is a random variable distributed as $\chi^{2}$ with 2 degrees of freedom. Since the least squares method assumes that the input data set is Gaussian-distributed, we cannot directly apply this method to best-fit the FFT power spectrum. Instead, we should consider the mean of the $$\chi_{2}^{2}/2$$ term. In log scale, $$\left\langle \mathrm{log}(\chi^{2}_{2}/2) \right\rangle$$ = -0.25068 (Vaughan (2005)). This term corresponds to the bias that will be introduced to the fitting if one directly implements the least squares method. Hence, we shall include this term in the model function such that the least squares fitting will not be 'biased'. We also note that the value of this bias term is independent of the choice of normalisation of the FFT power spectrum.

The power law model we used in the fit_fourier function is a superposition of white and coloured noise components, given by:

$$P(f) = P_{c}(f) + P_{w}(f) = Z_{c} f^{-\alpha} + Z_{w},$$

where $Z_{c}$ and $Z_{w}$ are the proportionality constants of coloured and white noises, respectively, and $\alpha$ is the power law index of coloured noise.

After obtaining the proportionality constants from the debiased least squares fit, we can estimate the energy of each noise type, $E_{c/w}$ using:

$$E_{c/w} \propto nf \times Z_{c/w},$$

where $nf$ is the number of Fourier frequencies, which does not include 0 Hz and the Nyquist frequency. And estimate the confidence limit for a given false alarm probability (fap) as $-\ln\left(1-(1-\mathrm{fap})^{1/nf}\right)\times P(f_{j})$.

See fft_fit_example.py for an example use of the fit_fourier function.

'emd_noise_conf'

Kolotkov et al. (2016) showed that the dyadic property of EMD (the center frequencies of consecutive IMFs tend to have a ratio close to 2) results in the following relationship between modal energy and modal period of a noise signal:

$$E_{m}P_{m}^{1-\alpha} = \text{const,}$$

where the parameter $\alpha$ is the power-law index used for characterising the colour of noise in the Fourier analysis.

See energy_period_relation.py demonstrating the relationship between $E_{m}$ and $P_{m}$ for various types of noise (values of $\alpha$). Here, we set the false alarm probability to 0.05.

It was shown that the energy of each EMD mode, $E_m$ has a chi-squared distribution with $k$ degrees of freedom (DoF). In contrast to the Fourier power, for which the number of DoF is 2 for all Fourier harmonics, the number of DoF $k$ of EMD modal energy $E_m$ is usually $>2$ and varies with the mode number (hence, with the period). Thus, for given noise parameters (i.e. the energies of the white and coloured components + the power-law index of the coloured component) determined with the fit_fourier function, the emd_noise_conf function first estimates the number of DoF $k$ over the entire range of EMD modal periods and then estimates the confidence limits using the percent-point function of the chi-squared distribution with $k$ DoF. The emd_noise_conf function generates 500 (by default) independent noise samples with the same power-law index and energy as the input and performs the EMD analysis on them. It extracts the dominant period and modal energy for each IMF by calling the emd_period_energy function. The emd_noise_fit function fits the chi-squared distribution to the histogram of modal energy $E_m$ for each mode number to extract the mean energy and $k$. We obtain the mean period, mean energy and number of DoF $k$ for each mode number. The empirically established relationships between both mean energy vs mean period and $k$ vs mean period are best-fitted with power-law functions (linear functions in log-log scale). These best-fit functions are then used to construct the confidence limits over the whole range of EMD modal periods. As, for $E_m$, the number of DoF $k>2$, resulting in a non-monotonic distribution of $E_m$, we get two confidence limits (upper and lower) in the EMD energy spectrum. In practice, the modes above the upper confidence limit are of greater interest as more energetic. This analysis does not apply to the very first IMF (the EMD mode with the shortest timescale) as its energy does not obey the chi-squared distribution.

Contributing

We welcome contributions from the community!

If you have suggestions, find a bug, or want to improve the package, feel free to open an issue or submit a pull request. Fork the repository, create a new branch, and propose your changes. We’ll review them as soon as possible.

Thank you for helping us improve!

Acknowledgements

The creation of this package was supported by the HEIF Space Science Impact Fund grant, EPSRC EP/Y037456/1 grant, UKRI Stephen Hawking Fellowship EP/Z535473/1, and LZP lzp2022/1-0017 grant. The authors are also grateful to Dr Sergey Anfinogentov for co-authoring Kolotkov et al. (2016) and designing the prototype of this package in Interactive Data Language (IDL); Dr Anne-Marie Broomhall and Dr Tishtrya Mehta for testing and critical suggestions.

License

This project is licensed under the Apache 2.0. License - see the LICENSE.md file for details

About

Python-based package for detecting oscillatory signals in observational or experimental time series with the EMD technique and assessing their statistical significance vs. power-law distributed background noise.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages