multiple testing | Statistical Odds & Ends

Set-up

Assume that we have p-values $p_1, \dots, p_d$ . Assume that they are computed from z-scores $\mathbf{X} = (X_1, \dots, X_d)$ (test statistics following normal distributions). Let $\mathbb{E}[\mathbf{X}] = \boldsymbol{\mu}$ and let $\text{Cov}(\mathbf{X}) = \mathbf{\Sigma}$ . Without loss of generality, assume that each test statistic $X_i$ has variance 1. With this, we can express the p-values as

$\begin{aligned} p_i = 2 \left[ 1 - \Phi (|X_i|) \right], \end{aligned}$

where $\Phi$ is the CDF function of the standard normal distribution.

We are interested in testing the global null hypothesis $H_0 :\boldsymbol{\mu} = \mathbf{0}$ .

The Cauchy combination test

Assume that we have some random vector $w = (w_1, \dots, w_d)$ such that $w_i \geq 0$ for all $i$ , $w_1 + \dots + w_d = 0$ and $w$ is independent of $\mathbf{X}$ . The test statistic for the Cauchy combination test, proposed by Liu & Xie 2020 (Reference 1), is

$\begin{aligned} T(\mathbf{X}) = \sum_{i=1}^d w_i \tan \left[ \pi \left( 2 \Phi (|X_i|) - \frac{3}{2} \right) \right] = \sum_{i=1}^d w_i \tan \left[ \pi \left( \frac{1}{2} - p_i \right) \right]. \end{aligned}$

Under the global null, $p_i \sim \text{Unif}(0,1)$ for each $i$ , implying that $\tan \left[ \pi \left( \frac{1}{2} - p_i \right) \right]$ has the standard Cauchy distribution (see this previous post). If the p-values are independent, then Proposition 1 in this other previous post implies that $T(\mathbf{X})$ has the standard Cauchy distribution.

In turns out that even if the p-values are dependent, $T(\mathbf{X})$ is still approximately Cauchy! This approximation is formalized in Theorem 1 of Reference 1:

Theorem. Suppose that for any $1 \leq i < j \leq d$ , $(X_i, X_j)$ follows a bivariate normal distribution. Suppose also that $\mathbb{E}[\mathbf{X}] = \mathbf{0}$ . Let $W_0$ be a standard Cauchy random variable. Then for any fixed $d$ and any correlation matrix $\mathbf{\Sigma} \geq \mathbf{0}$ , we have

$\begin{aligned} \lim_{t \rightarrow +\infty} \dfrac{\mathbb{P}\{ T(\mathbf{X}) > t \}}{\mathbb{P} \{ W_0 > t \} } = 1. \end{aligned}$

The theorem says that the test statistic $T(\mathbf{X})$ has approximately a Cauchy tail even under dependency structures in $\mathbf{X}$ . Knowing the (approximate) distribution of $T(\mathbf{X})$ under the global null allows us to use it as a test statistic.

Other notes

The “bivariate normal distribution” condition is a bit of a technical assumption, the authors claim it is a mild assumption.
This result bears a lot of similarity with a result by Pillai & Meng 2016 (see this previous post for a description). Section 2.2 of Reference 1 discusses the similarities and the differences.
Theorem 1 above holds for fixed $d$ (number of p-values). Section 2.3 of Reference 1 has a high-dimensional asymptotic result where $d = o(t^c)$ with $0 < c < 1/2$ .
The Cauchy combination test is especially powerful when only a small number of $\mu_i$ ‘s are large, or equivalently when a smaller number of $p_i$ ‘s are very small. We can see this intuitively: small $p_i$ ‘s become very large $\tan [\pi(1/2 - p_i)]$ ‘s, so the test statistic will be dominated by a few very large p-values. See Section 4.2 of Reference 1 for a power comparison study.

References:

Liu, Y., and Xie, J. (2020). Cauchy Combination Test: A Powerful Test With Analytic p-Value Calculation Under Arbitrary Dependency Structures.

Statistical Odds & Ends

Tag Archives: multiple testing

What is the Cauchy combination test?