What is the Cauchy combination test?

Set-up

Assume that we have p-values p_1, \dots, p_d. Assume that they are computed from z-scores \mathbf{X} = (X_1, \dots, X_d) (test statistics following normal distributions). Let \mathbb{E}[\mathbf{X}] = \boldsymbol{\mu} and let \text{Cov}(\mathbf{X}) = \mathbf{\Sigma}. Without loss of generality, assume that each test statistic X_i has variance 1. With this, we can express the p-values as

\begin{aligned} p_i = 2 \left[ 1 - \Phi (|X_i|) \right], \end{aligned}

where \Phi is the CDF function of the standard normal distribution.

We are interested in testing the global null hypothesis H_0 :\boldsymbol{\mu} = \mathbf{0}.

The Cauchy combination test

Assume that we have some random vector w = (w_1, \dots, w_d) such that w_i \geq 0 for all i, w_1 + \dots + w_d = 0 and w is independent of \mathbf{X}. The test statistic for the Cauchy combination test, proposed by Liu & Xie 2020 (Reference 1), is

\begin{aligned} T(\mathbf{X}) = \sum_{i=1}^d w_i \tan \left[ \pi \left( 2 \Phi (|X_i|) - \frac{3}{2} \right) \right] = \sum_{i=1}^d w_i \tan \left[ \pi \left( \frac{1}{2} - p_i \right) \right]. \end{aligned}

Under the global null, p_i \sim \text{Unif}(0,1) for each i, implying that \tan \left[ \pi \left( \frac{1}{2} - p_i \right) \right] has the standard Cauchy distribution (see this previous post). If the p-values are independent, then Proposition 1 in this other previous post implies that T(\mathbf{X}) has the standard Cauchy distribution.

In turns out that even if the p-values are dependent, T(\mathbf{X}) is still approximately Cauchy! This approximation is formalized in Theorem 1 of Reference 1:

Theorem. Suppose that for any 1 \leq i < j \leq d, (X_i, X_j) follows a bivariate normal distribution. Suppose also that \mathbb{E}[\mathbf{X}] = \mathbf{0}. Let W_0 be a standard Cauchy random variable. Then for any fixed d and any correlation matrix \mathbf{\Sigma} \geq \mathbf{0}, we have

\begin{aligned} \lim_{t \rightarrow +\infty} \dfrac{\mathbb{P}\{ T(\mathbf{X}) > t \}}{\mathbb{P} \{ W_0 > t \} } = 1. \end{aligned}

The theorem says that the test statistic T(\mathbf{X}) has approximately a Cauchy tail even under dependency structures in \mathbf{X}. Knowing the (approximate) distribution of T(\mathbf{X}) under the global null allows us to use it as a test statistic.

Other notes

  • The “bivariate normal distribution” condition is a bit of a technical assumption, the authors claim it is a mild assumption.
  • This result bears a lot of similarity with a result by Pillai & Meng 2016 (see this previous post for a description). Section 2.2 of Reference 1 discusses the similarities and the differences.
  • Theorem 1 above holds for fixed d (number of p-values). Section 2.3 of Reference 1 has a high-dimensional asymptotic result where d = o(t^c) with 0 < c < 1/2.
  • The Cauchy combination test is especially powerful when only a small number of \mu_i‘s are large, or equivalently when a smaller number of p_i‘s are very small. We can see this intuitively: small p_i‘s become very large \tan [\pi(1/2 - p_i)]‘s, so the test statistic will be dominated by a few very large p-values. See Section 4.2 of Reference 1 for a power comparison study.

References:

  1. Liu, Y., and Xie, J. (2020). Cauchy Combination Test: A Powerful Test With Analytic p-Value Calculation Under Arbitrary Dependency Structures.