0% found this document useful (0 votes)
71 views8 pages

Rank-Sum Tests for Clustered Data Analysis

This document presents a rank-sum test for clustered data. It develops a rank-sum statistic that averages over all possible choices of a single observation sampled from each cluster. This avoids issues with previous approaches that simply applied rank-sum tests to cluster averages. The key quantities for the test statistic - the expected value of the rank-sum statistic given the data, its unconditional expected value, and its variance - are derived. The asymptotic distribution of the test statistic is established. The test is shown to be valid for a variety of clustered data settings, including when cluster members belong to different groups or the correlation structure depends on group membership.

Uploaded by

Adnan Shoaib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views8 pages

Rank-Sum Tests for Clustered Data Analysis

This document presents a rank-sum test for clustered data. It develops a rank-sum statistic that averages over all possible choices of a single observation sampled from each cluster. This avoids issues with previous approaches that simply applied rank-sum tests to cluster averages. The key quantities for the test statistic - the expected value of the rank-sum statistic given the data, its unconditional expected value, and its variance - are derived. The asymptotic distribution of the test statistic is established. The test is shown to be valid for a variety of clustered data settings, including when cluster members belong to different groups or the correlation structure depends on group membership.

Uploaded by

Adnan Shoaib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Rank-Sum Tests for Clustered Data

Somnath DATTA and Glen A. S ATTEN


The Wilcoxon rank-sum test is widely used to test the equality of two populations, because it makes fewer distributional assumptions than
parametric procedures such as the t-test. However, the Wilcoxon rank-sum test can be used only if data are independent. When data are
clustered, tests based on generalized estimating equations (GEEs) that generalize the t-test have been proposed. Here we develop a rank-sum
test that can be used when data are clustered. As an application, we use our rank-sum test to develop a nonparametric test of association
between a genetic marker and a quantitative trait locus. We also give a rank-sum test for equivalence of three or more populations that
generalizes the KruskalWallis test to situations with clustered data. Unlike previous rank tests for clustered data, our proposal is valid
when members of the same cluster belong to different groups, or when the correlation between cluster members differs across groups.
KEY WORDS: Association; Clustered data; KruskalWallis test; Quantitative trait; Rank test; Transmission disequilibrium test; Wilcoxon
test.

1. INTRODUCTION
The Wilcoxon rank-sum test is an attractive way to compare
two groups, and has become a standard procedure among working statisticians. The Wilcoxon test is calculated by pooling
observations from the two groups, ranking the pooled observations and then computing the sum of rankings corresponding
to observations from one of the groups. The usual assumption
for the applicability of Wilcoxon test is that all the observations
in the study are independent. However, in many practical situation, there are clusters of correlated observations. Examples of
clustered data include repeated measurement of blood pressure
from a single individual, responses of litter mates in an experiment using rodents, or body mass index of siblings. Clustered
data are typically analyzed using generalized estimating equations (GEEs) to account for correlation between observations
to obtain consistent variance estimates. Although model-free,
testing hypotheses about two groups using GEEs corresponds
to a variance-adjusted t-test and is not invariant to monotonic
transformations of the data as rank-based procedures are.
In this article we propose rank-sum tests for comparing two
groups when data are clustered. We first note that simply averaging the response within clusters and then applying a rank-sum
test for independent data may give a test with improper size, because the null hypothesis of equal distribution between groups
may be violated if the two groups have different distributions
of cluster sizes. Moreover, this simple approach is not available when members of the same cluster may belong to different
groups. Additionally, the correlation between cluster members
may depend on group membership.
Two broad approaches are possible when constructing a rank
test for clustered data. First, one can make assumptions about
the nature of the clustering, for example, assuming that cluster members are exchangeable and that the correlation structure
within clusters is independent of group. Under these assumptions, Rosner, Glynn, and Lee (2003) recently proposed a rank
test for clustered data that in essence stratifies on cluster size to
create a rank-sum statistic for clustered data for the case when
cluster members necessarily belong to the same group. Here
we take a different approach that makes no assumptions on the

nature of the clustering, extending an idea for parameter estimation of Hoffman, Sen, and Weinberg (2001) and Williamson,
Datta, and Satten (2003) to hypothesis testing. The resulting test
is valid in a wide variety of settings, including when members
of the same cluster belong to different groups or when the correlation structure depends on group membership.
The rest of the article is organized as follows. In Section 2
we present the notation and the theoretical development of our
tests. Section 2.3 contains expressions that generalize our ranksum test to more than two groups. Section 2.4 contrasts our
test to that of Rosner et al. (2003); and Section 2.5 reports results from a simulation study comparing our procedure with the
standard Wilcoxon statistic calculated using cluster-averaged
response and the Rosner et al. statistic. In Section 3 we show
how our approach can be applied in statistical genetics to develop rank-based quantitative trait transmission-disequilibrium
tests (qTDTs). Using simulated data, we compare our rankbased qTDT with a t-test proposed by Xiong, Krushkal, and
Boerwinkle (1998) and apply our test to data on circulating
angiotensin-1 converting enzyme (ACE) levels. The main text
ends with a discussion in Section 4. The proof of the asymptotic
null distribution of our test statistic is deferred to the Appendix.
2. NOTATION AND GENERAL THEORY
Let M denote the number of clusters and let Xik denote the
kth observation in the ith cluster, 1 k ni , 1 i M, where
ni denotes the number of observations for the ith cluster. Let
gik denote group membership
 (0 or 1) of the kth observation
in the ith cluster and let k gik = ni1 be the number of members of group 1 in ith cluster. Our data consist of (X, g) :=
{Xik , gik : 1 k ni ; 1 i M}. To avoid specifying a probability model for cluster sizes and the number of observations
from each group in every cluster, we condition on ni and ni1 ;
that is, we assume that these quantities are fixed. Assume that
the gik , for a given cluster i, are identically distributed. Further
assume that although observations from the same cluster may
have arbitrary dependence, observations from different clusters
are independent. The null hypothesis that we consider is that
observations from the two groups follow the same distribution,
that is,
P(Xik x|gik = 0, ni , ni1 ) = P(Xik x|gik = 1, ni , ni1 )

Somnath Datta is Professor, Department of Statistics, University of Georgia,


Athens, GA 30602 (E-mail: [email protected]). Glen A. Satten is Mathematical Statistician, Centers for Disease Control and Prevention, Atlanta, GA 30333
(E-mail: [email protected]). The first authors research was supported in part
by the Centers for Disease Control and Prevention. The authors thank Martin
Farrall for generously providing the genetic data analyzed in Section 3.2, and
also gratefully acknowledge assistance from Gonalo Abecasis.
908

= F(x),

(1)

In the Public Domain


Journal of the American Statistical Association
September 2005, Vol. 100, No. 471, Theory and Methods
DOI 10.1198/016214504000001583

Datta and Satten: Rank Test for Clustered Data

909

for some (unknown) distribution function, F, for any pair (i, k).
Our proposed statistic is best introduced in terms of the
following Monte Carlo test. Suppose that from each cluster
we sampled a single individual ki at random, and then denoted the response Xiki by Xi and the group membership
by gi . It is not hard to verify that (Xi , gi ) are independent
F bin(1, ni1 /ni ), for 1 i M. Using (Xi , gi ), we could
construct a Wilcoxon rank-rum statistic,
W =

1 
gi Ri ,
M+1
M

j=i

i
I[Xik x] is the empirical distribuwhere Fi (x) = (ni )1 nk=1
tion function of observations from the ith cluster. Therefore,
S = E(W |X, g)
=

where Ri is the rank of Xi among the set {Xj , 1 j M}.


Our proposed test statistic corresponds to averaging W over
all possible choices of the (Xi , gi ) values given the observed
data. Hence we propose inference on
S E(S)
,
Z=
v
ar(S)

(2)

where S = E(W |X, g). This averaging is motivated by recent proposals by Hoffman et al. (2001), Rieger, Kaplan, and
Weinberg (2001), and Williamson et al. (2003).
Note that even
when data are clustered, Z := {W E(W )}/ var(W ) can
be used as a valid test of the null hypothesis (1), because
(Xi , gi ) are independent. However, this test is unappealing for
several reasons, including that it is inefficient, using only one
observation per cluster, and it depends on the particular observations chosen from each cluster. The averaging approach leading to the test statistic of (2) avoids these difficulties, because it
is an explicitly calculable function of all of the data.
The following steps are necessary before (2) can be recommended. First, we must be able to calculate the quantities
S = E(W |X, g), E(S), and v
ar(S). Second, we must establish
the (asymptotic) distribution of Z. Finally, we must evaluate the
performance of tests based on (2).
2.1 Calculation of Required Quantities
We first calculate the quantities needed to compute (2). To
allow for ties in the data, we use the mid-rank (the unweighted
average of all possible rankings of an observation). Hence,



1 

Ri = 1 +
I(Xj Xi ) +
I(Xj < Xi ) .
(3)
2
The value of S =
follows. We write
E(Ri gi |X, g)
 
gi
=E
=

Xi ) +

j=i

can be obtained using (3) as

I(Xj

ni1
1
E{gi I(Xj < Xi )|X, g} +
2
ni

1
=
E{gi Fj (Xi )|X, g}
2
j=i


ni
M 

gik
i=1 k=1

ni


+ gi X, g

1+


1
{Fj (Xik ) + Fj (Xik )} . (4)
2
j=i

Next we note that E(S) = E(W ). The unconditional expected value of W can be obtained by first conditioning on
the vector of indicators g = (g1 , . . . , gM ), so that
E(S) = E(W ) = E{E(W |g )}

M
M
1
1  ni1
=E
gi =
.
2
2
ni
i=1

(5)

i=1

To calculate var(S), we use the Hajek projection of S. Let


Xi , gi denote the data from cluster i and let Si := E{S|Xi , gi }.
To facilitate calculation of Si , note that


E gik {Fj (Xik ) + Fj (Xik )}|Xi , gi
= gik [F(Xik ) + F(Xik )] for j = i,


E gjk {Fi (Xjk ) + Fi (Xjk )}|Xi , gi


ni
nj1
1
{F(Xik ) + F(Xik )}
for j = i,
=
2
nj
ni
k=1

and

nj1
E gjk {Fh (Xjk ) + Fh (Xjk )}|Xi , gi =
2nj
for i = j, i = h, j = h.
After some algebra, we find that
S i = c i + Wi ,
where



ni
M


nj1
1
Wi =
(M 1)gik
2ni (M + 1)
nj
j=i

k=1

{F(Xik ) + F(Xik )}

j=i

j=i

j=i

< Xi )

1
E{gi I(Xj Xi )|X, g}
2
+

1
(M + 1)

j=i

E(W |X, g)

I(Xj

ni
Fj (Xik ) + Fj (Xik ) ni1
1 
+
gik
,
ni
2
ni
j=i k=1

i=1

j=i

ni1
1
E{gi Fj (Xi )|X, g} +
2
ni

(6)

and ci does not depend on Xi , gi and hence will not contribute


to var(S) calculated using Si . Taking expectations, we get


M
1
ni1  nj1

E(Wi ) =
(M 1)
2(M + 1)
ni
nj


M
ni1
1
=

2(M + 1) ni
M

j=i

M

j=1


nj1
.
nj

910

Journal of the American Statistical Association, September 2005


5,603
5
1 2
24
) = 995,328
.00563. Using these results, we obtain
( 108

59
5
( 64 6 )/ .00563 1.18 as the final test statistic.

Finally, var(S) is estimated using


v
ar(S) =

M


i E(Wi )}2 ,
{W

(7)
2.2 Asymptotic Distribution of the Rank-Sum Test
for Clustered Data

i=1

i is as in (6) with F replaced by its pooled estimate,


where W

M

M



F=
ni Fi
ni .
i=1

Asymptotic normality under the null hypothesis of the standardized test (2) can be established under the following mild
regularity conditions.
M
2
Condition 1.
i=1 (ni /N) 0, as M , where N =
M
i=1 ni is the total sample size.

i=1

It is not hard to see that the numerator of (2) reduces to


the numerator of the mid-rankbased Wilcoxon rank-sum statistic (Hudgens and Satten 2002) when there is no clustering
(i.e., ni 1), which further reduces to the numerator of the
usual Wilcoxon test statistic when no ties are present. However,
the variance (7) does not reduce to the usual Wilcoxon variance M0 M1 /12(M + 1), where Mi is the observations in group i,
but it is easily shown that (7) is asymptotically equivalent to the
usual Wilcoxon variance in the absence of clustering (i.e., the
ratio of the two variance estimates converges to 1 in probability). This issue is also addressed in Section 4.
We next illustrate the calculation of our new statistic using
the dataset in Table 1, comprising nine observations in three
clusters. Note that group membership cuts across clusters. The
quantity {Fj (Xik ) + Fj (Xik )}/2 is the proportion of observations in cluster j that are less than Xik (where we count observations in cluster j that are equal to Xik as having half of their
mass less than Xik ).
Hence for observation 2 (belonging to cluster 1), we find that j=1 {Fj (X12 ) + Fj (X12 )}/2 = 38 + 16 = 13
24 ,
because 3/2 of 4 observations in cluster 2 are counted as
less than Xik = 4 and 1/2 of 3 observations in cluster 3 are
counted as less than 4. Hence, using (4), S = E(W |X, g) =
1 (1+13/24)
+ (1+4/3)+(1+3/2)
+ (1+9/8)+(1+2)
} = 59
4{
2
4
3
64 .92. It
is easily verified that this value can also be obtained by averaging the 2 3 4 = 24 Wilcoxon statistics W obtained by
selecting one observation at a time from each group, when midranks are used for the tied observations. Further, using (5),
E(S) = 12 ( 12 + 12 + 23 ) = 56 .83. Note that S > E(S), indicating a tendency for members of group 1 to have higher
i for each cluster.
Xik values. To calculate v
ar(S), we calculate W
1
2
2 1
7
14

} = 432
For cluster 1, W1 = 24 {( 4 + 3 ) 18 + [2 ( 24 + 23 )] 18
3 1
1 1
1
2
1
and E(W1 ) = 24 [ 2 3 ( 2 + 2 + 3 )] = 48 . Similarly,
3 = 5 , and
2 = 55 , E(W2 ) = 1 , W
we find that W
1,728
48
108
1
14
1 2
55
1 2
E(W3 ) = 24 . Finally, v
ar(S) = ( 432 + 48 ) + ( 1,728
+ 48
) +

Condition 2. With Wi given by (6),


lim inf M 1
M

M


var(Wi ) > 0.

i=1

Recall the definitions of test statistic S, E(S), and v


ar(S)
given by (4), (5), and (7).
Theorem 1. Under Conditions 1 and 2,
S E(S) d
N(0, 1),

v
ar(S)

as M .

Condition 1 is a sample size condition needed for the consistency of the pooled estimate 
F of the common marginal distribution function. It is satisfied if, for example, the ni are bounded.
Condition 2 is a technical condition needed to ensure that the
estimated asymptotic variance v
ar(S) given by (7) can be used
in the standardization of the test statistic. Because Wi = WiM is
an average based on potentially dependent variables, it is not
possible to give simpler sufficient conditions for Condition 2
in general. However, for the special case when the variables
within each cluster are independent or positive dependent, it
can be seen by direct calculation (see the App.) that for all large
enough M,
var(Wi ) .05

(i )2
,
ni

where
i =

ni1
,
ni

= M 1

Cluster
(i)

Member
(k)

Xik

gik

1
2

j =i {Fj (Xik ) + Fj (Xik )}

13
24

5
12
4
3
3
2
9
8
23
8

M

ni1
i=1

Table 1. Synthetic Data to Illustrate Calculation of Our Proposed Test Statistic


ID
no.

(8)

1 

2 {F (Xik ) + F (Xik )}
1
18
7
18
3
18
7
18
11
18
7
9
7
18
7
9
17
18

ni

Datta and Satten: Rank Test for Clustered Data

911

Therefore, in this case a sufficient condition for Condition 2


involving just the sample sizes is
lim inf M

M

(i )2

ni

i=1

> 0.

2.3 Extension to m Groups


Next we consider an extension of our procedure to the
case when individuals within each clusters may belong to one
of m possible groups. The null hypothesis to be tested in this
context is that the marginal distributions are the same in all of
the groups. As before, let gik denote the group status (= j if it
belongs to the jth group; 1 j m) of the kth individual in the
ith group.
Corresponding to each group j {1, . . . , m}, define the corresponding rank-sum statistic S( j) obtained by (4), but with gik
( j)
replaced by gik = I(gik = j). Similarly, E(S( j) ) under the
null hypothesis can be obtained from (5), but with ni1 replaced by nij , the number of group j individuals in cluster i.
We propose a statistic that compares S( j) with its expected
( j)
value E(S( j) ) under the null hypothesis. Furthermore, let Wi
be as in (6), but with the gs replaced by g( j) and ni1 replaced
i( j) likewise. Calculate the spectral decomby nij , and define W

T


position of 
V := M 1 M
i=1 {Wi E(Wi )}{Wi E(Wi )} =
m (M) (M) (M)T




, where Wi E(Wi ) is the m-vector with
j=1 ( j) Pj Pj
(
j)
(M)
(M)
 E(W ( j) ), 
components W

are the (ori

(1)

(m)

(M)
dered) eigenvalues of 
V, and 
Pj are the corresponding or
(M) 1(M)(M)T .
thonormal eigenvectors. Let 
V = m1
j=1 {( j) } Pj Pj
Then one would reject the null hypothesis of equality of marginal distributions across groups for large values of the test statistic

T = M 1 {S E(S)}T 
V {S E(S)},

(9)

where S E(S) is the m-vector with components S( j) E(S( j) ).


This test can be considered a form of the KruskalWallis test for
clustered data. Under appropriate regularity conditions, T will
have an asymptotic chi-squared distribution with m 1 degrees
of freedom under the null hypothesis.
(M)

Condition 3. Let (m1) denote the second smallest eigen


T
value of the matrix M 1 M
i=1 E{Wi E(Wi )}{Wi E(Wi )} .
Assume that
lim inf (M)
(m1) > 0.
M

This condition ensures that the variancecovariance matrix


of S E(S) has rank m 1. We now state the following asymptotic distribution theorem, proof of which is deferred to the
Appendix.
Theorem 2. Under Conditions 1 and 3,
d

2
T m1
,

where T is given by (9).

as M ,

2.4 Comparison to the Rosner, Glynn, and Lee Test


and Other Rank-Sum Tests
Recently, Rosner et al. (hereafter RGL) (2003) proposed a
rank test for clustered data for the case where all observations
in the same cluster belong to the same group. In addition, the
RGL test assumes that cluster members are exchangeable and
that the correlation structure within clusters is independent of
group. The RGL test stratifies on cluster size; after ranking all
observations (ignoring clustering), it compares the rank sum of
observations from group 1 having cluster size k to the fraction
of group 1 clusters of size k times the total rank sum of observations from clusters of size k. The final statistic sums the
comparisons over all cluster sizes k and divides by an appropriate standard deviation.
The nature of the RGL statistic gives an indication of situations where good or poor performance can be expected.
Because RGL compares groups at each cluster size, imbalance
in the distribution of groups across cluster size strata will result
in inefficiency. At the extreme, all data from cluster sizes for
which only one group is represented are ignored by the RGL
test. Additionally, by scoring clusters by the sum of ranks of
cluster members, we can anticipate that RGL will perform best
when correlation is weak; as correlation increases, the effective
number of independent observations per cluster drops, so that
RGL overweights larger clusters. By the same reasoning, our
new approach may be less efficient than RGL when correlation is weak, because large clusters are underweighted. We explore these predictions in the context of a simulation study in
the next section.
2.4.1 Simulation Study. Here we report a small simulation
study to assess the properties of our proposed test statistic, to
compare our test with both the RGL tests and the standard
Wilcoxon tests that ignores any clustering, and to illustrate a
difficulty in using the standard Wilcoxon rank-sum test that
treats the cluster as the experimental unit and uses the withincluster average as the cluster response. We consider the situation where all members of a cluster belong to the same group,
which is a requirement when computing the RGL test and
when using the within-cluster average response. In Section 3
we present results for our proposed test in a more complex simulation that has both within-cluster correlation and members of
the same cluster belonging to different groups.
Let M0 and M1 denote the number of clusters whose members are in group 0 and group 1. As gik = gik we drop the
second index on g for this section only. We generated data
Xik = exp(Yik ) + gi , where Yi = (Yik , . . . , Yini ) were independent multivariate normal variates with mean 0 and exchangeable variancecovariance matrix (1 gi )I + gi 1, where I is
the ni ni identity matrix and 1 is the ni ni matrix with all
entries equal to 1. Note that when 0 = 1 = 0, there is no clustering; that is, data in each group are iid. We chose cluster sizes
to allow for the possibility of informative cluster sizes. Clusters with gi = 0 had ni = 2 with probability pc and cluster size
ni = 5 with probability 1 pc , whereas clusters with gi = 1 had
ni = 5 with probability pc and cluster size ni = 2 with probability 1 pc . Note that for pc = .5, the distribution of cluster sizes
is different between the two groups (i.e., cluster size is informative). Simulation results for various values of M0 , M1 , pc , 0 ,
1 , and = 0 (null) or = .5 (alternative) are given in Table 2.

912

Journal of the American Statistical Association, September 2005

Table 2. Simulation Results for Size ( = 0) and Power ( =.5), for Cluster Average (CA), RGL, New Test (DS),
and Standard Wilcoxon (W) Ignoring Cluster Effect

Distribution type

M0

M1

pc

CA

Continuous

10
10
25

10
10
25

0
0
0

0
0
0

.087
.064
.079

10
10
25
25
25
25

10
10
25
25
25
25

0
.2
.2
.5
.5
.2

0
0
0
0
.9
.9

0
0
0
0
.9
.1

.087
.061
.081
.049
.050
.320

Discrete

Continuous

.2
.2

Size
RGL
DS

CA

Power
RGL
DS

.044
.049

.052
.053
.051

.050
.049
.050

.40
.35
.70

.39
.78

.45
.49
.87

.55
.60
.93

.044
.049
.049
.049
.170

.053
.052
.051
.051
.052
.051

.049
.049
.050
.050
.320
.170

.35
.29
.63
.57
.49
.83

.26
.60
.94
.46
.56

.34
.36
.71
.91
.53
.65

.38
.42
.79
.95
.82
.83

NOTE: The nominal size of all tests is .05.


The size and power of RGL statistic based on only those simulated datasets for which the RGL test is defined.

To explore the effect of discrete data, for some simulations


we replaced Xik by the largest integer less than or equal to Xik .
The resulting distribution assigns mass to nonnegative integers
and has a 90th percentile between 2 and 3, resulting in heavily
tied data. For the RGL test when data are tied, we used the midrank in all expressions.
The simulation results in Table 2 illustrate several points.
The size of our proposed test was very close to nominal for all
simulation conditions. Our new test also performed well even
when group membership completely determined cluster size
( pc = 0), whereas the RGL test cannot be used for this case.
The size of the RGL test was also close to nominal for pc > 0,
except when clusters from different groups had different correlation structure (0 = 1 ). We also found that our new test had
appropriate size even for heavily tied data, as did the RGL test
using mid-ranks. Not surprisingly, power for all tests was lower
for discrete data than for continuous data.
For simulations with pc = .2, the power of our new test was
higher than that of the RGL test. However, when pc was increased to .5, the RGL test had higher power. This is because
the RGL test can perform poorly when the distribution of cluster sizes differs across groups. Note, however, that even when
pc = .5, if 0 and 1 are increased to .9, then the power of the
RGL test decreases to below that of our new statistic. When
cluster members are very correlated, the RGL statistic overweights large clusters, because it uses a sum of ranks of all cluster members even when the cluster effectively contributes only
one observation. Finally, when the correlation between cluster
members differs across groups (i.e., when 0 = 1 ), only our
new test maintained size.
For all simulations having 0 = 1 = 0, the standard Wilcoxon test ignoring the clustering is valid. Not surprisingly, the
standard Wilcoxon test outperformed both our new test and the
RGL test for these simulations. However, for simulations with
0 = 0 and 1 = 0, the size of the standard Wilcoxon was far
from the nominal .05 level. When cluster size was informative
( pc = .5), the size of the Wilcoxon test that used the average
cluster response was larger than nominal, even as the power
was lower than that of our new test. When pc = .5, so that the
two groups have the same distribution of cluster sizes, then the
Wilcoxon test using the average cluster response was properly
sized, but still had lower power than either our new test or the
RGL test.

3. TESTING WHEN CLUSTER MEMBERS CAN


BELONG TO DIFFERENT GROUPS
In this section we consider the situation where group and
cluster memberships do not coincide. As a concrete example,
we can consider testing the difference in blood pressure between boys and girls when some study participants are siblings.
In this case clusters correspond to sets of siblings, whereas the
groups correspond to boys or girls. The restriction that all members of the same cluster belong to the same group would correspond to the requirement that only families composed of only
boys or only girls be included in the study. There is no currently available rank-based approach to test group differences
when group and cluster memberships do not coincide.
3.1 Testing for Linkage and Association Between a
Marker Locus and a Quantitative Trait Locus
Genetic epidemiologists use family-based association tests to
determine whether alleles at a marker locus (a locus where genetic variation can be measured but where genetic variability
may not be directly related to disease mechanism) are associated (or correlated) with alleles at a locus that does directly
affect a trait of interest. Association between alleles at marker
and trait loci can arise either because of confounding (called
population stratification in statistical genetics) or because the
marker and trait loci are located in close proximity to each other
on the same chromosome. This latter case is called linkage,
and a finding of both association and linkage between alleles
at marker and trait loci is an important step in mapping trait
loci. Transmission-disequilibrium tests (TDTs) use data from
nuclear families and take nonnull values only when both association and linkage exist between marker and trait loci. TDTs
for quantitative traits are referred to as qTDTs.
Xiong et al. (1998) (XKB hereafter) proposed a qTDT
based on a t-test. Consider a marker locus with two alleles,
a and A. We consider data from nuclear families in which at
least one parent is heterozygous (i.e., has genotype aA), and
compare the trait values X between those children who received the a allele from their heterozygous parent and those
children who received the A allele from their heterozygous
parent. If the ith family has only one heterozygous parent,
then ni is the number of offspring, Xik is the trait value of
the kth offspring, and gik = 1 (0) if the heterozygous parent transmitted the a (A) allele to the kth offspring. If the

Datta and Satten: Rank Test for Clustered Data

913

ith family has two heterozygous parents, then ni is twice the


number of offspring, Xi = (Xi1 , Xi1 , Xi2 , Xi2 , . . . , Xioi , Xioi ) and
gi = (Fi1 , Mi1 , Fi2 , Mi2 , . . . , Fioi , Mioi ), where Fik = 1 (0) if the
father transmitted the a (A) allele to the kth offspring and Mik is
defined similarly for mothers. Note that for the heterozygous
offspring of two heterozygous parents, we do not actually know
whether it was the mother or father who transmitted the a allele,
but it is only important that one value of g is 1 and the other is 0.
Given data Xi and gi for M families, XKB (1998) proposed
the t-test,
T=
where mr =

M ni

(1/m1 + 1/m0 )S2

k=1 I[gik

i=1

Xr =

(X 1 X 0 )

= r],

M ni
1 
I[gik = r]Xik
mr

for r = 0, 1,

i=1 k=1

and
S2 =

1
(m1 + m0 2)

ni
M 



gik (Xik X 1 )2 + (1 gik )(Xik X 0 )2 ,
i=1 k=1

to compare whether the trait values of offspring that received


the a or A alleles differ. XKB showed that such a difference is
evidence of association and linkage disequilibrium. We propose
replacing the t-test by the expected value of the Wilcoxon ranksum test averaged over all possible samples of one transmission
event per family.
3.2 A Simulation Study for the Quantitative Trait
Transmission-Disequilibrium Tests
To compare the performance of our test with the XKB
test using simulated data, we simulated data that mimic the
inheritance of genetic material and heritable traits. We generated parental genotypes using a binomial distribution with
parameter p, the proportion of a alleles. We assumed that each
parental allele was equally likely to be transmitted to each offspring (unselected sampling). We generated parental trait values
XiF and XiM independently from a distribution F and generated
offspring trait values using
Xik = cik + (XiF + XiM ) + ik ,

k = 1, . . . , oi ,

where cik and Xik are the number of a alleles and the trait value
for the kth offspring, ik are iid random variables with distribution F, and and are constants. A non-0 value of corresponds to an (additive) effect of genotype at the marker locus
on trait values, whereas a non-0 value of corresponds to familial correlation (possibly due to genetic effects at loci that are
not linked to the marker locus).
Each simulated dataset contained data on 50 families. The
number of offspring per family followed a uniform distribution
on the integers 1, 2, 3, and 4. For each simulation, we generated
100,000 datasets. To determine the effect of the underlying distribution F, we simulated phenotypes (of both parent and offspring) using either a N(0, 1) or a lognormal distribution with
log-mean 0 and log-variance 1. Results are shown in Table 3,

Table 3. Probability of Rejecting the Null Hypothesis by the qTDTs


in a Simulation Experiment
Probability of rejection

F
(additive effect) (family effect) (error distribution) XKB
Rank

0
0
0
.5
.5
.9

0
0
.5
0
0
.5

N(0, 1)
LN(0, 1)
LN(0, 1)
N(0, 1)
LN(0, 1)
LN(0, 1)

.052
.047
.050
.850
.370
.610

.048
.045
.045
.670
.740
.750

where we tabulate the size and power for tests with a nominal
size of .05.
When = 0 (no effect of the locus on phenotype), both
the XKB and our new test had appropriate size, irrespective
of the distribution of phenotypes or the presence of a parental
(or random) effect. When phenotypes were normally distributed, the XKB test had higher power to detect departures from
= 0 than the rank test. However, when phenotypes were lognormally distributed, the rank test had higher power than the
XKB test. Interestingly, a parental (or random) effect decreased
the difference in power between the rank and XKB tests; comparing simulations having = .5 and = 0 with simulations
having = .9 and = .5, we see that the power of the rank
test is approximately the same, but the power of the XKB test
is increased for the simulations having > 0.
3.3 Application to Data on Circulating
Angiotensin-1Converting Enzyme Levels
Data on circulating ACE levels in 69 British families were
reported by Keavney et al. (1998). ACE levels were standardized separately for men and women; in addition, probands were
genotyped at 10 marker loci in the ACE gene. We abstracted
data on nuclear families by selecting from each pedigree only
those members having no offspring and both parents in the pedigree. Of these nuclear families, we used only those in which
both parents were genotyped. Offspring whose circulating ACE
levels were not measured were excluded from the analysis.
We tested marker locus I/D for linkage to and association with
a quantitative trait locus that affects circulating ACE levels.
We selected this marker locus because it had the greatest evidence of association in an analysis of these data by Abecasis,
Cookson, and Cardon (2000). Of the original 69 pedigrees, we
used only 37 nuclear families in the analysis. The joint distribution of the number of offspring and the number of heterozygous
parents is given in Table 4. From this table, we see that values
of ni for these data ranged from 1 to 10 (corresponding to twice
the number of offspring in a family with two heterozygous parents). Using (2), we obtained Z = 3.81 ( p = 1.4 104 ), giving strong evidence of linkage and association. The sign of the
Table 4. Family Size and Parental Heterozygosity in the ACE Data
of Keavney et al. (1998)
Number of
heterozygous
parents

1
2
Total

6
1
7

9
5
14

Number of offspring
3
4
5
8
5
13

0
1
1

0
1
1

Total

1
0
1

24
13
37

914

Journal of the American Statistical Association, September 2005

statistic indicates that the allele coded 1 corresponds to lower


levels of circulating ACE.
4. DISCUSSION
As noted previously, our calculation of the variance of our
test statistic does not reduce to the usual variance of the
Wilcoxon test statistic in the absence of clustering. An alternative approach to developing a rank-sum test that would reduce
to the usual Wilcoxon test statistic in the absence of clustering
would be to calculate var(S) using the subtraction estimator
var(S) = var(W ) E{var(W |X, g)}.

(10)

Because, given the g , W is the standard Wilcoxon based on


iid data X , the classical variance formula of rank-sum test can
be used to compute the first term on the right side of (10),
var(W )

M

2
1
M

E
=
gi g
gi
+ var
12(M + 1)
2
i=1

i=1

M

M
ni1
12(M + 1)
ni
i=1

2 
M



M
ni1
ni1
1  ni1
1
+

M
ni
ni
ni
i=1

1
4

M

i=1

i=1

ni1
ni1
1
,
ni
ni

assuming no ties. It is easy to see that if there were no clustering


(i.e., ni 1), then this would reduce to the usual variance of
Wilcoxon for iid data, and would also equal var(S), because in
this case the second term on the right side of (10) would be 0.
In general, the term E{var(W |X, g)} needs to be estimated via
projection techniques as before. For example, an estimator of
this (without correcting for ties) is given by


2 
ni
ni
M
1
1
1 
2
Vik
Vik
,
ni
ni
M2
i=1

k=1

k=1

where
Vik = gik +

nj

gik gjh
j=i h=1

nj

I(Xjh Xik ).

There is, however, a practical difficulty in using this approach. There is no guarantee that the resulting estimator
of var(S) obtained by the subtraction formula (10) will be positive. In fact, using simulated data, we have seen that in some
settings (10) is negative in a nonnegligible proportion of samples. This typically occurs when each term on the right side
of (10) is large compared to their difference. Further, even for
situations when (10) results in a positive variance estimator,
we found that the resulting standardized test had worse performance than the test using (7). For these reasons, we did not
pursue this approach.
The simulation results the we report in Section 2.4 show
that our test can be conservative when the number of clusters
is small. In this case it would be desirable to calculate exact

p values using a Monte Carlo scheme. When all members of


a cluster belong to the same group, this is easily accomplished.
A permutation test can be constructed by repeatedly randomly
permuting the group memberships of each cluster and then calculating our test statistic. The empirical quantiles of the resulting distribution can be used for hypothesis testing. However,
the situation is not so straightforward when group membership
can vary within clusters. In this case it appears necessary to
make assumptions about the nature of the within-group correlation before a permutation or bootstrap scheme can be suggested.
One possibility is to permute the group membership indicators
without regard to cluster, which is appropriate if within-group
correlations are independent of group membership. An alternative is to stochastically reassign group membership simultaneously to all members of the same group within each cluster.
For example, with two groups with independent probability 1/2
for each cluster, we would reassign all group 0 members to
group 1 and all group 1 members to group 0. This scheme is
appropriate when the correlation structure within a group is
determined by group membership. However, the sensitivity of
these permutation schemes to misspecification of the correlation structure is unknown.
The basic idea of calculating the expected value of the ranksum statistic conditional on sampling one observation from
each cluster may be applicable to other tests as well, including linear-rank and signed-rank statistics. We plan to pursue
a general theory in subsequent publications. However, preliminary calculation suggests that for linear-rank and signed-rank
tests this approach leads to somewhat restrictive conditions on
the score-generating function, and that many commonly used
scores (e.g., Savage, normal) must be modified to satisfy the
conditions required to establish asymptotic normality of the test
statistic.
APPENDIX: PROOFS OF ASYMPTOTIC
DISTRIBUTION THEOREMS
Consider the probability model conditional on nij , i 1, j = 0, 1.
 
Let U = (M + 1)S/ M
2 . Then U is very similar to a U-statistic
based on independent (but not identically distributed) random vectors Y1 , . . . , YM , with Yi = (Xi1 , gi1 , . . . , Xini , gini ), 1 i M, and
a kernel h = hij of order 2 that depends on M and the nij . Because
the random vectors involved are of different lengths and not identically distributed and the kernel function depends on the sample size,
standard theory for U-statistics does not immediately apply. However,
we show here that the asymptotic normality of S can be established
along similar lines as the standard U-statistic theory.
Proof of Theorem 1
Through careful examination of the reminder term in the first-order
Hoeffding decomposition, we will show that


M
1
(M + 1) 
,
(A.1)
(Wi E(Wi )) + op
U E(U) = M 
M
2
i=1
as M . Let
(M + 1) 
(Wi E(Wi )).
M 
M

rM = U E(U)

i=1

It follows from the a slight modification of the representation in


equation 5.3.2 of Serfling (1980) (which can be seen to hold for independent but not identically distributed random vectors) that rM is

Datta and Satten: Rank Test for Clustered Data

915

in turn a U-statistic based on a centered second-order kernel H = Hij


defined by

Proof of Theorem 2
Consider the spectral decomposition

H(yi , yj ) = h(yi , yj ) Eh(yi , Yj ) Eh( Yi , yj ) + Eh( Yi , Yj ).


Moreover, because |h( Yi , Yj )| is bounded (uniformly in i, j, and M),
so is |H( Yi , Yj )| by its relationship with h. Therefore, by direct calculation of the second moment of rM along the line of lemma 5.2.2B
of Serfling (1980), it follows that E(|rM |2 ) = O(M 2 ), implying that
rM = oP (M 1/2 ).
Note that the Wi s are independent and bounded random variables.
Therefore, by the Lindeberg central limit theorem for triangular arrays
(Billingsley 1986, thm. 27.2),
M
d
i=1 (Wi E(Wi ))

N(0, 1), as M ,
(A.2)
M
var(W
)
i
i=1

M


E{Wi E(Wi )}{Wi E(Wi )}T

i=1

m


(M) (M) (M)T


Pj
,

( j) Pj

j=1
(M)
(M)
(M)
where (1) (m1) (m) = 0 are the eigenvalues and
(M)
the Pj s are the corresponding orthonormal eigenvectors of V. Using

the CramerWold device and the Lindeberg central limit theorem,


we obtain
d

(Z1 , . . . , Zm1 )T Nm1 (0, Im1 ),

var(Wi ) ,

as M .

 (M) 1/2
(M)
{S E(S)}T Pj ,
Zj = ( j) M

(A.3)

i=1

Clearly, (A.3) follows from condition 2. Moreover, by the law of large


numbers for independent and uniformly integrable summands (see,
e.g., Fabian and Hannan 1985),
M 1

M


(A.6)

where

provided that

M


V := M 1

(Wi E(Wi ))2 M 1

i=1

M


var(Wi ) 0,
(A.4)

Next, note that for each x, 


F (x) is a weighted average of independent
summands Fi (x), each of which has expectation F(x), it converges
in probability to F(x) under condition 1. Moreover, the convergence
holds in sup norm as well, by the usual arguments as in the proof of
the GlivenkoCantelli theorem. Therefore, from (A.4) we get
ar(S) M 1
M 1 v

M


var(Wi ) 0,

as M ,

M





 P
E{Wi E(Wi )}{Wi E(Wi )}T  0,


where denotes any norm on m . Therefore, using Condition 3


and (A.6), we obtain
d
Zm1 )T Nm1 (0, Im1 ),
(
Z1 , . . . , 

(A.8)

(M)
(M)
where the 
Zj s are given by (A.7) with ( j) and Pj
replaced by

(M)
(M)


2
( j) and Pj . We complete the proof by noting that T = m1
j=1 Zj .

[Received September 2003. Revised August 2004.]

i=1

which further implies that


M
p
i=1 var(Wi )
1,
v
ar(S)

i=1

i=1

as M .

(A.7)

Next, using the law of large numbers and the uniform consistency of 
F,
we obtain, as before,

M

 1  
 i E(Wi )}T
{Wi E(Wi )}{W
M

M 1

i=1

1 j m 1.

REFERENCES
as M ,

(A.5)

if Condition 2 holds.
Finally, using (A.2) and (A.5) with the representation (A.1), we conclude the proof of the theorem.
Proof of (8)
Assume continuous distributions for simplicity. Then
 2
ni
F (x) dF(x) = 1/3. Note that Wi = n1
i
k=1 Wik , say, where
1
(i ),
2
2 1 ( 2 + 2 ) 1 ( )2 .
EWik
i
i
i
3
3

EWik

Therefore, when the Wik s are independent or positive dependent,




1
 (i )2 n1
var(Wi )
i
12
for any  > 0 and all large enough M. Let  = 1/12 1/20 to complete
the proof.

Abecasis, G. R., Cookson, W. O. C., and Cardon, L. R. (2000), Pedigree Tests


of Transmission Disequilibrium, European Journal of Human Genetics, 8,
545551.
Billingsley, P. (1986), Probability and Measure (2nd ed.), New York: Wiley.
Fabian, V., and Hannan, J. (1985), Introduction to Probability and Mathematical Statistics, New York: Wiley.
Hoffman, E. B., Sen, P. K., and Weinberg, C. R. (2001), Within-Cluster Resampling, Biometrika, 88, 11211134.
Hudgens, M. G., and Satten, G. A. (2002), Midrank Unification of Rank Tests
for Exact, Tied and Censored Data, Journal of Nonparametric Statistics, 14,
569581.
Keavney, B., McKenzie, C. A., Connell, J. M., Julier, C., Ratcliffe, P. J.,
Sobel, E., Lathrop, M., and Farrell, M. (1998), Measured Haplotype Analysis of the Angiotensin-I Converting Enzyme Gene, Human Molecular Genetics, 7, 17451763.
Rieger, R. H., Kaplan, N. L., and Weinberg, C. R. (2001), Efficient Use of
Siblings in Testing for Linkage and Association, Genetic Epidemiology, 20,
175191.
Rosner, B., Glynn, R. J., and Lee, M.-L. T. (2003), Incorporation of Clustering Effects for the Wilcoxon Rank-Sum Test: A Large-Sample Approach,
Biometrics, 59, 10891098.
Serfling, R. J. (1980), Approximation Theorems of Mathematical Statistics,
New York: Wiley.
Williamson, J. M., Datta, S., and Satten, G. A. (2003), Marginal Analyses of
Clustered Data When Cluster Size Is Informative, Biometrics, 59, 3642.
Xiong, M. M., Krushkal, J., and Boerwinkle, E. (1998), TDT Statistics for
Mapping Quantitative Trait Loci, Annals of Human Genetics, 62, 431452.

You might also like