0% found this document useful (0 votes)

1K views505 pages

Robust Nonparametric Statistical Methods

Uploaded by

Petronilo Jamachit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views505 pages

Robust Nonparametric Statistical Methods

Uploaded by

Petronilo Jamachit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Robust Nonparametric Statistical Methods

Thomas P. Hettmansperger
Penn State University
and
Joseph W. McKean
Western Michigan University
Copyright c _1997, 2008, 2010 by Thomas P. Hettmansperger and Joseph W. McKean
All rights reserved.
ii
Dedication: To Ann and to Marge
Contents
Preface ix
1 One Sample Problems 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Location Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Geometry and Inference in the Location Model . . . . . . . . . . . . . . . . . 4
1.3.1 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Properties of Normed-Based Inference . . . . . . . . . . . . . . . . . . . . . . 17
1.5.1 Basic Properties of the Power Function
S
() . . . . . . . . . . . . . 18
1.5.2 Asymptotic Linearity and Pitman Regularity . . . . . . . . . . . . . . 21
1.5.3 Asymptotic Theory and Eciency Results for

. . . . . . . . . . . . 24
1.5.4 Asymptotic Power and Eciency Results for the Test Based on S() 25
1.5.5 Eciency Results for Condence Intervals Based on S() . . . . . . . 27
1.6 Robustness Properties of Norm-Based Inference . . . . . . . . . . . . . . . . 30
1.6.1 Robustness Properties of

. . . . . . . . . . . . . . . . . . . . . . . . 30
1.6.2 Breakdown Properties of Tests . . . . . . . . . . . . . . . . . . . . . . 33
1.7 Inference and the Wilcoxon Signed-Rank Norm . . . . . . . . . . . . . . . . 35
1.7.1 Null Distribution Theory of T(0) . . . . . . . . . . . . . . . . . . . . 36
1.7.2 Statistical Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.7.3 Robustness Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.8 Inference Based on General Signed-Rank Norms . . . . . . . . . . . . . . . . 44
1.8.1 Null Properties of the Test . . . . . . . . . . . . . . . . . . . . . . . . 46
1.8.2 Eciency and Robustness Properties . . . . . . . . . . . . . . . . . . 47
1.9 Ranked Set Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
1.10 Interpolated Condence Intervals for the L
1
Inference . . . . . . . . . . . . . 56
1.11 Two Sample Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
1.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2 Two Sample Problems 73
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.2 Geometric Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
iii
iv CONTENTS
2.2.1 Least Squares (LS) Analysis . . . . . . . . . . . . . . . . . . . . . . . 77
2.2.2 Mann-Whitney-Wilcoxon (MWW) Analysis . . . . . . . . . . . . . . 78
2.2.3 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.4 Inference Based on the Mann-Whitney-Wilcoxon . . . . . . . . . . . . . . . . 83
2.4.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.4.2 Condence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.4.3 Statistical Properties of the Inference Based on the MWW . . . . . . 92
2.4.4 Estimation of . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
2.4.5 Eciency Results Based on Condence Intervals . . . . . . . . . . . . 97
2.5 General Rank Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.5.1 Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.5.2 Eciency Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
2.5.3 Connection between One and Two Sample Scores . . . . . . . . . . . 107
2.6 L
1
Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
2.6.1 Analysis Based on the L
1
Pseudo Norm . . . . . . . . . . . . . . . . . 108
2.6.2 Analysis Based on the L
1
Norm . . . . . . . . . . . . . . . . . . . . . 112
2.7 Robustness Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
2.7.1 Breakdown Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 115
2.7.2 Inuence Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
2.8 Lehmann Alternatives and Proportional Hazards . . . . . . . . . . . . . . . . 118
2.8.1 The Log Exponential and the Savage Statistic . . . . . . . . . . . . . 119
2.8.2 Eciency Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
2.9 Two Sample Rank Set Sampling (RSS) . . . . . . . . . . . . . . . . . . . . . 123
2.10 Two Sample Scale Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
2.10.1 Optimal Rank-Based Tests . . . . . . . . . . . . . . . . . . . . . . . . 125
2.10.2 Ecacy of the Traditional F-Test . . . . . . . . . . . . . . . . . . . . 133
2.11 Behrens-Fisher Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
2.11.1 Behavior of the Usual MWW Test . . . . . . . . . . . . . . . . . . . . 135
2.11.2 General Rank Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
2.11.3 Modied Mathisens Test . . . . . . . . . . . . . . . . . . . . . . . . . 138
2.11.4 Modied MWW Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
2.11.5 Eciencies and Discussion . . . . . . . . . . . . . . . . . . . . . . . . 141
2.12 Paired Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
2.12.1 Behavior under Alternatives . . . . . . . . . . . . . . . . . . . . . . . 145
2.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
3 Linear Models 153
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
3.2 Geometry of Estimation and Tests . . . . . . . . . . . . . . . . . . . . . . . . 153
3.2.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
3.2.2 The Geometry of Testing . . . . . . . . . . . . . . . . . . . . . . . . . 156
CONTENTS v
3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
3.4 Assumptions for Asymptotic Theory . . . . . . . . . . . . . . . . . . . . . . 164
3.5 Theory of Rank-Based Estimates . . . . . . . . . . . . . . . . . . . . . . . . 166
3.5.1 R-Estimators of the Regression Coecients . . . . . . . . . . . . . . . 166
3.5.2 R-Estimates of the Intercept . . . . . . . . . . . . . . . . . . . . . . . 170
3.6 Theory of Rank-Based Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
3.6.1 Null Theory of Rank Based Tests . . . . . . . . . . . . . . . . . . . . 177
3.6.2 Theory of Rank-Based Tests under Alternatives . . . . . . . . . . . . 181
3.6.3 Further Remarks on the Dispersion Function . . . . . . . . . . . . . . 185
3.7 Implementation of the R-Analysis . . . . . . . . . . . . . . . . . . . . . . . . 187
3.7.1 Estimates of the Scale Parameter

. . . . . . . . . . . . . . . . . . 188
3.7.2 Algorithms for Computing the R-Analysis . . . . . . . . . . . . . . . 191
3.7.3 An Algorithm for a Linear Search . . . . . . . . . . . . . . . . . . . . 193
3.8 L
1
-Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
3.9 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
3.9.1 Properties of R-Residuals and Model Misspecication . . . . . . . . . 196
3.9.2 Standardization of R-Residuals . . . . . . . . . . . . . . . . . . . . . 202
3.9.3 Measures of Inuential Cases . . . . . . . . . . . . . . . . . . . . . . 208
3.10 Survival Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
3.11 Correlation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
3.11.1 Hubers Condition for the Correlation Model . . . . . . . . . . . . . . 221
3.11.2 Traditional Measure of Association and its Estimate . . . . . . . . . . 223
3.11.3 Robust Measure of Association and its Estimate . . . . . . . . . . . . 223
3.11.4 Properties of R-Coecients of Multiple Determination . . . . . . . . 225
3.11.5 Coecients of Determination for Regression . . . . . . . . . . . . . . 230
3.12 High Breakdown (HBR) Estimates . . . . . . . . . . . . . . . . . . . . . . . 232
3.12.1 Geometry of the HBR-Estimates . . . . . . . . . . . . . . . . . . . . 232
3.12.2 Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
3.12.3 Asymptotic Normality of

HBR
. . . . . . . . . . . . . . . . . . . . . 235
3.12.4 Robustness Prperties of the HBR Estimates . . . . . . . . . . . . . . 239
3.12.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
3.12.6 Implementation and Examples . . . . . . . . . . . . . . . . . . . . . . 243
3.12.7 Studentized Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . 244
3.12.8 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
3.13 Diagnostics for Dierentiating between Fits . . . . . . . . . . . . . . . . . . 247
3.14 Rank-Based procedures for Nonlinear Models . . . . . . . . . . . . . . . . . . 252
3.14.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
3.15 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
3.16 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
vi CONTENTS
4 Experimental Designs: Fixed Eects 275
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
4.2 Oneway Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
4.2.1 R-Fit of the Oneway Design . . . . . . . . . . . . . . . . . . . . . . . 277
4.2.2 Rank-Based Tests of H
0
:
1
= =
k
. . . . . . . . . . . . . . . . 281
4.2.3 Tests of General Contrasts . . . . . . . . . . . . . . . . . . . . . . . . 283
4.2.4 More on Estimation of Contrasts and Location . . . . . . . . . . . . . 284
4.2.5 Pseudo-observations . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
4.3 Multiple Comparison Procedures . . . . . . . . . . . . . . . . . . . . . . . . 288
4.3.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
4.4 Twoway Crossed Factorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
4.5 Analysis of Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
4.6 Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
4.7 Rank Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
4.7.1 Monte Carlo Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
5 Models with Dependent Error Structure 323
5.1 General Mixed Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
5.1.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
5.2 Simple Mixed Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
5.2.1 Variance Component Estimators . . . . . . . . . . . . . . . . . . . . . 328
5.2.2 Studentized Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . 329
5.2.3 Example and Simulation Studies . . . . . . . . . . . . . . . . . . . . 330
5.2.4 Simulation Studies of Validity . . . . . . . . . . . . . . . . . . . . . . 331
5.2.5 Simulation Study of Other Score Functions . . . . . . . . . . . . . . . 333
5.3 Rank-Based Procedures Based on Arnold Transformations . . . . . . . . . . 333
5.3.1 R Fit Based on Arnold Transformed Data . . . . . . . . . . . . . . . 334
5.4 General Estimating Equations (GEE) . . . . . . . . . . . . . . . . . . . . . . 339
5.4.1 Asymptotic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
5.4.2 Implementation and a Monte Carlo Study . . . . . . . . . . . . . . . 343
5.4.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
5.5 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
6 Multivariate 351
6.1 Multivariate Location Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
6.2 Componentwise Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
6.2.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
6.2.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
6.2.3 Componentwise Rank Methods . . . . . . . . . . . . . . . . . . . . . 364
6.3 Spatial Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
CONTENTS vii
6.3.1 Spatial sign Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
6.3.2 Spatial Rank Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 373
6.4 Ane Equivariant and Invariant Methods . . . . . . . . . . . . . . . . . . . 377
6.4.1 Blumens Bivariate Sign Test . . . . . . . . . . . . . . . . . . . . . . 377
6.4.2 Ane Invariant Sign Tests in the Multivariate Case . . . . . . . . . . 379
6.4.3 The Oja Criterion Function . . . . . . . . . . . . . . . . . . . . . . . 387
6.4.4 Additional Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
6.5 Robustness of Multivariate Estimates of Location . . . . . . . . . . . . . . . 392
6.5.1 Location and Scale Invariance: Componentwise Methods . . . . . . . 392
6.5.2 Rotation Invariance: Spatial Methods . . . . . . . . . . . . . . . . . . 392
6.5.3 The Spatial Hodges-Lehmann Estimate . . . . . . . . . . . . . . . . . 394
6.5.4 Ane Equivariant Spatial Median . . . . . . . . . . . . . . . . . . . . 394
6.5.5 Ane Equivariant Oja Median . . . . . . . . . . . . . . . . . . . . . 394
6.6 Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
6.6.1 Test for Regression Eect . . . . . . . . . . . . . . . . . . . . . . . . 397
6.6.2 The Estimate of the Regression Eect . . . . . . . . . . . . . . . . . 404
6.6.3 Tests of General Hypotheses . . . . . . . . . . . . . . . . . . . . . . . 405
6.7 Experimental Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
6.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
A Asymptotic Results 421
A.1 Central Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
A.2 Simple Linear Rank Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 422
A.2.1 Null Asymptotic Distribution Theory . . . . . . . . . . . . . . . . . . 423
A.2.2 Local Asymptotic Distribution Theory . . . . . . . . . . . . . . . . . 424
A.2.3 Signed-Rank Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 431
A.3 Results for Rank-Based Analysis of Linear Models . . . . . . . . . . . . . . . 433
A.3.1 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
A.3.2 Asymptotic Linearity and Quadraticity . . . . . . . . . . . . . . . . . 437
A.3.3 Asymptotic Distance Between

and

. . . . . . . . . . . . . . . . . 439
A.3.4 Consistency of the Test Statistic F

. . . . . . . . . . . . . . . . . . . 440
A.3.5 Proof of Lemma 3.5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 442
A.4 Asymptotic Linearity for the L
1
Analysis . . . . . . . . . . . . . . . . . . . . 443
A.5 Inuence Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
A.5.1 Inuence Function for Estimates Based on Signed-Rank Statistics . . 447
A.5.2 Inuence Functions for Chapter 3 . . . . . . . . . . . . . . . . . . . . 448
A.5.3 Inuence Function of

HBR
of Chapter 5 . . . . . . . . . . . . . . . . 454
A.6 Asymptotic Theory for Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . 455
B Larger Data Sets 465
viii CONTENTS
Preface
I dont believe I can really do without teaching. The reason is, I have to have something
so that when I dont have any ideas and Im not getting anywhere I can say to myself,
At least Im living; at least Im doing something; Im making some contribution-its just
psychological.
Richard Feynman
We are currently revising these notes. Any corrections and/or comments are welcome.
This book is based on the premise that nonparametric or rank based statistical methods
are a superior choice in many data analytic situations. We cover location models, regres-
sion models including designed experiments, and multivariate models. Geometry provides
a unifying theme throughout much of the development. We emphasize the similarity in
interpretation with least squares methods. Basically, we replace the Euclidean norm with
a weighted L-1 norm. This results in rank based methods or L-1 methods depending on
the choice of weights. The rank-based methods proceed much like the traditional analy-
sis. Using the norm, models are easily tted. Diagnostics procedures can then be used to
check the quality of t (model criticism) and to locate outlying points and points of high
inuence. Upon satisfaction with the t, rank-based inferential procedures can be used to
conduct the statistical analysis. The benets include signicant gains in power and eciency
when the error distribution has tails heavier than those of a normal distribution and superior
robustness properties in general.
The main text concentrates on Wilcoxon and L-1 methods. The theoretical develop-
ment for general scores (weights) is contained in the Appendix. By restricting attention to
Wilcoxon rank methods, we can recommend a unied approach to data analysis beginning
with the simple location models and extending through complex regression models and de-
signed experiments. All major methodology is illustrated on real data. The examples are
intended as guides for the application of the rank and L-1 methods. Furthermore, all the data
sets in this book can be obtained from the web site: http://www.stat.wmich.edu/home.html.
Selected topics from the rst four chapters provide a basic graduate course in rank based
methods. The prerequisites are an introductory course in mathematical statistics and some
background in applied statistics. The rst seven sections of Chapter 1 and the rst four
sections of Chapter 2 are fundamental for the development of Wilcoxon signed rank and
Mann-Whitney-Wilcoxon rank sum methods in the one- and two-sample location models. In
ix
x PREFACE
Chapter 3, on the linear model, sections one through seven and section nine present the basic
material for estimation, testing and diagnostic procedures for model criticism. Sections two
through four of Chapter 4 give extensive development of methods for the one- and two-way
layouts. Then, depending on individual tastes, there are several more exotic topics in each
chapter to choose from.
Chapters 5 and 6 contain more advanced material. In Chapter 5 we extend rank based
methods for a linear model to bounded inuence, high breakdown estimates and tests. In
Chapter 6 we take up the concept of multidimensional rank. We then discuss various ap-
proaches to the development of rank-like procedures that satisfy various invariant/equivariant
restrictions.
Computation of the procedures discussed in this book is very important. Minitab contains
an undocumented RREG (rank regression) command. It contains various subcommands that
allow for testing and estimation in the linear model. The reader can contact Minitab at (put
email address or web page address here) and request a technical report that describes the
RREG command. In many of the examples of this book the package rglm is used to obtain
the rank-based analyses. The basic algorithms behind this package are described in Chapter
3. Information (including online rglm analyses of examples) can be obtained from the web
site: http://www.stat.wmich.edu/home.html. Students can also be encouraged to write their
own S-plus functions for specic methods.
We are indebted to many of our students and colleagues for valuable discussions, stim-
ulation, and motivation. In particular, the rst author would like to express his sincere
thanks for many stimulating hours of discussion with Steve Arnold, Bruce Brown, and Hannu
Oja while the second author wants to express his sincere thanks for discussions with John
Kapenga, Joshua Naranjo, Jerry Sievers, and Tom Vidmar. We both would like to express
our debt to Simon Sheather, our friend, colleague, and co-author on many papers. Finally,
we would like to thank Jun Recta for assistance in creating several of the plots.
Tom Hettmansperger
Joe McKean
July 2008
State College, PA
Kalamazoo, MI
Chapter 1
One Sample Problems
1.1 Introduction
Traditional statistical procedures are widely used because they oer the user a unied
methodology with which to attack a multitude of problems, from simple location prob-
lems to highly complex experimental designs. These procedures are based on least squares
tting. Once the problem has been cast into a model then least squares oers the user:
1. a way of tting the model by minimizing the Euclidean normed distance between the
responses and the conjectured model;
2. diagnostic techniques that check the adequacy of the t of the model, explore the
quality of t, and detect outlying and/or inuential cases;
3. inferential procedures, including condence procedures, tests of hypotheses and multi-
ple comparison procedures;
4. computational feasibility.
Procedures based on least squares, though, are easily impaired by outlying observations.
Indeed one outlying observation is enough to spoil the least squares t, its associated di-
agnostics and inference procedures. Even though traditional inference procedures are exact
when the errors in the model follow a normal distribution, they can be quite inecient when
the distribution of the errors has longer tails than the normal distribution.
For simple location problems, nonparametric methods were proposed by Wilcoxon (1945).
These methods consist of test statistics based on the ranks of the data and associated esti-
mates and condence intervals for location parameters. The test statistics are distribution
free in the sense that their null distributions do not depend on the distribution of the errors.
It was soon realized that these procedures are almost as ecient as the traditional methods
when the errors follow a normal distribution and, furthermore, are often much more ecient
relative to the traditional methods when the error distributions deviate from normality; see
Hodges and Lehmann (1956). These procedures possess both robustness of validity and
1
2 CHAPTER 1. ONE SAMPLE PROBLEMS
power. In recent years these nonparametric methods have been extended to linear and non-
linear models. In addition, from the perspective of modern robustness theory, contrary to
least squares estimates, these rank-based procedures have bounded inuence functions and
positive breakdown points.
Often these nonparametric procedures are thought of as disjoint methods that dier from
one problem to another. In this text, we intend to show that this is not the case. Instead,
these procedures present a unied methodology analogous to the traditional methods. The
four items cited above for the traditional analysis hold for these procedures too. Indeed the
only operational dierence is that the Euclidean norm is replaced by another norm.
There are computational procedures available for the rank-based procedures discussed
in this book. We oer the reader a collection of computational functions written in the
software language R at the site http://www.stat.wmich.edu/mckean/Rfuncs/ . We refer
to these computational algorithms as rank-based R algorithms or RBR. We discuss these
functions throughout the text and use them in many of the examples, simulation studies,
and exercises. The programming language R (see Ihaka, R. and Gentleman, R., 1996) is
freeware and can run on all (PC, Mac, Linux) platforms. To download the R software and
accompanying information, visit the site http://www.r-project.org/. The language R has
intrinsic functions for computation of some of the procedures discussed in this and the next
chapter.
1.2 Location Model
In this chapter we will consider the one sample location problem. This will allow us to explore
some useful concepts such as distribution freeness and robustness in a simple setting. We
will extend many of these concepts to more complicated situations in later chapters. We
need to rst dene a location parameter. For a random variable X we often subscript its
distribution function by X to avoid confusion.
Denition 1.2.1. Let T(H) be a function dened on the set of distribution functions. We
say T(H) is a location functional if
1. If G is stochastically larger than F (ie.(G(x) F(x)) for all x, then T(G) T(F);
2. T(H
aX+b
) = aT(H
X
) + b, a > 0;
3. T(H
X
) = T(H
X
).
Then, we will call = T(H) a location parameter of H.
Note that if X has location parameter it follows from the second item in the above
denition that the random variable e = X has location parameter 0. Suppose X
1
, . . . , X
n
is a random sample having the common distribution function H(x) and = T(H) is a
location parameter of interest. We express this by saying that X
i
follows the statistical
location model,
X
i
= + e
i
, i = 1, . . . , n , (1.2.1)
1.2. LOCATION MODEL 3
where e
1
, . . . , e
n
are independent and identically distributed random variable with distri-
bution function F(x) and density function f(x) and location T(F) = 0. It follows that
H(x) = F(x ) and that T(H) = . We next discuss three examples of location param-
eters that we will use throughout this chapter. Other location parameters are discussed in
Section 1.8. See Bickel and Lehmann (1975) for additional discussion of location functionals.
Example 1.2.1. The Median Location Functional
First dene the inverse of the cdf H(x) by H
1
(u) = infx : H(x) u. Generally we
will suppose that H(x) is strictly increasing on its support and this will eliminate ambiguities
on the selection of the parameter. Now dene
1
= T
1
(H) = H
1
(1/2). This is the median
functional. Note that if G(x) F(x) for all x, then G
1
(u) F
1
(u) for all u; and, in
particular, G
1
(1/2) F
1
(1/2). Hence, T
1
(H) satises the rst condition for a location
functional. Next let H

(x) = P(aX + b x) = H[a

1
(x b)]. Then it follows at once that
H
1
(u) = aH
1
(u) + b and the second condition is satised. The third condition follows
with an argument similar to the the one for the second condition.
Example 1.2.2. The Mean Location Functional
For the mean functional let
2
= T
2
(H) =
_
xdH(x), when the mean exists. Note that
_
xdH(x) =
_
H
1
(u)du. Now if G(x) F(x) for all x, then x G
1
(F(x)). Let x =
F
1
(u) and we have F
1
(u) G
1
(F(F
1
(u)) G
1
(u). Hence, T
2
(G) =
_
G
1
(u)du
_
F
1
(u)du = T
2
(F) and the rst condition is satised. The other two conditions follow
easily from the denition of the integral.
Example 1.2.3. The Pseudo-Median Location Functional
Assume that X
1
and X
2
are independent and identically distributed, (iid), with distri-
bution function H(x). Let Y = (X
1
+ X
2
)/2. Then Y has distribution function H

(y) =
P(Y y) =
_
H(2yx)h(x)dx. Let
3
= T
3
(H) = H
1
(1/2). To show that T
3
is a location
functional, suppose G(x) F(x) for all x. Then
G

(y) =
_
G(2y x)g(x) dx =
_ __
2yx

g(t) dt
_
g(x) dx
_ __
2yx

f(t) dt
_
g(x) dx
=
_ __
2yt

g(x) dt
_
f(t) dx
_ __
2yt

f(x) dt
_
f(t) dx = F

(y) ;
hence, as in Example 1.2.1, it follows that G
1
(u) F
1
(u) and, hence, that T
3
(G)
T
3
(F). For the second property, let W = aX + b where X has distribution function H and
a > 0. Then W has distribution function F
W
(t) = H((t b)/a). Then by the change of
variable z = (x b)/a, we have
F

W
(y) =
_
H
_
2y x b
a
_
1
a
h
_
x b
a
_
dx =
_
H
_
2
y b
a
z
_
h(z) dz .
4 CHAPTER 1. ONE SAMPLE PROBLEMS
Thus the dening equation for T
3
(F
W
) is
1
2
=
_
H
_
2
T
3
(F
W
) b
a
z
_
h(z) dz ,
which is satised for T
3
(F
W
) = aT
3
(H) + b. For the third property, let V = X where X
has distribution function H. Then V has distribution function F
V
(t) = 1 H(t). Hence,
by the change in variable z = x,
F

V
(y) =
_
(1 H(2y + x))h(x) dx = 1
_
H(2y z))h(z) dz .
Because the dening equation of T
3
(F
V
) can be written as
1
2
=
_
H(2(T
3
(F
V
)) z)h(z) dz ,
it follows that T
3
(F
V
) = T
3
(H). Therefore, T
3
is a location functional. It has been called
the pseudo-median by Hoyland (1965) and is more appropriate for symmetric distributions.
The next theorem characterizes all the location functionals for a symmetric distribution.
Theorem 1.2.1. Suppose that the pdf h(x) is symmetric about some point a. If T(H) is a
location functional, then T(H) = a.
Proof. Let the random variable X have pdf h(x) symmetric about a. Let Y = X a, then
Y has pdf g(y) = h(y +a) symmetric about 0. Hence Y and Y have the same distribution.
By the third property of location functionals, this means that T(G
Y
) = T(G
Y
) = T(G
Y
);
i.e, T(G
Y
) = 0. But by the second property, 0 = T(G
Y
) = T(H) a; that is , a = T(H).
This theorem means that when we sample from a symmetric distribution we can unam-
biguously dene location as the center of symmetry. Then all location functionals that we
may wish to study will specify the same location parameter.
1.3 Geometry and Inference in the Location Model
Letting X = (X
1
, . . . , X
n
)

and e = (e
1
, . . . , e
n
)

, we then write the statistical location model,

( 1.2.1), as,
X = 1 +e , (1.3.1)
where 1 denotes the vector all of whose components are 1 and T(F
e
) = 0. If
F
denotes the
one-dimensional subspace spanned by 1, then we can express the model more compactly as
X = + e, where
F
. The subscript F on stands for full model in the context of
hypothesis testing as discussed below.
Let x be a realization of X. Note that except for random error, x would lie in
F
. Hence
an intuitive tting criteria is to estimate by a value

such that the vector 1

F
lies
1.3. GEOMETRY AND INFERENCE IN THE LOCATION MODEL 5
closest to x, where closest is dened in terms of a norm. Furthermore, a norm, as the
following general discussion shows, provides a complete inference for the parameter .
Recall that a norm is a nonnegative function, | |, dened on 1
n
such that |y| 0 for
all y; |y| = 0 if and only if y = 0; |ay| = [a[|y| for all real a; and |y + z| |y| +|z|.
The distance between two vectors is d(z, y) = |z y|.
Given a location model, ( 1.3.1), and a specied a norm, | |, the estimate of induced
by the norm is

= argmin|x 1| , (1.3.2)
i.e., the value which minimizes the distance between x and the space
F
. As discussed in
Exercise 1.12.1, a minimizing value always exists. The dispersion function induced by the
norm is given by,
D() = |x 1| . (1.3.3)
The minimum distance between the vector of observations x and the space
F
is D(

).
As Exercise 1.12.3 shows, D() is a convex, continuous function of which is dierentiable
almost everywhere. Actually the norms discussed in this book are dierentiable at all but
at most a nite number of points. We dene the gradient process by the function
S() =
d
d
D() . (1.3.4)
As Exercise 1.12.3, shows, S() is a nonincreasing function. Its discontinuities are the points
where D() is nondierentiable. Furthermore the minimizing value is a value where S() is
0 or, due to a discontinuity, steps through 0. We express this by saying that

solves the
equation
S(

)
.
= 0 . (1.3.5)
Suppose we can represent the above estimate by

=

(x) =

(H
n
), where H
n
denotes
the empirical distribution function of the sample. The notation

(H
n
) is suggestive of the
functional notation used in the last section. This is as it should be, since it is easy to
show that

satises the sample analogues of properties (2) and (3) of Denition 1.2.1. For
property (2), consider the estimating equation of the translated sample y = ax + 1b, for
a > 0, given by

(y) = argmin|y 1| = a argmin

_
_
_
_
x 1
b
a
_
_
_
_
.
From this we immediaitely have that

(y) = a

(x) + b. For property (3), the dening

equation for the sample y = x is

(y) = argmin|y 1| = argmin|x 1()| .

From which we have

(y) =

(x). Furthermore, for the norms considered in this book it

is easy to check that

(H
n
)

(G
n
) when H
n
and G
n
are empirical cdfs for which H
n
is
stochastically larger than G
n
. Hence, the norms generate location functionals on the set of
6 CHAPTER 1. ONE SAMPLE PROBLEMS
empirical cdfs. The L
1
norm provides an easy example. We can think of

(H
n
) = H
1
n
(
1
2
)
as the restriction of (H) = H
1
(
1
2
) to the class of discrete distributions which assign mass
1/n to n points. Generally we can think of

(H
n
) as the restriction of (H) or, conversely,
we can think of (H) as the extension of

(H
n
). We let the norm determine the location.
This is especially simple in the symmetric location model where all location functionals are
equal to the point of symmetry.
Next consider the hypotheses,
H
0
: =
0
versus H
A
: ,=
0
, (1.3.6)
for a specied
0
. Because of the second property of location functionals in Denition 1.2.1,
we can assume without loss of generality that
0
= 0; otherwise we need only subtract
0
from each X
i
. Based on the data, the most acceptable value of is the value at which the
gradient S() is zero. Hence large values of [S(0)[ favor H
A
. Formally the level gradient
test or score test for the hypotheses ( 1.3.6) is given by
Reject H
0
in favor of H
A
if [S(0)[ c , (1.3.7)
where c is such that P
0
[[S(0)[ c] = . Typically, the null distribution of S(0) is symmetric
so there is no loss in generality in considering symmetrical critical regions.
A second formulation of a test statistic is based on the dierence in minimizing dispersions
or the reduction in dispersion. Call Model 1.2.1 the full model. As noted above, the distance
between x and the subspace
F
is D(

). The reduced model is the full model subject

to H
0
. In this case the reduced model space is 0. Hence the distance between x and
the reduced model space is D(0). Under H
0
, x should be close to this space; therefore, the
reduction in dispersion test is given by
Reject H
0
in favor of H
A
if RD = D(0) D(

) m , (1.3.8)
where m is determined by the null distribution of RD. This test will be used in Chapter 3
and subsequent chapters.
A third formulation is based on the standardized estimate:
Reject H
0
in favor of H
A
if
|
b
|

Var
b

, (1.3.9)
where is determined by the null distribution of

. Tests based directly on the estimate are
often referred to as Wald type tests.
The following useful theorem allows us to shift between computing probabilities when
= 0 and for general . Its proof is a straightforward application of a change of variables.
See Theorem A.2.4 of the Appendix for a more general result.
Theorem 1.3.1. Suppose that we can write S() = S(x
1
, . . . , x
n
). Then P

(S(0)
t) = P
0
(S() t).
1.3. GEOMETRY AND INFERENCE IN THE LOCATION MODEL 7
We now turn to the problem of the construction of a (1 )100% condence interval
for based on S(). Such an interval is easily obtained by inverting the acceptance region
of the level test given by ( 1.3.7). The acceptance region is [ S(0) [< c. Dene

L
= inft : S(t) < c and

U
= supt : S(t) > c. (1.3.10)
Then because S() is nonincreasing,
:[ S() [< c = :

U
. (1.3.11)
Thus from Theorem 1.3.1,
P

U
) = P

([ S() [< c) = P
0
([ S(0) [< c) = 1 . (1.3.12)
Hence, inverting a size test results in the (1 )100% condence interval (

L
,

U
).
Thus a norm not only provides a tting criterion but also a complete inference. As
with all statistical analyses, checks on the appropriateness of the model and the quality of
t are needed. Useful plots here include: stem-leaf plots and q q plots to check shape
and distributional assumptions, boxplots and dotplots to check for outlying observations,
and a plot of X
i
versus i (or other appropriate variables) to check for dependence between
observations. Some of these diagnostic checks are performed in the the next section of
numerical examples.
In the next three examples, we discuss the inference for the norms associated with the
location functionals presented in the last section. We state the results of their associated
inference, which we will derive in later sections.
Example 1.3.1. L
1
-Norm
Recall that the L
1
norm is dened as |x|
1
=

[ x
i
[, hence the associated dispersion
and negative gradient functions are given respectively by D
1
() =

[ X
i
[ and S
1
() =

sgn(X
i
). Letting H
n
denote the empirical cdf, we can write the estimating equation
as
0 = n
1

sgn(x
i
) =
_
sgn(x )dH
n
(x) .
The solution, of course, is

the median of the observations. If we replace the empirical cdf
H
n
by the true underlying cdf H then the estimating equation becomes the dening equation
for the parameter = T(H). In this case, we have
0 =
_
sgn(x T(H))dH(x) =
_
T(H)

dH(x) +
_

T(H)
dH(x) ;
hence, H(T(H)) = 1/2 and solving for T(H) we nd T(H) = H
1
(1/2) as expected.
8 CHAPTER 1. ONE SAMPLE PROBLEMS
As we show in Section 1.5,

has an asymptotic N(,

2
S
/n) distribution , (1.3.13)
where
s
= 1/(2h()). Estimation of the standard deviation of

is discussed in Section 1.5.
Turning next to testing the hypotheses ( 1.3.6), the gradient test statistic is S
1
(0) =

sgn(X
i
). But we can write, S
1
(0) = S
+
1
S

1
+ S
0
1
where S
+
1
=

I(X
i
> 0), S

1
=

I(X
i
< 0), and S
0
1
=

I(X
i
= 0) = 0, with probability one since we are sampling
from a continuous distribution, and I() is the indicator function. In practice, we must deal
with ties and this is usually done by setting aside those observations that are equal to the
hypothesized value and carrying out the test with a reduced sample size. Now note that
n = S
+
1
+S

1
so that we can write S
1
= 2S
+
1
n and the test can be based on S
+
1
. The null
distribution of S
+
1
is binomial with parameters n and 1/2. Hence the level sign test of
the hypotheses ( 1.3.6) is
Reject H
0
in favor of H
A
if S
+
1
c
1
or S
+
1
n c
1
, (1.3.14)
and c
1
satises
P[bin(n, 1/2) c
1
] = /2 , (1.3.15)
where bin(n, 1/2) denotes a binomial random variable based on n trials and with proba-
bility of success 1/2. Note that the critical value of the test can be determined without
specifying the shape of F. In this sense, the test based on S
1
is distribution free or
nonparametric. Using the asymptotic null distribution of S
+
1
, c
1
can be approximated as
c
1
.
= n/2 n
1/2
z
/2
/2 .5 where (z
/2
) = /2; (.) is the standard normal cdf, and .5 is
the continuity correction.
For the associated (1 )100% condence interval, we follow the general development
above, ( 1.3.12). Hence, we must nd

L
= inft : S
+
1
(t) < n c
1
, where c
1
is given by
( 1.3.15). Note that S
+
1
(t) < n c
1
if and only if the number of X
i
greater than t is less
than n c
1
. But #i : X
i
> X
(c
1
+1)
= n c
1
1 and #i : X
i
> X
(c
1
+1)
n c
1
for any > 0. Hence,

L
= X
(c
1
+1)
. A similar argument shows that

U
= X
(nc
1
)
. We can
summarize this by saying that the (1 )100% L
1
condence interval is the half open, half
closed interval
[X
(c
1
+1)
, X
(nc
1
)
) where /2 = P(S
+
1
(0) c
1
) determines c
1
. (1.3.16)
The critical value c
1
can be determined from the binomial(n, 1/2) distribution or from the
normal approximation cited above. The interval developed here is a distribution-free con-
dence interval since the condence coecient is determined from the binomial distribution
without making any shape assumption on the underlying model distribution.
Example 1.3.2. L
2
-Norm
Recall that the square of the L
2
-norm is given by |x|
2
2
=

n
i=1
x
2
i
. As shown in Exercise
1.12.4, the estimate determined by this norm is the sample mean X and the functional
1.3. GEOMETRY AND INFERENCE IN THE LOCATION MODEL 9
parameter is =
_
xh(x) dx, provided it exists. Hence the L
2
norm is consistent for the
mean location problem. The associated test statistic is equivalent to Students t-test. The
approximate distribution of X is N(0,
2
/n), provided the variance
2
= VarX
1
exists.
Hence, the test statistic is not distribution free. In practice, is replaced by its estimate s =
(

(X
i
X)
2
/(n 1))
1/2
and the test is based on the t-ratio, t =

nX/s, which, under the

null hypothesis, is asymptotically N(0, 1). The usual condence interval is Xt
/2,n1
s/

n,
where t
/2,n1
is the (1 /2)-quantile of a t-distribution with n 1 degrees of freedom.
This interval has the approximate condence coecient (1 )100%, unless the errors, e
i
,
follow a normal distribution in which case it has exact condence.
Example 1.3.3. Weighted L
1
Norm
Consider the function
|x|
3
=
n

i=1
R([x
i
[)[x
i
[ , (1.3.17)
where R([x
i
[) denotes the rank of [x
i
[ among [x
1
[, . . . , [x
n
[. As the next theorem shows this
function is a norm on 1
n
. See Section 1.8 for a general weighted L
1
norm.
Theorem 1.3.2. The function |x|
3
=

j[x[
(j)
=

R([x
j
[)[x
j
[ is a norm, where R([x
j
[) is
the rank of [x
j
[ among [x
1
[, . . . , [x
n
[ and [x[
(1)
[x[
(n)
are the ordered absolute values.
Proof. The equality relating |x|
3
to the ranks is clear. To show that we have a norm, we
rst note that |x|
3
0 and that |x|
3
= 0 if and only if x = 0. Also clearly |ax|
3
= [a[|x|
3
for any real a. Hence, to nish the proof, we must verify the triangle inequality. Now
|x+y|
3
=

j[x+y[
(j)
=

R([x
i
+y
j
[)[x
i
+y
j
[

R([x
i
+y
j
[)[x
i
[+

R([x
i
+y
j
[)[y
j
[ .
(1.3.18)
Consider the rst term on the right side. By summing through another index we can write
it as,

R([x
i
+ y
j
[)[x
i
[ =

b
j
[x[
(j)
,
where b
1
, . . . , b
n
is a permutation on the integers 1, . . . , n. Suppose b
j
is not in order, then
there exists a t and a s such that [x[
(t)
[x[
(s)
but b
t
> b
s
. Whence,
[b
s
[x[
(t)
+ b
t
[x[
(s)
] [b
t
[x[
(t)
+ b
s
[x[
(s)
] = (b
t
b
s
)([x[
(s)
[x[
(t)
) 0 .
Hence such an interchange never decreases the sum. This leads to the result,

R([x
i
+ y
j
[)[x
i
[

j[x[
(j)
,
A similar result holds for the second term on the right side of ( 1.3.18). Therefore, |x+y|
3

j[x[
(j)
+

j[y[
(j)
= |x|
3
+ |y|
3
, and, this completes the proof. The above argument is
taken from Hardy, Littlewood, and Polya (1952).
10 CHAPTER 1. ONE SAMPLE PROBLEMS
We shall call this norm the weighted L
1
Norm. In the next theorem, we oer an
interesting identity satised by this norm. First, though, we need another representation
of it. For a random sample X
1
, . . . , X
n
, dene the anti-ranks to be the random variables
D
1
, . . . , D
n
such that
Z
1
= [X
D
1
[ . . . Z
n
= [X
Dn
[ . (1.3.19)
For example, if D
1
= 2 then [X
2
[ is the smallest absolute value and Z
1
has rank 1. Note
that the anti-rank function is just the inverse of the rank function. We can then write
|x|
3
=
n

i=j
j[x[
(j)
=
n

j=1
j[x
D
j
[ . (1.3.20)
Theorem 1.3.3. For any vector x,
|x|
3
=

x
i
+ x
j
2

i<j

x
i
x
j
2

. (1.3.21)
Proof: Letting the index run through the anti-ranks, we have

x
i
+ x
j
2

i<j

x
i
x
j
2

=
n

i=1
[x
i
[ +

i<j
_

x
D
i
+ x
D
j
2

x
D
j
x
D
i
2

_
.
(1.3.22)
For i < j, hence [x
D
i
[ [x
D
j
[, consider the expression,

x
D
i
+ x
D
j
2

x
D
j
x
D
i
2

.
There are four cases to consider: where x
D
i
and x
D
j
are both positive; where they are both
negative; and the two cases where they have mixed signs. In all these cases, though, it is
easy to show that

x
D
i
+ x
D
j
2

x
D
j
x
D
i
2

= [x
D
j
[ .
Using this, we have that the right side of expression ( 1.3.22) is equal to:
n

i=1
[x
i
[ +

i<j
[x
D
j
[ =
n

j=1
[x
D
j
[ +
n

j=1
(j 1)[x
D
j
[ =
n

j=1
j[x
D
j
[ = |x|
3
, (1.3.23)
and we are nished.
The associated gradient function is
T() =
n

i=1
R([X
i
[)sgn(X
i
) =

ij
sgn
_
X
i
+ X
j
2

_
. (1.3.24)
1.3. GEOMETRY AND INFERENCE IN THE LOCATION MODEL 11
The middle term is due to the fact that the ranks only change values at the nite number
of points determined by [X
i
[ = [X
j
[; otherwise R([X
i
[) is constant. The third
term is obtained immediately from the identity ( 1.3.21). The n(n + 1)/2 pairwise averages
(X
i
+ X
j
)/2 : 1 i j n are called the Walsh averages. Hence, the estimate of is
the median of the Walsh averages, which we shall denote as,

3
= med
ij
_
X
i
+ X
j
2
_
, (1.3.25)
rst discussed by Hodges and Lehmann (1963). Often

3
is called the Hodges-Lehmann
estimate of location. In order to obtain the corresponding location functional, note that
R([X
i
[) = #[X
j
[ [X
i
[ = # [X
i
[ X
j
+[X
i
[
= nH
n
( +[X
i
[) nH

n
( [X
i
[) ,
where H

n
is the left limit of H
n
. Hence (1.3.24) becomes
_
H
n
( +[x [) H

n
( [x [)sgn(x ) dH
n
(x) = 0 ,
and in the limit we have,
_
H( +[x [) H( [x [)sgn(x ) dH(x) = 0 ,
that is,

H(2 x) H(x) dH(x) +

H(x) H(2 x) dH(x) = 0 .

This simplies to
_

H(2 x) dH(x) =
_

H(2 x) dH(x) =
1
2
, (1.3.26)
Hence, the functional is the pseudo-median dened in Example 1.2.3. If the density h(x) is
symmetric then from ( 1.7.11)

3
has an approximate N(
3
,
2
/n) distribution , (1.3.27)
where = 1/(

12
_
h
2
(x) dx). Estimation of is discussed in Section 3.7.
The most convenient form of the gradient process is
T
+
() =

ij
I
_
X
i
+ X
j
2
>
_
=
n

i=1
R([X
i
[)I(X
i
> ) . (1.3.28)
12 CHAPTER 1. ONE SAMPLE PROBLEMS
The corresponding gradient test statistic for the hypotheses ( 1.3.6) is T
+
(0). In Section
1.7, provided that h(x) is symmetric, it is shown that T
+
(0) is distribution free under H
0
with null mean and variance n(n + 1)/4 and n(n + 1)(2n + 1)/24, respectively. This test
is often referred to as the Wilcoxon signed-rank test. Thus the test for the hypotheses
( 1.3.6) is
Reject H
0
in favor of H
A
, if T
+
(0) k or T
+
(0)
n(n+1)
2
k , (1.3.29)
where P(T
+
(0) k) = /2. An approximation for k is given in the next paragraph.
Because of the similarity between the sign and signed-rank processes, the condence
interval based on T
+
() follows immediately from the argument given in Example 1.3.1 for
the sign process. Instead of the order statistics which were used in the condence interval
based on the sign process, in this case we use the ordered Walsh averages, which we denote
as W
(1)
, . . . , W
(n(n+1)/2)
. Hence a (1 )100% condence interval for is given by
[W
(k+1)
, W
((n(n+1)/2)k)
) where k is such that /2 = P(T
+
(0) k) . (1.3.30)
As with the sign process, k can be approximated using the asymptotic normal distribution
of T
+
(0) by
k
.
=
n(n + 1)
4
z
/2
_
n(n + 1)(2n + 1)
24
.5 ,
where z
/2
is the (1 /2)-quantile of the standard normal distribution. Provided that h(x)
is symmetric, this condence interval is distribution free.
1.3.1 Computation
The three procedures discussed in this section are easily computed in R. The R intrin-
sic functions t.test and wilcoxon.test compute the t- and Wilcoxon-signed-rank tests,
respectively. Our collection of R functions, RBR, contain the functions onesampwil and
onesampsgn which compute the asymptotic versions of the Wilcoxon-signed-rank and sign
tests, respectively. These functions also compute the associated estimates, condence inter-
vals and standard errors. Their use is discussed in the examples. Minitab (see ??) also can
be used to compute these tests. At command line, the Minitab commands stest, wtest,
and ttest compute the sign, Wilcoxon-signed-rank, and t-tests, repsectively.
1.4 Examples
In applications by convention, when testing the null hypothesis H
0
: =
0
using the sign
test, any data point equal to
0
is set aside and the sample size is reduced. On the other
hand, these values are not set aside for point estimation or condence intervals. The output
of the RBR functions onesampwil and onesampsgn includes the test statistics T and S,
respectively, and a continuity corrected standardized value z. The p-values are approximated
1.4. EXAMPLES 13
Table 1.4.1: Excess hours of sleep under the inuence of two drugs and the dierence in
excesses.
Row Dextro Laevo Di(L-D)
1 -0.1 -0.1 0.0
2 0.8 1.6 0.8
3 3.4 4.4 1.0
4 0.7 1.9 1.2
5 -0.2 1.1 1.3
6 -1.2 0.1 1.3
7 2.0 3.4 1.4
8 3.7 5.5 1.8
9 -1.6 0.8 2.4
10 0.0 4.6 4.6
by computing normal probabilities on z. Especially for small sample sizes, for the test based
on the signs, S, the approximate and exact p-values can be somewhat dierent. In calculating
the signed-ranks for the test statistic T, we use average ranks. For t-tests, we report the the
p-values and condence intervals using the t-distribution with n 1 degrees of freedom.
Example 1.4.1. Cushney-Peebles Data.
The data given in Table 1.4.1 gives the average excess number of hours of sleep that each
of 10 patients achieved from the use of two drugs. The third column gives the dierence
(Laevo-Dextro) in excesses across the two drugs. This is a famous data set. Gosset, writing
under the pseudonym Student, published his landmark paper on the t-test in 1908 and used
this data set for illustration. The dierences, however, suggests that the L
2
methods may
not be the methods of choice in this case. The normal quantile plot, Panel A of Figure 1.4.1,
shows that the tails may be heavy and that there may be an outlier. A normal quantile
plot has the data (dierences) on the vertical axis and the expected values of the standard
normal order statistics on the horizontal axis. When the data is consistent with a normal
assumption, the plot should be roughly linear. The boxplot, with 95% L
1
condence interval,
Panel B of Figure 1.4.1, further illustrates the presence of an outlier. The box is dened by
the quartiles and the shaded notch represents the condence interval.
For the sake of discussion and comparison of methods, we provide the p-values for the sign
test, the Wilcoxon signed rank test, and the t-test. We used the RBR functions onesampwil,
onesampsgn, and onesampt to compute the results for the Wilcoxon signed rank test, the sign
test, and the t-test, respectively. For each function, the following display shows the necessary
R code (these are preceded with the prompt >) to compute these functions, which is then
followed by the results. The standard errors (SE) for the sign and signed-rank estimates are
given by (1.5.28) and (1.7.12), respectively. in general in Section 1.5.5. These functions also
produce a boxplot of the data. The boxplot produced by the function onesampsgn is shown
in Figure 1.4.1.
14 CHAPTER 1. ONE SAMPLE PROBLEMS
Figure 1.4.1: Panel A: Normal qq Plot of Cushney-Peebles Data; Panel B: Boxplot with 95%
notched condence interval; Panel C: Sensitivity Curve for t-test; and Panel D: Sensitivity
Curve for sign test
*
*
*
*
* *
*
*
*
*
1.0 0.5 0.0 0.5 1.0
0
1
2
3
4
Normal quantiles
D
i
f
f
e
r
e
n
c
e
:

L
a
e
v
o

D
e
x
t
r
o
Panel A
0
1
2
3
4
D
i
f
f
e
r
e
n
c
e
:

L
a
e
v
o

D
e
x
t
r
o
Panel B
10 5 0 5 10
0
1
2
3
4
5
6
Value of 10th difference
t

t
e
s
t
Panel C
10 5 0 5 10
2
.
2
2
.
3
2
.
4
2
.
5
2
.
6
2
.
7
2
.
8
Value of 10th difference
S
t
a
n
d
a
r
d
i
z
e
d

s
i
g
n

t
e
s
t
Panel D
Assumes that the differences are in the vector diffs
> onesampwil(diffs)
Results for the Wilcoxon-Signed-Rank procedure
Test of theta = 0 versus theta not equal to 0
Test-Stat. is T 54 Standardized (z) Test-Stat. is 2.70113 p-vlaue 0.00691043
Estimate 1.3 SE is 0.484031
95 % Confidence Interval is ( 0.9 , 2.7 )
Estimate of the scale parameter tau 1.530640
1.4. EXAMPLES 15
> onesampsgn(diffs)
Results for the Sign procedure
Test of theta = 0 versus theta not equal to 0
Test stat. S is 9 Standardized (z) Test-Stat. 2.666667 p-vlaue 0.007660761
Estimate 1.3 SE is 0.4081708
95 % Confidence Interval is ( 0.8 , 2.4 )
Estimate of the scale parameter tau 1.290749
> temp=onesampt(diffs)
Results for the t-test procedure
Test of theta = 0 versus theta not equal to 0
Test stat. Ave(x) - 0 is 1.58 Standardized (t) Test-Stat. 4.062128 p-vlaue 0.00283289
Estimate 1.58 SE is 0.3889587
95 % Confidence Interval is ( 0.7001142 , 2.459886 )
Estimate of the scale parameter sigma 1.229995
The condence interval corresponding to the sign test is (0.8, 2.4) which is shifted above
0. Hence, there is strong support for the alternative hypothesis that the location of the
dierence distribution is not equal to zero. That is, we reject H
0
: = 0 in favor of
H
A
: ,= 0 at = .05. All three tests support this conclusion. The estimates of location
corresponding to the three tests are the median (1.3), the median of the Walsh averages
(1.3), and the mean of the sample dierences (1.58). Note that the outlier had an eect on
the sample mean.
In order to see how sensitive the test statistics are to outliers, we change the value of
the outlier (dierence in the 10th row of Table 1.4.1 and plot the value of the test statistic
against the value of the dierence in the 10th row of Table 1.4.1; see Panel C of Figure 1.4.1.
Note that as the value of the 10th dierence changes the t-test changes quite rapidly. In
fact, the t-test can be pulled out of the rejection region by making the dierence suciently
small or large. However, the sign test , Panel D of Figure 1.4.1, stays constant until the
dierence crosses zero and then only changes by 2. This illustrates the high sensitivity of the
t-test to outliers and the relative resistance of the sign test. A similar plot can be prepared
for the Wilcoxon signed rank test; see Exercise 1.12.8. In addition, the corresponding p-
values can be plotted to see how sensitive the decision to reject the null hypothesis is to
outliers. Sensitivity plots are similar to inuence functions. We discuss inuence functions
for estimates in Section 1.6.
Example 1.4.2. Shoshoni Rectangles.
16 CHAPTER 1. ONE SAMPLE PROBLEMS
Table 1.4.2: Width to Length Ratios of Rectangles
0.553 0.570 0.576 0.601 0.606 0.606 0.609 0.611 0.615 0.628
0.654 0.662 0.668 0.670 0.672 0.690 0.693 0.749 0.844 0.933
The golden rectangle is a rectangle in which the ratio of the width to length is approximately
0.618. It can be characterized in various ways. For example, w/l = l/(w + l) characterizes
the golden rectangle. It is considered to be an aesthetic standard in Western civilization and
appears in art and architecture going back to the ancient Greeks. It now appears in such
items as credit and business cards. In a cultural anthropology study, DuBois (1960) reports
on a study of the Shoshoni beaded baskets. These baskets contain beaded rectangles and the
question was whether the Shoshonis use the same aesthetic standard as the West. A sample
of twenty width to length ratios from Shoshoni baskets is given in Table 1.4.2.
Panel A of Figure 1.4.2 shows the notched boxplot containing the 95% L
1
condence
interval for the median of the population of w/l ratios. It shows two outliers which are also
apparent in the normal quantile plot, Panel B of Figure 1.4.2. We used the sign procedure
to analyze the data, perfoming the computations with the RBR function onesampsgn. For
Figure 1.4.2: Panel A: Boxplot of Width to Length Ratios of Shoshoni Rectangles; Panel B:
Normal qq plot.
0
.
6
0
.
7
0
.
8
0
.
9
W
id
t
h

t
o

le
n
g
t
h

r
a
t
io
s
Panel A
*
*
*
*
****
*
*
*
*
***
**
*
*
*
1.5 1.0 0.5 0.0 0.5 1.0 1.5
0
.
6
0
.
7
0
.
8
0
.
9
Normal quantiles
W
id
t
h

t
o

le
n
g
t
h

r
a
t
io
s
Panel B
this problem, it is of interest to test H
0
: = 0.618 (the golden rectangle). The display
1.5. PROPERTIES OF NORMED-BASED INFERENCE 17
below shows this evaluation for the sign test along with a 90% condence interval for .
> onesampsgn(x,theta0=.618,alpha=.10)
Results for the Sign procedure
Test of theta = 0.618 versus theta not equal to 0.618
Test stat. S is 2 Standardized (z) Test-Stat. 0.2236068 p-vlaue 0.8230633
Estimate 0.641 SE is 0.01854268
90 % Confidence Interval is ( 0.609 , 0.67 )
Estimate of the scale parameter tau 0.0829254
With a p-value of 0.823, there is no evidence to refute the null hypothesis. Further. we see
that the golden rectangle 0.618 is contained in the condence interval. This suggests that
there is no evidence in this data that the Shoshonis are using a dierent standard.
For comparison, the analysis based on the t-procedure is
> onesampt(x,theta0=.618,alpha=.10)
Results for the t-test procedure
Test of theta = 0.618 versus theta not equal to 0.618
Test stat. Ave(x) - 0.618 is 0.0425 Standardized (t) Test-Stat. 2.054523
p-vlaue 0.05394133
Estimate 0.6605 SE is 0.02068606
90 % Confidence Interval is ( 0.624731 , 0.696269 )
Estimate of the scale parameter sigma 0.09251088
Based on the t-test with the p-value of 0.053, one might conclude that there is evidence that
the Shoshonis are using a dierent standard. Further, the 90% t-interval does not contain
the golden rectangle ratio. Based on the t-analysis, a researcher might conclude that there is
evidence that the Shoshonis are using a dierent standard. Hence, the robust and traditional
approaches lead to dierent practical conclusions for this problem. The outliers, of course
impaired the t-analysis. For this data, we have more faith in the simple sign test.
1.5 Properties of Normed-Based Inference
In this section, we establish statistical properties of the inference described in Section 1.3
for the norm-t of a location model. These properties describe the null and alternative
distributions of the test, ( 1.3.7), and the asymptotic distribution of the estimate, (1.3.2).
Furthermore, these properties allow us to derive relative eciencies between competing pro-
cedures. While our discussion is general, we will illustrate the inference based on the L
1
and
18 CHAPTER 1. ONE SAMPLE PROBLEMS
L
2
norms as we proceed. The inference based on the signed-rank norm will be considered in
Section 1.7 and that based on norms of general signed-rank scores in Section 1.8.
We assume then that Model ( 1.2.1) holds for a random sample X
1
, . . . , X
n
with common
distribution and density functions H(x) = F(x ) and h(x) = f(x ), respectively. Next
a norm is specied to t the model. We will assume that the induced functional is 0 at F,
i.e., T(F) = 0. Let S() be the gradient function induced by the norm. We establish the
properties of the inference by considering the null and alternative behavior of the gradient
test. For convenience, we consider the one sided hypothesis,
H
0
: = 0 versus H
A
: > 0 . (1.5.1)
Since S() is nonincreasing, a level test of these hypotheses based on S(0) is
Reject H
0
in favor of H
A
if S(0) c , (1.5.2)
where c is such that P
0
[S(0) c] = .
The power function of this test is given by,

S
() = P

[S(0) c] = P
0
[S() c] , (1.5.3)
where the last equality follows from Theorem 1.3.1.
The power function forms a convenient summary of the test based on S(0). The prob-
ability of a Type I Error (level of the test) is given by
S
(0). The probability of a Type II
error at the alternative is
S
() = 1
S
(). For a given test of hypotheses ( 1.5.1) we
want the power function to be increasing in with an upper limit of one. In the rst sub-
section below, we establish these properties for the test ( 1.5.2). We can also compare level
-tests of ( 1.5.1) by comparing their powers at alternative hypotheses. These are eciency
considerations and they are covered in later subsections.
1.5.1 Basic Properties of the Power Function
S
()
As a rst step we show that
S
() is nondecreasing:
Theorem 1.5.1. Suppose the test of H
0
: = 0 versus H
A
: > 0 rejects when S(0) c.
Then the power function is nondecreasing in .
Proof. Recall that S() is nonincreasing in since D() is convex. By Theorem 1.3.1,

S
() = P
0
[S() c]. Now, if
1

2
then S(
1
) S(
2
) and , hence, S(
1
) c
implies that S(
2
) c. It then follows that P
0
(S(
1
) c) P
0
(S(
2
) c) and the
power function is monotone in as required.
This theorem shows that the test of H
0
: = 0 versus H
A
: > 0 based on S(0) is
unbiased, that is, P

(S(0) c) for positive , where is the size of the test. At times

it is convenient to consider the more general null hypothesis:
H

0
: 0 versus H
A
: > 0 . (1.5.4)
1.5. PROPERTIES OF NORMED-BASED INFERENCE 19
A test of H

0
versus H
A
with power function
S
is said to have level , if
sup
0

S
() = .
The proof of Theorem 1.5.1 shows that
S
() is nondecreasing in all 1. Since the
gradient test has level for H
0
, it follows immediately that it has level for H

0
also.
We next show that the power function of the gradient test converges to 1 as . We
formally dene this as:
Denition 1.5.1. Consider a level test for the hypotheses ( 1.5.1) which has power func-
tion
S
(). We say the test is resolving, if
S
() 1 as .
Theorem 1.5.2. Suppose the test of H
0
: = 0 versus H
A
: > 0 rejects when S(0) c.
Further, let = sup

S() and suppose that is attained for some nite value of . Then the
test is resolving, that is, P

(S(0) c) 1 as .
Proof. Since S() is nonincreasing, for any unbounded increasing sequence
m
, S(
m
)
S(
m+1
). For xed n and F, there is a real number a such that P
0
([ X
i
[ a, i = 1, . . . , n) >
1 for any specied > 0. Let A

denote the event [ X

i
[ a, i = 1, . . . , n. Now,
P
m
(S(0) c) = P
0
(S(
m
) c)
= 1 P
0
(S(
m
) < c)
= 1 P
0
(S(
m
) < c A

) P
0
(S(
m
) < c A
c

) .
The hypothesis of the theorem implies that, for suciently large m, S(
m
) < c A

is empty. Further, P
0
(S(
m
) < c A
c

) P
0
(A
c

) < c. Hence, for m suciently large,

P
m
(S(0) c) 1 and the proof is complete.
The condition of boundedness imposed on S() in the above theorem holds for almost
all the nonparametric tests discussed in this book; hence, these nonparametric tests will be
resolving. Thus they will be able to discern large alternative hypotheses with high power.
What can be said at a xed alternative? Recall the denition of a consistent test:
Denition 1.5.2. We say that a test is consistent if the power tends to one for each xed
alternative as the sample size n increases. The alternatives consist in specic values of
and a cdf F.
Consistency implies that the test is behaving as expected when the sample size increases
and the alternative hypothesis is true. To obtain consistency of the gradient test, we need
to impose the following two assumptions on S(): rst
S() = S()/n

P
() where (0) = 0 and (0) < () for all > 0, (1.5.5)
20 CHAPTER 1. ONE SAMPLE PROBLEMS
for some > 0 and secondly,
E
0
S(0) = 0 and

nS(0)
D
N(0,
2
(0)) under H
0
for all F , (1.5.6)
for some positive constant (0). The rst assumption means that S(0) separates the null
from the alternative hypothesis. Note, it is not crucial that (0) = 0, since this can always be
achieved by recentering. It will be useful to have the following result concerning the asymp-
totic null distribution of S(0). Its proof follows readily from the denition of convergence in
distribution.
Theorem 1.5.3. Assume ( 1.5.6). The test dened by

nS(0) z

(0) where z

is the
upper percentile from the standard normal cdf ie. 1 (z

) = is asymptotically size .
Hence, P
0
(

nS(0)) z

(0)) .
It follows that a gradient test is consistent; i.e.,
Theorem 1.5.4. Assume conditions ( 1.5.5) and ( 1.5.6). Then the gradient test

nS(0)
z

(0) is consistent, ie. the power at xed alternatives tends to one as n increases.
Proof. Fix

> 0 and F. For > 0 and for large n, we have n

1/2
z

(0) < (

) . This
leads to the following string of inequalities:
P

,F
(S(0) n
1/2
z

(0)) P

,F
(S(0) (

) )
P

,F
([ S(0) (

) [ ) 1 ,
which is the desired result.
Example 1.5.1. The L
1
Case
Assume that the model cdf F has the unique median 0. Consider the L
1
norm. The
associated level gradient test of ( 1.5.1) is equivalent to the sign test given by:
Reject H
0
in favor of H
A
if S
+
1
=

I(X
i
> 0) c ,
where c is such that P[bin(n, 1/2) c] = . The test is nonparametric, i.e., it does not
depend on F. From the above discussion its power function is nondecreasing in . Since S
+
1
()
is bounded and attains its bound on a nite interval, the test is resolving. For consistency,
take = 1 in expression ( 1.5.5). Then E[n
1
S
+
1
(0)] = P(X > 0) = 1 F() = ().
An application of the Weak Law of Large numbers shows that the limit in condition ( 1.5.5)
holds. Further, (0) = 1/2 < () for all > 0 and all F. Finally, apply the Central
Limit Theorem to show that ( 1.5.6) holds. Hence, the sign test is consistent for location
alternatives. Further, it is consistent for each pair , F such that P(X > 0) > 1/2.
A discussion of these properties for the gradient test based on the L
2
-norm can be found
in Exercise 1.12.5.
1.5. PROPERTIES OF NORMED-BASED INFERENCE 21
1.5.2 Asymptotic Linearity and Pitman Regularity
In the last section we discussed some of the basic properties of the power function for a
gradient test. Next we establish some general results that will allow us to compare power
functions for dierent level -tests. These results will also lead to the asymptotic distribu-
tions of the location estimators

based on norm ts. We will also make use of them in later
sections and chapters.
Assume the setup found at the beginning of this section; i.e., we are considering the
location model ( 1.3.1) and we have specied a norm with gradient function S(). We rst
dene a Pitman regular process:
Denition 1.5.3. We will say an estimating function S() is Pitman Regular if the
following four conditions hold: rst,
S() is nonincreasing in ; (1.5.7)
second, letting S() = S()/n

, for some > 0.

there exists a function (), such that (0) = 0,

() is continuous at 0,

(0) > 0 and either S(0)

() or E

(S(0) = () ; (1.5.8)
third,
sup
|b|B

nS
_
b

n
_

nS(0) +

(0)b

P
0 , (1.5.9)
for any B > 0; and fourth there is a constant (0) such that

n
_
S(0)
(0)
_
D
0
N(0, 1) . (1.5.10)
Further the quantity
c =

(0)/(0) (1.5.11)
is called the ecacy of S().
Condition ( 1.5.9) is called the asymptotic linearity of the process S(). Often we can
compute c when we have the mean under general and the variance under = 0. Thus

(0) =
d
d
E

[S(0) [
=0
and
2
(0) = limnVar
0
(S(0)) . (1.5.12)
Hence, another way expressing the asymptotic linearity of S() is

n
_
S(b/

n)
(0)
_
=

n
_
S(0)
(0)
_
cb + o
p
(1) . (1.5.13)
22 CHAPTER 1. ONE SAMPLE PROBLEMS
If we replace b by

n
n
where, of course, [

n
n
[ B for B > 0, then we can write

n
_
S(
n
)
(0)
_
=

n
_
S(0)
(0)
_
c

n
n
+ o
p
(1) . (1.5.14)
We record one more result on limiting distributions whose proof follows from Theorems 1.3.1
and 1.5.6.
Theorem 1.5.5. Suppose S() is Pitman Regular. Then

n
_
S(b/

n)
(0)
_
D
0
Z cb (1.5.15)
and

n
_
S(0)
(0)
_
D
b/

n
Z cb , (1.5.16)
where Z N(0, 1) and, so, Z cb N(cb, 1).
The second part of this theorem says that the limiting distribution of S(0) , when stan-
dardized by (0), and computed along a sequence of alternatives b/n
1/2
is still normal with
the same variance of one but with a new mean, namely cb. This result will be useful in
approximating the power near the null hypothesis.
We will nd asymptotic linearity to be useful in establishing statistical properties. Our
next result provides sucient conditions for linearity.
Theorem 1.5.6. Let S() = (1/n

)S() for some > 0 such that the conditions ( 1.5.7),

( 1.5.8) and ( 1.5.10) of Denition 1.5.3 hold. Suppose for any b 1,
nVar
0
(S(n
1/2
b) S(0)) 0 , as n . (1.5.17)
Then
sup
|b|B

nS
_
b

n
_

nS(0) +

(0)b

P
0 , (1.5.18)
for any B > 0.
Proof. First consider U
n
(b) = [S(n
1/2
b) S(0)]/(b/

n). By ( 1.5.8) we have

E
0
(U
0
(b)) =

n
b

_
b

n
_
=

n
b
_

(
n
)
_

(0) . (1.5.19)
Furthermore,
Var
0
U
n
(b) =
n
b
2
Var
0
_
S
_
b

n
_
S(0)
_
0 . (1.5.20)
As Exercise 1.12.9 shows, ( 1.5.19) and ( 1.5.20) imply that U
n
(b) converges to

(0) in
probability, pointwise in b, i.e., U
n
(b) =

(0) +o
p
(1).
1.5. PROPERTIES OF NORMED-BASED INFERENCE 23
For the second part of the proof, let W
n
(b) =

n[S(b/

n) S(0) +

(0)b/

n]. Further
let > 0 and > 0 and partition [B, B] into B = b
0
< b
1
< . . . < b
m
= B so that
b
i
b
i1
/(2[

(0)[) for all i. There exists N such that n N implies P[max

i
[W
n
(b
i
)[ >
/2] < .
Now suppose that W
n
(b) 0 ( a similar argument can be given for W
n
(b) < 0). Then
[W
n
(b)[ =

n
_
S
_
b

n
_
S(0)
_
+ b

(0)

n
_
S
_
b

n
_
S(0)
_
+b
i1

(0) + (b b
i1
)

(0)
[W
n
(b
i1
)[ + (b b
i1
)[

(0)[ max
i
[W
n
(b
i
)[ + /2 .
Hence,
P
0
_
sup
|b|B
[W
n
(b)[ >
_
P
0
(max
i
[W
n
(b
i
)[ + /2) > ) < ,
and
sup
|b|B
[W
n
(b)[
P
0 .
In the next three subsections we use these tools to handle the issues of power and eciency
for a general norm-based inference, but rst we show that the L
1
gradient function is Pitman
regular.
Example 1.5.2. Pitman Regularity of the L
1
Process
Assume that the model pdf satises f(0) > 0. Recall that the L
1
gradient function is
S
1
() =
n

i=1
sgn(X
i
) .
Take = 1 in Theorem 1.5.6; hence, the average of interest is S
1
() = n
1
S
1
(). This is
nonincreasing so condition ( 1.5.7) is satised. Next it is easy to check that () = E

S
1
(0) =
E

sgnX
i
= E
0
sgn(X
i
+ ) = 1 2F(). Hence,

(0) = 2f(0). Then condition ( 1.5.8) is

satised. We now consider condition ( 1.5.17). Consider the case b > 0, (similarly for b < 0),
S
1
(b/

n) S
1
(0) = n
1
n

1
[sgn(X
i
b/

n) sgn(X
i
)] = (2/n)
n

1
I(0 < X
i
< b/n
1/2
)
Because this is a sum of independent Bernoulli variables, we have
nVar
0
[S
1
(b/n
1/2
) S
1
(0)] 4P(0 < X
1
< b/

n) = 4[F(b/

n) F(0)] 0 .
The convergence to 0 occurs since F is continuous. Thus condition ( 1.5.17) is satised.
Finally, note that (0) = 1 so

nS
1
converges in distribution to Z N(0, 1) by the Central
24 CHAPTER 1. ONE SAMPLE PROBLEMS
Limit Theorem. Therefore the L
1
gradient process S() is Pitman regular. It follows that
the ecacy of the L
1
is
c
L
1
= 2f(0) . (1.5.21)
For future reference, we state the asymptotic linearity result for the L
1
process: if [

n
n
[ B
then

nS
1
(
n
) =

nS
1
(0) 2f(0)

n
n
+ o
p
(1) . (1.5.22)
Example 1.5.3. Pitman Regularity of the L
2
Process
In Exercise 1.12.6 it is shown that, provided X
i
has nite variance, the L
2
gradient function
is Pitman Regular and that the ecacy is simply c
L
2
= 1/
f
.
We are now in a position to investigate the eciency and power properties of the statisti-
cal methods based on the L
1
norm relative to the statistical methods based on the L
2
norm.
As we will see in the next three subsections, these properties depend only on the ecacies.
1.5.3 Asymptotic Theory and Eciency Results for

As at the beginning of this section, suppose we have the location model, ( 1.2.1), and that we
have chosen a norm to t the model with gradient function S(). In this part we will develop
the asymptotic distribution of the estimate. The asymptotic variance will provide the basis
for eciency comparisons. We will use the asymptotic linearity that accompanies Pitman
Regularity. To do this, however, we rst need to show that

n

is bounded in probability.
Lemma 1.5.1. If the gradient function S() is Pitman Regular, then

n(

) = O
p
(1).
Proof. Assume without loss of generality that = 0 and take t > 0. By the monotonicity of
S(), if S(t/

n) < 0 then

t/

n. Hence, P
0
(S(t/

n) < 0) P
0
(

n). Theorem
1.5.5 implies that the rst probability can be made as close to (tc) as desired. This, in turn,
can be made as close to 1 as desired. In a similar vein we note that If S(t/

n) > 0, then

n and

t. Again, the probability of this event can be made arbitrarily close

to 1. Hence, P
0
([

[ t) is arbitrarily close to 1 and we have boundedness in probability.

We are now in a position to exploit this boundedness in probability to determine the
asymptotic distribution of the estimate.
Theorem 1.5.7. Suppose S() is Pitman regular with ecacy c. Then

n(

) converges
in distribution to Z n(0, c
2
).
Proof. As usual we assume, with out loss of generality, that = 0. First recall that

is
dened by n
1/2
S(

)
.
= 0. From Lemma 1.5.1, we know that

is bounded in probability
so that we can apply ( 1.5.13) to deduce

nS(

)
(0)
=

nS(0)
(0)
c

+ o
p
(1) .
1.5. PROPERTIES OF NORMED-BASED INFERENCE 25
Solving we have

= c
1

nS(0)/(0) +o
p
(1) ;
hence, the result follows because

nS(0)/(0) is asymptotically N(0, 1).
Denition 1.5.4. If we have two Pitman Regular estimates with ecacies c
1
and c
2
, re-
spectively, then the eciency of

1
with respect to

2
is dened to be the reciprocal ratio of
their asymptotic variances, namely, e(

1
,

2
) = c
2
1
/c
2
2
.
The next example compares the L
1
estimate to the L
2
estimate.
Example 1.5.4. Relative eciency between the L
1
and L
2
estimates
In this example we compare the L
1
and L
2
estimates, namely, the sample median and
mean. We have seen that their respective ecacies are 2f(0) and
1
f
, and their asymptotic
variances are 1/4f
2
(0)n and
2
f
/n, respectively. Hence, the relative eciency of the median
with respect to the mean is
e(

X,

X) = asyvar(

n

X)/asyvar(

n

X) = c
2

X
/c
2

X
= 4f
2
(0)
2
f
(1.5.23)
where

X is the sample median and

X is the sample mean. The eciency computation
depends only on the Pitman ecacies. We illustrate the computation of the eciency using
the contaminated normal distribution. The pdf of the contaminated normal distribution
consists of mixing the standard normal pdf with a normal pdf having mean zero and variance

2
> 1. For between 0 and 1, the pdf can be written:
f

(x) = (1 )(x) +
1
(
1
x) (1.5.24)
with
2
f
= 1 +(
2
1). This distribution has tails heavier than the standard normal distri-
bution and can be used to model data contamination; see Tukey (1960) for more discussion.
We can think of as the fraction of the data that is contaminated. In Table 1.5.1 we
provide values of the eciencies for various values of contamination and with = 3. Note
that when we have 10 percent contamination that the eciency is 1. This indicates that, for
this distribution, the median and mean are equally eective. Finally, this example exhibits
a distribution for which the median is superior to the mean as an estimate of the center. See
Exercise 1.12.12 for other examples.
1.5.4 Asymptotic Power and Eciency Results for the Test Based
on S()
Consider the location model, ( 1.2.1), and assume that we have chosen a norm to t the model
with gradient function S(). Consider the gradient test ( 1.5.2) of the hypotheses ( 1.5.1).
In Section 1.5.1, we showed that the power function of this test is nondecreasing with upper
limit one and that it is typically resolving. Further, we showed that for a xed alternative,
the test is consistent. Thus the power will tend to one as the sample size increases. To
26 CHAPTER 1. ONE SAMPLE PROBLEMS
Table 1.5.1: Eciencies of the median relative to the mean for contaminated normal models.
e(

X,

X)
.00 .637
.03 .758
.05 .833
.10 1.000
.15 1.134
oset this eect, we will let the alternative converge to the null value at a rate that will
stabilize the power away from one. This will enable us to compare two tests along the same
alternative sequence. Consider the null hypothesis H
0
: = 0 versus H
An
: =
n
where

n
=

n and

> 0. Recall that the asymptotic size test based on S(0) rejects H
0
if

nS/(0) z

where 1 (z

) = .
The following theorem is called the asymptotic power lemma. Its proof follows im-
mediately from expression ( 1.5.13).
Theorem 1.5.8. Assume that S(0) is Pitman Regular with ecacy c, then the asymptotic
local power along the sequence
n
=

n is

S
(
n
) = P
n
_
nS(0)/(0) z

= P
0
_
nS(
n
)/(0) z

1 (z

c) ,
as n .
Note that larger values of the ecacy imply larger values of the asymptotic local power.
Denition 1.5.5. The Pitman asymptotic relative eciency of one test relative to
another is dened to be e(S
1
, S
2
) = c
2
1
/c
2
2
.
Note that this is the same formula as the eciency of one estimate relative to another
given in Denition 1.5.4. Therefore, the eciency results discussed in Example 1.5.4
between the L
1
and L
2
estimates apply for the sign and t tests also. Hence, we have an
example in which the simple sign test is asymptotically more powerful than the t-test.
We can also develop a sample size interpretation for the asymptotic power. Suppose we
specify a power < 1. Further, let z

be dened by 1(z

) = . Then 1(z

cn
1/2

n
) =
1 (z

) and z

cn
1/2

n
= z

. Solving for n yields

n
.
= (z

)
2
/c
2

2
n
. (1.5.25)
Typically we take
n
= k
n
with k
n
small. Now if S
1
(0) and S
2
(0) are two Pitman Regu-
lar asymptotically size tests then the ratio of sample sizes required to achieve the same
asymptotic power along the same sequence of alternatives is given by the approximation:
n
2
/n
1
.
= c
2
1
/c
2
2
. This provides additional motivation for the above denition of Pitman ef-
ciency of two tests. The initial development of asymptotic eciency was done by Pitman
(1948) in an unpublished manuscript and later published by Noether (1955).
1.5. PROPERTIES OF NORMED-BASED INFERENCE 27
1.5.5 Eciency Results for Condence Intervals Based on S()
In this part we consider the length of the condence interval as a measure of its eciency.
Suppose that we specify = 1 for the condence coecient. Then let z
/2
be dened
by 1 (z
/2
) = /2. Again we suppose throughout the discussion that the estimating
functions are Pitman Regular. Then the endpoints of the 100 percent condence interval
are given asymptotically by

L
and

U
such that

nS(

L
)
(0)
= z
/2
and

nS(

U
)
(0)
= z
/2
; (1.5.26)
see ( 1.3.10) for the exact versions of the endpoints.
The next theorem provides the asymptotic behavior of the length of this interval and,
further, it shows that the standardized length of the condence interval is a consistent
estimate of the asymptotic standard deviation of

n

.
Theorem 1.5.9. Suppose S() is a Pitman Regular estimating function with ecacy c. Let
L be the length of the corresponding condence interval. Then

nL
2z
/2
P

1
c
Proof: Using the same argument as in Lemma 1.5.1, we can show that

L
and

U
are
bounded in probability when multiplied by

n. Hence, the above estimating equations can
be linearized to obtain, for example:
z
/2
=

nS(

L
)/(0) =

nS(0)/(0) c

L
/(0) +o
P
(1) .
This can then be solved to nd:

L
=

nS(0)/c(0) z
/2
/c + o
P
(1)
When this is also done for

U
and the dierence is taken, we have:
n
1/2
(

L
) = 2z
/2
/c + o
P
(1) ,
which concludes the argument.
From Theorem 1.5.7,

has an approximate normal distribution with variance c
2
/n. So
by Theorem 1.5.9, a consistent estimate of the standard error of

is
SE(

) =

nL
2z
/2
1

n
=
L
2z
/2
. (1.5.27)
If the ratio of squared asymptotic lengths is used as a measure of eciency then the
eciency of one condence interval relative to another is again the ratio of the
squares of the ecacies.
28 CHAPTER 1. ONE SAMPLE PROBLEMS
The discussion of the properties of estimation, testing, and condence interval construc-
tion shows that, asymptotically at least, the relative merit of a procedure is measured by its
ecacy. This measure is the slope of the linear approximation of the standardized estimat-
ing function that determines these procedures. In the comparison of L
1
and L
2
methods,
we have seen that the eciency e(L
1
, L
2
) = 4
2
f
f
2
(0). There are other types of asymptotic
eciency that have been studied in the literature along with nite sample versions of these
asymptotic eciencies. The conclusions drawn from these other eciencies are consistent
with the picture presented here. Finally, conclusions of simulation studies have also been
consistent with the material presented here. Hence, we will not discuss these other measures;
see Section 2.6 of Hettmansperger (1984a) for further references.
Example 1.5.5. Estimation of the Standard Error of the Sample Median
Recall that the sample median, when properly standardized, has a limiting normal dis-
tribution. Suppose we have a sample of size n from H(x) = F(x) where is the unknown
median. From Theorem 1.5.7, we know that the approximating distribution for

, the sam-
ple median, is normal with mean and variance 1/[4nh
2
()]. We refer to this variance as
the asymptotic variance. This normal distribution can be used to approximate probabilities
concerning the sample median. When the underlying form of the distribution H is unknown,
we must estimate this asymptotic variance. Theorem 1.5.9 provides one key to the estima-
tion of the asymptotic variance. The square root of the asymptotic variance is sometimes
called the asymptotic standard error of the sample median. We will discuss the estimation
of this standard error rather than the asymptotic variance.
As a simple example, in expression (1.5.27) take = .05, z
/2
= 2, and k = n/2
n
1/2
, then we have the following consistent estimate of the asymptotic standard error of the
median:
SE(median) [X
(n/2+n
1/2
)
X
(n/2n
1/2
)
]/4. (1.5.28)
This simple estimate of the asymptotic standard error is based on the length of the 95%
condence interval for the median. Sheather (1987) shows that the estimate can be improved
by using the interpolated condence intervals discussed in Section 1.10. Of course, other
condence intervals with dierent condence coecients can be used also. We recommend
using 90% or 95%; again, see McKean and Schrader (1984) and Sheather (1987). This SE
is computed by our R function onesampsgn for general . The default value of is set at
0.05.
There are other approaches to the estimation of this standard error. For example, we
could estimate the density h(x) directly and then use h
n
(

) where h
n
is the density estimate.
Another possibility is to estimate the nite sample standard error of the sample median
directly. Sheather (1987) surveys these approaches. We will discuss one further possibility
here, namely the bootstrap. The bootstrap has gained wide attention recently because of
its versatility in estimation and testing in nonstandard situations. See Efron and Tibshirani
(1993) for a very readable account of the bootstrap.
If we know the underlying distribution H(x), then we could estimate the standard error
of the median by repeatedly drawing samples with a computer from the distribution H. If we
1.5. PROPERTIES OF NORMED-BASED INFERENCE 29
Table 1.5.2: Generated N(0, 1) variates, (placed in order)
-1.79756 -1.66132 -1.46531 -1.45333 -1.21163 -0.92866 -0.86812
-0.84697 -0.81584 -0.78912 -0.68127 -0.37479 -0.33046 -0.22897
-0.02502 -0.00186 0.09666 0.13316 0.17747 0.31737 0.33125
0.80905 0.88860 0.90606 0.99640 1.26032 1.46174 1.52549
1.60306 1.90116
have B samples from H and have computed and stored the B values of the sample median,
then our estimate of the standard error of the median is simply the sample standard deviation
of these B values. When H is unknown we replace it by H
n
, the empirical distribution
function, and proceed with the simulation. Later in the chapter we will encounter an example
where we want to compute a bootstrap p-value for a test; see Section ??. The bootstrap
approach based on H
n
is called the nonparametric bootstrap since nothing is assumed
about the form of the underlying distribution H. In another version, called the parametric
bootstrap, we suppose that we know the form of the underlying distribution H but there are
some unknown parameters such as the mean and variance. We use the sample to estimate
these unknown parameters, insert the values into H, and use this distribution to draw the
B samples. In this book we will be concerned mainly with the nonparametric bootstrap and
we will use the generic term bootstrap to refer to this approach. In either case, ready access
to high speed computing makes this method appealing. The following example illustrates
the computations.
Example 1.5.6. Generated Data
Using Minitab, the 30 data points in Table 1.5.2 were generated from a normal distribu-
tion with mean 0 and variance 1. Thus, we know that the asymptotic standard error should
be about 1/[30
1/2
2f(0)] = 0.23. We will use this to check what happens if we try to estimate
the standard error from the data.
Using expression (1.3.16), the 95% condence interval for the median is (0.789, 0.331).
Hence, the length of condence interval estimate, given in expression ( 1.5.28), is (0.331 +
0.789)/4 = 0.28. A simple R function was written to bootstrap the sample; see Exercise
1.12.7. Using this function, we obtained 1000 bootstrap samples and the resulting standard
deviation of the 1000 bootstrap medians was 0.27. For this instance, the bootstrap procedure
essentially agrees with the length of condence interval estimate.
Note that, from the data, the sample mean is 0.03575 and the sample standard deviation
is 1.04769. If we assume the underlying distribution H is normal with unknown mean and
variance, we would use the parametric bootstrap. Hence, instead of sampling from the
empirical distribution function, we want to sample from a normal distribution with mean
0.03575 and standard deviation 1.04769. Using R (see Exercise 1.12.7), we obtained 1000
parametric bootstrapped samples. The sample standard deviation of the resulting medians
was 0.23, just the value we would expect. You should not expect to get the precise value
every time you bootstrap, either parametrically or nonparametrically. It is, however, a very
30 CHAPTER 1. ONE SAMPLE PROBLEMS
versatile method to use to estimate such quantities as standard errors of estimates and
p-values of tests.
An unusual aspect of this example is that the bootstrap distribution of the sample me-
dian can be found in closed form and does not have to be simulated as described above. The
variance of the sample median computed from the bootstrap distribution can then be found.
The result is another estimate of the variance of the sample median. This was discovered
independently by Maritz and Jarrett (1978) and Efron (1979). We do not pursue this devel-
opment here because in most cases we must simulate the bootstrap distribution and that is
where the real strength of the bootstrap approach lies. For an interesting comparison of the
various estimates of the variance of the sample median see McKean and Schrader (1984).
1.6 Robustness Properties of Norm-Based Inference
We have just considered the statistical properties of the inference procedures. We have looked
at ideas such as eciency and power. We now turn to stability or robustness properties. By
this we mean how the inference procedures are eected by outliers or corruption of portions
of the data. Ideally, we would like procedures (tests and estimates) which do not respond
too quickly to a single outlying value when it is introduced into the sample. Further, we
would not like procedures that can be changed by arbitrary amounts by corrupting a small
amount of the data. Response to outliers is measured by the inuence curve and response
to data corruption is measured by the breakdown value. We will introduce nite sample
versions of these concepts. They are easy to work with and, in the limit, they generally
equal the more abstract versions based on the study of statistical functionals. We consider
rst the robustness properties of the estimates and secondly tests. As in the last section,
the discussion will be general but the L
1
and L
2
procedures will be discussed as we proceed.
The robustness properties of the procedures based on the weighted L
1
norm will be covered
in Sections 1.7 and 1.8. See Section A.5 of the Appendix for a development based on
functionals.
1.6.1 Robustness Properties of

We begin with the denition of breakdown for the estimator

.
Denition 1.6.1. Estimation Breakdown Let x = (x
1
, . . . , x
n
) represent a realization of a
sample and let
x
(m)
= (x

1
, . . . , x

m
, x
m+1
, . . . , x
n
)

represent the corruption of any m of the n observations. We dene the bias of an estimator

to be bias(m;

, x) = sup [

(x
(m)
)

(x)[ where the sup is taken over all possible corrupted

samples x
(m)
. Note that we change only x

1
, . . . , x

m
while x
m+1
, . . . , x
n
are xed at their
original values. If the bias is innite, we say the estimate has broken down and the nite
sample breakdown value is given by

n
= min m/n : bias(m;

, x) = . (1.6.1)
1.6. ROBUSTNESS PROPERTIES OF NORM-BASED INFERENCE 31
This approach to breakdown is called replacement breakdown because observations are
replaced by corrupted values; see Donoho and Huber (1983) for more discussion of this
approach. Often there exists an integer m such that x
(m)

x
(nm+1)
and either

tends
to as x
(m)
tends to or

tends to +as x
(nm+1)
tends to +. If m

is the smallest
such integer then

n
= m

/n. Hodges (1967) was the rst to introduce these ideas.

To remove the eects of sample size, the limit, when it exists, can be computed. In this
case we call the lim

n
=

, the asymptotic breakdown value.

Example 1.6.1. Breakdown Values for the L
1
and L
2
Estimates
The L
1
estimate is the sample median. If the sample size is n = 2k then it is easy to see
that when x
(k)
tends to , the median also tends to . Hence, the breakdown value
of the sample median is k/n which tends to .5. By a similar argument, when the sample
size is n = 2k + 1, the breakdown value is (k + 1)/n and it also tends to .5 as the sample
size increases. Hence, we say that the sample median is a 50% breakdown estimate. The L
2
estimate is the sample mean. A similar analysis shows that the breakdown value is 1/n which
tends to zero. Hence, we say the sample mean is a zero breakdown estimate. This sharply
contrasts the two estimates since we see that the median is the most resistant estimate and
the sample mean is the least resistant estimate. In Exercise 1.12.13, the reader is asked to
show that the pseudo-median induced by the signed-rank norm, ( 1.3.25), has breakdown
.29.
We have just considered the eect of corrupting some of the observations. The estimate
breaks down if we can force the estimate to change by an arbitrary amount by changing
the observations over which we have control. Another important concept of stability entails
measuring the eect of the introduction of a single outlier. An estimate is stable or resistant
if it does not change by a large amount when the outlier is introduced. In particular, we
want the change to be bounded no matter what the value of the outlier.
Suppose we have a sample of observations x
1
, . . . , x
n
from a distribution centered at 0
and an estimate

n
based on these observations. By Pitman Regularity, Denition 1.5.3,
and Theorem 1.5.7, we have
n
1/2

n
= c
1
n
1/2
S(0)/(0) +o
P
(1) , (1.6.2)
provided the true parameter is 0. Further, we often have a representation of S(0) as a sum
of independent random variables. We may have to make a projection of S(0) to achieve this;
see the next chapter for examples of projections. In any case, we then have the following
representation
c
1
n
1/2
S(0)/(0) = n
1/2
n

i=1
(x
i
) + o
P
(1) , (1.6.3)
where () is the function needed in the representation. When we combine the above two
32 CHAPTER 1. ONE SAMPLE PROBLEMS
statements we have
n
1/2

n
= n
1/2
n

i=1
(x
i
) + o
P
(1) . (1.6.4)
Recall that the distribution that we are sampling is assumed to be centered at 0. The
dierence (

n
0) is approximated by the average of n independent and identically distributed
random variables. Since (x
i
) represents the eect of the ith observation on

n
it is called
the inuence function.
The inuence function approximates the rate of change of the estimate when an outlier
is introduced. Let x
n+1
= x

represent a new, outlying , observation. Since

n
should be
roughly 0, we have
(n + 1)

n+1
(n + 1)

n
.
= (x

)
and

n+1

n
1/(n + 1)
(x

) , (1.6.5)
and this reveals the dierential character of the inuence function. Hampel (1974) developed
the inuence function from the theory of von Mises dierentiable functions. In Sections A.5
and A.5.2 of the Appendix, we use his formulation to derive several inuence functions for
later situations. Here, though, we will identify inuence functions for the estimates through
the approximations described above. We now illustrate this approach.
Example 1.6.2. Inuence Function for the L
1
and L
2
Estimates
We will briey describe the inuence functions for the sample median and the sample
mean, the L
1
and L
2
estimates. From Example 1.5.2 we have immediately that, for the
sample median,
n
1/2

n
n

i=1
sgn(X
i
)
2f(0)
and
(x) =
sgn(x)
2f(0)
Note that the inuence function is bounded but not continuous. Hence, outlying ob-
servations cannot have an arbitrarily large eect on the estimate. It is this feature along
with the 50% breakdown property that makes the sample median the prototype of resistant
estimates. The sample mean, on the other hand, has an unbounded inuence function. It is
easy to see that (x) = x, linear and unbounded. Hence, a single large outlier is sucient to
carry the sample mean beyond any bound. The unbounded inuence is connected to the 0
breakdown property. Hence, the L
2
estimate is the prototype of an estimate highly ecient
at a specied model, the normal model in this case, but not resistant. This means that quite
1.6. ROBUSTNESS PROPERTIES OF NORM-BASED INFERENCE 33
close to the model for which the estimate is optimal, the estimate may perform very poorly;
recall Table 1.5.1.
1.6.2 Breakdown Properties of Tests
We now turn to the issue of breakdown in testing hypotheses. The problems are a bit
dierent in this case since we typically want to move, by data corruption, a test statistic
into or out of a critical region. It is not a matter of sending the statistic beyond any nite
bound as it is in estimation breakdown.
Denition 1.6.2. Suppose that V is a statistic for testing H
0
: = 0 versus H
0
: > 0
and we reject the null hypothesis when V k, where P
0
(V k) = determines k. The
rejection breakdown of the test is dened by

n
(reject) = min m/n : inf
x
sup
x
(m)
V k , (1.6.6)
where the sup is taken over all possible corruptions of m data points. Likewise the accep-
tance breakdown is dened to be

n
(accept) = min m/n : sup
x
inf
x
(m)
V < k . (1.6.7)
Rejection breakdown is the smallest portion of the data that can be corrupted to guaran-
tee that the test will reject. Acceptance breakdown is interpreted as the smallest portion of
the data that must be corrupted to guarantee that the test statistic will not be in the critical
region; i.e., the test is guaranteed to fail to reject the null hypothesis. We turn immediately
to a comparison of the L
1
and L
2
tests.
Example 1.6.3. Rejection Breakdown of the L
1
We rst consider the one sided sign test for testing H
0
: = 0 versus H
A
: > 0. The
asymptotically size test rejects the null hypothesis when n
1/2
S
1
(0) z

, the upper
quantile from a standard normal distribution. It is easier to see exactly what happens if we
convert the test to S
+
1
(0) =

I(X
i
> 0) n/2 + (n
1/2
z

)/2. Now each time we make an

observation positive it makes S
+
1
(0) increase by one. Hence, if we wish to guarantee that the
test will reject, we make m observations positive where m

= [n/2 + (n
1/2
z

)/2] + 1, [.] the

greatest integer function. Then the rejection breakdown is

n
(reject) = m

/n
.
=
1
2
+
z

2n
1/2
Likewise,

n
(accept)
.
=
1
2

z

2n
1/2
.
Note that the rejection breakdown converges down to the estimation breakdown and the
acceptance breakdown converges up to it.
34 CHAPTER 1. ONE SAMPLE PROBLEMS
We next turn to the one-sided Students t-test. Acceptance breakdown for the t-test is
simple. By making a single observation approach , the t statistic can be made negative
hence we can always guarantee acceptance with control of one observation. The rejection
breakdown is more interesting. If we increase an observation both the sample mean and the
sample standard deviation increase. Hence, it is not at all clear what will happen to the
t-statistic. In fact it is not sucient to increase a single observation in order to force the
t-statistic to move into the critical region. We now show that the rejection breakdown for
the t-statistic is:

n
(reject) =
t
2

n 1 +t
2

0 , as n ,
where t

is the upper quantile from a t-distribution with n 1 degrees of freedom. The

inmum part of the denition suggests that we set all observations at B < 0 and then
change m observations to M > 0. The result is
x =
mM (n m)B
n
and s
2
=
m(n m)(M + B)
2
(n 1)n
.
Putting these two quantities together we have
n
1/2
x
s
= [m(n m)B/M]
_
n 1
m(n m)(1 +B/M)
2
_
1/2

m(n 1)
n m
1/2
,
as M . We now equate the limit to t

and solve for m to get m = nt

/(n 1 + t
2

),
(actually we would take the greatest integer and add one). Then the rejection breakdown
is m divided by n as stated. Table 1.6.1 compares rejection breakdown values for the sign
and t-tests. We assume = .05 and the sample sizes are chosen so that the size of the sign
test is quite close to .05. For further discussion, see Ylvisaker (1977).
These denitions of breakdown assume a worst case scenario. They assume that the
test statistic is as far away from the critical region (for rejection breakdown) as possible.
In practice, however, it may be the case that a test statistic is quite near the edge of the
critical region and only one observation is needed to change the decision from fail to reject
to reject. An alternative form of breakdown considers the average number of observations
that must be corrupted, conditional on the test statistic being in the acceptance region, to
force a rejection.
Let M
R
be the number of observations that must be corrupted to force a rejection; then,
M
R
is a random variable. The expected rejection breakdown is dened to be
Exp

n
(reject) = E
H
0
[M
R
[M
R
> 0]/n . (1.6.8)
Note that we condition on M
R
> 0 since M
R
= 0 is equivalent to a rejection. It is left as
Exercise 1.12.14 to show that the expected breakdown can be computed with unconditional
expectation as
Exp

n
(reject) = E
H
0
[M
R
]/(1 ) . (1.6.9)
In the following example we illustrate this computation on the sign test and show how it
compares to the worst case breakdown introduced earlier.
1.7. INFERENCE AND THE WILCOXON SIGNED-RANK NORM 35
Table 1.6.1: Rejection breakdown values for size = .05 tests.
n Sign t
10 .71 .27
13 .70 .21
18 .67 .15
30 .63 .09
100 .58 .03
.50 0
Table 1.6.2: Comparison of expected breakdown and worst case breakdown for the size
= .05 sign test.
n Exp

n
(reject)

n
(reject)
10 .27 .71
13 .24 .70
18 .20 .67
30 .16 .63
100 .08 .58
0 .50
Example 1.6.4. Expected Rejection Breakdown of the Sign Test
Refer to Example 1.6.3. The one sided sign test rejects when

I(X
i
> 0) n/2 +
n
1/2
z
/2
. Hence, given that we fail to reject the null hypothesis, we will need to change
(corrupt) n/2 + n
1/2
z
/2

I(X
i
> 0) negative observations into positive ones. This is
precisely M
R
and E[M
R
] = n
1/2
z
/2
. It follows that Exp

n
(reject) = z
/2
n
1/2
(1 ) 0 as
n rather than .5 which happens in the worst case breakdown. Table 1.6.2 compares
the two types of rejection breakdown. This simple calculation clearly shows that even highly
resistant tests such as the sign test may breakdown quite easily. This is contrary to what the
worst case breakdown analysis would suggest. For additional reading on test breakdown see
Coakley and Hettmansperger (1992). He, Simpson and Portnoy (1990) discuss asymptotic
test breakdown.
1.7 Inference and the Wilcoxon Signed-Rank Norm
In this section we develop the statistical properties for the procedures based on the Wilcoxon
signed-rank norm, ( 1.3.17), that was dened in Example 1.3.3 of Section 1.3. Recall that
the norm and its associated gradient function are given in expressions ( 1.3.17) and ( 1.3.24),
respectively. Recall for a sample X
1
, . . . , X
n
that the estimate of is the median of the
36 CHAPTER 1. ONE SAMPLE PROBLEMS
Walsh averages given by ( 1.3.25). As in Section 1.3, our hypotheses of interest are
H
0
: = 0 versus H
0
: ,= 0 . (1.7.1)
The level test associated with the signed-rank norm is
Reject H
0
in favor of H
A
, if [T(0)[ c , (1.7.2)
where c is such that P
0
[[T(0)[ c]. To complete the test we need to determine the null
distribution of T(0), which is given by Theorems 1.7.1 and 1.7.2.
In order to develop the statistical properties, in addition to ( 1.2.1), we assume that
h(x) is symmetrically distributed about . (1.7.3)
We refer to this as the symmetric location model. Under symmetry, by Theorem 1.2.1,
T(H) = , for all location functionals T.
1.7.1 Null Distribution Theory of T(0)
In addition to expression ( 1.3.24), a third representation of T(0) will be helpful in estab-
lishing its null distribution. Recall the denition of the anti-ranks, D
1
, . . . , D
n
, given in
expression ( 1.3.19). Using these anti-ranks, we can write
T(0) =

R([X
i
[)sgn(X
i
) =

jsgn(X
D
j
) =

jW
j
,
where W
j
= sgn(X
D
j
).
Lemma 1.7.1. Under H
0
, [X
1
[, . . . , [X
n
[ are independent of sgn(X
1
), . . . , sgn(X
n
).
Proof: Since X
1
, . . . , X
n
is a random sample from H(x), it suces to show that P[[X
i
[
x, sgn(X
i
) = 1] = P[[X
i
[ x]P[sgn(X
i
) = 1]. But due to H
0
and the symmetry of h(x) this
follows from the following string of equalities:
P[[X
i
[ x, sgn(X
i
) = 1] = P[0 < X
i
x] = H(x)
1
2
= [2H(x) 1]
1
2
= P[[X
i
[ x]P[sgn(X
i
) = 1] .
Based on this lemma, the vector of ranks and, hence, the vector of antiranks (D
1
, . . . , D
n
),
are independent of the vector (sgn(X
1
), . . . , sgn(X
n
)). Based on these facts, we can obtain
the distribution of (W
1
, . . . , W
n
), which we summarize in the following lemma; see Exercise
1.12.15 for its proof.
Lemma 1.7.2. Under H
0
and the symmetry of h(x), W
1
, . . . , W
n
are iid random variables
with P[W
i
= 1] = P[W
i
= 1] = 1/2.
1.7. INFERENCE AND THE WILCOXON SIGNED-RANK NORM 37
We can now easily derive the null distribution theory of T(0) which we summarize in the
following theorems. Details are given in Exercise 1.12.16.
Theorem 1.7.1. Under H
0
and the symmetry of h(x),
T(0) is distribution free and its distribution is symmetric (1.7.4)
E
0
[T(0)] = 0 (1.7.5)
Var
0
(T(0)) =
n(n + 1)(2n + 1)
6
(1.7.6)
T(0)
_
Var
0
(T(0))
has an asymptotically N(0, 1) distribution . (1.7.7)
The exact distribution of T(0) cannot be found in closed form. We do, however, have
the following recursion formula; see Exercise 1.12.17.
Theorem 1.7.2. Consider the version of the signed-rank test statistics given by T
+
, ( 1.3.28).
Let p
n
(k) = P[T
+
= k] for k = 0, . . . ,
n(n+1)
2
. Then
p
n
(k) =
1
2
[p
n1
(k) + p
n1
(k n)] , (1.7.8)
where
p
0
(0) = 1 ; p
0
(k) = 0 for k ,= 0; and p
0
(k) = 0 for k < 0 .
Using this formula algorithms can be developed which obtain the null distribution of
the signed-rank test statistic. The moment generating function can also be inverted to nd
the null distribution; see Hettmansperger(1984a, Section 2.2). As discussed in Section 1.3.1,
software is now available which computes critical values and p-values of the null distribution.
Theorem 1.7.1 justies the condence interval for given in display ( 1.3.30); i.e, the
(1)100% condence interval given by [W
(k+1)
, W
(((n(n+1))/2)k)
) where W
(i)
denotes the ith
ordered Walsh average and P(T
+
(0) k) = /2. Based on ( 1.7.7), k can be approximated
as k n(n+1)/4.5z
/2
[n(n+1)(2n+1)/24]
1/2
. As noted in Section 1.3.1, the computation
of the estimate and condence interval can be obtain by our R function onesampwil or the
R intrinsic function wilcox.test.
1.7.2 Statistical Properties
From our earlier analysis of the statistical properties of the L
1
and L
2
methods we see that
Pitman Regularity is crucial. In particular, we need to compute the Pitman ecacy which
determines the asymptotic variance of the estimate, the asymptotic local power of the test,
and the asymptotic length of the condence interval. In the following theorem we show that
the weighted L
1
gradient function is Pitman Regular and determine the ecacy. Then we
make some preliminary eciency comparisons with the L
1
and L
2
methods.
38 CHAPTER 1. ONE SAMPLE PROBLEMS
Theorem 1.7.3. Suppose that h is symmetric and that
_
h
2
(x)dx < . Let
T() =
2
n(n + 1)

ij
sgn
_
x
i
+ x
j
2

_
.
Then the conditions of Denition 1.5.3 are satised and, thus, T() is Pitman Regular.
Moreover, the Pitman ecacy is given by
c =

12
_

h
2
(x)dx . (1.7.9)
Proof. Since we have the L
1
norm applied to the Walsh averages, the estimating function
is a nonincreasing step function with steps at the Walsh averages. Hence, ( 1.5.7) holds.
Next note that h(x) = h(x) and, hence,
() = E

T(0) =
2
n + 1
E

sgn(X
1
) +
n 1
n + 1
E

_
sgn
_
X
1
+ X
2
2
__
.
Now
E

sgnX
1
=
_
sgn(x + )h(x)dx = 1 2H() ,
and
E

sgn(X
1
+ X
2
)/2 =
_ _
sgn[(x + y)/2 +]h(x)h(y)dxdy =
_
[1 2H(2 y)]h(y)dy .
Dierentiate with respect to and set = 0 to get

(0) =
2h(0)
n + 1
+
4(n 1)
n + 1
_

h
2
(y)dy 4
_
h
2
(y) dy .
The niteness of the integral is sucient to ensure that the derivative can be passed through
the integral; see Hodges and Lehmann (1961) or Olshen (1967). Hence, ( 1.5.8) also holds.
We next establish Condition ( 1.5.9). Since
T() =
2
n(n + 1)
n

i=1
sgn(X
i
) +
2
n(n + 1)

<j
sgn
_
X
i
+ X
j
2

_
,
the rst term is of smaller order and we need only consider the second term. Now, for b > 0,
let
V

=
2
n(n + 1)

i<j
_
sgn
_
X
i
+ X
j
2
n
1/2
b
_
sgn
_
X
i
+ X
j
2
__
=
4
n(n + 1)

i<j
I
_
0 <
X
i
+ X
j
2
< n
1/2
b
_
.
1.7. INFERENCE AND THE WILCOXON SIGNED-RANK NORM 39
Hence,
nVar(V

) =
16n
n
2
(n + 1)
2
E

i<j

s<t
(I
ij
I
st
EI
ij
EI
st
) ,
where I
ij
= I(0 < (x
i
+ x
j
)/2 < n
1/2
b). This becomes
nVar(V

) =
16n
2
(n 1)
2n
2
(n + 1)
2
Var(I
12
) +
16n
2
(n 1)(n 2)
2n
2
(n + 1)
2
[EI
12
I
13
EI
12
EI
13
]
The rst term tends to zero since it behaves like 1/n. In the second term, consider [EI
12
I
13

EI
12
EI
13
[ EI
12
+ E
2
I
12
= EI
12
(1 +EI
12
). Now, as n ,
EI
12
= P
_
0 <
X
i
+ X
j
2
< n
1/2
b
_
=
_
[H(2n
1/2
b x) H(x)]h(x)dx 0
Hence, by Theorem 1.5.6, Condition ( 1.5.9) is true. Finally, asymptotic normality of the
null distribution is established in Theorem 1.7.1 which also yields nVar
0
T(0) 4/3 =
2
(0).
It follows that the Pitman ecacy is
c =
4
_
h
2
(y)dy
_
4/3
=

12
_
h
2
(y)dy .
For future reference we display the asymptotic linearity result:
T()
_
n(n + 1)(2n + 1)/6
=
T(0)
_
n(n + 1)(2n + 1)/6

12
_

h
2
(x) dx + o
p
(1) , (1.7.10)
for

n[[ B, where B > 0.
An immediate consequence of this theorem and Theorem 1.5.7 is that

)
D
Z N
_
0, 1/12
__
h
2
(t)dt
_
2
_
, (1.7.11)
and we thus have the limiting distribution of the median of the Walsh averages. Exercise
1.12.20 shows that
_
h
2
(t) dt < , when h has nite Fisher information.
From our general discussion, a simple estimate of the standard error of the median of
the Walsh averages is proportional to the length of a distribution free condence interval.
Consider the (1 )100% condence interval given by [W
(k+1)
, W
(((n(n+1))/2)k)
) where W
(i)
denotes the ith ordered Walsh average and P(T
+
(0) k) = /2. Then by expression
(1.5.27), a consistent estimate of the SE of the median of the Walsh averages (medWA) is
SE(medWA) =
W
(((n(n+1))/2)k)
W
(k+1)
2z
/2
. (1.7.12)
Our R function onesampwil computes this standard error for general , (default is set at
0.05). We will have more to say about this particular c in the next chapter where we will
40 CHAPTER 1. ONE SAMPLE PROBLEMS
encounter it in the two-sample location model and later in the linear model, where a better
estimator of this SE is presented.
From Example 1.5.3 and Denition 1.5.4, we have the asymptotic relative eciency
between the signed-rank Wilcoxon process and the L
2
process is given by
e(Wilcoxon, L
2
) = 12
2
h
__
h
2
(x) dx
_
2
, (1.7.13)
where h is the underlying density with variance
2
h
.
In the following example, we consider the contaminated normal distribution and then
nd the eciency of the rank methods relative to the L
1
and L
2
methods.
Example 1.7.1. Asymptotic Relative Eciency for Contaminated Normal Distributions
Let f

(x) denote the pdf of the contaminated normal distribution used in Example 1.5.4;
the proportion of contamination is and the variance of the contaminated part is 9. A
straight forward computation shows that
_
f
2

(y)dy =
(1 )
2
2

+

2
6

+
(1 )

,
and we use this in the formula for c given above. The ecacies for the L
1
and L
2
are given
in Example 1.5.4. We rst consider the special case of = 0 corresponding to an underlying
normal distribution. In this case we have for the rank methods c
2
R
= 12/(4) = 3/ = .955,
for the L
1
methods c
2
1
= 2/ = .637, and for the L
2
methods c
2
2
= 1. We have already seen
that the eciency e
normal
(L
1
, L
2
) = c
2
1
/c
2
2
= .637 from the rst line of Table 1.5.1. We
now have
e
normal
(Wilcoxon, L
2
) = 3/
.
= .955 and e
normal
(Wilcoxon, L
1
) = 1.5 . (1.7.14)
The eciency of the rank methods relative to the L
2
methods is extraordinary. It says that
even at the distribution for which the t test is uniformly most powerful, the Wilcoxon signed
rank test is almost as ecient. This means that replacing the values of the observations by
their ranks (retaining only the order information) does not aect the statistical properties of
the test. This was considered highly nonintuitive in the 1950s since nonparametric methods
were thought of as quick and dirty. Now they must be considered highly ecient competitors
of the optimal methods and, in addition, they are more robust than the optimal methods.
This provides powerful motivation for the continued study of rank methods in other statistical
models such as the two-sample location model and the linear model. The early work in the
area of eciency of rank methods is due largely to Lehmann and his students. See Lehmann
and Hodges (1956, 1961) for two important early papers and Lehmann (1975, Appendix) for
more discussion.
We complete this example with a table of eciencies of the rank methods relative to the
L
1
and L
2
methods for the contaminated normal model with = 3. Table 1.7.1 shows these
1.7. INFERENCE AND THE WILCOXON SIGNED-RANK NORM 41
Table 1.7.1: Eciencies of the Rank, L
1
, and L
2
methods for the Contaminated Normal
Distribution.
e(L
1
, L
2
) e(R, L
1
) e(R, L
2
)
.00 .637 1.500 .955
.01 .678 1.488 1.009
.03 .758 1.462 1.108
.05 .833 1.436 1.196
.10 1.000 1.373 1.373
.15 1.134 1.320 1.497
eciencies and extends Table 1.5.1. As increases the weight in the tails of the distribution
also increases. Note that the eciencies of both the L
1
and rank methods relative to L
2
methods increase with . On the other hand, the eciency of the rank methods relative to
the L
1
methods decreases slightly. The rank methods are still more ecient; however, this
illustrates the fact that the L
1
methods are good for heavy tailed distributions. The overall
implication of this example is that the L
2
methods, such as the sample mean, the t test and
condence interval, are not particularly ecient once the underlying distribution departs
from the normal distribution. Further, the rank methods such as the Wilcoxon signed rank
test, condence interval, and the median of the Walsh averages are surprisingly ecient,
even at the normal distribution. Note that the rank methods are more ecient than L
2
methods even for 1% contamination.
Finally, the following theorem shows that the Wilcoxon signed rank statistic never loses
much eciency relative to the t-statistic. Let T
s
denote the family of distributions which
have symmetric densities and nite Fisher information; see Exercise 1.12.20.
Theorem 1.7.4. Let X
1
, . . . , X
n
be a random sample from H T
S
. Then
inf
Fs
e(Wilcoxon, L
2
) = 0.864 . (1.7.15)
Proof: By ( 1.7.13), e(Wilcoxon, L
2
) = 12
2
h
__
h
2
(x) dx
_
2
. If
2
h
= then e(Wilcoxon, L
2
) >
.864; hence, we can restrict attention to H T
s
such that
2
h
< . As Exercise 1.12.21
indicates e(Wilcoxon, L
2
) is location and scale invariant, so, we can further assume that
h is symmetric about 0 and
2
h
= 1. The problem, then, is to minimize
_
h
2
subject to
_
h =
_
x
2
h = 1 and
_
xh = 0. This is equivalent to minimizing
_
h
2
+ 2b
_
x
2
h 2ba
2
_
h , (1.7.16)
where a and b are positive constants to be determined later. We now write ( 1.7.16) as
_
_
h
2
+ 2b(x
2
a
2
)h

=
_
|x|a
_
h
2
+ 2b(x
2
a
2
)h

+
_
|x|>a
_
h
2
+ 2b(x
2
a
2
)h

. (1.7.17)
42 CHAPTER 1. ONE SAMPLE PROBLEMS
First complete the square on the rst term on the right side of ( 1.7.17) to get
_
|x|a
_
h + b(x
2
a
2
)

_
|x|a
b
2
(x
2
a
2
)
2
(1.7.18)
Now ( 1.7.17) is equal to the two terms of ( 1.7.18) plus the second term on the right side of
( 1.7.17). We can now write the density that minimizes ( 1.7.16).
If [x[ > a take h(x) = 0, since x
2
> a
2
, and if [x[ a take h(x) = b(a
2
x
2
), since the
integral in the rst term of ( 1.7.18) is nonnegative. We can now determine the values of a
and b from the side conditions. From
_
h = 1. we have
_
a
a
b(a
2
x
2
) dx = 1 ,
which implies that a
3
b =
3
4
. Further, from
_
x
2
h = 1 we have
_
a
a
x
2
b(a
2
x
2
) dx = 1 ,
from which a
5
b =
15
4
. Hence solving for a and b yields a =

5 and b = 3

5/100. Now
_
h
2
=
_

5

5
_
3

5
100
(5 x
2
)
_
2
dx =
3

5
25
,
which leads to the result,
inf
Fs
e(Wilcoxon, L
2
) = 12
_
3

5
25
_
2
=
108
125
.
= 0.864 .
1.7.3 Robustness Properties
We complete this section with a discussion of the breakdown point of the estimate and test
and a heuristic derivation of the inuence function of the estimate. In Example 1.6.1 we
discussed the breakdown of the sample median and mean. In those cases we saw that the
median is the most resistant while the mean is the least resistant. In Exercise 1.12.13
you are asked to show that the breakdown point of the median of the Walsh averages, the
R-estimate, is roughly .29. Our next result gives the inuence function

.
Theorem 1.7.5. The inuence function of

= med
ij
(x
i
+ x
j
)/2 is given by:
(x) =
H(x) 1/2
_

h
2
(t)dt
1.7. INFERENCE AND THE WILCOXON SIGNED-RANK NORM 43
We sketch a derivation of this result, here. A rigorous development is oered in Section
A.5 of the Appendix. From Theorems 1.7.3 and 1.5.6 we have
n
1/2
T()/(0) n
1/2
T(0)/(0) cn
1/2
,
and

n
T(0)/c(0) ,
where (0) = (4/3)
1/2
and c = (12)
1/2
_
h
2
(t)dt. Make these substitutions to get

n
.
=
1
n(n + 1)2
_
h
2
(t)dt

ij
sgn
_
X
i
+ X
j
2
_
Now introduce an outlier x
n+1
= x

and take the dierence between

n+1
and

n
. The result
is
2
_
h
2
(t)dt[(n + 2)

n+1
n

n
]
.
=
1
(n + 1)
n+1

i=1
sgn
_
x
i
+ x

2
_
.
We can replace n + 2 and n + 1 by n where convenient without eecting the asymptotics.
Using the symmetry of the density of H, we have
1
n
n

i=1
sgn
_
x
i
+ x

2
_
.
= 1 2H
n
(x

) 1 2H(x

) = 2H(x

) 1 .
It now follows that (n + 1)(

n+1

n
)
.
= (x

), given in the statement of the theorem; see

the discussion of the inuence function in Section 1.6.
Note that we have a bounded inuence function since the cdf H is a bounded function.
Further, it is continuous, unlike the inuence function of the median. Finally, as an additional
check, note that E
2
(X) = 1/12[
_
h
2
(t)dt]
2
= 1/c
2
, the asymptotic variance of n
1/2

.
Let

c
= med
i,j
(X
i
cX
j
)/(1 c) for 1 c < 1 . This extension of the Hodges-
Lehmann estimate, ( 1.3.25), has some very interesting robustness properties for c > 0. The
inuence function of

c
is not only bounded but also redescending, similar to the most robust
M-estimates. In addition,

c
has 50% breakdown. For a complete discussion of this estimate
see Maritz, Wu and Staudte (1977) and Brown and Hettmansperger (1994).
In the next theorem we develop the test breakdown for the Wilcoxon signed rank test.
Theorem 1.7.6. The rejection breakdown, Denition 1.6.2, for the Wilcoxon signed rank
test is

n
.
= 1
_
1
2

z

(3n)
1/2
_
1/2
1
1
2
1/2
.
= .29 .
44 CHAPTER 1. ONE SAMPLE PROBLEMS
Table 1.7.2: Rejection breakdown values for size = .05 tests.
Signed-rank
n Sign t Wilcoxon
10 .71 .27 .57
13 .70 .21 .53
18 .67 .15 .48
30 .63 .09 .43
100 .58 .03 .37
.50 0 .29
Proof. Consider the form T
+
(0) =

I[(x
i
+x
j
)/2 > 0], where the double sum is over
all i j. The asymptotically size test rejects H
0
: = 0 in favor of H
A
: > 0 when
T
+
(0) c
.
= n(n +1)/4 +z

[n(n +1)(2n+1)/24]
1/2
. Now we must guarantee that T
+
(0) is
in the critical region. This requires at least c positive Walsh averages. Let x
(1)
. . . x
(n)
be the ordered observations. Then contamination of x
(n)
results in n contaminated Walsh
averages, namely those Walsh averages that include x
(n)
. Contamination of x
(n1)
yields n1
additional contaminated Walsh averages. When we proceed in this way, contamination of the
b ordered values x
(n)
, . . . , x
(nb+1)
yields n+(n1)+...+(nb+1) = [n(n+1)/2][(nb)(n
b+1)/2] contaminated Walsh averages. We now set [n(n+1)/2][(nb)(nb+1)/2]
.
= c and
solve the resulting quadratic for b. We must solve b
2
(2n + 1)b + 2c
.
= 0. The appropriate
root in this case is
b
.
=
2n + 1 [(2n + 1)
2
8c]
1/2
2
.
Substituting the approximate critical value for c, dividing by n, and ignoring higher order
terms, leads to the stated result.
Table 1.7.2 displays the nite rejection breakdowns of the Wilcoxon signed rank test
over the same sample sizes as the rejection breakdowns of the sign test given in Table 1.6.1.
For convenience we have reproduced the results for the sign and t tests, also. The rejection
breakdown for the Wilcoxon test converges from above to the estimation breakdown of .29.
The Wilcoxon test is more resistant than the t-test but not as resistant as the simple sign
test. It is interesting to note that from the discussion of eciency, it is clear that we can now
achieve high eciency and not pay the price in lack of robustness. The rank based methods
seem to be a very attractive alternative to the highly resistant but relatively inecient (at
the normal model) L
1
methods and the highly ecient (at the normal model) but nonrobust
L
2
methods.
1.8 Inference Based on General Signed-Rank Norms
In this section, we develop properties for a generalized sign-rank process. It includes the L
1
and the weighted L
1
as special cases. The development is similar to that of the weighted L
1
1.8. INFERENCE BASED ON GENERAL SIGNED-RANK NORMS 45
so a brief sketch suces. For x 1
n
, consider the function,
|x|

+ =
n

i=1
a
+
(R[x
i
[)[x
i
[ , (1.8.1)
where the scores a
+
(i) are generated as a
+
(i) =
+
(i/(n + 1)) for a positive valued, non-
decreasing, square-integrable function
+
(u) dened on the interval (0, 1). The proof that
| |

+ is a norm on 1
n
follows in the same way as in the weighted L
1
case; see the proof of
Theorem 1.3.2 and Exercise 1.12.22. The gradient function associated with this norm is
T

+() =
n

i=1
a
+
(R[X
i
[)sgn(X
i
) . (1.8.2)
Note that it reduces to the L
1
norm if
+
(u) 1 and the weighted L
1
, Wilcoxon signed-rank,
norm if
+
(u) = u. A family of simple score functions between the weighted L
1
and the L
1
are of the form

+
c
(u) =
_
u 0 < u < c
c 0 u < 1
, (1.8.3)
where the parameter c is between 0 and 1. These scores were proposed by Policello and
Hettmansperger (1976); see, also, Hogg (1974). The frequently used normal scores are
generated by the score function,

(u) =
1
_
u + 1
2
_
, (1.8.4)
where is the standard normal distribution function. Note that
+

(u) is the inverse cdf (or

quantile function) of the absolute value of a standard normal random variable. The normal
scores were originally proposed by Fraser (1957).
For the location model ( 1.2.1), the estimate of based on the norm ( 1.8.1) is the value
of which minimizes the distance |X1|

+ or equivalently solves the equation

+()
.
= 0 . (1.8.5)
A simple tracing algorithm suces to compute

. As Exercise 1.12.18 shows T

+() is a
decreasing step function of which steps down only at the Walsh averages. So rst sort
the Walsh averages. Next select a starting value

(0)
, such as median of the Walsh averages
which corresponds to the signed-rank Wilcoxon scores. Then proceed through the sorted
Walsh averages left or right, depending on whether or not T

(0)
) is negative or positive.
The algorithm continues until the sign of T

+() changes. This is the algorithm behind

our RBR function onesampr which solves equation (1.8.5) for general scores functions; see
Exercise 1.12.33. Also, the linear searches discussed in Chapter 3, Section 3.7.3, can be used
to compute

.
To determine the corresponding functional, note that we can write R[X
i
[ = #
j

[X
i
[ X
j
[X
i
[ + . Let H
n
denote the empirical distribution function of the
46 CHAPTER 1. ONE SAMPLE PROBLEMS
sample X
1
, . . . , X
n
and let H

n
denote the left limit of H
n
. We can then write the dening
equation of

as,
_

+
(H
n
([x [ + ) H

n
( [x [))sgn(x ) dH
n
(x) = 0 ,
which converges to
() =
_

+
(H([x [ + ) H( [x [))sgn(x ) dH(x) = 0 . (1.8.6)
For convenience, a second representation of () can be obtained if we extend
+
(u) to the
interval (1, 0) as follows:

+
(t) =
+
(t) , for 1 < t < 0 . (1.8.7)
Using this extension, the functional = T(H) is the solution of
() =
_

+
(H(x) H(2 x)) dH(x) = 0. (1.8.8)
Compare expressions ( 1.8.8) and ( 1.3.26).
The level test of the hypotheses ( 1.3.6) based on T

+(0) is
Reject H
0
in favor of H
A
, if [T

+(0)[ c , (1.8.9)
where c solves P
0
[[T

+(0)[ c] = . We briey develop the statistical and robustness

properties of this test and the estimator

+ in the next two subsections.

1.8.1 Null Properties of the Test
For this subsection on null properties and the following subsection on eciency properties of
the test ( 1.8.9), we will assume that the sample X
1
, . . . , X
n
follows the symmetric location
model, ( 1.7.3), with common symmetric density function h(x) = f(x ), where f(x) is
symmetric about 0. Let H(x) denote the distribution function associated with h(x).
As in Section 1.7.1, we can express T

+(0) in terms of the anti-ranks as,

+(0) =

a
+
(R([X
i
[))sgn(X
i
) =

a
+
(j)sgn(X
D
j
) =

a
+
(j)W
j
; (1.8.10)
see the corresponding expression ( 1.3.20) for the weighted L
1
norm. Recall that under H
0
and the symmetry of h(x), the variables W
1
, . . . , W
n
are iid with P[W
i
= 1] = P[W
i
= 1] =
1/2, (Lemma 1.7.2). Thus we immediately have that T

+(0) is distribution free under H

0
with mean and variance
E
0
[T

+(0)] = 0 (1.8.11)
Var
0
[T

+(0)] =
n

i=1
a
+2
(i) . (1.8.12)
1.8. INFERENCE BASED ON GENERAL SIGNED-RANK NORMS 47
Tables can be constructed for the null distribution of T

+(0) from which critical values, c,

can be obtained to complete the test described in ( 1.8.9).
For the asymptotic null distribution of T

+(0), the following additional assumption on

the scores will be sucient:
max
j
a
+2
(j)

i=1
a
+2
(i)
0 . (1.8.13)
Because
+
is square integrable, we have
1
n

i=1
a
+2
(i)
2

+ =
_
1
0
(
+
(u))
2
du , 0 <
2

+ < , (1.8.14)
i.e., the left side is a Riemann sum of the integral. Under these assumptions and the sym-
metric location model, Corollary A.1.1 of the Appendix can be used to show that the null
distribution of T

+(0) is asymptotically normal; see, also, Exercise 1.12.16. Hence, an

asymptotic level test is
Reject H
0
in favor of H
A
, if

+
(0)

z
/2
. (1.8.15)
An approximate (1 )100% condence interval for based on the process T

+() is the
interval (

+
,L
,

+
,U
) such that
T

+
,L
) = z
/2

+ and T

+
,U
) = z
/2

+ ; (1.8.16)
see ( 1.5.26). These equations can be solved by the simple tracing algorithm discussed
immediately following expression (1.8.5).
1.8.2 Eciency and Robustness Properties
We derive the eciency properties of the analysis described above by establishing the four
conditions of Denition 1.5.3 to show that the process T

+() is Pitman regular. Assume

that
+
(u) is dierentiable. First dene the quantity
h
as

h
=
_
1
0

+
(u)
+
h
(u) du , (1.8.17)
where

+
h
(u) =
h

_
H
1
_
u+1
2
__
h
_
H
1
_
u+1
2
__ . (1.8.18)
As discussed below,
+
h
(u) is called the optimal score function. We assume that our
scores are such that
h
> 0.
Since it is the negative of a gradient of a norm, T

+() is nondecreasing in ; hence, the

rst condition, ( 1.5.7), holds. Let T

+(0) = T

+(0)/n and consider

+() = E

+(0)] = E
0
[T

+()] .
48 CHAPTER 1. ONE SAMPLE PROBLEMS
Note that T

+() converges in probability to () in ( 1.8.8). Hence,

+() = ()
where in ( 1.8.8) H is a distribution function with point of symmetry at 0, without loss of
generality. If we dierentiate () and set = 0, we get

+(0) = 2
_

+
(2H(x) 1)h(x) dH(x)
= 4
_

0

+
(2H(x) 1)h
2
(x) dx =
_
1
0

+
(u)
+
h
(u) du > 0 , (1.8.19)
where the third equality in ( 1.8.19) follows from an integration by parts. Hence the second
Pitman regularity condition holds.
For the third condition, ( 1.5.9), the asymptotic linearity for the process T

+(0) is given
in Theorem A.2.11 of the Appendix. We restate the result here for reference:
P
0
_
sup

n||B

n
T

+()
1

n
T

+(0) +
h

_
0 , (1.8.20)
for all > 0 and all B > 0. Finally the fourth condition, ( 1.5.10), concerns the asymptotic
null distribution which was discussed above. The null variance of T

+(0)/

n is given by
expression ( 1.8.12). Therefore the process T

+() is Pitman regular with ecacy given by

+ =
_
1
0

+
(u)
+
h
(u) du
_
_
1
0
(
+
(u))
2
du
=
2
_

+
(2H(x) 1)h
2
(x) dx
_
_
1
0
(
+
(u))
2
du
. (1.8.21)
As our rst result, we obtain the asymptotic power lemma for the process T

+(). This,
of course, follows immediately from Theorem 1.5.8 so we state it as a corollary.
Corollary 1.8.1. Under the symmetric location model,
P
n
_
T

+(0)

+
z

_
1 (z

+) , (1.8.22)
for the sequence of hypotheses
H
0
: = 0 versus H
An
: =
n
=

n
for

> 0 .
Based on Pitman regularity, the asymptotic distribution of the the estimate

+ is

+ )
D
N(0,
2

+) , (1.8.23)
where the scale parameter

+ is dened by the reciprocal of ( 1.8.21),

+ = c
1

+
=

+
_
1
0

+
(u)
+
h
(u) du
. (1.8.24)
1.8. INFERENCE BASED ON GENERAL SIGNED-RANK NORMS 49
Using the general result of Theorem 1.5.9, the length of the condence interval for ,
(1.8.16), can be used to obtain a consistent estimate of

+. This in turn can be used to

obtain a consistent estimate of the standard error of

+; see Exercise ??.

The asymptotic relative eciency between two estimates or two tests based on score
functions
+
1
(u) and
+
2
(u) is the ratio
e(
+
1
,
+
2
) =
c
2

+
1
c
2

+
2
=

+
2

+
1
. (1.8.25)
This can be used to compare dierent tests. For a specic distribution we can determine
the optimum scores. Such a score should make the scale parameter

+ as small as possible.
This scale parameter can be written as,
c

+ =
1

+
=
_
_
_
_
1
0

+
(u)
+
h
(u) du

+
_
_
1
0

2
h
(u) du
_
_
_

_
1
0

2
h
(u) du . (1.8.26)
The quantity in brackets is a correlation coecient; hence, to minimize the scale parameter

+, we need to maximize the correlation coecient which can be accomplished by selecting

the optimal score function given by

+
(u) =
+
h
(u) ,
where
+
h
(u) is given by expression ( 1.8.18). The quantity
_
_
1
0
(
+
h
(u))
2
du is the square
root of Fisher information; see Exercise 1.12.23. Therefore for this choice of scores the
estimate

+
h
is asymptotically ecient. This is the reason for calling the score function

+
h
the optimal score function.
It is shown in Exercise 1.12.24 that the optimal scores are the normal scores if h(x) is
a normal density, the Wilcoxon weighted L
1
scores if h(x) is a logistic density, and the L
1
scores if h(x) is a double exponential density. It is further shown that the scores generated
by ( 1.8.3) are optimal for symmetric densities with a logistic center and exponential tails.
From Exercise 1.12.24, the eciency of the normal scores methods relative to the least
squares methods is
e(NS, LS) =
__

f
2
(x)
(
1
(F(x)))
dx
_
2
, (1.8.27)
where F T
S
, the family of symmetric distributions with positive, nite Fisher information
and =

is the N(0, 1) pdf.

We now prove a result similar to Theorem 1.7.4. We prove that the normal scores
methods always have eciency at least equal to one relative to the LS methods. Further, it
is only equal to 1 at the normal distribution. The result was rst proved by Cherno and
Savage (1958); however, the proof presented below is due to Gastwirth and Wol (1968).
50 CHAPTER 1. ONE SAMPLE PROBLEMS
Theorem 1.8.1. Let X
1
, . . . , X
n
be a random sample from F T
s
. Then
inf
Fs
e(NS, LS) = 1 , (1.8.28)
and is only equal to 1 at the normal distribution.
Proof: If
2
f
= then e(NS, LS) > 1; hence, we suppose that
2
f
= 1. Let e = e(NS, LS).
Then from ( 1.8.27) we can write

e = E
_
f(X)
(
1
(F(X)))
_
= E
_
1
(
1
(F(X))) /f(X)
_
.
Applying Jensens inequality to the convex function h(x) = 1/x, we have

e
1
E [ (
1
(F(X))) /f(X)]
.
Hence,
1

e
E
_
(
1
(F(X)))
f(X)
_
=
_

_

1
(F(x))
_
dx .
We now integrate by parts, using u = (
1
(F(x))), du =

(
1
(F(x))) f(x) dx/(
1
(F(x))) =

1
(F(x))f(x) dx since

(x)/(x) = x. Hence, with dv = dx, we have

1
(f(x))
_
dx = x
_

1
(F(x))
_

+
_

x
1
(F(x))f(x) dx . (1.8.29)
Now transform x (
1
(F(x))) into F
1
((w))(w) by rst letting t = F(x) and then
w =
1
(t). The integral
_
F
1
((w))(w) dw =
_
xf(x) dx < , hence the limit of the
integrand must be 0 as x . This implies that the rst term on the right side of
( 1.8.29) is 0. Hence applying the Cauchy-Schwarz inequality,
1

e

_

x
1
(F(x))f(x) dx
=
_

x
_
f(x)
1
(F(x))
_
f(x) dx

x
2
f(x) dx
_

1
(F(x))
_
2
f(x) dx
_
1/2
= 1 ,
1.8. INFERENCE BASED ON GENERAL SIGNED-RANK NORMS 51
since
_
x
2
f(x) dx = 1 and
_
x
2
(x) dx = 1. Hence e
1/2
1 and e 1, which completes
the proof. It should be noted that the inequality is strict except at the normal distribution.
Hence the normal scores are strictly more ecient than the LS procedures except at the
normal model where the asymptotic relative eciency is 1.
The inuence function for

+ is derived in Section A.5 of the Appendix. It is given

by
(t,

+) =

+
(2H(t) 1)
4
_

0

+
(2H(x) 1)h
2
(x) dx
. (1.8.30)
Note, also, that E[
2
(X,

+)] =
2

+
as a check on the asymptotic distribution of

+. Note
that the inuence function is bounded provided the score function is bounded. Thus the
estimates based on the scores discussed in the last paragraph are all robust except for the
normal scores. In the case of the normal scores, when H(t) = (t), the inuence function is
(t) =
1
(t); see Exercise 1.12.25.
The asymptotic breakdown of the estimate

+ is

given by
_
1

+
(u) du =
1
2
_
1
0

+
(u) du . (1.8.31)
We provide a heuristic argument for ( 1.8.31); for a rigorous development see Huber (1981).
Recall Denition 1.6.1. The idea is to corrupt enough data so that the estimating equation,
( 1.8.5), no longer has a solution. Suppose that [n] observations are corrupted, where []
denotes the greatest integer function. Push the corrupted observations out towards + so
that
n

i=[(1)n]+1
a
+
(R([X
i
[))sgn(X
i
) =
n

i=[(1)n]+1
a
+
(i) .
This restrains the estimating function from crossing the horizontal axis provided

[(1)n]

i=1
a
+
(i) +
n

i=[(1)n]+1
a
+
(i) > 0 .
Replacing the sums by integrals in the limit yields
_
1
0

+
(u) du >
_
1
1

+
(u) du .
Now use the fact that
_
1
0

+
(u) du +
_
1
1

+
(u) du =
_
1
0

+
(u) du
and that we want the smallest possible to get ( 1.8.31).
Example 1.8.1. Breakdowns of Estimates Based on Wilcoxon and Normal Scores
52 CHAPTER 1. ONE SAMPLE PROBLEMS
Table 1.8.1: Empirical AREs Based on n = 30 and 10,000 simulations.
Estimators Normal Contaminated Normal
NS, LS 0.983 1.035
Wil, LS 0.948 1.007
NS, WIL 1.037 1.028
For

= med(X
i
+X
j
)/2,
+
(u) = u and it follows at once that

= 1 (1/

2)
.
= .293.
For the estimate based on the normal scores where
+
(u) is given by ( 1.8.4), expression
( 1.8.31) becomes
exp
_

1
2
_

1
_
1

2
__
2
_
=
1
2
and

= 2(1 (

log 4))
.
= .239. Hence we have the unusual situation that the estimate
based on the normal scores has positive breakdown but an unbounded inuence curve.
Example 1.8.2. Small Sample Empirical AREs of Estimator Based on Normal Scores
As discussed above, the ARE between the normal scores estimator and the sample mean
is 1 at the normal distribution. This is an asymptotic result. To answer the question about
this eciency at small samples, we conducted a small simulation study. We set the sample
size at n = 30 and ran 10,000 simulations from a normal distribution. We also selected the
contaminated normal distribution with = 0.01 and
c
= 3, which is a very mild contami-
nated distribution. We consider the three estimators: rank-based estimator based on normal
scores (NS), rank-based estimator based on Wilcoxon scores (WIL), and the sample mean
(LS). We used the RBR command onesampr(x,score=phinscp,grad=spnsc,maktable=F)
to compute the normal scores estimator; see Exercise 1.12.29. As our empirical ARE we used
the ratios of empirical mean square errors of the three estimators. Table 1.8.1 summarizes
the results. The empirical AREs for the NS and WIL estimators, at the normal, are close to
their asymptotic counterparts. Note that the NS estimator results in only a loss of less than
2% eciency over LS. For this small amount of contamination the NS estimator dominates
the LS estimator. It also dominates the Wilcoxon estimator. In Exercise 1.12.29, the reader
is asked to extend this study to other situations.
Example 1.8.3. Shoshoni Rectangles, Continued.
The next display shows the normal scores analysis of the Shoshoni Rectangles Data; see
Example 1.4.2. We conducted the same analysis as we did for the sign test and tratditional
t-test discussed in Example 1.4.2. Note that the call to the RBR function onnesampr with
the values score=phinscp,grad=spnsc computes the normal scores analysis.
> onesampr(x,theta0=.618,alpha=.10,score=phinscp,grad=spnsc)
Test of Theta = 0.618 Alternative selected is 0
Test Stat. Tphi+ is 7.809417 Standardized (z) Test-Stat. 1.870514 and p-vlaue 0.06141252
1.9. RANKED SET SAMPLING 53
Estimate 0.6485 SE is 0.02502799
90 % Confidence Interval is ( 0.61975 , 0.7 )
Estimate of the scale parameter tau 0.1119286
While not as sensitive to the outliers as the traditional analysis, the outliers still had some
inuence on the normal scores analysis. The normal scores test rejects the null hypothesis
at level 0.06 while the 90% condence interval just misses the value 0.618.
1.9 Ranked Set Sampling
In this section we discuss an alternative to simple random sampling (SRS) called ranked set
sampling (RSS). This method of data collection is useful when measurements are destructive
or expensive while ranking of the data is relatively easy. Johnson, Nussbaum, Patil and
Ross (1996) give an interesting application to environmental sampling. As a simple example
consider the problem of estimating the mean volume of trees in a forest. To measure the
volume, we must destroy the tree. On the other hand, an expert may well be able to rank the
trees by volume in a small sample. The idea is to take a sample of size k of trees and ask the
expert to pick the one with smallest volume. This tree is cut down and the volume measured
and the other k 1 trees are returned to the population for possible future selection. Then
a new sample of size k is taken and the expert identies the second smallest which is then
cut down and measured. This is repeated until we have k measurements, having looked at
k
2
trees. This ends cycle 1. The measurements are represented as x
(1)1
. . . x
(k)1
where
the number in parentheses indicates an order statistic and the second number indicates the
cycle. We repeat the process for n cycles to get nk measurements:
x
(1)1
, . . . , x
(1)n
iid h
(1)
(t)
x
(2)1
, . . . , x
(2)n
iid h
(2)
(t)
.
.
.
.
.
.
.
.
.
x
(k)1
, . . . , x
(k)n
iid h
(k)
(t)
It is important to note that all nk measurements are independent but are identically
distributed only within each row. The density function h
(j)
(t) represents the pdf of the jth
order statistic from a sample of size k and is given by:
h
(j)
(t) =
k!
(j 1)!(k j)!
H
j1
(t)[1 H(t)]
kj
h(t)
We suppose the measurements are distributed as H(x) = F(x ) and we wish to make
a statistical inference concerning , such as an estimate, test, or condence interval. We will
illustrate the ideas on the L
1
methods since they are simple to work with. We also wish to
54 CHAPTER 1. ONE SAMPLE PROBLEMS
compute the eciency of the RSSL
1
methods relative to the SRSL
1
methods. We will see
that there is a substantial increase in eciency when using the RSS design. In particular,
we will compare the RRS methods to SRS methods based on a sample of size nk. The
RSS method was rst applied by McIntyre (1952) in measuring mean pasture yields. See
Hettmansperger (1995) for a development of the RSSL
1
methods. The most convenient
form of the RSS sign statistic is the number of positive measurements given by
S
+
RSS
=
k

j=1
n

i=1
I(X
(j)i
> 0) . (1.9.1)
Now note that S
+
RSS
can be written as S
+
RSS
=

S
+
(j)
where S
+
(j)
=

i
I(X
(j)i
> 0) has
a binomial distribution with parameters n and 1 H
(j)
(0). Further, S
+
(j)
, j = 1, . . . , k are
stochastically independent. It follows at once that
ES
+
RSS
= n
k

j=1
(1 H
(j)
(0)) (1.9.2)
VarS
+
RSS
= n
k

j=1
(1 H
(j)
(0))H
(j)
(0) .
With k xed and n , it follows from the independence of S
+
(j)
, j = 1, . . . , k that
(nk)
1/2
S
+
RSS
n
k

j=1
(1 H
(j)
(0)
D
Z n(0,
2
) , (1.9.3)
and the asymptotic variance is given by

2
= k
1
k

j=1
[1 H
(j)
(0)]H
(j)
(0) =
1
4
k
1
k

j=1
(H
(j)
(0)
1
2
)
2
. (1.9.4)
It is convenient to introduce a parameter
2
= 1 (4/k)

(H
(j)
(0) 1/2)
2
, then
2
=
2
/4.
The reader is asked to prove the second equality above in Exercise 1.12.26. Using the
formulas for the pdfs of the order statistics it is straightforward to verify that
h(t) = k
1
k

j=1
h
(j)
(t) and H(t) = k
1
k

j=1
H
(j)
(t) .
We now consider testing H
0
: = 0 versus H
A
: ,= 0. The following theorem provides the
mean and variance of the RSS sign statistic under the null hypothesis.
Theorem 1.9.1. Under the assumption that H
0
: = 0 is true, F(0) = 1/2,
F
(j)
(0) =
k!
(j 1)!(k j)!
_
1/2
0
u
j1
(1 u)
kj
du
1.9. RANKED SET SAMPLING 55
Table 1.9.1: Values of F
(j)
(0), j = 1, . . . , k and
2
= 1 (4/k)

(F
(j)
(0) 1/2)
2
.
k: 2 3 4 5 6 7 8 9 10
1 .750 .875 .938 .969 .984 .992 .996 .998 .999
2 .250 .500 .688 .813 .891 .938 .965 .981 .989
3 .125 .313 .500 .656 .773 .856 .910 .945
4 .063 .188 .344 .500 .637 .746 .828
5 .031 .109 .227 .363 .500 .623
6 .016 .063 .145 .254 .377
7 .008 .035 .090 .172
8 .004 .020 .055
9 .002 .011
10 .001

2
.750 .625 .547 .490 .451 .416 .393 .371 .352
and
ES
+
RSS
= nk/2, and VarS
+
RSS
= 1/4 k
1

(F
(j)
(0) 1/2)
2
.
Proof. Use the fact that k
1

F
(j)
(0) = F(0) = 1/2, and the expectation formula
follows at once. Note that
F
(j)
(0) =
k!
(j 1)!(k j)!
_
0

F(t)
j1
(1 F(t))
kj
f(t)dt ,
and then make the change of variable u = F(t).
The variance of S
+
RSS
does not depend on H, as expected; however, its computation
requires the evaluation of the incomplete beta integral. Table 1.9.1 provides the values
of F
(j)
(0), under H
0
: = 0. The bottom line of the table provides the values of
2
=
1 (4/k)

(F
(j)
(0) 1/2)
2
, an important parameter in assessing the gain of RSS over SRS.
We will compare the SRS sign statistic S
+
SRS
based on a sample of nk to the RSS sign
statistic S
+
RSS
. Note that the variance of S
+
SRS
is nk/4. Then the ratio of variances is
VarS
+
RSS
/VarS
+
SRS
=
2
= 1 (4/k)

(F
(j)
(0) 1/2)
2
. The reduction in variance is given in
the last row of Table 1.9.1 and can be quite large.
We next show that the parameter is an integral part of the ecacy of the RSS L
1
methods. It is straight forward using the methods of Section 1.5 and Example 1.5.2 to
show that the RSS L
1
estimating function is Pitman regular. To compute the ecacy we
rst note that

S
RSS
= (nk)
1
k

j=1
n

i=1
sgn(X
(j)i
) = (nk)
1
[2S
+
RSS
nk] .
We then have at once that
(nk)
1/2

S
RSS
D
0
Z n(0,
2
) , (1.9.5)
56 CHAPTER 1. ONE SAMPLE PROBLEMS
and

(0) = 2f(0); see Exercise 1.12.27. See Babu and Koti (1996) for a development of the
exact distribution. Hence, the ecacy of the RSS L
1
methods is given by
c
RSS
=
2f(0)

=
2f(0)
1 (4/k)

k
j=1
(F
(j)
(0) 1/2)
2

1/2
.
We now summarize the inference methods and their eciency in the following:
1. The test. Reject H
0
: = 0 in favor of H
A
: > 0 at signicance level if
S
+
SRS
> (nk/2) z

(nk/4)
1/2
where, as usual, 1 (z

) = .
2. The estimate. (nk)
1/2
medX
(j)i

D
Z n(0,
2
/4f
2
(0)).
3. The condence interval. Let X

(1)
, . . . , X

(nk)
be the ordered values of X
(j)i
, j =
1, . . . , k and i = 1, . . . , n. Then [X

(m+1)
, X

(nkm)
] is a (1 )100% condence in-
terval for where P(S
+
SRS
m) = /2. Using the normal approximation we have
m
.
= (nk/2) z
/2
(nk/4)
1/2
.
4. Eciency. The eciency of the RSS methods with respect to the SRS methods is
given by e(RSS, SRS) = c
2
RSS
/c
2
SRS
=
2
. Hence, the reciprocal of the last line of
Table 1.9.1 provides the eciency values and they can be quite substantial. Recall
from the discussion following Denition 1.5.5 that eciency can be interpreted as
the ratio of sample sizes needed to achieve the same approximate variances, the same
approximate local power, and the same condence interval length. Hence, we write
(nk)
RSS
.
=
2
(nk)
SRS
. This is really the point of the RSS design. Returning to the
example of estimating the volume of wood in a forest, if we let k = 5, then from Table
1.9.1, we would need to destroy and measure only about one half as many trees using
the RSS method rather than the SRS method.
As a nal note, we mention the problem of assessing the eect of imperfect ranking. Suppose
that the expert makes a mistake when asked to identify the jth ordered value in a set of k
observations. As expected, there is less gain from using the RSS method. The interesting
point is that if the expert simply identies the supposed jth ordered value by random guess
then
2
= 1 and the two sign tests have the same information; see Hettmansperger (1995)
for more detail.
1.10 Interpolated Condence Intervals for the L
1
In-
ference
When we construct L
1
condence intervals, we are limited in our choice of condence coe-
cients because of the discreteness of the binomial distribution. The eect does not wear o
1.10. INTERPOLATED CONFIDENCE INTERVALS FOR THE L
1
INFERENCE 57
very quickly as the sample size increases. For example with a sample of size 50, we can have
either a 93.5% or a 96.7% condence interval, and that is as close as we can come to 95%.
In the following discussion we provide a method to interpolate between condence intervals.
The method is nonlinear and seems to be essentially distribution-free. We will begin by
presenting and illustrating the method and then derive its properties.
Suppose is the desired condence coecient. Further, suppose the following intervals
are available from the binomial table: interval (x
(k)
, x
(nk+1)
) with condence coecient
k
and interval (x
(k+1)
, x
(nk)
) with condence coecient
k+1
where
k+1

k
. Then the
interpolated interval is [

L
,

U
],

L
= (1 )x
(k)
+ x
(k+1)
and

U
= (1 )x
(nk+1)
+ x
(nk)
(1.10.1)
where
=
(n k)I
k + (n 2k)I
and I =

k

k+1
. (1.10.2)
We call I the interpolation factor and note that if we were using linear interpolation then
= I. Hence, we see that the interpolation is distinctly nonlinear.
As a simple example we take n = 10 and ask for a 95% condence interval. For k = 2
we nd
k
= .9786 and
k+1
= .8907. Then I = .325 and = .685. Hence,

L
= .342x
(2)
+
.658x
(3)
and

U
= .342x
(9)
+.658x
(8)
. Note that linear interpolation is almost the reverse of
the recommended mixtures, namely = I = .325 and this can make a substantial dierence
in small samples.
The method is based on the following theorem. This theorem highlights the nonlinear
relationship between the interpolation factor and . After proving the theorem we will need
to develop an approximate solution and then show that it works in practice.
Theorem 1.10.1. The interpolation factor I is given by
I =

k

k+1
= 1 (n k)2
n
_

0
F
k
_

1
y
_
(1 F(y))
nk1
f(y)dy
Proof. Without loss of generality we will assume that is 0. Then we can write:

k
= P
0
(x
k
0 x
nk+1
) = P
0
(k 1 < S
+
1
(0) < n k 1)
and

k+1
= P
0
(x
k+1
0 x
nk
) = P
0
(k < S
+
1
(0) < n k) .
Taking the dierence, we have, using
_
n
k
_
to denote the binomial coecient,

k+1
= P
0
(S
+
1
(0) = k) + P
0
(S
+
1
(0) = n k) =
_
n
k
_
(1/2)
n1
. (1.10.3)
58 CHAPTER 1. ONE SAMPLE PROBLEMS
We now consider the lower tail probability associated with the condence interval. First
consider
P
0
(X
k+1
> 0) =
1
k+1
2
=
_

0
n!
k!(n k 1)!
F
k
(t)(1 F(t))
nk1
dF(t)(1.10.4)
= P
0
(S
+
1
(0) n k) = P
0
(S
+
1
(0) k) .
We next consider the lower end of the interpolated interval
1
2
= P
0
((1 )X
k
+ X
k+1
> 0)
=
_

0
_
y

1
y
n!
(k 1)!(n k 1)!
F
k1
(x)(1 F(y))
nk1
f(x)f(y)dxdy
=
_

0
n!
(k 1)!(n k 1)!
1
k
_
F
k
(y) F
k
_
y
1
__
(1 F(y))
nk1
f(y)dy
=
1
k+1
2

_

0
n!
k!(n k 1)!
F
k
_
y
1
_
(1 F(y))
nk1
f(y)dy (1.10.5)
Use ( 1.10.4) in the last line above. Now with ( 1.10.3), substitute into the formula for the
interpolation factor and the result follows.
Clearly, not only is the relationship between I and nonlinear but it also depends on the
underlying distribution F. Hence, the interpolated interval is not distribution free. There
is one interesting case in which we have a distribution free interval given in the following
corollary.
Corollary 1.10.1. Suppose F is the cdf of a symmetric distribution. Then I(1/2) = k/n,
where we write I() to denote the dependence of the interpolation factor on .
This shows that when we sample from a symmetric distribution, the interval that lies
half between the available intervals does not depend on the underlying distribution. Other
interpolated intervals are not distribution free. Our next theorem shows how to approximate
the solution and the solution is essentially distribution free. We show by example that the
approximate solution works in many cases.
Theorem 1.10.2.
I()
.
= k/((2k n) + n k)
Proof. We consider the integral
_

0
F
k
_

1
y
_
(1 F(y))
nk1
f(y)dy
The integrand decreases rapidly for moderate powers; hence, we expand the integrand around
y = 0. First take logarithms then
k log F
_

1
y
_
= k log F(0)

1
k
f(0)
F(0)
y + o(y)
1.10. INTERPOLATED CONFIDENCE INTERVALS FOR THE L
1
INFERENCE 59
Table 1.10.1: Condence Coecients for Interpolated Condence Intervals in Example
1.10.1. DE(Approx)=Double Exponential and the Approximation in Theorem 1.10.2,
U=Uniform, N=Normal, C=Cauchy, Linear=Linear Interpolation
DE(Approx) U N C Linear
0.1 0.976 0.977 0.976 0.976 0.970
0.2 0.973 0.974 0.974 0.974 0.961
0.3 0.970 0.971 0.971 0.970 0.952
0.4 0.966 0.967 0.966 0.966 0.943
0.5 0.961 0.961 0.961 0.961 0.935
0.6 0.955 0.954 0.954 0.954 0.926
0.7 0.946 0.944 0.944 0.946 0.917
0.8 0.935 0.930 0.931 0.934 0.908
0.9 0.918 0.912 0.914 0.918 0.899
and
(n k 1) log(1 F(y)) = (n k 1) log(1 F(0)) (n k 1)
f(0)
1 F(0)
y + o(y) .
Substitute r = k/(1 ) and F(0) = 1 F(0) = 1/2 into the above equations, and add
the two equations together. Add and subtract r log(1/2), and group terms so the right side
of the second equation appears on the right side along with k log(1/2) r log(1/2). Hence,
we have
k log F
_

1
y
_
+ (n k 1) log(1 F(y)) = k log(1/2) r log(1/2)
+(n r k 1) log(1 F(y)) +o(y) ,
and, hence,
_

0
F
k
_

1
y
_
(1 F(y))
nk1
f(y)dy
.
=
_

0
2
(kr)
(1 F(y))
n+rk1
f(y)dy
=
1
2
n
(n + r k)
. (1.10.6)
Substitute this approximation into the formula for I(), use r = k/(1 ) and the result
follows.
Note that the approximation agrees with Corollary 1.10.1. In addition Exercise 1.12.28
shows that the approximation formula is exact for the double exponential (Laplace) dis-
tribution. In Table 1.10.1 we show how well the approximation works for several other
distributions. The exact results were obtained by numerical integration of the integral in
Theorem 1.10.1. Similar close results were found for asymmetric examples. For further
reading see Hettmansperger and Sheather (1986) and Nyblom (1992).
60 CHAPTER 1. ONE SAMPLE PROBLEMS
Example 1.10.1. Cushney-Peebles Example 1.4.1, continued.
We now return to this example using it to illustrate the sign test and the L
1
interpolated
condence interval. We use the RBR function interpci for the computations. We take as
our location model: X
1
, . . . , X
10
iid from H(x) = F(x ), F and both unknown, along
with the L
1
norm. We have already seen that the estimate of is the sample median equal
to 1.3. Besides obtaining an interpolated 95% condence interval, we test H
0
: = 0 versus
H
A
: ,= 0. Assuming that the sample is in the vector x, the output for a test and a 95%
interpolated condence interval is:
> tm=interpci(.05,x)
Estimation of Median
Sample Median is 1.3
Confidence Interval ( 1 , 1.8 ) 89.0625 %
Confidence Interval ( 0.9315 , 2.0054 ) 95 % Interpolted
Confidence Interval ( 0.8 , 2.4 ) 97.8516 %
Results for the Sign Test
Test of theta = 0 versus theta not equal to 0
Test stat. S is 9 p-vlaue 0.00390625
Note the p-value of the test is .0039 and we would easily reject the null hypothesis at any
reasonable level of signicance. The interpolated 95% condence interval for shows the
reasonable set of values of to be between .9315 and 2.0054, given the level of condence.
1.11 Two Sample Analysis
We now propose a simple way to extend our one sample methods to the comparison of two
samples. Suppose X
1
, . . . , X
m
are iid F(x
x
) and Y
1
, . . . , Y
n
are iid F(y
y
) and the
two samples are independent. Let =
y

x
and we wish to test the null hypothesis
H
0
: = 0 versus the alternative hypothesis H
a
: ,= 0. Without loss of generality we can
consider
x
= 0 so that the X sample is from a distribution with cdf F(x) and the Y sample
is from a distribution with cdf F(y ).
The hypothesis testing rule that we propose is:
1. Construct L
1
condence intervals [X
L
, X
U
] and [Y
L
, Y
U
].
2. Reject H
0
if the intervals are disjoint.
1.11. TWO SAMPLE ANALYSIS 61
If we consider the condence interval as a set of reasonable values for the parameter, given
the condence coecient, then we reject the null hypothesis when the respective reasonable
values are disjoint. We must determine the signicance level for the test. In particular, for
given
x
and
y
, what is the value of
c
, the signicance level for the comparison? Perhaps
more pertinent: Given
c
, what values should we choose for
x
and
y
? Below we show that
for a broad range of sample sizes,
Comparing two 84% CIs yields a 5% test of H
0
: = 0 versus H
A
: ,= 0, (1.11.1)
where CI denotes condence interval. In the following theorem we provide the relationship
between
c
and the pair
x
,
y
. Dene z
x
by
x
= 2(z
x
) 1 and likewise z
y
by
y
=
2(z
y
) 1.
Theorem 1.11.1. Suppose m, n so that m/N , 0 < < 1, N = m + n. Then
under the null hypothesis H
0
: = 0,

c
= P(X
L
> Y
U
) + P(Y
L
> X
U
) 2[(1 )
1/2
z
x

1/2
z
y
]
Proof. We will consider
c
/2 = P(X
L
> Y
U
). From ( 1.5.22) we have
X
L
.
=
S
x
(0)
m2f(0)

z
x
m
1/2
2f(0)
and Y
U
.
=
S
y
(0)
m2f(0)
+
z
y
n
1/2
2f(0)
.
Since m/N
N
1/2
X
L
D

1/2
Z
1
, Z
1
n(z
x
/2f(0), 1/4f
2
(0)) ,
and
N
1/2
Y
U
D
(1 )
1/2
Z
2
, Z
2
n(z
y
/2f(0), 1/4f
2
(0)) .
Now
c
/2 = P(X
L
> Y
U
) = P(N
1/2
(Y
U
X
L
) < 0) and X
L
, Y
U
are independent, hence
N
1/2
(Y
U
X
L
)
D

1/2
Z
1
(1 )
1/2
Z ,
and

1/2
Z
1
(1 )
1/2
Z
2
n
_
1
2f(0)
_
z
x
(1 )
1/2
+
z
y

1/2
_
,
1
4f
2
(0)
_
1

+
1
1
__
.
It then follows that
P(N
1/2
(Y
U
X
L
) < 0)
_

_
z
x
(1 )
1/2
+
z
y

1/2
_
/
_
1
(1 )
_
1/2
_
.
Which, when simplied, yields the result in the statement of the theorem.
62 CHAPTER 1. ONE SAMPLE PROBLEMS
Table 1.11.1: Condence Coecients for 5% Comparison.
= m/N .500 .550 .600 .650 .750
m/n 1.00 1.22 1.50 1.86 3.00
z
x
= z
y
1.39 1.39 1.39 1.40 1.43

x
=
y
.84 .84 .84 .85 .86
To illustrate, we take equal sample sizes so that = 1/2 and we take z
x
= z
y
= 2. Then
we have two 95% condence intervals and we will reject the null hypothesis H
0
: = 0 if the
two intervals are disjoint. The above theorem says that the signicance level is approximately
equal to
c
= 2(2.83) = .0046. This is a very small level and it will be dicult to reject
the null hypothesis. We might prefer a signicance level of say
c
= .05. We then must nd
z
x
and z
y
so that .05 = 2((.5)
1/2
(z
x
+z
y
)). Note that now we have an innite number of
solutions. If we impose the reasonable condition that the two condence coecients are the
same then we require that z
x
= z
y
= z. Then we have the equation .025 = ((2)
1/2
z) and
hence 2 = (2)
1/2
z. So z = 2
1/2
= 1.39 and the condence coecient for the two intervals
is =
x
=
y
= 2(1.41) 1 = .84. Hence, if we have equal sample sizes and we use two
84% condence intervals then we have a 5% two sided comparison of the two samples.
If we set
c
= .10, this would correspond to a 5% one sided test. This means that we
compare the two condence intervals in the direction specied by the alternative hypothesis.
For example, if we specify =
y

x
> 0, then we would reject the null hypothesis if
the X-interval is completely below the Y -interval. To determine which condence intervals
we again assume that the two intervals will have the same condence coecient. Then we
must nd z such that .05 = ((2)
1/2
z) and this leads to 1.645 = (2)
1/2
z and z = 1.16.
Hence, the condence coecient for the two intervals is =
x
=
y
= 2(1.16) 1 = .75.
Hence, for a one-sided 5% test or a 10% two-sided test, when you have equal sample sizes,
use two 75% condence intervals.
We must now consider what to do if the sample sizes are not equal. Let z
c
be determined
by
c
/2 = (z
c
), then, again if we use the same condence coecient for the two intervals,
z = z
x
= z
y
= z
c
/(
1/2
+ (1 )
1/2
). When m = n so that = 1 = .5 we had
z = z
c
/2
1/2
= .707z
c
and so z = 1.39 when
c
= .05. We now show by example that when

c
= .05, z is not sensitive to the value of . Table 1.11.1 gives the relevant information.
Hence, if we use 84% condence intervals, then the signicance level will be roughly 5% for
the comparison for a broad range of ratios of sample sizes. Likewise, we would use 75%
intervals for a 10% comparison. See Hettmansperger (1984b) for additional discussion.
Next suppose that we want a condence interval for =
y

x
. In the following simple
theorem we show that the proposed test based on comparing two condence intervals is
equivalent to checking to see if zero is contained in a dierent condence interval. This new
interval will be a condence interval for .
Theorem 1.11.2. [X
L
, X
U
] and [Y
L
, Y
U
] are disjoint if and only if 0 is not contained in
1.11. TWO SAMPLE ANALYSIS 63
[Y
L
X
U
, Y
U
X
L
].
If we specify our signicance level to be
c
then we have immediately that
1
c
= P

(Y
L
X
U
Y
U
X
L
)
and [Y
L
X
U
, Y
U
X
L
] is a
c
= 1
c
condence interval for .
This theorem simply points out that the hypothesis test can be equivalently based on
a single condence interval. Hence, two 84% intervals produce a roughly 95% condence
interval for . The condence interval is easy to construct since we need only nd the least
and greatest dierences of the end points between the respective Y and X intervals.
Recall that one way to measure the eciency of a condence interval is to nd its asymp-
totic length. This is directly related to the Pitman ecacy of the procedure; see Section
1.5.5. This would seem to be the most natural way to study the eciency of the test based
on condence intervals. In the following theorem we determine the asymptotic length of the
interval for .
Theorem 1.11.3. Suppose m, n in such a way that m/N , 0 < < 1, N = m+n.
Further suppose that
c
= 2(z
c
) 1. Let be the length of [Y
L
X
U
, Y
U
X
L
]. Then
N
1/2

2z
c

1
[(1 )]
1/2
]2f(0)
Proof. First note that =
x
+
y
, the sum of the two lengths of the X and Y intervals,
respectively. Further,
N
1/2
=
N
1/2
n
1/2
n
1/2

y
+ =
N
1/2
m
1/2
m
1/2

x
.
But by Theorem 1.5.9 this converges in probability to z
x
/
1/2
+ z
y
/(1 )
1/2
. Now note
that (1 )
1/2
z
x
+
1/2
z
y
= z
c
and the result follows.
The interesting point about this theorem is that the eciency of the interval does not
depend on how z
x
and z
y
are chosen so long as they satisfy (1 )
1/2
z
x
+
1/2
z
y
= z
c
. In
addition, this interval has inherited the ecacy of the L
1
interval in the one sample location
model. We will discuss the two-sample location model in detail in the next chapter. In
Hettmansperger (1984b) other choices for z
x
and z
y
are discussed; for example, we could
choose z
x
and z
y
so that the asymptotic standardized lengths are equal. The corresponding
condence coecients for this choice are more sensitive to unequal sample sizes than the
method proposed here.
Example 1.11.1. Hendy and Charles Coin Data
Hendy and Charles (1970) study the change in silver content in Byzantine coins. During
the reign of Manuel I (1143-1180) there were several mintings. We consider the research
hypothesis that the silver content changed from the rst to the fourth coinage. The data
consists in 9 coins identied from the rst coinage and 7 coins from the fourth. We suppose
64 CHAPTER 1. ONE SAMPLE PROBLEMS
Table 1.11.2: Silver Percentage in Two Mintings
First 5.9 6.8 6.4 7.0 6.6 7.7 7.2 6.9 6.2
Fourth 5.3 5.6 5.5 5.1 6.2 5.8 5.8
that they are realizations of random samples of coins from the two populations. The percent-
age of silver in each coin is given in Table 1.11. Let =
1

4
where the 1 and 4 indicate
the coinage. To test the null hypothesis H
0
: = 0 versus H
A
: ,= 0 at = .05, we
construct two 84% L
1
condence intervals and reject the null hypothesis if they are disjoint.
The condence intervals can be computed by using the RBR function onesampsgn with the
value alph=.16. Results pertinent to the condence intervals are:
> onesampsgn(First,alpha=.16)
Estimate 6.8 SE is 0.2135123
84 % Confidence Interval is ( 6.4 , 7 )
Estimate of the scale parameter tau 0.6405368
> onesampsgn(Fourth,alpha=.16)
Estimate 5.6 SE is 0.1779269
84 % Confidence Interval is ( 5.3 , 5.8 )
Estimate of the scale parameter tau 0.4707503
Clearly, the 84% condence intervals are disjoint, hence, we reject the null hypothesis
at a 5% signicance level and claim that the emperor apparently held back a little on the
fourth coinage. A 95% condence interval for =
1

4
is found by taking the dierences
in the ends of the condence intervals: (6.4 5.8, 7.0 5.3) = (0.6, 1.7). Hence, this analysis
suggests that the dierence in median percentages is someplace between .6% and 1.7%, with
a point estimate of 6.8 5.6 = 1.2%.
Figure 1.11.1 provides a comparison boxplot of the data for the rst and fourth coinages.
Marking the 84% condence intervals on the plot, we can see the relatively large gap between
the condence intervals, i.e., the sharp reduction in silver content from the rst to fourth
coinage. In addition, the box for the fourth coinage is a bit more narrow than the box for
the rst coinage indicating that there may be less variation (as measured by the interquartile
range) in the fourth coinage. There are no apparent outliers as indicated by the whiskers on
the boxplot. Larson and Stroup (1976) analyze this example with a two sample t-test.
1.12. EXERCISES 65
Figure 1.11.1: Comparison Boxplots of the Hendy and Charles Coin Data
First Fourth
5
.
0
5
.
5
6
.
0
6
.
5
7
.
0
7
.
5
P
e
r
c
e
n
t
a
g
e

o
f

s
i
l
v
e
r
1.12 Exercises
1.12.1. Show that if | | is a norm, then there always exists a value of which minimizes
|x 1| for any x
1
, . . . , x
n
.
1.12.2. Figure 1.12.1 displays the graph of Z() versus for n = 20 data points (count the
steps) where
Z() =
1

n
20

i=1
sign(X
i
),
i.e., the standardized sign (median) process.
(a) From the plot, what are the minimum and maximum values of the sample?
(b) From the plot, what is the associated point estimate of ?
66 CHAPTER 1. ONE SAMPLE PROBLEMS
(c) From the plot, determine a 95% condence interval for , (approximate, but show on
the graph).
(d) From the plot, determine the value of the test statistic and the associated p-value for
testing H
0
: = 0 versus H
A
: > 0.
1 0 1 2 3

5
0
5
theta
Z
(
t
h
e
t
a
)
Plot of Z(theta) versus theta
Figure 1.12.1: The Graph of Z() versus
1.12.3. Show D(), ( 1.3.3), is convex and continuous as a function of . Further, argue that
D() is dierentiable almost everywhere. Let S() be a function such that S() = D

()
where the derivative exists. Then show that S() is a nonincreasing function.
1.12.4. Consider the L
2
norm. Show that

= x and that S
2
(0) =

nt/

n 1 +t
2
where
t =

n x/s, and s is the sample standard deviation. Further, show S
2
(0) is an increasing
function of t so the test based on t is equivalent to S
2
(0).
1.12.5. Discuss the consistency of the t-test. Is the t-test resolving?
1.12.6. Discuss the Pitman regularity in the L
2
case.
1.12.7. The following R function computes a bootstrap distribution of the sample median.
bootmed = function(x,nb){
# Sample is in x and nb is the number of bootstraps
n = length(x)
bootmed = rep(0,nb)
for(i in 1:nb){
1.12. EXERCISES 67
y = sample(x,size=n,replace=T)
bootmed[i] = median(y)
}
bootmed
}
(a). Use this code to obtain 1000 bootstraped medians for the Shoshoni data of Example
1.4.2. Determine the standard error of this bootstrap sample of medians and compare
it with the estimate based on the length of the condence interval for the Shoshoni
data.
(b). Now nd the mean and variance of the Shoshoni data. Use these estimates to perform
a parametric bootstrap of the sample median, as discussed in Example ??. Determine
the standard error of this parametric bootstrap sample of medians and compare it with
estimates in Part (a).
1.12.8. Using languages such as Minitab or R, obtain a plot of the test sensitivity curves
based on the signed-rank Wilcoxon statistic for the Cushney-Peebles Data, Example 1.4.1,
similar to the sensitivity curves based on the t test and the sign test as shown in Figure
1.4.1.
1.12.9. In the proof of Theorem 1.5.6, show that ( 1.5.19) and ( 1.5.20) imply that U
n
(b)
converges to

(0) in probability, pointwise in b, i.e., U

n
(b) =

(0) +o
p
(1).
1.12.10. Suppose we are sampling fron the distribution with pdf
f(x) =
3
4
1
(2/3)
exp[x[
3/2
, < x <
and we are considering whether to use the Wilcoxon or sign test. Using the ecacies of these
tests, determine which test to use.
1.12.11. For which of the following distributions is the signed-rank Wilcoxon more powerful?
Why?
f
1
(x) =
_
3
2
x
2
1 < x < 1
0 elsewhere.
f
2
(x) =
_
3
2
(1 x
2
) 1 < x < 1
0 elsewhere.
1.12.12. Show that ( 1.5.23) is scale invariant. Hence, the eciency does not change if X
is multiplied by a positive constant. Let
f(x, ) = exp([x[

)/2(
1
), < x < , 1 2.
When = 2, f is a normal distribution and when = 1, f is a Laplace distribution. Compute
and plot as a function of the eciency ( 1.5.23).
68 CHAPTER 1. ONE SAMPLE PROBLEMS
1.12.13. Show that the nite sample breakdown of the Hodges-Lehmann estimate ( 1.3.25) is

n
= m/n, where m is the solution to the quadratic inequality 2m
2
(4n+2)m

+n
2
+n 0.
Table

n
as a function of n and show that

n
converges to 1
1

2
.
= .29.
1.12.14. Derive ( 1.6.9).
1.12.15. Prove Lemma 1.7.2.
1.12.16. Prove Theorem 1.7.1. In particular, check the conditions of the Lindeberg Central
Limit Theorem to verify ( 1.7.7).
1.12.17. Prove Theorem 1.7.2.
1.12.18. For the general signed-rank norm given by (1.8.1), show that the function T

+(),
(1.8.2) is a decreasing step function which steps down only at the Walsh averages. Hint:
First show that the ranks of [X
i
[ and [X
j
[ switch for
1
<
2
if and only if

1
<
X
i
+ X
j
2
<
2
,
(replace ranks by signs if i = j).
1.12.19. Use the results of the last exercise to write in some detail the tracing algorithm,
described after expression (1.8.5), for obtaining the location estimator

+ and its associated

standard error.
1.12.20. Suppose h(x) has nite Fisher information:
I(h) =
_
(h

(x))
2
h(x)
dx < .
Prove that h(x) is bounded and that
_
h
2
(x)dx < .
Hint: Write
h(x) =
_
x

(t)dt
_
x

(t)[dt .
1.12.21. Repeat Exercise 1.12.12 for ( 1.7.13).
1.12.22. Show that ( 1.8.1) is a norm.
1.12.23. Show that
_

+
2
h
(u)du,
+
2
h
(u) given by ( 1.8.18), is equal to Fisher information,
_
(h

(x))
2
h(x)
dx .
1.12.24. Find ( 1.8.18) when h is normal, logistic, Laplace (double exponential) density,
respectively.
1.12. EXERCISES 69
1.12.25. Verify that the inuence function of the normal score estimate is unbounded when
the underlying distribution is normal.
1.12.26. Verify ( 1.9.4).
1.12.27. Derive the limit distribution in expression ( 1.9.5).
1.12.28. Show that approximation ( 1.10.6) is exact for the double exponential (Laplace)
distribution.
1.12.29. Extend the simulation study of Example 1.8.2 to the other contaminated normal
situations found in Table 1.7.1. Comment on the results. Compare the empirical results for
the Wilcoxon withe asymptotic results found in the table.
The following R code performs the contaminated normal simulation discussed in Example
1.8.2. (Semicolons are end of line indicators. As indicated in the call to onesampr, the normal
scores estimator is computed by using the gradient R function spnsc and score function
phinscp.)
nsims = 10000; n = 30; itype = 1; eps = .01; sigc = 3;
collls = rep(0,nsims): collwil = rep(0,nsims); collnsc = rep(0,nsims);
for(i in 1:nsims){
if(itype == 0){x = rnorm(n)}
if(itype == 1){x = rcn(n,eps,sigc)}
collls[i] = mean(x)
collnsc[i] = onesampr(x,score=phinscp,grad=spnsc,maktable=F)$est
collwil[i] = onesampwil(x,maktable=F)$est
}
msels = mean(collls^2); msensc = mean(collnsc^2): msewil = mean(collwil^2)
arensc = msels/msensc; arewil = msels/msewil; arenscwil = msewil/msensc
1.12.30. Consider the one sample location problem. Let T() be a nonincreasing process.
Consider the hypotheses:
H
0
: = 0 versus H
A
: > 0.
Assume that T() is standardized so that the decision rule of the (asymptotic) level test
is given by
Reject H
0
: = 0 in favor of H
A
: > 0, if T(0) > z

.
Further assume that for all [[ < B, B > 0,
T(/

n) = T(0) 1.2 + o
p
(1).
(a) For
0
> 0, determine the asymptotic power (
0
), i.e., determine
(
0
) = P

0
[T(0) > z

].
(b) Evaluate (
0
) for n = 36 and
0
= 0.5.
70 CHAPTER 1. ONE SAMPLE PROBLEMS
1.12.31. Suppose X
1
, . . . , X
2n
are independent observations such that X
i
has cdf F(x
i
)0.
For testing H
0
:
1
= . . . =
2n
versus H
A
:
1
. . .
2n
with at least one strict inequality,
consider the test statistic,
S =
n

i=1
I(X
n+i
> X
i
) .
(a.) Discuss the small sample and asymptotic distribution of S under H
0
.
(b.) Determine the alternative distribution of S under the alternative
n+i

i
= , > 0,
for all i = 1, . . . , n. Show that the test is consistent for this alternative. This test is
called Manns (1945) test for trend.
1.12.32. The data in Table 1.12.1 constitutes a sample of size 59 of information on profes-
sional baseball players. The data were recorded from the back of a deck of baseball cards,
(complements of Carrie McKean).
(a). Obtain dotplots of the weights and heights of the baseball players.
(b). Assume the weight of a typical adult male is 175 pounds. Use the Wilcoxon test statistic
to test the hypotheses
H
0
:
W
= 175 versus H
A
:
W
,= 175 ,
where
W
is the median weight of a professional baseball player. Compute the p-value.
Next obtain a 95% condence interval for
W
using the condence interval procedure
based on the Wilcoxon. Use the dotplot in Part (a) to comment on the assumption of
symmetry.
(c). Let
H
be the median height of a baseball player. Repeat the analysis of Part (b) for
the hypotheses
H
0
:
H
= 70 versus H
A
:
H
,= 70 .
1.12.33. The signed-rank Wilcoxon scores are optimal for the logistic distribution while the
sign scores are optimal for the Laplace distribution. A family of score functions which are
optimal for distributions with logistic middles and Laplace tails are the bent scores.
These are continuous score functions
+
(u) with a linear (positive slope and intercept 0)
piece for 0 < u < b and a constant piece for b < u < 1, for a specied value of b; see Policello
and Hettmansperger (1976). These are called signed-rank Winsorized Wilcoxon scores.
(a) Obtain the standardized scores such that
_
[
+
(u)]
2
du = 1.
(b) For these scores with b = 0.75, obtain the corresponding estimate of location and an
estimate of its standard error for the following data set:
7.94 8.13 8.11 7.96 7.83 7.04 7.91 7.82
7.42 8.06 8.51 7.88 8.96 7.58 8.14 8.06
1.12. EXERCISES 71
Table 1.12.1: Data for professional baseball players, Exercise 1.12.32. The variables are:
(H) Height in inches; (W) Weight in pounds; (B) Side of plate from which the player bats,
(1-Right handed, 2-Left handed, 3-Switch-hitter); (A) Throwing arm (0-Right, 1-Left); (P)
Pitch-hit indicator, (0-Pitcher, 1-Hitter); and (Ave) Average (ERA if pitcher, Batting aver-
age if hitter).
H W B A P Ave H W B A P Ave
74 218 1 1 0 3.330 79 232 2 1 0 3.100
75 185 1 0 1 0.286 72 190 1 0 1 0.238
77 219 2 1 0 3.040 75 200 2 0 0 3.180
73 185 1 0 1 0.271 70 175 2 0 1 0.279
69 160 3 0 1 0.242 75 200 1 0 1 0.274
73 222 1 0 0 3.920 78 220 1 0 0 3.880
78 225 1 0 0 3.460 73 195 1 0 0 4.570
76 205 1 0 0 3.420 75 205 2 1 1 0.284
77 230 2 0 1 0.303 74 185 1 0 1 0.286
78 225 1 0 0 3.460 71 185 3 0 1 0.218
76 190 1 0 0 3.750 73 210 1 0 1 0.282
72 180 3 0 1 0.236 76 210 2 1 0 3.280
73 185 1 0 1 0.245 73 195 1 0 1 0.243
73 200 2 1 0 4.800 75 205 1 0 0 3.700
74 195 1 0 1 0.276 73 175 1 1 0 4.650
75 195 1 0 0 3.660 73 190 2 1 1 0.238
72 185 2 1 1 0.300 74 185 3 1 0 4.070
75 190 1 0 1 0.239 72 190 3 0 1 0.254
76 200 1 0 0 3.380 73 210 1 0 0 3.290
76 180 2 1 0 3.290 71 195 1 0 1 0.244
72 175 2 1 1 0.290 71 166 1 0 1 0.274
76 195 2 1 0 4.990 71 185 1 1 0 3.730
68 175 2 0 1 0.283 73 160 1 0 0 4.760
73 185 1 0 1 0.271 74 170 2 1 1 0.271
69 160 1 0 1 0.225 76 185 1 0 0 2.840
76 211 3 0 1 0.282 71 155 3 0 1 0.251
77 190 3 0 1 0.212 76 190 1 0 0 3.280
74 195 1 0 1 0.262 71 160 3 0 1 0.270
75 200 1 0 0 3.940 70 155 3 0 1 0.261
73 207 3 0 1 0.251
72 CHAPTER 1. ONE SAMPLE PROBLEMS
The software RBR computes this estimate with the call
onesampr(x,score=phipb,grad=sphipb,param=c(.75)).
Chapter 2
Two Sample Problems
2.1 Introduction
Let X
1
, . . . , X
n
1
be a random sample with common distribution function F(x) and density
function f(x). Let Y
1
, . . . , Y
n
2
be another random sample, independent of the rst, with
common distribution function G(x) and density g(x). We will call this the general model
throughout this chapter. A natural null hypothesis is H
0
: F(x) = G(x). In this chapter
we will consider rank and sign tests of this hypothesis. A general alternative to H
0
is
H
A
: F(x) ,= G(x) for some x. Except for the Section 2.10 on the scale model we will
be generally concerned with the alternative models where one distribution is stochastically
larger than the other; for example, the alternative that G is stochastically larger than F
which can be expressed as H
A
: G(x) F(x) with a strict inequality for some x. This family
of alternatives includes the location model, described next, and the Lehmann alternative
models discussed in Section 2.7, which are used in survival analysis.
As in Chapter 1, the location models will be of primary interest. For these models
G(x) = F(x ) for some parameter . Thus the parameter represents a shift in
location between the two distributions. It can be expressed as =
Y

X
where
Y
and

X
are the medians of the distributions of G and F or equivalently as =
Y

X
where,
provided they exist,
Y
and
X
are the means of G and F. In the location problem the null
hypothesis becomes H
0
: = 0. In addition to tests of this hypothesis we will develop
estimates and condence intervals for . We will call this the location model throughout
this chapter and we will show that this is a generalization of the location problem dened in
Chapter 1.
As in Chapter 1 with the one-sample problems, for the two-sample problems, we oer
the reader computational R functions which do the computation for the rank-based analyses
dicussed in this chapter.
73
74 CHAPTER 2. TWO SAMPLE PROBLEMS
2.2 Geometric Motivation
In this section, we work with the location model described above. As in Chapter 1, we will
derive sign and rank-based tests and estimates from a geometric point of view. As we shall
show, their development is analogous to that of least squares procedures in that other norms
are used in place of the least squares Euclidean norm. In order to do this we place the
problem into the context of a linear model. This will facilitate our geometric development
and will also serve as an introduction to Chapter 3, linear models.
Let Z

= (X
1
, . . . , X
n
1
, Y
1
, . . . , Y
n
2
) denote the vector of all observations; let n = n
1
+n
2
denote the total sample size; and let
c
i
=
_
0 if 1 i n
1
1 if n
1
+ 1 i n
. (2.2.1)
Then we can write the location model as
Z
i
= c
i
+ e
i
, 1 i n , (2.2.2)
where e
1
, . . . , e
n
are iid with distribution function F(x). Let C = [c
i
] denote the n1 design
matrix and let
FULL
denote the column space of C. We can express the location model as
Z = C +e , (2.2.3)
where e

= (e
1
, . . . , e
n
) is the n 1 vector of errors. Note that except for random error, the
observations Z would lie in
FULL
. Thus given a norm, we estimate so that C

minimizes
the distance between Z and the subspace
FULL
; i.e., C

is the vector in
FULL
closest to
Z.
Before turning our attention to , however, we write the problem in terms of the geometry
discussed in Chapter 1. Consider any location functional T of the distribution of e. Let
= T(F). Dene the random variable e

= e . Then the distribution function of e

is
F

(x) = F(x+) and its functional is T(F

) = 0. Thus the model, (2.2.3), can be expressed

as
Z = 1 +C +e

. (2.2.4)
Note that this is a generalization of the location problem discussed in Chapter 1. From
the last paragraph, the distribution function of X
i
can be expressed as F(x) = F

(x );
hence, T(F) = is a location functional of X
i
. Further, the distribution function of Y
j
can be written as G(x) = F

(x ( + )). Thus T(G) = + is a location functional

of Y
j
. Therefore, is precisely the dierence in location functionals between X
i
and Y
j
.
Furthermore does not depend on which location functional is used and will be called the
shift parameter.
Let b = (, )

. Given a norm, we want to choose as our estimate of b a value

b such
that [1 C]

b minimizes the distance between the vector of observations Z and the column
space V of the matrix [1 C]. Thus we can use the norms dened in Chapter 1 to estimate b.
2.2. GEOMETRIC MOTIVATION 75
If, as an example, we select the L
1
norm, then our estimate of b minimizes
D(b) =
n

i=1
[Z
i
c
i
[ . (2.2.5)
Dierentiating D with respect to and , respectively, and setting the resulting equations
to 0 we obtain the equations,
n
1

i=1
sgn (X
i
) +
n
2

j=1
sgn (Y
j
)
.
= 0 (2.2.6)
n
2

j=1
sgn (Y
j
)
.
= 0 . (2.2.7)
Subtracting the second equation from the rst we get

n
1
i=1
sgn (X
i
)
.
= 0; hence,

= med X
i
. Substituting this into the second equation, we get

= med Y
j

=
med Y
j
med X
i
; hence,

b = (med X
i
, med Y
j

med X
i
). We will obtain
inference based on the L
1
norm in Sections 2.6.1 and 2.6.2.
If we select the L
2
norm then, as shown in Exercise 2.13.1, the LS-estimate

b = (X, Y
X)

. Another norm discussed in Chapter 1 was the weighted L

1
norm. In this case b is
estimated by minimizing
D(b) =
n

i=1
R([Z
i
c
i
[)[Z
i
c
i
[ . (2.2.8)
This estimate cannot be obtained in closed form; however, fast minimization algorithms for
such problems are discussed later in Chapter 3.
In the initial statement of the problem, though, is a nuisance parameter and we are
really interested in , the shift in location between the populations. Hence, we want to dene
distance in terms of norms which are invariant to . The type of norm that is invariant to
is a pseudo-norm which we dene next.
Denition 2.2.1. An operator | |

is called a pseudo-norm if it satises the following

four conditions:
|u +v|

|u|

+|v|

for all u, v R
n
|u|

= [[|u|

for all R, u R
n
|u|

0 for all u R
n
|u|

= 0 if and only if u
1
= = u
n
Note that a regular norm satises the rst three properties but in lieu of the fourth
property, the norm of a vector is 0 if and only if the vector is 0. The following inequalities
76 CHAPTER 2. TWO SAMPLE PROBLEMS
establish the invariance of pseudo-norms to the parameter :
|Z 1 C|

|Z C|

+|1|

= |Z C|

= |Z 1 C + 1|

|Z 1 C|

.
Hence, |Z 1 C|

= |Z C|

.
Given a pseudo-norm, denote the associated dispersion function by D

() = |ZC|

.
It follows from the above properties of a pseudo-norm that D

() is a non-negative, contin-
uous, and convex function of .
We next develop an inference which includes estimation of and tests of hypotheses
concerning for a general pseudo-norm. As an estimate of the shift parameter , we
choose a value

which solves

= ArgminD

() = Argmin|Z C|

; (2.2.9)
i.e., C

minimizes the distance between Z and

FULL
. Another way of dening

is as the
stationary point of the gradient of the pseudo-norm. Dene the function S

by
S

() = |Z C|

(2.2.10)
where denotes the gradient of |Z C|

with respect to . Because D

() is convex,
it follows immediately that
S

() is nonincreasing in . (2.2.11)
Hence

is such that
S

)
.
= 0 . (2.2.12)
Given a location functional = T(F), i.e. Model (2.2.4), once has been estimated we
can base an estimate of on the residuals Z
i

c
i
. For example, if we chose the median
as our location functional then we could use the median of the residuals to estimate it. We
will discuss this in more detail for general linear models in Chapter 3.
Next consider the hypotheses
H
0
: = 0 versus H
A
: ,= 0 . (2.2.13)
The closer S

(0) is to 0 the more plausible is the hypothesis H

0
. More formally, we dene
the gradient test of H
0
versus H
A
by the rejection rule,
Reject H
0
in favor of H
A
if S

(0) k or S

(0) l ,
where the critical values k and l depend on the null distribution of S

(0). Typically, the null

distribution of S

(0) is symmetric about 0 and k = l. The reduction in dispersion test

is given by
Reject H
0
in favor of H
A
if D

(0) D

) m ,
2.2. GEOMETRIC MOTIVATION 77
where the critical value m is determined by the null distribution of the test statistic. In
this chapter, as in Chapter 1, we will be concerned with the gradient test while in Chapter
3 we will use the reduction in dispersion test. A condence interval for of condence
(1 )100% is the interval : k < S

() < l and
1 = P

[k < S

() < l] . (2.2.14)
Since D

() is convex, S

() is nonincreasing and we have

L
= inf : S

() < l and

U
= sup : S

() > k ; (2.2.15)
compare (1.3.10). Often we will be able to invert k < S

() < l to nd an explicit formula

for the upper and lower end points.
We will discuss a large class of general pseudo norms in Section 2.5, but now we present
the pseudo norms that yield the pooled t-test and the Mann-Whitney-Wilcoxon test.
2.2.1 Least Squares (LS) Analysis
The traditional analysis is based on the squared pseudo-norm given by
|u|
2
LS
=
n

i=1
n

j=1
(u
i
u
j
)
2
, u R
n
. (2.2.16)
It follows, (see Exercise 2.13.1) that
|Z C|
2
LS
= 4n
1
n
2
(Y X ) ;
hence the classical estimate is

LS
= Y X. Eliminating the constant factor 4n
1
n
2
the
classical test is based on the statistic
S
LS
(0) = Y X .
As shown in Exercise 2.13.1, standardizing S
LS
results in the two-sample pooled t-statistic.
An approximate condence interval for is given by
Y X t
(/2,n
1
+n
2
2)

_
1
n
1
+
1
n
2
,
where is the usual pooled estimate of the common standard deviation. This condence
interval is exact if e
i
has a normal distribution. Asymptotically, we replace t
(/2,n
1
+n
2
2)
by
z
/2
. The test is asymptotically distribution free.
78 CHAPTER 2. TWO SAMPLE PROBLEMS
2.2.2 Mann-Whitney-Wilcoxon (MWW) Analysis
The rank based analysis is based on the pseudo-norm dened by
|u|
R
=
n

i=1
n

j=1
[u
i
u
j
[ , u R
n
. (2.2.17)
Note that this pseudo-norm is the L
1
-norm based on the dierences between the components
and that it is the second term of expression (1.3.20), which denes the norm of the signed
rank analysis of Chapter 1. Note further, that this pseudo-norm diers from the least squares
pseudo-norm in that the square root is taken inside the double summation. In Exercise 2.13.2
the reader is asked to show that this indeed is a pseudo-norm and that further it can be
written in terms of ranks as
|u|
R
= 4
n

i=1
_
R(u
i
)
n + 1
2
_
u
i
.
From (2.2.17), it follows that the MWW gradient is
|Z C|
R
= 2
n
1

i=1
n
2

j=1
sgn (Y
j
X
i
) .
Our estimate of is a value which makes the gradient zero; that is, makes half of the
dierences positive and the other half negative. Thus the rank based estimate of is

R
= med Y
j
X
i
. (2.2.18)
This pseudo-norm estimate is often called the Hodges-Lehmann estimate of shift for the
two sample problem, (Hodges and Lehmann, 1963). As we show in Section 2.4.4,

R
has an
approximate normal distribution with mean and standard deviation
_
(1/n
1
) + (1/n
2
),
where the scale parameter is given in display (2.4.22).
From the gradient we dene
S
R
() =
n
1

i=1
n
2

j=1
sgn (Y
j
X
i
) . (2.2.19)
Next dene
S
+
R
() = #(Y
j
X
i
> ) . (2.2.20)
Note that we have (with probability one) that S
R
() = 2S
+
R
() n
1
n
2
. The statistic
S
+
R
= S
+
R
(0), originally proposed by Mann and Whitney (1947), will be more convenient to
use. The gradient test for the hypotheses (2.2.13) is
Reject H
0
in favor of H
A
if S
+
R
k or S
+
R
n
1
n
2
k ,
2.2. GEOMETRIC MOTIVATION 79
where k is chosen by P
0
(S
+
R
k) = /2. We show in Section 2.4 that the test statistic
is distribution free under H
0
and, that further, it has an asymptotic normal distribution
with mean n
1
n
2
/2 and standard deviation
_
n
1
n
2
(n
1
+ n
2
+ 1)/12 under H
0
. Hence, an
asymptotic level test rejects H
0
in favor of H
A
, if
[z[ > z
/2
where z =
S
+
R
(n
1
n
2
/2)

n
1
n
2
(n
1
+n
2
+1)/12
. (2.2.21)
As shown in Section 2.4.2, the (1 )100% MWW condence interval for is given
by
[D
(k+1)
, D
(n
1
n
2
k)
) , (2.2.22)
where k is such that P
0
[S
+
R
k] = /2 and D
(1)
D
(n
1
n
2
)
denote the ordered n
1
n
2
dierences Y
j
X
i
. It follows from the asymptotic null distribution of S
+
R
that k can be
approximated as
n
1
n
2
2

1
2
z
/2
_
n
1
n
2
(n+1)
12
.
A rank formulation of the MWW test statistic S
+
R
() will also prove useful. Letting
R(u
i
) denote the rank of u
i
among u
1
, . . . , u
n
we can write
n
2

j=1
R(Y
j
) =
n
2

j=1
#
i
(X
i
< Y
j
) + #
i
(Y
i
Y
j
)
= #(Y
j
X
i
> ) +
n
2
(n
2
+ 1)
2
.
Dening,
W() =
n
2

i=1
R(Y
i
) , (2.2.23)
we thus have the relationship that
S
+
R
() = W()
n
2
(n
2
+ 1)
2
. (2.2.24)
The test statistic W(0) was proposed by Wilcoxon (1945). Since it is a linear function of
the Mann-Whitney test statistic it has identical statistical properties. We will refer to the
statistic, S
+
R
, as the Mann-Whitney-Wilcoxon statistic and will label it as MWW.
As a nal note on the geometry of the rank based analysis, reconsider the model with
the location functional in it, i.e. (2.2.4). Suppose we obtain the R-estimate of , (2.2.18).
Let e
R
= Z C

R
denote the residuals. Next suppose we want to estimate the location
parameter by using the weighted L
1
norm which was discussed for estimation of location
in Section 1.7 of Chapter 1. Let |u|
SR
=

n
j=1
j[u[
(j)
denote this norm. For the residual
vector e
R
, expression (1.3.10) of Chapter 1 is given by
|e 1|
SR
=

e
i
+e
j
2

+ (1/4)|e
R
|
R
. (2.2.25)
80 CHAPTER 2. TWO SAMPLE PROBLEMS
Hence the estimate of determined by this geometry is the Hodges-Lehmann estimate based
on the residuals; i.e.,

R
= med
ij
_
e
i
+e
j
2
_
. (2.2.26)
Asymptotic theory for the joint distribution of the random vector (

R
,

R
)

will be discussed
in Chapter 3.
2.2.3 Computation
The Mann-Whitney-Wilcoxon analysis which we described above is easily computed using the
RBR function twosampwil. This function returns the value of the Mann-Whitney-Wilcoxon
test statistic S
+
R
= S
+
R
(0), (2.2.20), the estimate

, (2.2.18), the associated condence in-
terval (2.2.22), and comparison boxplots of the samples. Also, the R intrinsic function
wilcoxon.test and minitab command MANN compute this Mann-Whitney-Wilcoxon analy-
sis.
2.3 Examples
In this section we present two examples which illustrate the methods discussed in the last
section. The calculations were performed by the RBR functions twosampwil and twosampt
which compute the Mann-Whitney-Wilcoxon and LS analyses, respectively. By convention,
for each dierence Y
j
X
i
= 0, we add the value 1/2 to the test statistic S
+
R
. Further, the
returned p-value is calculated with the usual continuity correction. The estimate of and
its standard error (SE) displayed in the results are given by expression (2.4.27), where a full
discussion is given. The LS analysis, computed by twosampt, is based on the traditional
pooled two-sample t-test.
Example 2.3.1. Quail Data
The data for this problem are drawn from a high volume drug screen designed to nd
compounds which reduce low density lipoproteins, LDL, cholesterol in quail; see McKean,
Vidmar and Sievers (1989) for a discussion of this screen. For the purposes of the present
example, we have taken the plasma LDL levels of one group of quail who were fed over a
specied period of time a special diet mixed with a drug compound and the LDL levels of
a second group of quail who were fed the same special diet but without the drug compound
over the same length of time. A completely randomized design was employed. We will refer
to the rst group as the treatment group and the second group as the control group. The
data are displayed in Table 2.3.1. Let
C
and
T
denote the true median levels of LDL for the
control and treatment populations, respectively. The parameter of interest is =
C

T
.
We are interested in the alternative hypothesis that the treatment has been eective; hence
the hypotheses are:
H
0
: = 0 versus H
A
: > 0 .
2.3. EXAMPLES 81
Table 2.3.1: Data for Quail Example
Control 64 49 54 64 97 66 76 44 71 89
70 72 71 55 60 62 46 77 86 71
Treated 40 31 50 48 152 44 74 38 81 64
The comparison boxplots returned by the RBR function twosampwil are found in Figure
2.3.1. Note that there is one outlier, the fth observation of the treated group, which has
the value 152. Outliers such as this were typical with most of the data in this study; see
McKean et al. (1989). For the data at hand, the treated group appears to have lower LDL
levels.
Figure 2.3.1: Comparison Boxplots of Treatment and Control Quail LDL Levels
Control Treated
4
0
6
0
8
0
1
0
0
1
2
0
1
4
0
L
D
L

c
h
o
l
e
s
t
e
r
o
l
Comparison Boxplots of Treated and Control
The analyses returned by the functions twosampwil and twosampt are given below. The
Mann-Whitney-Wilcoxon test statistic has the value 134.5 with p-value 0.067, while the t-test
statistic has value 0.557 with p-value 0.291. The MWW indicates with marginal signicance
that the treatment performed better than the placebo. The two sample t analysis was
impaired by the outlier.
The Hodges-Lehmann estimate of , (2.2.18), is 14 and the 90% condence interval is
(2.0, 24.0). In contrast, the least squares estimate of shift is 5 and the corresponding 90%
condence interval is (10.25, 20.25).
> twosampwil(y,x,alt=1,alpha=.10,namex="Treated",namey="Control",
82 CHAPTER 2. TWO SAMPLE PROBLEMS
nameresp="LDL cholesterol")
Test of Delta = 0 Alternative selected is 1
Test Stat. S+ is 134.5 Standardized (z) Test-Stat. 1.495801 and
p-vlaue 0.06735282
MWW estimate of the shift in location is 14 SE is 8.180836
90 % Confidence Interval is ( -2 , 24 )
Estimate of the scale parameter tau 21.12283
> twosampt(y,x,alt=1,alpha=.10,namex="Treated",namey="Control",
nameresp="LDL cholesterol")
Test of Delta = 0 Alternative selected is 1
Test Stat. ybar-xbar- 0 is 5 Standardized (t) Test-Stat. 0.5577585
and p-vlaue 0.2907209
Mean of y minus the mean of x is 5 SE is 8.964454
90 % Confidence Interval is ( -10.24971 , 20.24971 )
Estimate of the scale parameter sigma 23.14612
As noted above, this data was drawn from data from a high-speed drug screen to discover
drug compounds which have the potential to reduce LDL cholesterol. In this screen, if a
compound was at least marginally signicant the investigation of it would continue, else it
would be eliminated from further scrutiny. Hence, for this drug compound, the robust and
LS analyses would result in dierent practical outcomes.
Example 2.3.2. Hendy-Charles Coin Data, continuation of Example 1.11.1
Recall that the 84% L
1
condence intervals for the data are disjoint. Thus we reject the
null hypothesis that the silver content is the same for the two mintings at the 5% level. We
now apply the MWW test and condence interval to this data and nd the Hodges-Lehmann
estimate of shift. If the tailweights of the underlying distributions are moderate, the MWW
methods are more ecient.
The output from the RBR function twosampwil is:
> twosampwil(Fourth,First)
Test of Delta = 0 Alternative selected is 0
Test Stat. S+ is 61.5 Standardized (z) Test-Stat. 3.122611
and p-vlaue 0.001792544
MWW estimate of the shift in location is 1.1 SE is 0.2999926
2.4. INFERENCE BASED ON THE MANN-WHITNEY-WILCOXON 83
95 % Confidence Interval is ( 0.6 , 1.7 )
Estimate of the scale parameter tau 0.5952794
Note that there is strong statistical evidence that the mintings are dierent. The Hodges-
Lehmann estimate (2.2.18) is 1.1 which suggests that there is roughly a 1.1% decrease in the
silver content from the rst to the fourth mintings. The 95% condence interval, (2.2.22), is
(0.6, 1.7). Half the length of the condence is .45 and this could be reported as the margin
of error in estimating , the change in median silver contents from the rst to the fourth
mintings. Hence we could report 1.1%.45%.
2.4 Inference Based on the Mann-Whitney-Wilcoxon
We next develop the theory for inference based on the Mann-Whitney-Wilcoxon statistic,
including the test, the estimate and the condence interval. Although much of the devel-
opment is for the location model the general model will also be considered. We begin with
testing.
2.4.1 Testing
Although the geometric motivation of the test statistic S
+
R
was derived under the location
model, the test can be used for more general models. Recall that the general model is
comprised of a random sample X
1
, . . . , X
n
1
with cdf F(x) and a random sample Y
1
, . . . , Y
n
2
with cdf G(x). For the discussion we select the hypotheses,
H
0
: F(x) = G(x), for all x versus H
A
: F(x) G(x), with strict inequality for some x.
(2.4.1)
Under this stochastically ordered alternative Y tends to dominate X,; i.e., P(Y > X) >
1/2. Our rank-based decision rule is to reject H
0
in favor of H
A
if S
+
R
is too large, where
S
+
R
= #(Y
j
X
i
> 0). Our immediate goal is to make this precise. What we discuss will of
course hold for the other one-sided alternative F(x) G(x) and the two-sided alternative
F(x) G(x) or F(x) G(x) as well. Furthermore, since the location model is a submodel
of the general model, what holds for the general model will hold for it also. It will always
be clear which set of hypotheses is being considered.
Under H
0
, we rst show that S
+
R
is distribution free and then show it is symmetrically
distributed about (n
1
n
2
)/2.
Theorem 2.4.1. Under the general null hypothesis in (2.4.1), S
+
R
is distribution free.
Proof: Under the null hypothesis, the combined samples X
1
, . . . , X
n
1
, Y
1
, . . . , Y
n
2
consti-
tute a random sample of size n from the distribution function F(x). Hence any assignment
of n
2
ranks from the set of integers 1, . . . , n to Y
1
, . . . , Y
n
2
is equilikely; i. e., has probability
_
n
n
2
_
1
independent of F.
Theorem 2.4.2. Under H
0
in (2.4.1), the distribution of S
+
R
is symmetric about (n
1
n
2
)/2.
84 CHAPTER 2. TWO SAMPLE PROBLEMS
Proof: Under H
0
in (2.4.1) L(Y
j
X
i
) = L(X
i
Y
j
) for all i, j; see Exercise 2.13.3. Thus
if S

R
= #(X
i
Y
j
> 0) then, under H
0
, L(S
+
R
) = L(S

R
). Since S

R
= n
1
n
2
S
+
R
we have
the following string of equalities which proves the result:
P[S
+
R

n
1
n
2
2
+ x] = P[n
1
n
2
S

R

n
1
n
2
2
+ x]
= P[S

R

n
1
n
2
2
x] = P[S
+
R

n
1
n
2
2
x]
Hence for the hypotheses (2.4.1), a level test based on S
+
R
would reject H
0
if S
+
R

c
,n
1
,n
2
where P
H
0
[S
+
R
c
,n
1
,n
2
] = . From the symmetry, note that the lower critical
point is given by n
1
n
2
c
,n
1
,n
2
.
Although S
+
R
is distribution free under the null hypothesis its distribution cannot be
obtained in closed form. The next theorem gives a recursive formula for its distribution.
The proof can be found in Exercise 2.13.4; see, also, Hettmansperger (l984, p. 136-137).
Theorem 2.4.3. Under the general null hypothesis in (2.4.1), let P
n
1
,n
2
(k) = P
H
0
[S
+
R
= k].
Then
P
n
1
,n
2
(k) =
n
2
n
1
+ n
2
P
n
1
,n
2
1
(k n
1
) +
n
1
n
1
+ n
2
P
n
1
1,n
2
(k) ,
where P
n
1
,n
2
(k) satises the boundary conditions P
i,j
(k) = 0 if k < 0, P
i,0
(k) and P
0,j
(k) are
1 or 0 as k = 0 or k ,= 0.
Based on these recursion formulas, tables of the null distribution can be obtained readily,
which then can be used to obtain the critical values for the rank based test. Alternatively,
the asymptotic null distribution of S
+
R
can be used to determine approximate critical values.
This asymptotic test will be discussed later; see Theorem 2.4.9.
We next derive the mean and variance of S
+
R
under the three models:
(a) the general model where X has distribution function F(x) and Y has distribution func-
tion G(x);
(b) the location model where G(x) = F(x );
(c) and the null model in which F(x) = G(x).
Of course, from Theorem 2.4.2, the null mean of S
+
R
is (n
1
n
2
)/2. In our derivation we
repeatedly make use of the fact that if H is the distribution function of a random variable
Z then the random variable H(Z) has a uniform distribution over the interval (0, 1); see
Exercise 2.13.5.
Theorem 2.4.4. Assuming that X
1
, . . . , X
n
1
are iid F(x) and Y
1
, . . . , Y
n
2
are iid G(x) and
that these two samples are independent of one another, the means of S
+
R
under the three
models (a)-(c) are:
(a) E
_
S
+
R

= n
1
n
2
[1 E [G(X)]] = n
1
n
2
E [F(Y )]
(b) E
_
S
+
R

= n
1
n
2
[1 E [F(X )]] = n
1
n
2
E [F(X + )]
(c) E
_
S
+
R

=
n
1
n
2
2
.
2.4. INFERENCE BASED ON THE MANN-WHITNEY-WILCOXON 85
Proof: We shall prove only (a), since results (b) and (c) follow directly from it. We can
write S
+
R
in terms of indicator functions as
S
+
R
=
n
1

i=1
n
2

j=1
I(Y
j
X
i
> 0) , (2.4.2)
where I(t > 0) is 1 or 0 for t > 0 or t 0, respectively. Let Y have distribution function G,
let X have distribution function F, and let X and Y be independent. Then
E [I (Y X > 0)] = E [P [Y > X[X]]
= E [1 G(X)] = E [F(Y )] ,
where the second equality follows from the independence of X and Y . The results then
follow.
Theorem 2.4.5. The variances of S
+
R
under the models (a) - (c) are:
(a) V ar
_
S
+
R

= n
1
n
2
_
E [G(X)] E
2
[G(X)]
_
+ n
1
n
2
(n
1
1)V ar [F(Y )] + n
1
n
2
(n
2
1)V ar [G(X)]
(b) V ar
_
S
+
R

= n
1
n
2
_
E [F(X )] E
2
[F(X )]
_
+ n
1
n
2
(n
1
1)V ar [F(Y )] + n
1
n
2
(n
2
1)V ar [F(X )]
(c) V ar
_
S
+
R

=
n
1
n
2
(n + 1)
12
.
Proof: Again only the result (a) will be obtained. Using the indicator formulation of S
+
R
,
(2.4.2), we have
V ar
_
S
+
R

=
n
1

i=1
n
2

j=1
V ar [I(Y
j
X
i
> 0)]+
n
1

i=1
n
2

j=1
n
1

l=1
n
2

k=1
Cov [I(Y
j
X
i
> 0), I(Y
k
X
l
> 0)] ,
where the sums for the covariance terms are over all possible combinations except (i, j) =
(l, k). For the rst term, note that the variance of I(Y X > 0) is
V ar [I(Y > X)] = E [I(Y > X)] E
2
[I(Y > X)]
= E [1 G(X)] E
2
[1 G(X)]
= E [G(X)] E
2
[G(X)] .
This yields the rst term in (a). For the covariance terms, note that a covariance is 0 unless
either j = k or i = l. This leads to the following two cases:
86 CHAPTER 2. TWO SAMPLE PROBLEMS
Case(i) For the covariance terms with j = k and i ,= l, we need E [I(Y > X
1
)I(Y > X
2
)]
which is,
E [I(Y > X
1
)I(Y > X
2
)] = P [Y > X
1
, Y > X
2
]
= E [P [Y > X
1
, Y > X
2
[Y ]]
= E [P [Y > X
1
[Y ] P [Y > X
2
[Y ]]
= E
_
F(Y )
2

There are n
2
ways to get a j and n
1
(n
1
1) ways to get i ,= l; hence there are n
1
n
2
(n
1
1)
covariances of this form. This leads to the second term of (a).
Case(ii) ) The terms for the covariances where i = l and j ,= k follow similarly to Case (i).
This leads to the third and nal term of (a).
The last two theorems suggest that the random variable Z =
S
+
R

n
1
n
2
2
q
n
1
n
2
(n+1)
12
has an approx-
imate N(0, 1) distribution under H
0
. This follows from the next results which yield the
asymptotic distribution of S
+
R
under general alternatives as well as under the null hypoth-
esis. We will obtain these results by projecting our statistic S
+
R
down onto a set of linear
combinations of independent random variables. Then we can use central limit theory on the
projection. See Hajek and

Sidak (1967) for a discussion of this technique.
Let T = T(Z
1
, . . . , Z
n
) be a random variable based on a sample Z
1
, . . . , Z
n
such that
E [T] = 0. Let
p

k
(x) = E [T [ Z
k
= x] , k = 1, . . . , n .
Next dene the random variable T
p
to be
T
p
=
n

k=1
p

k
(Z
k
) . (2.4.3)
In the next theorem we show that T
p
is the projection of T onto the space of linear functions of
Z
1
, . . . , Z
n
. Note that unlike T, T
p
is a linear combination of independent random variables;
hence, its asymptotic distribution is often easier to obtain than that of T. As the following
projection theorem shows it is in a sense the closest linear function of the form

p
i
(Z
i
)
to T.
Theorem 2.4.6. If W =

n
i=1
p
i
(Z
i
) then E [(T W)
2
] is minimized by taking p
i
(x) =
p

i
(x). Furthermore E [(T T
p
)
2
] = V ar [T] V ar [T
p
].
Proof: First note that E [p

k
(Z
k
)] = 0. We have,
E
_
(T W)
2

= E
_
[(T T
p
) (W T
p
)]
2

(2.4.4)
= E
_
(T T
p
)
2

+ E
_
(W T
p
)
2

2E [(T T
p
)(W T
p
)] .
2.4. INFERENCE BASED ON THE MANN-WHITNEY-WILCOXON 87
We can write one-half the cross product term as
n

i=1
E [(T T
p
)(p
i
(Z
i
) p

i
(Z
i
))] =
n

i=1
E [E [(T T
p
)(p
i
(Z
i
) p

i
(Z
i
)) [ Z
i
]]
=
n

i=1
E
_
(p
i
(Z
i
) p

i
(Z
i
))E
_
T
n

j=1
p

j
(Z
j
) [ Z
i
__
.
The conditional expectation can be written as,
(E [T [ Z
i
] p

i
(Z
i
))

j=i
E
_
p

j
(Z
j
)

= 0 0 = 0 .
Hence the cross-product term is zero, and, therefore the left-hand-side of expression (2.4.4)
is minimized with respect to W by taking W = T
p
. Also since this holds, in particular, for
W = 0 we get
E
_
T
2

= E
_
(T T
p
)
2

+ E
_
T
2
p

.
Since both T and T
p
have zero means the second result of the theorem also follows.
From these results a strategy for obtaining the asymptotic distribution of T is apparent.
Namely, nd the asymptotic distribution of its projection, T
p
and then show V ar [T]
V ar [T
p
] 0 as n . This implies that T and T
p
have the same asymptotic distribution;
see Exercise 2.13.7. We shall apply this strategy to get the asymptotic distribution of the
rank based methods. As a rst step we obtain the projection of S
+
R
E
_
S
+
R

under the
general model.
Theorem 2.4.7. Under the general model the projection of the random variable S
+
R
E
_
S
+
R

is
T
p
= n
1
n
2

j=1
(F(Y
j
) E [F(Y
j
)]) n
2
n
1

i=1
(G(X
i
) E [G(X
i
)]) . (2.4.5)
Proof: Dene the n random variables Z
1
, . . . , Z
n
by
Z
i
=
_
X
i
if 1 i n
1
Y
in
1
if n
1
+ 1 i n
.
We have,
p

k
(x) = E
_
S
+
R
[ Z
k
= x

E
_
S
+
R

=
n
1

i=1
n
2

j=1
E [I(Y
j
> X
i
) [ Z
k
= x] E
_
S
+
R

. (2.4.6)
There are two cases depending on whether 1 k n
1
or n
1
+ 1 k n
1
+ n
2
= n.
88 CHAPTER 2. TWO SAMPLE PROBLEMS
Case (1) Suppose 1 k n
1
then the conditional expectation in the above expression
(2.4.6), depending on the value of i, becomes
(a): i ,= k, E [I(Y
j
> X
i
) [ X
k
= x] = E [I(Y
j
> X
i
)]
= P [Y > X]
(b): i = k, E [I(Y
j
> X
i
) [ X
i
= x]
= P [Y > X [ X = x]
= 1 G(x)
Hence, in this case,
p

k
(x) = n
2
(n
1
1)P [Y > X] + n
2
(1 G(x)) E
_
S
+
R

.
Case (2) Next suppose that n
1
+ 1 k n then the conditional expectation in the above
expression (2.4.6), depending on the value of j, becomes
(a): j ,= k, E [I(Y
j
> X
i
) [ Y
k
= x] = P [Y > X]
(b): j = k, E [I(Y
j
> X
i
) [ Y
j
= x] = F(x)
Hence, in this case,
p

k
(x) = n
1
(n
2
1)P [Y > X] + n
1
F(x) E
_
S
+
R

.
Combining these results we get
T
p
=
n
1

i=1
p

i
(X
i
) +
n
2

j=1
p

j
(Y
j
)
= n
1
n
2
(n
1
1)P [Y > X] + n
2
n
1

i=1
(1 G(X
i
))
+ n
1
n
2
(n
2
1)P [Y > X] + n
1
n
2

j=1
F(Y
j
) nE
_
S
+
R

.
This can be simplied by noting that
P(Y > X) = E [P(Y > X [ X)] = E [1 G(X)]
or similarly
P(Y > X) = E [F(Y )] .
From (a) of Theorem 2.4.4,
E
_
S
+
R

= n
1
n
2
(1 E [G(X)]) = n
1
n
2
P(Y > X) .
Substituting these three results into (2.4.6) we get the desired result.
An immediate outcome is
2.4. INFERENCE BASED ON THE MANN-WHITNEY-WILCOXON 89
Corollary 2.4.1. Under the general model, if T
p
is given by (2.4.5) then
Var (T
p
) = n
2
1
n
2
Var (F(Y )) +n
1
n
2
2
Var (G(X)) .
From this it follows that T
p
should be standardized as
T

p
=
1

nn
1
n
2
T
p
.
In order to obtain the asymptotic distribution of T
p
and subsequently S
+
R
we need the
following assumption on the design (sample sizes),
(D.1) :
n
i
n

i
, 0 <
i
< 1 . (2.4.7)
This says that the sample sizes go to at the same rate. Note that
1
+
2
= 1. The
asymptotic variance of T

p
is thus
Var (T

p
)
1
Var (F(Y )) +
2
Var (G(X)) .
We rst want to obtain the asymptotic distribution under general alternatives. In order
to do this we need an assumption concerning the ranges of X and Y . The support of a
continuous random variable with distribution function H and density h is dened to be the
set x : h(x) > 0 which is denoted by o(H).
Our second assumption states that the intersection of the supports of F and G has a
nonempty interior; that is
(E.3) : There is an open interval I such that I o(F) o(G) . (2.4.8)
Note that the asymptotic variance of T

p
is not zero under (E.3).
We are now in the position to nd the asymptotic distribution of T

p
.
Theorem 2.4.8. Under the general model and the assumptions (D.1) and (E.3), T

p
has an
asymptotic N (0,
1
Var (F(Y )) +
2
Var (G(X))) distribution.
Proof: By (2.4.5) we can write
T

p
=
_
n
1
nn
2
n
2

j=1
(F(Y
j
) E [F(Y
j
)])
_
n
2
nn
1
n
1

i=1
(G(X
i
) E [G(X
i
)]) . (2.4.9)
Note that both sums on the right side of expression (2.4.9) are composed of independent and
identically distributed random variables and that the sums are independent of one another.
The result then follows immediately by applying the simple central limit theorem to each
sum.
This is the key result we need in order to obtain the asymptotic distribution of our test
statistic S
+
R
. We rst obtain the result under the general model and then under the null
hypothesis. As we will see, both results are immediate.
90 CHAPTER 2. TWO SAMPLE PROBLEMS
Theorem 2.4.9. Under the general model and the conditions (E.3) and (D.1) the random
variable
S
+
R
E[S
+
R
]

Var (S
+
R
)
has a limiting N(0, 1) distribution.
Proof: By the last theorem and Theorem 2.4.6, we need only show that the dierence in
the variances of S
+
R
/

nn
1
n
2
and T

p
goes to 0 as n . Note that,
Var
_
1

nn
1
n
2
S
+
R
_
=
n
1
n
2
nn
1
n
2
_
E [G(X)] E [G(X)]
2
_
+
n
1
n
2
(n
1
1)
nn
1
n
2
Var (F(Y )) +
n
1
n
2
(n
2
1)
nn
1
n
2
Var (G(X)) ;
hence, Var (T

p
) Var (S
+
R
/

nn
1
n
2
) 0 and the result follows from Exercise (2.13.7).
The asymptotic distribution of the test statistic under the null hypothesis follows imme-
diately from this theorem. We record it in the next corollary.
Corollary 2.4.2. Under H
0
: F(x) = G(x) and (D.1) only, the test statistic S
+
R
is approx-
imately N
_
n
1
n
2
2
,
n
1
n
2
(n+1)
12
_
.
Therefore an asymptotic size test for H
0
: F(x) = G(x) versus H
A
: F(x) ,= G(x) is
to reject H
0
if [z[ z
/2
where
z =
S
+
R

n
1
n
2
2
_
n
1
n
2
(n+1)
12
and
1 (z
/2
) = /2 .
Since we approximate a discrete random variable with continuous one, we think it is advisable
in cases of small samples to use a continuity correction. Fix and Hodges (l955) give an
Edgeworth approximation to the distribution of S
+
R
and Bickel (l974) discusses the error of
this approximation.
Since the standard normal distribution function, , is continuous on the entire real line,
we can strengthen the convergence in Theorem 2.4.9 to uniform convergence; that is, the
distribution function of the standardized MWW converges uniformly to . Using this, it
is not hard to show that the standardized critical values of the MWW converge to their
counterparts at the standard normal. Thus if c
,n
is the MWW critical value dened by
= P
H
0
[S
+
R
c
,n
] then
c
,n

n
1
n
2
2
_
n
1
n
2
(n+1)
12
z

, (2.4.10)
where 1 = (z

); see Exercise 2.13.8 for details. This result will prove useful in the next
section.
2.4. INFERENCE BASED ON THE MANN-WHITNEY-WILCOXON 91
We now consider when the test based on S
+
R
is consistent. Consider the general set up;
i. e., X
1
, . . . , X
n
1
is a random sample with distribution function F(x) and Y
1
, . . . , Y
n
2
is a
random sample with distribution function G(x). Consider the hypotheses
H
0
: F = G versus H
A1
: F(x) G(x) with F(x
0
) > G(x
0
) for some x
0
Int(o(F) o(G)).
(2.4.11)
Such an alternative is called a stochastically ordered alternative. The next theorem shows
that the MWW test statistic is consistent for this alternative. Likewise it is consistent for
the other one sided stochastically ordered alternative with F and G interchanged, H
A2
, and,
also, for the two sided alternative which consists of the union of H
A1
and H
A2
. These results
imply that the MWW test is consistent for location alternatives, provided F and G have
overlapping support. As Exercise 2.13.9 shows, it will also be consistent when one support
is shifted to the right of the other support.
Theorem 2.4.10. Suppose that the assumptions (D.1), (2.4.7), and (E.3), (2.4.8), hold.
Under the stochastic ordering alternatives given above, S
+
R
is a consistent test.
Proof: Assume the stochastic ordering alternative H
A1
, (2.4.11). For an arbitrary level ,
select the critical level c

such that the test that rejects H

0
if S
+
R
c

has asymptotic level

. We want to show that the power of the test goes to 1 as n . Since F(x
0
) > G(x
0
) for
some point x
0
in the interior of o(F)o(G), there exists an interval N such that F(x) > G(x)
on N. Hence
E
H
A
[G(X)] =
_
N
G(y)f(y)dy +
_
N
c
G(y)f(y)dy
<
_
N
F(y)f(y)dy +
_
N
c
F(y)f(y)dy =
1
2
(2.4.12)
The power of the test is given by
P
H
A
_
S
+
R
c

= P
H
A
_
S
+
R
E
H
A
(S
+
R
)
_
V
H
A
(S
+
R
)

(n
1
n
2
/2)
_
V
H
A
(S
+
R
)
+
(n
1
n
2
/2) E
H
A
(S
+
R
)
_
V
H
A
(S
+
R
)
_
.
Note by (2.4.10) that
c

(n
1
n
2
/2)
_
V
H
A
(S
+
R
)
=
c

(n
1
n
2
/2)
_
V
H
0
(S
+
R
)
_
V
H
0
(S
+
R
)
_
V
H
A
(S
+
R
)
z

,
where is a real number (since the variances are of the same order). But by (2.4.12)
(n
1
n
2
/2) E
H
A
(S
+
R
)
_
V
H
A
(S
+
R
)
=
(n
1
n
2
/2) n
1
n
2
[1 E
H
A
(G(X))
_
V
H
A
(S
+
R
)
]
=
n
1
n
2
_

1
2
+ E
H
A
(G(X))

_
V
H
A
(S
+
R
)
.
By Theorem (2.4.9), under H
A
the random variable
S
+
R
E
H
A
(S
+
R
)

V
H
A
(S
+
R
)
converges in distribution to
a standard normal variate. Since the convergence is uniform, it follows from the above limits
that the power converges to 1. Hence the MWW test is consistent.
92 CHAPTER 2. TWO SAMPLE PROBLEMS
2.4.2 Condence Intervals
Consider the location model (2.2.4). We next obtain a distribution free condence interval
for by inverting the MWW test. As a rst step we have the following result on the function
S
+
R
(), (2.2.20):
Lemma 2.4.1. S
+
R
() is a decreasing step function of which steps down by 1 at each
dierence Y
j
X
i
. Its maximum is n
1
n
2
and its minimum is 0.
Proof: Let D
(1)
D
(n
1
n
2
)
denote the ordered n
1
n
2
dierences Y
j
X
i
. The results
follow immediately by writing S
+
R
() = #(D
(i)
> ).
Let be given and choose c
/2
to be the lower /2 critical point of the MWW distribution;
i.e., P

_
S
+
R
() c
/2,

= /2. By the above lemma we have

1 = P

_
c
/2
< S
+
R
() < n
1
n
2
c
/2

= P

_
D
(c
/2
+1)
< D
(n
1
n
2
c
/2
)
_
.
Thus [D
(c
/2
+1)
, D
(n
1
n
2
c
/2
)
) is a (1 )100% condence interval for ; compare (1.3.30).
From the asymptotic null distribution theory for S
+
R
, Corollary (2.4.2), we can approximate
c
/2
as
c
/2
.
=
n
1
n
2
2
z
/2
_
n
1
n
2
(n + 1)
12
.5 . (2.4.13)
2.4.3 Statistical Properties of the Inference Based on the MWW
In this section we derive the eciency properties of the MWW test statistic and properties
of its power function under the location model (2.2.4).
We begin with an investigation of the power function of the MWW test. For deniteness
we will consider the one sided alternative,
H
0
: = 0 versus H
A
: > 0 . (2.4.14)
Results similar to those given below can be obtained for the power function of the other one
sided and the two sided alternatives. Given a level , let c
,n
1
,n
2
denote the upper critical
value for the MWW test of this hypothesis; hence, the test rejects H
0
if S
+
R
c
,n
1
,n
2
. The
power function of this test is given by
() = P

[S
+
R
c
,n
1
,n
2
] , (2.4.15)
where the subscript on P denotes that the probability is determined when the true pa-
rameter is . Recall that S
+
R
() = #Y
j
X
i
> .
The following theorem will prove useful, its proof is similar to that of Theorem 1.3.1 of
Chapter 1 and the more general result Theorem A.2.4 of the Appendix.
Theorem 2.4.11. For all t, P

[S
+
R
(0) t] = P
0
[S
+
R
() t].
2.4. INFERENCE BASED ON THE MANN-WHITNEY-WILCOXON 93
From Lemma 2.4.1 and Theorem 2.4.11 we have our rst important result on the power
function of the MWW test; namely, that it is monotone.
Theorem 2.4.12. For the above hypotheses (2.4.14), the function () in monotonically
increasing in .
Proof: Let
1
<
2
. Then
2
<
1
and, hence, from Lemma 2.4.1, we have
S
+
R
(
2
) S
+
R
(
1
). By applying Theorem 2.4.11, the desired result, (
2
) (
1
),
follows from the following:
1 (
2
) = P

2
[S
+
R
(0) < c
,n
1
,n
2
]
= P
0
[S
+
R
(
2
) < c
,n
1
,n
2
]
P
0
[S
+
R
(
1
) < c
,n
1
,n
2
]
= P

1
[S
+
R
(0) < c
,n
1
,n
2
]
= 1 (
1
) .
From this we immediately have that the MWW test is unbiased; that is, its power
function evaluated at an alternative is always at least as large as its level of signicance. We
state it as a corollary.
Corollary 2.4.3. For the above hypotheses (2.4.14), () for all > 0.
A more general null hypothesis is given by
H

0
: 0 versus H
A
: > 0 .
If T is any test for these hypotheses with critical region C then we say T is a size test
provided
sup
0
P

[T C] = .
For selected , it follows from the monotonicity of the MWW power function that the MWW
test has size for this more general null hypothesis.
From the above theorems, we have that the MWW power function is monotonically
increasing in . Since S
+
R
() achieves its maximum for nite, we have by Theorem 1.5.2
of Chapter 1 that the MWW test is resolving; hence, its power function approaches one as
. Even for the location model, though, we cannot get the power function of the MWW
test in closed form. For local alternatives, however, we can obtain an asymptotic expression
for the power function. Applications of this result include sample size determination for
the MWW test and eciency comparisons of the MWW with other tests, both of which we
consider.
We will need the assumption that the density f(x) had nite Fisher Information, i.e.,
(E.1) f is absolutely continuous, 0 < I(f) =
_
1
0

2
f
(u) du < , (2.4.16)
94 CHAPTER 2. TWO SAMPLE PROBLEMS
where

f
(u) =
f

(F
1
(u))
f(F
1
(u))
. (2.4.17)
As discussed in Section 3.4, assumption (E.1) implies that f is uniformly bounded.
Once again we will consider the one sided alternative, (2.4.14), (similar results hold for
the other one sided and two sided alternatives). Consider a sequence of local alternatives of
the form
H
An
:
n
=

n
, (2.4.18)
where > 0 is arbitrary but xed.
As a rst step, we need to show that S
+
R
() is Pitman regular as discussed in Chapter
1. Let S
+
R
() = S
+
R
()/(n
1
n
2
). We need to verify the four conditions of Denition 1.5.3 of
Chapter 1. The rst condition is true by Lemma 2.4.1 and the fourth condition follows from
Corollary 2.4.2. By (b) of Theorem 2.4.4, we have
() = E

[S
+
R
(0)] = 1 E[F(X )] . (2.4.19)
By assumption (E.1), (2.4.16),
_
f
2
(x) dx sup f
_
f(x) dx < . Hence dierentiating
(2.4.19) we obtain

(0) =
_
f
2
(x)dx > 0 and, thus, the second condition is true. Hence we
need only show that the third condition, asymptotic linearity of S
+
R
() is true. This will
follow provided we can show the variance condition (1.5.17) of Theorem 1.5.6 is true. Note
that
S
+
R
(/

n) S
+
R
(0) = (n
1
n
2
)
1
#(0 < Y
j
X
i
/

n) .
This is similar to the MWW statistic itself. Using essentially the same argument as that for
the variance of the MWW statistic, Theorem 2.4.5 we get
nVar
0
[S
+
R
(/

n) S
+
R
(0)] =
n
n
1
n
2
(a
n
a
2
n
) +
n(n
1
1)
n
1
n
2
(b
n
c
n
)
+
n(n
2
1)
n
1
n
2
(d
n
a
2
n
) ,
where a
n
= E
0
[F(X +/

n) F(X)], b
n
= E
0
[(F(Y ) F(Y /

n))
2
], c
n
= E
0
[(F(Y )
F(Y /

n))], and d
n
= E
0
[(F(X + /

n) F(X))
2
]. Using the Lebesgue Dominated
Convergence Theorem, it is easy to see that a
n
, b
n
, c
n
, and d
n
all converge to 0. Therefore
Condition (1.5.17) of Theorem 1.5.6 holds and we have thus established the asymptotic
linearity result given by:
sup
||B
_

n
1/2
S
+
R
(/

n) n
1/2
S
+
R
(0) +
_
f
2
(x) dx

_
P
0 , (2.4.20)
for any B > 0. Therefore, it follows that S
+
R
() is Pitman regular.
2.4. INFERENCE BASED ON THE MANN-WHITNEY-WILCOXON 95
In order to get the ecacy of the MWW test, we need the quantity
2
(0) dened by

2
(0) = lim
n0
nVar
0
(S
R
(0))
= lim
n0
nn
1
n
2
(n + 1)
n
2
1
n
2
2
12
= (12
1

2
)
1
;
see expression (1.5.12). Therefore by (1.5.11) the ecacy of the MWW test is
c
MWW
=

(0)/(0) =
_

12
_
f
2
(x) dx =
_

1
, (2.4.21)
where is the scale parameter given by
= (

12
_
f
2
(x)dx)
1
. (2.4.22)
In Exercise 2.13.10 it is shown that the ecacy of the two sample pooled t-test is

1
where
2
is the common variance of X and Y . Hence the eciency of the the
MWW test to the two sample t test is the ratio
2
/
2
. This of course is the same eciency
as that of the signed rank Wilcoxon test to the one sample t test; see (1.7.13). In particular
if the distribution of X is normal then the eciency of the MWW test to the two sample
t test is .955. For heavier tailed distributions, this eciency is usually larger than 1; see
Example 1.7.1.
As in Chapter 1 it is convenient to summarize the asymptotic linearity result as follows:

n
_
S
+
R
(/

n) (0)
(0)
_
=

n
_
S
+
R
(0) (0)
(0)
_
c
MWW
+ o
p
(1) , (2.4.23)
uniformly for [[ B and any B > 0.
The next theorem is the asymptotic power lemma for the MWW test. As in Chapter 1,
(see Theorem 1.5.8), its proof follows from the Pitman regularity of the MWW test.
Theorem 2.4.13. Under the sequence of local alternatives, (2.4.18),
lim
n
(
n
) = P
0
[Z z

c
MWW
] = 1
_
z

_
12
1

2
_
f
2

_
; ,
where Z is N(0, 1).
In Exercise 2.13.10, it is shown that if
LS
() denotes the power function of the usual
two sample t-test then
lim
n

LS
(
n
) = 1
_
z

_
, (2.4.24)
where
2
is the common variance of X and Y . By comparing these two power functions, it is
seen that the Wilcoxon is asymptotically more powerful if < , i.e., if e = c
2
MWW
/c
2
t
> 1.
96 CHAPTER 2. TWO SAMPLE PROBLEMS
As an application of the asymptotic power lemma, we consider sample size determi-
nation. Consider the MWW test for the one sided hypothesis (2.4.14). Suppose the level,
, and the power, , for a particular alternative
A
are specied. For convenience, assume
equal sample sizes, i. e. n
1
= n
2
= n

where n

denotes the common sample size; hence,

1
=
2
= 2
1
. Express
A
as

A
/

. Then by Theorem 2.4.13 we have

.
= 1
_
z

_
1
4

_
,
But this implies
z

= z

A
/

2
and (2.4.25)
n

=
_
z

A
_
2
2
2
.
The above value of n

is the approximate sample size. Note that it does depend on which,

in applications, would have to be guessed or estimated in a pilot study; see the discussion
in Section 2.4.5, (estimates of are discussed in Sections 2.4.5 and 3.7.1). For a specied
distribution it can be evaluated; for instance, if the underlying density is assumed to be
normal with standard deviation then =
_
/3.
Using (2.4.24) a similar derivation can be obtained for the usual two sample t-test, re-
sulting in an approximate sample size of
n

LS
=
_
z

A
_
2
2
2
.
The ratio of the sample size needed by the MWW test to that of the two sample t test is

2
/
2
. This provides additional motivation for the denition of eciency.
2.4.4 Estimation of
Recall from the geometry earlier in this chapter that the estimate of based on the rank-
pseudo norm is

R
= med
i,j
Y
j
X
i
, (2.2.18). We now obtain several properties of
this estimate including its asymptotic distribution. This will lead again to the eciency
properties of the rank based methods discussed in the last section.
For convenience, we note some equivariances of

R
=

(Y, X), which are established in
Exercise 2.13.11. First,

R
is translation equivariant; i.e.,

R
(Y+ +, X+ ) =

R
(Y, X) + ,
for any and . Second,

R
is scale equivariant; i.e.,

R
(aY, aX) = a

R
(Y, X) ,
for any a. Based on these we next show that

R
is an unbiased estimate of under certain
conditions.
2.4. INFERENCE BASED ON THE MANN-WHITNEY-WILCOXON 97
Theorem 2.4.14. If the errors, e

i
, in the location model (2.2.4) are symmetrically dis-
tributed about 0, then

R
is symmetrically distributed about .
Proof: Due to translation equivariance there is no loss of generality in assuming that
and are 0. Then Y and X are symmetrically distributed about 0; hence, L(Y ) = L(Y )
and L(X) = L(X). Thus from the above equivariance properties we have,
L(

(Y, X)) = L(

(Y, X) = L(

(Y, X)) .
Therefore

R
is symmetrically distributed about 0, and, in general it is symmetrically dis-
tributed about .
Theorem 2.4.15. Under Model (2.2.4), if n
1
= n
2
then

R
is symmetrically distributed
about .
The reader is asked to prove this in Exercise 2.13.12. In general,

R
may be biased if the
error distribution is not symmetrically distributed but as the following result shows

R
is
always asymptotically unbiased. Since the MWW process S
+
R
() was shown to be Pitman
regular the asymptotic distribution of

n(

) is N(0, c
2
MWW
). In practice, we say

R
has an approximate N(,
2
(n
1
1
+ n
1
2
)) distribution,
where was dened in (2.4.22).
Recall from Denition 1.5.4 of Chapter 1, that the asymptotic relative eciency of two
Pitman regular estimators is the reciprocal of the ratio of their aymptotic variances. As
Exercise 2.13.10 shows, the least squares estimate

LS
= Y X of is approximately
N
_
,
2
_
1
n
1
+
1
n
2
__
; hence,
e(

R
,

LS
) =

2

2
= 12
2
f
__
f
2
(x) dx
_
2
.
This agrees with the asymptotic relative eciency results for the MWW test relative to the
t test and (1.7.13).
2.4.5 Eciency Results Based on Condence Intervals
Let L
1
be the length of the (1 )100% distribution free condence interval based on the
MWW statistic discussed in Section 2.4.2. Since this interval is based on the Pitman regular
process S
+
R
(), it follows from Theorem 1.5.9 of Chapter 1 that
_
n
1
n
2
n
L
1
2z
/2
P
; (2.4.26)
that is, the standardized length of a distribution-free condence interval is a consistent
estimate of the scale parameter . It further follows from (2.4.26) that, as in Chapter 1, if
98 CHAPTER 2. TWO SAMPLE PROBLEMS
eciency is based on the relative squared asymptotic lengths of condence intervals then we
obtain the same eciency results as quoted above for tests and estimates.
In the RBR computational function twosampwil a simple degree of freedom adjustment
is made in the estimation of . In the traditional LS analysis based on the pooled t, this
adjustment is equivalent to dividing the pooled estimate of variance by n
1
+ n
2
2 instead
of n
1
+ n
2
. Hence, as our estimate of , the function twosampwil uses
=
_
n
1
+ n
2
n
1
+ n
2
2
_
n
1
n
2
n
L
1
2z
/2
. (2.4.27)
Thus the standard error (SE) of the estimator

R
is given by
_
(1/n
1
) + (1/n
2
).
The distribution free condence interval is not symmetric about

R
. Often in practice
symmetric intervals are desired. Based on the asymptotic distribution of

R
we can formulate
the approximate interval

R
z
/2

_
1
n
1
+
1
n
2
, (2.4.28)
where is a consistent estimate of . If we use (2.4.26) as our estimate of with the level
, then the condence interval simplies to

L
1
2
. (2.4.29)
Besides the estimate given in (2.4.26), a consistent estimate of was proposed by by
Koul, Sievers and McKean (1987) and will be discussed in Section 3.7. Using this estimate
small sample studies indicate that z
/2
should be replaced by the t critical value t
(/2,n1)
;
see McKean and Sheather (1991) for a review of small sample studies on R-estimates. In
this case, the symmetric condence interval based on

R
is directly analogous to the usual
t interval based on least squares in that the only dierence is that is replaced by .
Example 2.4.1. Hendy and Charles Coin Data, continued from Examples 1.11.1 and 2.3.2
Recall from Chapter 1 that this example concerned the silver content in two coinages (the
rst and the fourth) minted during the reign of Manuel I. The data are given in Chapter
1. The Hodges-Lehmann estimate of the dierence between the rst and the fourth coinage
is 1.10 percent of silver and a 95% condence interval for the dierence is (.60, 1.70). The
length of this condence interval is 1.10; hence, the estimate of given in expression (2.4.27)
is 0.595. The symmetrized condence interval (2.4.28) based on the t upper .025 critical
value is (0.46, 1.74). Both of these intervals are in agreement with the condence interval
obtained in Example 1.11.1 based on the two L
1
condence intervals.
Another estimate of can be obtained from a similar consideration of the distribution free
condence intervals based on the signed-rank statistic discussed in Chapter 1; see Exercise
2.13.13. Note in this case for consistency, though, we would have to assume that f is
symmetric.
2.5. GENERAL RANK SCORES 99
2.5 General Rank Scores
In this section we will be concerned with the location model; i.e., X
1
, . . . , X
n
1
are iid F(x),
Y
1
, . . . , Y
n
2
are iid G(x) = F(x), and the samples are independent of one another. We will
present an analysis for this problem based on general rank scores. In this terminology, the
Mann-Whitney-Wilcoxon procedures are based on a linear score function. We will present
the results for the hypotheses
H
0
: = 0 versus H
0
: > 0 . (2.5.1)
The results for the other one sided and two sided alternatives are similar. We will also be
concerned with estimation and condence intervals for . As in the preceeding sections we
will rst present the geometry.
Recall that the pseudo norm which generated the MWW analysis could be written as a
linear combination of ranks times residuals. This is easily generalized. Consider the function
|u|

=
n

i=1
a(R(u
i
))u
i
, (2.5.2)
where a(i) are scores such that a(1) a(n) and

a(i) = 0. For the next theorem,

we will also assume that a(i) = a(n +1 i); although, this is only used to show the scalar
multiplicative property.
Theorem 2.5.1. Suppose that a(1) a(n),

a(i) = 0, and a(i) = a(n + 1 i).

Then the function | |

is a pseudo-norm.
Proof: By the connection between ranks and order statistics we can write
|u|

=
n

i=1
a(i)u
(i)
.
Next suppose that u
(j)
is the last order statistic with a negative score. Since the scores sum
to 0, we can write
|u|

=
n

i=1
a(i)(u
(i)
u
(j)
)
=

ij
a(i)(u
(i)
u
(j)
) +

ij
a(i)(u
(i)
u
(j)
) . (2.5.3)
Both terms on the right side are nonnegative; hence, |u|

0. Since all the terms in (2.5.3)

are nonnegative, |u|

= 0 implies that all the terms are zero. But since the scores are
not all 0, yet sum to zero, we must have a(1) < 0 and a(n) > 0. Hence we must have
u
(1)
= u
(j)
= u
(n)
; i.e., u
(1)
= = u
(n)
. Conversely if u
(1)
= = u
(n)
then |u|

= 0. By
the condition a(i) = a(n + 1 i) it follows that |u|

= [[|u|

; see Exercise 2.13.16.

100 CHAPTER 2. TWO SAMPLE PROBLEMS
In order to complete the proof we need to show the triangle inequality holds. This is
established by the following string of inequalities:
|u +v|

=
n

i=1
a(R(u
i
+ v
i
))(u
i
+ v
i
)
=
n

i=1
a(R(u
i
+ v
i
))u
i
+
n

i=1
a(R(u
i
+ v
i
))v
i

i=1
a(i)u
(i)
+
n

i=1
a(i)v
(i)
= |u|

+|v|

.
The proof of the above inequality is similar to that of Theorem 1.3.2 of Chapter 1.
Based on a set of scores satisfying the above assumptions, we can establish a rank infer-
ence for the two sample problem similar to the MWW analysis. We shall do so for general
rank scores of the form
a

(i) = (i/(n + 1)) , (2.5.4)

where (u) satises the following assumptions
_
(u) is a nondecreasing function dened on the interval (0, 1)
_
1
0
(u) du = 0 and
_
1
0

2
(u) du = 1
; (2.5.5)
see (S.1), (3.4.10) in Chapter 3, also. The last assumptions concerning standardization of
the scores are for convenience. The Wilcoxon scores are generated in this way by the linear
function
R
(u) =

12(u (1/2)) and the sign scores are generated by

S
(u) = sgn(2u 1).
We will denote the corresponding pseudo norm for scores generated by (u) as
|u|

=
n

i=1
a

(R(u
i
))u
i
. (2.5.6)
These two sample sign and Wilcoxon scores are generalizations of the sign and Wilcoxon
scores discussed in Chapter 1 for the one sample problem. In Section 1.8 of Chapter 1 we
presented one sample analyses based on general score functions. Similar to the sign and
Wilcoxon cases, we can generate a two sample score function from any one sample score
function. For reference we establish this in the following theorem:
Theorem 2.5.2. As discussed at the beginning of Section 1.8, let
+
(u) be a score function
for the one sample problem. For u (1, 0), let
+
(u) =
+
(u). Dene,
(u) =
+
(2u 1) , for u (0, 1) . (2.5.7)
and
|x|

=
n

i=1
(R(x
i
)/(n + 1))x
i
. (2.5.8)
2.5. GENERAL RANK SCORES 101
Then | |

is a pseudo-norm on 1
n
. Furthermore
(u) = (1 u) , (2.5.9)
and
_
1
0

2
(u) du =
_
1
0
(
+
(u))
2
du . (2.5.10)
Proof: As discussed in the beginning of Section 1.8, (see expression (1.8.1)),
+
(u) is a
positive valued and nondecreasing function dened on the interval (0, 1). Based on these
properties, it follows that (u) is nondecreasing and that
_
1
o
(u) du = 0. Hence | |

is a
pseudo-norm on 1
n
. Properties (2.5.9) and (2.5.10) follow readily; see Exercise 2.13.17 for
details.
The two sample sign and Wilcoxon scores, cited above, are easily seen to be generated
this way from their one sample counterparts
+
(u) = 1 and
+
(u) =

3u, respectively. As
discussed further in Section 2.5.3, properties such as eciencies of the analysis based on the
one sample scores are the same for a two sample analysis based on their corresponding two
sample scores.
In the notation of (2.2.3), the estimate of is

= Argmin |Z C|

.
Denote the negative of the gradient of |Z C|

by S

(). Then based on (2.5.6),

() =
n
2

j=1
a

(R(Y
j
)) . (2.5.11)
Hence

equivalently solves the equation,

)
.
= 0 . (2.5.12)
As with pseudo norms in general, the function |Z C|

is a convex function of . The

negative of its derivative, S

(), is a decreasing step function of which steps down at the

dierences Y
j
X
i
; see Exercise 2.13.18. Unlike the MWW function S
R
(), the step sizes
of S

() are not necessarily the same size. Based on MWW starting values, a simple trace
algorithm through the dierences can be used to obtain the estimator

. The R function
twosampr2 computes the rank-based analysis for general scores.
The gradient rank test statistic for the hypotheses (2.5.1) is
S

=
n
2

j=1
a

(R(Y
j
)) . (2.5.13)
102 CHAPTER 2. TWO SAMPLE PROBLEMS
Since the test statistic only depends on the ranks of the combined sample it is distribution
free under the null hypothesis. As shown in Exercise 2.13.18,
E
0
[S

] = 0 (2.5.14)

= V
0
[S

] =
n
1
n
2
n(n 1)
n

i=1
a
2
(i) . (2.5.15)
Note that we can write the variance as

=
n
1
n
2
n 1
_
n

i=1
a
2
(i)
1
n
_
.
=
n
1
n
2
n 1
, (2.5.16)
where the approximation is due to the fact that the term in braces is a Riemann sum of
_

2
(u)du = 1 and, hence, converges to 1.
It will be convenient from time to time to use rank statistics based on unstandardized
scores; i.e., a rank statistic of the form
S
a
=
n
2

j=1
a(R(Y
j
)) , (2.5.17)
where a(i) = (i/(n +1)), i = 1, . . . , n is a set of scores. As Exercise 2.13.18 shows the null
mean
S
and null variance
2
S
of S
a
are given by

S
= n
2
a and
2
S
=
n
1
n
2
n(n 1)

(a(i) a)
2
. (2.5.18)
2.5.1 Statistical Methods
The asymptotic null distribution of the statistic S

, (2.5.13), easily follows from Theorem

A.2.1 of the Appendix. To see this note that we can use the notation (2.2.1) and (2.2.2) to
write S

as a linear rank statistic; i.e.,

=
n

i=1
c
i
a(R(Z
i
)) =
n

i=1
(c
i
c)a
_
n
n + 1
F
n
(Z
i
)
_
, (2.5.19)
where F
n
is the empirical distribution function of Z
1
, . . . , Z
n
. Our score function is mono-
tone and square integrable; hence, the conditions on scores in Section A.2 are satised. Also
F is continuous so the distributional assumption is satised. Finally, we need only show that
the constants c
i
satisfy conditions, D.2, (3.4.7), and D.3, (3.4.8). It is a simple exercise to
show that
n

i=1
(c
i
c)
2
=
n
1
n
2
n
max
1in
(c
i
c)
2
= max
_
n
2
2
n
2
,
n
2
1
n
2
_
.
2.5. GENERAL RANK SCORES 103
Under condition (D.1), (2.4.7), 0 <
i
< 1 where lim(n
i
/n) =
i
for i = 1, 2. Using this
along with the last two expressions, it is immediate that Noethers condition, (3.4.9), holds
for the c
i
s. Thus the assumptions of Section A.2 hold for the statistic S

.
As in expression (A.2.7) of Section A.2, dene the random variable T

as
T

=
n

i=1
(c
i
c)(F(Z
i
)) . (2.5.20)
By comparing expressions (2.5.19) and (2.5.20), it seems that the variable T

is an approx-
imation of S

. This follows from Section A.2. Briey, under H

0
the distribution of T

is approximately normal and Var((T

) 0; hence, S

is asymptotically normal
with mean and variance given by expressions (2.5.14) and (2.5.15), respectively. Hence, an
asymptotic level test of the hypotheses (2.5.1) is
Reject H
0
in favor of H
A
, if S

,
where

is dened by (2.5.15).
As discussed above, the estimate

of solves the equation (2.5.12). The interval

(

L
,

U
) is a (1 )100% condence interval for (based on the asymptotic distribution)
provided

L
and

U
solve the equations
S

U
)
.
= z
/2
_
n
1
n
2
n
and S

L
)
.
= z
/2
_
n
1
n
2
n
, (2.5.21)
where 1 (z
/2
) = /2. As with the estimate of , these equations can be easily solved
with an iterative algorithm; see Exercise 2.13.18.
2.5.2 Eciency Results
In order to obtain the eciency results for these statistics, we rst show that the process
S

() is Pitman regular. For general scores we need to further assume that the density
has nite Fisher information; i.e., satises condition (E.1), (2.4.16). Recall that Fisher
information is given by I(f) =
_
1
0

2
F
(u) du, where

f
(u) =
f

(F
1
(u))
f(F
1
(u))
. (2.5.22)
Below we will show that the score function
f
is optimal. Dene the parameter

as,

=
_
(u)
f
(u)du . (2.5.23)
Estimation of

is dicussed in Section 3.7.

104 CHAPTER 2. TWO SAMPLE PROBLEMS
To show that the process S

() is Pitman regular, we show that the four conditions of

Denition 1.5.3 are true. As noted after expression (2.5.12), S

() is nonincreasing; hence,
the rst condition holds. For the second condition, note that we can write
S

() =
n
2

i=1
a(R(Y
i
)) =
n
2

i=1

_
n
1
n + 1
F
n
1
(Y
i
) +
n
2
n + 1
F
n
2
(Y
i
)
_
, (2.5.24)
where F
n
1
and F
n
2
are the empirical cdfs of the samples X
1
, . . . , X
n
1
and Y
1
, . . . , Y
n
2
, respec-
tively. Hence, passing to the limit we have,
E
0
_
1
n
S

()
_

2
_

[
1
F(x) +
2
F(x )]f(x ) dx
=
2
_

[
1
F(x + ) +
2
F(x)]f(x) dx =

() ; (2.5.25)
see Cherno and Savage (1958) for a rigorous proof of the limit. Dierentiating

() and
evaluating the derivative at 0 we obtain

(0) =
1

2
_

[F(t)]f
2
(t) dt
=
1

2
_

[F(t)]
_

(t)
f(t)
_
f(t) dt
=
1

2
_
1
0
(u)
f
(u) du =
1

> 0 . (2.5.26)
Hence, the second condition is satised.
The null asymptotic distribution of S

(0) was established in the Section 2.5.1; hence the

fourth condition is true. Hence, we need only establish asymptotic linearity. This result
follows from the results for general rank regression statistics which are developed in Section
A.2.2 of the Appendix. By Theorem A.2.8 of the Appendix, the asymptotic linearity result
for S

() is given by
1

n
S

n) =
1

n
S

(0)
1

2
+ o
p
(1) , (2.5.27)
uniformly for [[ B, where B > 0 and

is dened in (2.5.23).
Therefore, following Denition 1.5.3 of Chapter 1, the estimating function is Pitman
regular.
By the discussion following (2.5.20), we have that n
1/2
S

(0)/

2
is asymptotically
N(0, 1). The ecacy of the test based on S

is thus given by
c

2
=
1

2
. (2.5.28)
2.5. GENERAL RANK SCORES 105
As with the MWW analysis, several important items follow immediately from Pitman
regularity. Consider rst the behavior of S

under local alternatives. Specically consider

a level test based on S

for the hypothesis (2.5.1) and the sequence of local alternatives

H
n
:
n
= /

n. As in Chapter 1, it is easy to show that the asymptotic power of the

test based on S

is given by
lim
n
P
/

n
[S

] = 1 (z

) . (2.5.29)
Based on this result, sample size determination for the test based on S

can be conducted
similar to that based on the MWW test statistic; see (2.4.25).
Next consider the asymptotic distribution of the estimator

. Recall that the

estimate

solves the equation S

)
.
= 0. Based on Pitman regularity and Theorem
1.5.7 of Chapter 1 the asymptotic distribution

is given by

)
D
N(0,
2

(
1

2
)
1
) ; (2.5.30)
By using (2.5.27) and T

(0) to approximate S

(0), we have the following useful result:

2
1

n
T

(0) +o
p
(1) . (2.5.31)
We want to select scores such that the ecacy c

, (2.5.28), is as large as possible, or

equivalently such that the asymptotic variance of

is as small as possible. How large can

the ecacy be? Similar to (1.8.26), note that we can write

=
_
(u)
f
(u)du
=

_

2
f
(u)du
_
(u)
f
(u)du
_
_

2
f
(u)du
_
_

2
(u)du
=

_

2
f
(u)du . (2.5.32)
The second equation is true since the scores were standardized as above. In the third equation
is a correlation coecient and
_

2
f
(u)du is Fisher location information, (2.4.16), which
we denoted by I(f). By the Rao-Cramer lower bound, the smallest asymptotic variance
obtainable by an asymptotically unbiased estimate is (
1

2
I(f))
1
. Such an estimate is called
asymptotically ecient. Choosing a score function to maximize (2.5.32) is equivalent to
choosing a score function to make = 1. This can be achieved by taking the score function
to be (u) =
f
(u), (2.5.22). The resulting estimate,

, is asymptotically ecient. Of
course this can be accomplished only provided that the form of f is known; see Exercise
2.13.19. Evidently, the closer the chosen score is to
f
, the more powerful the rank analysis
will be.
106 CHAPTER 2. TWO SAMPLE PROBLEMS
In Exercise 2.13.19, the reader is ask to show that the MWW analysis is asymptotically
ecient if the errors have a logistic distribution. For normal errors, it follows in a few
steps from expression (2.4.17) that the optimal scores are generated by the normal scores
function,

N
(u) =
1
(u) , (2.5.33)
where (u) is the distribution function of a standard normal random variable. Exercise
2.13.19 shows that this score function is standardized. These scores yield an asymptotically
ecient analysis if the the errors truly have a normal distribution and, further, e(
N
, L
2
) 1;
see Theorem 1.8.1. Also, unlike the Mann-Whitney-Wilcoxon analysis, the estimate of the
shift based on the normal scores cannot be obtained in closed form. But as mentioned
above for general scores, provided the score function is nondecreasing, simple iterative al-
gorithms can be used to obtain the estimate and the corresponding condence interval for
. In the next sections we will discuss analyses that are asymptotically ecient for other
distributions.
Example 2.5.1. Quail Data, continued from Example 2.3.1
In the larger study, McKean et al. (1989), from which these data were drawn, the re-
sponses were positively skewed with long right tails; although, outliers frequently occurred
in the left tail also. McKean et al. conducted an investigation of estimates of the score func-
tions for over 20 of these experiments. Classes of simple scores which seemed appropriate
for such data were piecewise linear with one piece which is linear on the rst part on the
interval (0, b) and with a second piece which is constant on the second part (b, 1); i.e., scores
of the form

b
(u) =
_
2
b(2b)
u 1 if 0 < u < b
b
2b
if b u < 1
. (2.5.34)
These scores are optimal for densities with left logistic and right exponential tails; see Ex-
ercise 2.13.19. A value of b which seemed appropriate for this type of data was 3/4. Let
S
3/4
=

a
3/4
(R(Y
j
)) denote the test statistic based on these scores. The RBR function
phibentr with the argument param = 0.75 computes these scores. Using the RBR func-
tion twosampr2 with the argument score = phibentr, computes the rank-based analysis
for the score function (2.5.34). Assuming that the treated and control observations are in x
and y, respectively, the call and the resulting analysis for a one sided test as computed by
R is:
> tempb = twosampr2(x,y,test=T,alt=1,delta0=0,score=phibentr,grad=sphir,
param=.75,alpha=.05,maktable=T)
Test of Delta = 0 Alternative selected is 1
Standardized (z) Test-Statistic 1.787738 and p-vlaue 0.03690915
Estimate 15.5 SE is 7.921817
2.5. GENERAL RANK SCORES 107
95 % Confidence Interval is ( -2 , 28 )
Estimate of the scale parameter tau 20.45404
Comparing p-values, the analysis based on the score function (2.5.34) is a little more precise
than the MWW analysis given in Example 2.3.1. Recall that the data are right skewed, so
this result is not surprising.
For another class of scores similar to (2.5.34), see the discussion around expression (3.10.6)
in Chapter 3.
2.5.3 Connection between One and Two Sample Scores
In Theorem 2.5.2 we discussed how to obtain a corresponding two sample score function
given a one sample score function. Here we reverse the problem, showing how to obtain
a one sample score function from a two sample score function. This will provide a natural
estimate of in (2.2.4). We also show the eciencies and asymptotic properties are the same
for such corresponding scores functions.
Consider the location model but further assume that X has a symmetric distribution.
Then Y also has a symmetric distribution. For associated one sample problems, we could
then use the signed rank methods developed in Chapter 1. What one sample scores should
we select?
First consider what two sample scores would be suitable under symmetry. Assume with-
out loss of generality that X is symmetrically distributed about 0. Recall that the optimal
scores are given by the expression (2.5.22). Using the fact that F(x) = 1 F(x), it is easy
to see (Exercise 2.13.20) that the optimal scores satisfy,

f
(u) =
f
(1 u) , for 0 < u < 1 ,
that is, the optimal score function is odd about
1
2
. Hence for symmetric distributions, it
makes sense to consider two sample scores which are odd about
1
2
.
For this sub-section then assume that the two sample score generating function satises
the property
(S.3) (1 u) = (u) . (2.5.35)
Note that such scores satisfy: (1/2) = 0 and (u) 0 for u 1/2. Dene a one sample
score generating function as

+
(u) =
_
u + 1
2
_
(2.5.36)
and the one sample scores as
a
+
(i) =
+
_
i
n + 1
_
. (2.5.37)
It follows that these one sample scores are nonnegative and nonincreasing.
For example, if we use Wilcoxon two sample scores, that is, scores generated by the
function, (u) =

12
_
u
1
2
_
then the associated one sample score generating function is
108 CHAPTER 2. TWO SAMPLE PROBLEMS

+
(u) =

3u and, hence, the one sample scores are the Wilcoxon signed-rank scores. If
instead we use the two sample sign scores, (u) = sgn(2u 1) then the one sample score
function is
+
(u) = 1. This results in the one sample sign scores.
Suppose we use two sample scores which satisfy (2.5.35) and use the associated one
sample scores. Then the corresponding one and two sample ecacies satisfy
c

=
_

2
c

+ , (2.5.38)
where the ecacies are given by expressions (2.5.28) and (1.8.21). Hence the eciency and
asymptotic properties of the one and two sample analyses are the same. As a nal remark,
if we write the model as in expression (2.2.4), then we can use the rank statistic based on
the two sample to estimate . We next form the residuals Z
i

c
i
. Then using the one
sample scores statistic of Chapter 1, we can estimate based on these residuals, as discussed
in Chapter 1. In terms of a regression problem we are estimating the intercept parameter
based on the residuals after tting the regression coecient . This is discussed in some
detail in Section 3.5.
2.6 L
1
Analyses
In this section, we present analyses based on the L
1
norm and pseudo norm. We discuss the
pseudo norm rst, showing that the corresponding test is the familiar Moods (1950) test.
The test which corresponds to the norm is Mathisens (1943) test.
2.6.1 Analysis Based on the L
1
Pseudo Norm
Consider the sign scores. These are the scores generated by the function (u) = sgn(u1/2).
The corresponding pseudo norm is given by,
|u|

=
n

i=1
sgn
_
R(u
i
)
n + 1
2
_
u
i
. (2.6.1)
This pseudo norm is optimal for double exponential errors; see Exercise 2.13.19.
We have the following relationship between the L
1
pseudo norm and the L
1
norm. Note
that we can write
|u|

=
n

i=1
sgn
_
i
n + 1
2
_
u
(i)
.
Next consider,
n

i=1
[u
(i)
u
(ni+1)
[ =
n

i=1
sgn(u
(i)
u
(ni+1)
)(u
(i)
u
(ni+1)
)
= 2
n

i=1
sgn(u
(i)
u
(ni+1)
)u
(i)
.
2.6. L
1
ANALYSES 109
Finally note that
sgn(u
(i)
u
(ni+1)
) = sgn(i (n i + 1)) = sgn
_
i
n + 1
2
_
.
Putting these results together we have the relationship,
n

i=1
[u
(i)
u
(ni+1)
[ = 2
n

i=1
sgn
_
i
n + 1
2
_
u
(i)
= 2|u|

. (2.6.2)
Recall that the pseudo norm based Wilcoxon scores can be expressed as the sum of all
absolute dierences between the components; see (2.2.17). In contrast the pseudo norm
based on the sign scores only involves the n symmetric absolute dierences [u
(i)
u
(ni+1)
[.
In the two sample location model the corresponding R-estimate based on the pseudo
norm (2.6.1) is a value of which solves the equation
S

() =
n
2

j=1
sgn
_
R(Y
j
)
n + 1
2
_
.
= 0 . (2.6.3)
Note that we are ranking the set X
1
, . . . , X
n
1
, Y
1
, . . . , Y
n
2
which is equivalent to
ranking the set X
1
med X
i
, . . . , X
n
1
med X
i
, Y
1
med X
i
, . . . , Y
n
2
med X
i
.
We must choose so that half of the ranks of the Y part of this set are above (n+1)/2 and
half are below. Note that in the X part of the second set, half of the X part is below 0 and
half is above 0. Thus we need to choose so that half of the Y part of this set is below 0
and half is above 0. This is achieved by taking

= med Y
j
med X
i
. (2.6.4)
This is the same estimate as produced by the L
1
norm, see the discussion following (2.2.5).
We shall refer to the above pseudo norm (2.6.1) as the L
1
pseudo norm. Actually, as
pointed out in Section 2.2, this equivalence between estimates based on the L
1
norm and
the L
1
pseudo norm is true for general regression problems in which the model includes an
intercept, as it does here.
The corresponding test statistic for H
0
: = 0 is

n
2
j=1
sgn(R(Y
j
)
n+1
2
). Note that
the sgn function here is only counting the number of Y
j
s which are above the combined
sample median

M = med X
1
, . . . , X
n
1
, Y
1
, . . . , Y
n
2
minus the number below

M. Hence a
more convenient but equivalent test statistic is
M
+
0
= #(Y
j
>

M) , (2.6.5)
which is called Moods median test statistic; see Mood (1950).
110 CHAPTER 2. TWO SAMPLE PROBLEMS
Testing
Since this L
1
-analysis is based on a rank-based pseudo-norm we could use the general theory
discussed in Section 2.5 to handle the theory for estimation and testing. As we will point
out, though, there are some interesting results pertaining to this analysis.
For the null distribution of M
+
0
, rst assume that n is even. Without loss of generality,
assume that n = 2r and n
1
n
2
. Consider the combined sample as a population of n items,
where n
2
of the items are Y s and n
1
items are Xs. Think of the n/2 items which exceed

M. Under H
0
these items are as likely to be an X as a Y . Hence, M
+
0
, the number of Y s
in the top half of the sample follows the hypergeometric distribution, i.e.,
P(M
+
0
= k) =
_
n
2
k
__
n
1
rk
_
_
n
r
_ k = 0, . . . , n
2
,
where r = n/2. If n is odd the same result holds except in this case r = (n 1)/2. Thus as
a level decision rule, we would reject H
0
: = 0 in favor of H
A
: > 0, if M
+
0
c

,
where c

could be determined from the hypergeometric distribution or approximated by the

binomial distribution. From the properties of the hypergeometic distribution, E
0
[M
+
0
] =
r(n
2
/n) and V
0
[M
+
0
] = (rn
1
n
2
(n r))/(n
2
(n 1)). Under the assumption D.1, (2.4.7), it
follows that the limiting distribution of M
+
0
is normal.
Condence Intervals
Exercise 2.13.21 shows that, for n = 2r,
M
+
0
() = #(Y
j
>

M) =
n
2

i=1
I(Y
(i)
X
(ri+1)
> 0) , (2.6.6)
and furthermore that the n = 2r dierences,
Y
(1)
X
(r)
< Y
(2)
X
(r1)
< < Y
(n
2
)
X
(rn
2
+1)
,
can be ordered only knowing the order statistics from the individual samples. It is further
shown that if k is such that P(M
+
0
k) = /2 then a (1 )100% condence interval for
is given by
(Y
(k+1)
X
(rk)
, Y
(n
2
k)
X
(rn
2
+k+1)
) .
The above condence interval simplies when n
1
= n
2
= m, say. In this case the interval
becomes
(Y
(k+1)
X
(mk)
, Y
(mk)
X
(k+1)
) ,
which is the dierence in endpoints of the two simple L
1
condence intervals (X
(k+1)
, X
(mk)
)
and (Y
(k+1)
, Y
(mk)
) which were discussed in Section 1.11. Using the normal approximation
2.6. L
1
ANALYSES 111
to the hypergeometric we have k = m/2 Z
/2
_
m
2
/(4(2m1)) .5. Hence, the above
two intervals have condence coecient

.
= 1 2
_
k m/2
_
m/4
_
= 1 2
_
z
/2
_
m/(2m1)
_
.
= 1 2
_
z
/2
2
1/2
_
.
For example, for the equal sample size case, a 5% two sided Moods test is equivalent to
rejecting the null hypothesis if the 84% one sample L
1
condence intervals are disjoint.
While this also could be done for the unequal sample sizes case, we recommend the direct
approach of Section 1.11.
Eciency Results
We will obtain the eciency results from the asymptotic distribution of the estimate,

=
med Y
j
med X
i
of . Equivalently, we could obtain the results by asymptotic linearity
that was derived for arbitrary scores in (2.5.27); see Exercise 2.13.22.
Theorem 2.6.1. Under the conditions cited in Example 1.5.2, (L
1
Pitman regularity con-
ditions), and (2.4.7), we have

)
D
N(0, (
1

2
4f
2
(0))
1
) . (2.6.7)
Proof: Without loss of generality assume that and are 0. We can write,

=
_
n
n
2

n
2
med Y
j

_
n
n
1

n
1
med X
i
.
From Example 1.5.2, we have

n
2
med Y
j
=
1
2f(0)
1

n
2
n
2

j=1
sgnY
j
+ o
p
(1)
hence,

n
2
med Y
j
D
Z
2
where Z
2
is N(0, (4f
2
(0))
1
). Likewise

n
1
med X
i
D
Z
1
where Z
1
is N(0, (4f
2
(0))
1
). Since Z
1
and Z
2
are independent, we have that

n

D
(
2
)
1/2
Z
2

(
1
)
1/2
Z
1
which yields the result.
The ecacy of Moods test is thus

2
2f(0). The asymptotic relative eciency of
Moods test to the two-sample t test is 4
2
f
2
(0), while its asymptotic relative eciency with
the MWW test is f
2
(0)/(3(
_
f
2
)
2
). These are the same as the eciency results of the sign
test to the t test and to the Wilcoxon signed-rank test, respectively, that were obtained in
Chapter 1; see Section 1.7.
Example 2.6.1. Quail Data, continued, Example 2.3.1
112 CHAPTER 2. TWO SAMPLE PROBLEMS
For the quail data the median of the combined samples is

M = 64. For the subsequent
test based on Moods test we eliminated the three data points which had this value. Thus
n = 27, n
1
= 9 and n
2
= 18. The value of Moods test statistic is M
+
0
= #(P
j
> 64) = 11.
Since E
H
0
(M
+
0
) = 8.67 and V
H
0
(M
+
0
) = 1.55, the standardized value (using the continuity
correction) is 1.47 with a p-value of .071. Using all the data, the point estimate corresponding
to Moods test is 19 while a 90% condence interval, using the normal approximation, is
(10, 31).
2.6.2 Analysis Based on the L
1
Norm
Another sign type procedure is based on the L
1
norm. Reconsider expression (2.2.7) which is
the partial derivative of the L
1
dispersion function with respect to . We take the parameter
as a nuisance parameter and we estimate it by med X
i
. An aligned sign test procedure
for is then obtained by aligning the Y
j
s with respect to this estimate of . The process
of interest, then, is
S() =
n
2

j=1
sgn(Y
j
med X
i
) .
A test of H
0
: = 0 is based on the statistic
M
+
a
= #(Y
j
> med X
i
) . (2.6.8)
This statistic was proposed by Mathisen (1943) and is also referred to as the control median
test; see Gastwirth (1968). The estimate of obtained by solving S()
.
= 0 is, of course,
the L
1
estimate

= med Y
j
med X
i
.
Testing
Mathisens test statistic, similar to Moods, has a hypergeometric distribution under H
0
.
Theorem 2.6.2. Suppose n
1
is odd and is written as n
1
= 2n

1
+1. Then under H
0
: = 0,
P(M
+
a
= t) =
_
n

1
+t
n

1
__
n
2
t+n

1
n

1
_
_
n
n
1
_ , t = 0, 1, . . . , n
2
.
Proof: The proof will be based on a conditional argument. Given X
(n

1
+1)
= x, M
+
a
is
binomial with n
2
trials and 1 F(x) as the probability of success. The density of X
(n

1
+1)
is
f

(x) =
n
1
!
(n

1
!)
2
(1 F(x))
n

1
F(x)
n

1
f(x) .
2.6. L
1
ANALYSES 113
Using this and the fact that the samples are independent we get,
P(M
+
a
= t) =
_ _
n
2
t
_
(1 F(x))
t
F(x)
n
2
t
f(x)dx
=
_
n
2
t
_
n
1
!
(n

1
!)
2
_
(1 F(x))
t+n

1
F(x)
n

1
+n
2
t
f(x)dx
=
_
n
2
t
_
n
1
!
(n

1
!)
2
_
1
0
(1 u)
t+n

1
u
n

1
+n
2
t
du .
By properties of the function this reduces to the result.
Once again using the conditional argument, we obtain the moments of M
+
a
as
E
0
[M
+
a
] =
n
2
2
(2.6.9)
V
0
[M
+
a
] =
n
2
(n + 1)
4(n
1
+ 2)
; (2.6.10)
see Exercise 2.13.23.
The result when n
1
is even is found in Exercise 2.13.23. For the asymptotic null distribu-
tion of M
+
a
we shall make use of the linearity result for the sign process derived in Chapter
1; see Example 1.5.2.
Theorem 2.6.3. Under H
0
and D.1, (2.4.7), M
+
a
has an approximate N(
n
2
2
,
n
2
(n+1)
4(n
1
+2)
) distri-
bution.
Proof: Assume without loss of generality that the true median of X and Y is 0. Let

= med X
i
. Note that
M
+
a
= (
n
2

j=1
sgn(Y
j

) + n
2
)/2 . (2.6.11)
Clearly under (D.1),

n
2

is bounded in probability. Hence by the asymptotic linearity

result for the L
1
analysis, obtained in Example 1.5.2, we have
n
1/2
2
n
2

j=1
sgn(Y
j

) = n
1/2
2
n
2

j=1
sgn(Y
j
) 2f(0)

n
2

+ o
p
(1) .
But we also have

n
1

= (2f(0)

n
1
)
1
n
1

i=1
sgn(X
i
) + o
p
(1) .
Therefore
n
1/2
2
n
2

j=1
sgn(Y
j

) = n
1/2
2
n
2

j=1
sgn(Y
j
)
_
n
2
/n
1
n
1/2
1
n
1

i=1
sgn(X
i
) + o
p
(1) .
114 CHAPTER 2. TWO SAMPLE PROBLEMS
Note that
n
1/2
2
n
2

j=1
sgn(Y
j
)
D
N(0,
1
1
) .
and
_
n
2
/n
1
n
1/2
1
n
1

i=1
sgn(X
i
)
D
N(0,
2
/
1
) .
The result follows from these asymptotic distributions, the independence of the samples,
expression (2.6.11), and the fact that asymptotically the variance of M
+
a
satises
n
2
(n + 1)
4(n
1
+ 2)
.
= n
2
(4
1
)
1
.
Condence Intervals
Note that M
+
a
() = #(Y
j
>

) = #(Y
j

> ); hence, if k is such that P

0
(M
+
a

k) = /2 then (Y
(k+1)

, Y
(n
2
k)

) is a (1)100% condence interval for . For testing

the two sided hypothesis H
0
: = 0 versus H
A
: ,= 0 we would reject H
0
if 0 is not in
the condence interval. This is equivalent, however, to rejecting if

is not in the interval
(Y
(k+1)
, Y
(n
2
k)
).
Suppose we determine k by the normal approximation. Then
k
.
=
n
2
2
z
/2

n
2
(n + 1)
4(n
1
+ 2)
.5
.
=
n
2
2
z
/2
_
n
2
4
1
.5 .
The condence interval (Y
(k+1)
, Y
(n
2
k)
), is a 100%, ( = 1 2(z
/2
(
1
)
1/2
), condence
interval based on the sign procedure for the sample Y
1
, . . . , Y
n
2
. Suppose we take = .05
and have the equal sample sizes case so that
1
= .5. Then = 1 2(2

2). Hence, the

two sided 5% test rejects H
0
: = 0 if

is not in the condence interval.
Remarks on Eciency
Since the estimator of based on the Mathisen procedure is the same as that of Moods
procedure, the asymptotic relative eciency results for Mathisens procedure are the same as
that of Moods. Using another type of eciency due to Bahadur (1967), Killeen, Hettman-
sperger and Sievers (1972) show it is generally better to compute the median of the smaller
sample.
Curtailed sampling on the Y s is one situation where Mathisens test would be used
instead of Moods test since with Mathisens test an early decision could be made; see
Gastwirth (1968).
Example 2.6.2. Quail Data, continued, Examples 2.3.1 and 2.6.1
2.7. ROBUSTNESS PROPERTIES 115
For this data, med T
i
= 49. Since one of the placebo values was also 49, we eliminated
it in the subsequent computation of Mathisens test. The test statistic has the value M
+
a
=
#(C
j
> 49) = 17. Using n
2
= 19 and n
1
= 10 the null mean and variance are 9.5 and
11.875, respectively. This leads to a standardized test statistic of 2.03 (using the continuity
correction) with a p-value of .021. Utilizing all the data, the corresponding point estimate
and condence interval are 19 and (6, 27). This diers from MWW and Mood analyses; see
Examples 2.3.1 and 2.6.1, respectively.
2.7 Robustness Properties
In this section we obtain the breakdown points and the inuence functions of the L
1
and
MWW estimates. We rst consider the breakdown properties.
2.7.1 Breakdown Properties
We begin with the denition of an equivariant estimator of . For convenience let the
vectors X and Y denote the samples X
1
, . . . , X
n
1
and Y
1
, . . . , Y
n
2
, respectively. Also let
X+ a1 = (X
1
+ a, . . . , X
n
1
+ a)

.
Denition 2.7.1. An estimator

(X, Y) of is said to be an equivariant estimator of
if

(X+ a1, Y) =

(X, Y) a and

(X, Y+ a1) =

(X, Y) + a.
Note that the L
1
estimator and the Hodges-Lehmann estimator are both equivariant
estimators of . Indeed as Exercise 2.13.24 shows any estimator based on the rank pseudo
norms discussed in Section 2.5 are equivariant estimators of . As the following theorem
shows the breakdown point of an equivariant estimator is bounded above by .25.
Theorem 2.7.1. Suppose n
1
n
2
. Then the breakdown point of an equivariant estimator
satises

[(n
1
+ 1)/2] + 1/n, where [] denotes the greatest integer function.
Proof: Let m = [(n
1
+ 1)/2] + 1. Suppose

is an equivariant estimator such that

> m/n. Then the estimator remains bounded if m points are corrupted. Let X

=
(X
1
+ a, . . . , X
m
+ a, X
m+1
, . . . , X
n
1
)

. Since we have corrupted m points there exists a

B > 0 such that
[

, Y)

(X, Y)[ B . (2.7.1)
Next let X

= (X
1
, . . . , X
m
, X
m+1
a, . . . , X
n
1
a)

. Then X

contains n
1
m = [n
1
/2] m
altered points. Therefore,
[

, Y)

(X, Y)[ B . (2.7.2)
Equivariance implies that

(X

, Y) =

(X

, Y) + a. By (2.7.1) we have

(X, Y) B

(X

, Y)

(X, Y) + B (2.7.3)
116 CHAPTER 2. TWO SAMPLE PROBLEMS
while from (2.7.2) we have

(X, Y) B + a

(X

, Y)

(X, Y) + B + a . (2.7.4)
Taking a = 3B leads to a contradiction between (2.7.2) and (2.7.4).
By this theorem the maximum breakdown point of any equivariant estimator is roughly
half of the smaller sample proportion. If the sample sizes are equal then the best possible
breakdown is 1/4.
Example 2.7.1. Breakdown of L
1
and MWW estimates
The L
1
estimator of ,

= med Y
j
med X
i
, achieves the maximal breakdown since
med Y
j
achieves the maximal breakdown in the one sample problem.
The Hodges-Lehmann estimate

R
= med Y
j
X
i
also achieves maximal breakdown.
To see this, suppose we corrupt an X
i
. Then n
2
dierences Y
j
X
i
are corrupted. Hence
between samples we maximize the corruption by corrupting the items in the smaller sample,
so without loss of generality we can assume that n
1
n
2
. Suppose we corrupt m X
i
s. In
order to corrupt med Y
j
X
i
we must corrupt (n
1
n
2
)/2 dierences. Therefore mn
2

(n
1
n
2
)/2; i.e., m n
1
/2. Hence med Y
j
X
i
has maximal breakdown. Based on Exercise
1.12.13 of Chapter 1, the one sample estimate based on the Wilcoxon signed rank statistic
does not achieve the maximal breakdown value of 1/2 in the one sample problem.
2.7.2 Inuence Functions
Recall from Section 1.6.1 that the inuence function of a Pitman regular estimator based
on a single sample X
1
, . . . , X
n
is the function (z) when the estimator has the represen-
tation n
1/2

(X
i
) + o
p
(1). The estimators we are concerned with in this section are
Pitman regular; hence, to determine their inuence functions we need only obtain similar
representations for them.
For the L
1
estimate we have from the proof of Theorem 2.6.1 that

= med Y
j
med X
i
=
1
2f(0)
1

n
_
n
2

j=1
sgn (Y
j
)

n
1

i=1
sgn (X
i
)

1
_
+ o
p
(1) .
Hence the inuence function of the L
1
estimate is
(z) =
_
(
1
2f(0))
1
sgn z if z is an x
(
2
2f(0))
1
sgn z if z is an y
,
which is a bounded discontinuous function.
For the Hodges-Lehmann estimate, (2.2.18), note that we can write the linearity result
(2.4.23) as

n(S
+
(/

n) 1/2) =

n(S
+
(0) 1/2)
_
f
2
+ o
p
(1) ,
2.7. ROBUSTNESS PROPERTIES 117
which upon substituting

n

R
for leads to

R
=
__
f
2
_
1

n(S
+
(0) 1/2) +o
p
(1) .
Recall the projection of the statistic S
R
(0)1/2 given in Theorem 2.4.7. Since the dierence
between it and this statistic goes to zero in probability we can, after some algebra, obtain
the following representation for the Hodges-Lehmann estimator,

R
=
__
f
2
_
1
1

n
_
n
2

j=1
F(Y
j
) 1/2

n
2

i=1
F(X
i
) 1/2

1
_
+ o
p
(1) .
Therefore the inuence function for the Hodges-Lehmann estimate is
(z) =
_

_

1
_
f
2
_
1
(F(z) 1/2) if z is an x
_

2
_
f
2
_
1
(F(z) 1/2) if z is an y
,
which is easily seen to be bounded and continuous.
For least squares, since the estimate is Y X the inuence function is
(Z) =
_
(
1
)
1
z if z is an x
(
2
)
1
z if z is an y
,
which is unbounded and continuous. The Hodges-Lehmann and L
1
estimates attain the
maximal breakdown point and have bounded inuence functions; hence they are robust.
On the other hand, the least squares estimate has 0 percent breakdown and an unbounded
inuence function. One bad point can destroy a least squares analysis.
For a general score function (u), by (2.5.31) we have the asymptotic representation

=
1

n
_
n
1

i=1
_

1
_
(F(X
i
)) +
n
2

i=1
_

2
_
(F(Y
i
))
_
.
Hence, the inuence function of the R-estimate based on the score function is given by
(z) =
_

1
(F(z)) if z is an x

2
(F(z)) if z is an y
,
where

is dened by expression (2.5.23). In particular, the inuence function is bounded

provided the score generating function is bounded. Note that the inuence function for the
R-estimate based on normal scores is unbounded; hence, this estimate is not robust. Recall
Example 1.8.1 in which the one sample normal scores estimate has an unbounded inuence
function (non robust) but has positive breakdown point (resistant). A rigorous derivation of
these inuence functions can be based on the inuence function derived in Section A.5.2 of
the Appendix.
118 CHAPTER 2. TWO SAMPLE PROBLEMS
2.8 Lehmann Alternatives and Proportional Hazards
Consider a two sample problem where the responses are lifetimes of subjects. We shall
continue to denote the independent samples by X
1
, . . . , X
n
1
and Y
1
, . . . , Y
n
2
. Let X
i
and Y
j
have distribution functions F(x) and G(x) respectively. Since we are dealing with lifetimes
both X
i
and Y
j
are positive valued random variables. The hazard function for X
i
is dened
by
h
X
(t) =
f(t)
1 F(t)
and represents the likelihood that a subject will die at time t given that he has survived
until that time; see Exercise 2.13.25.
In this section, we will consider the class of lifetime models that are called Lehmann
alternative models for which the distribution function G satises
1 G(x) = (1 F(x))

, (2.8.1)
where the parameter > 0. See Section 4.4 of Maritz (1981) for an overview of nonparamet-
ric methods for these models. The Lehmann model generalizes the exponential scale model
F(x) = 1 exp(x) and G(x) = 1 (1 F(x))

= 1 exp(x). As shown in Exercise

2.13.25, the hazard function of Y
j
is given by h
Y
(t) = h
X
(t); i.e., the hazard function of Y
j
is proportional to the hazard function of X
i
; hence, these models are also referred to as pro-
portional hazards models; see, also, Section 3.10. The null hypothesis can be expressed
as H
L0
: = 1. The alternative we will consider is H
LA
: < 1; that is, Y is less hazardous
than X; i.e., Y has more chance of long survival than X and is stochastically larger than X.
Note that,
P

(Y > X) = E

[P(Y > X [ X)]

= E

[1 G(X)]
= E

[(1 F(X))

] = ( + 1)
1
(2.8.2)
The last equality holds, since 1 F(X) has a uniform (0, 1) distribution. Under H
LA
, then,
P

(Y > X) > 1/2; i.e., Y tends to dominate X.

The MWW test statistic S
+
R
= #(Y
j
> X
i
) is a consistent test statistic for H
L0
versus
H
LA
, by Theorem 2.4.10. We reject H
L0
in favor of H
LA
for large values of S
+
R
. Furthermore
by Theorem 2.4.4 and (2.8.2), we have that
E

[S
+
R
] = n
1
n
2
E

[1 G(X)] =
n
1
n
2
1 +
.
This suggests as an estimate of , the statistic,
= ((n
1
n
2
)/S
+
R
) 1 . (2.8.3)
By Theorem 2.4.5 it can be shown that
V

(S
+
R
) =
n
1
n
2
( + 1)
2
+
n
1
n
2
(n
1
1)
( + 2)( + 1)
2
+
n
1
n
2
(n
2
1)
2
(2 + 1)( + 1)
2
; (2.8.4)
2.8. LEHMANN ALTERNATIVES AND PROPORTIONAL HAZARDS 119
see Exercise 2.13.27. Using this result and the asymptotic distribution of S
+
R
under general
alternatives, Theorem 2.4.9, we can obtain, by the delta method, the asymptotic variance of
given by
Var
.
=
(1 +)
2

n
1
n
2
_
1 +
n
1
1
+ 2
+
(n
2
1)
2 + 1
_
. (2.8.5)
This can be used to obtain an asymptotic condence interval for ; see Exercise 2.13.27 for
details. As in the example below the bootstrap could also be used to estimate the Var( ).
2.8.1 The Log Exponential and the Savage Statistic
Another rank test which is frequently used in this situation is the log rank test proposed
by Savage (1956). In order to obtain this test, rst consider the special case where X has
the exponential distribution function, F(x) = 1 e
x/
, for > 0. In this case the hazard
function of X is a constant function. Consider the random variable = log X log . In a
few steps we can obtain its distribution function as,
P[ t] = P[log X log t]
= 1 exp (e
t
) ;
i.e., has an extreme value distribution. The density of is f

(t) = exp (t e
t
). Hence, we
can model log X as the location model:
log X = log + . (2.8.6)
Next consider the distribution of the log Y . Using expression (2.8.1) and a few steps of
algebra we get
P[log Y t] = 1 exp (

e
t
) .
But from this it is easy to see that we can model Y as
log Y = log + log
1

+ , (2.8.7)
where the error random variable has the above extreme value distribution. From (2.8.6) and
(2.8.7) we see that the log-transformation problem is simply a two sample location problem
with shift parameter = log . Here, H
L0
is equivalent to H
0
: = 0 and H
LA
is
equivalent to H
A
: > 0. We shall refer to this model as the log exponential model for
the remainder of this section. Thus any of the rank-based analyses that we have discussed
in this chapter can be used to analyze this model.
Lets consider the analysis based on the optimal score function for the model. Based on
Section 2.5 and Exercise 2.13.19, the optimal scores for the extreme value distribution are
generated by the function

f
(u) = (1 + log(1 u)) . (2.8.8)
120 CHAPTER 2. TWO SAMPLE PROBLEMS
Hence the optimal rank test in the log exponential model is given by
S
L
=
n
2

j=1

f
_
R(Y
j
)
n + 1
_
=
n
2

j=1
_
1 + log
_
1
R(log Y
j
)
n + 1
__
=
n
2

j=1
_
1 + log
_
1
R(Y
j
)
n + 1
__
. (2.8.9)
We reject H
L0
in favor of H
LA
for large values of S
L
. By (2.5.14) the null mean of S
L
is 0
while from (2.5.18) its null variance is given by

f
=
n
1
n
2
n(n 1)
n

i=1
_
1 + log
_
1
i
n + 1
__
2
. (2.8.10)
Then an asymptotic level test rejects H
L0
in favor of H
LA
if S
L
z

f
.
Certainly the statistic S
L
can be used in the general Lehmann alternative model described
above, although, it is not optimal if X does not have an exponential distribution. We shall
discuss the eciency of this test below.
For estimation, let

be the estimate of based on the optimal score function
f
; that
is,

solves the equation
n
2

j=1
_
1 + log
_
1
R[log(Y
j
) ]
n + 1
__
.
= 0 . (2.8.11)
Besides estimation, the condence intervals discussed in Section 2.5 for general scores, can
be obtained for the score function
f
; see Example 2.8.1 for an illustration.
Thus another estimate of would be = exp
_

_
. As discussed in Exercise 2.13.27,
an asymptotic condence interval for can be formulated from this relationship. Keep in
mind, though, that we are assuming that X is exponentially distributed.
As a further note, since
f
(u) is an unbounded function it follows from Section 2.7.2
that the inuence function of

is unbounded. Thus the estimate is not robust.
A frequently used, equivalent test statistic to S
L
was proposed by Savage. To derive it,
denote R(Y
j
) by R
j
. Then we can write
log
_
1
R
j
n + 1
_
=
_
1R
j
/(n+1)
1
1
t
dt =
_
0
R
j
/(n+1)
1
1 t
dt .
We can approximate this last integral by the following Riemann sum:
1
1 R
j
/(n + 1)
1
n + 1
+
1
1 (R
j
1)/(n + 1)
1
n + 1
+ +
1
1 (R
j
(R
j
1))/(n + 1)
1
n + 1
.
This simplies to
1
n + 1 1
+
1
n + 1 2
+ +
1
n + 1 R
j
=
n

i=n+1R
j
1
i
.
2.8. LEHMANN ALTERNATIVES AND PROPORTIONAL HAZARDS 121
This suggests the rank statistic

S
L
= n
2
+
n
2

j=1
n

i=nR
j
+1
1
i
. (2.8.12)
This statistic was proposed by Savage (1956). Note that it is a rank statistic with scores
dened by
a
j
= 1 +
n

i=nj+1
1
i
. (2.8.13)
Exercise 2.13.28 shows that its null mean and variance are given by
E
H
0
[

S
L
] = 0

2
=
n
1
n
2
n 1
_
1
1
n
n

j=1
1
j
_
. (2.8.14)
Hence an asymptotic level test is to reject H
L0
in favor of H
LA
if

S
L
z

.
Based on the above Riemann sum it would seem that

S
L
and S
L
are close statistics.
Indeed they are asymptotically equivalent and, hence, both are optimal when X is exponen-
tially distributed; see Hajek and

Sidak (1967) or Kalbeish and Prentice (1980) for details.
2.8.2 Eciency Properties
We next derive the asymptotic relative eciences for the log exponential model with f

(t) =
exp (t e
t
). The MWW statistic, S
+
R
, is a consistent test for the log exponential model. By
(2.4.21), the ecacy of the Wilcoxon test is
c
MWW
=

12
_
f
2

2
=
_
3
4
_

2
; ,
Since the Savage test is asymptotically optimal its ecacy is the square root of Fisher infor-
mation, i.e., I
1/2
(f

) discussed in Section 2.5. This ecacy is

2
. Hence the asymptotic
relative eciency of the Mann-Whitney-Wilcoxon test to the Savage test at the log expo-
nential model, is 3/4; see Exercise 2.13.29.
Recall that the ecacy of the L
1
procedures, both Moods and Mathisens, is 2f

2
,
where

denotes the median of the extreme value distribution. This turns out to be

= log(log 2)). Hence f

) = (log 2)/2, which leads to the ecacy

2
log 2 for the L
1
methods. Thus the asymptotic relative eciency of the L
1
procedures with respect to the
procedure based on Savage scores is (log 2)
2
= .480. The asymptotic relative eciency of
the L
1
methods to the MWW at this model is .6406. Therefore there is a substantial loss of
eciency if L
1
methods are used for the log exponential model. This makes sense since the
extreme value distribution has very light tails.
122 CHAPTER 2. TWO SAMPLE PROBLEMS
The variance of a random variable with density f

is
2
/6; hence the asymptotic relative
eciency of the t test to the Savage test at the log exponential model is 6/
2
= .608. Hence,
for the procedures analyzed in this chapter on the log exponential model the Savage test is
optimal followed, in order, by the MWW, t, and L
1
tests.
Example 2.8.1. Lifetimes of an Insulation Fluid.
The data below are drawn from an example on page 3 of Lawless (1982); see, also, Nelson
(1982, p. 227). They consist of the breakdown times (in minutes) of an electrical insulating
uid when subject to two dierent levels of voltage stress, 30 and 32 kV. Suppose we are
interested in testing to see if the lower level is less hazardous than the higher level.
Voltage Level Times to Breakdown (Minutes)
30 kV 17.05 22.66 21.02 175.88 139.07 144.12 20.46 43.40
Y 194.90 47.30 7.74
32 kV 0.40 82.85 9.88 89.29 215.10 2.75 0.79 15.93
X 3.91 0.27 0.69 100.58 27.80 13.95 53.24
Let Y and X denote the log of the breakdown times of the insulating uid at the voltage
stesses of 30 kV and 32 kVs, respectively. Let =
Y

X
denote the shift in locations.
We are interested in testing H
0
: = 0 versus H
A
: > 0. The comparison boxplots for
the log-transformed data are displayed in the left panel of Figure 2.8.1. It appears that the
lower level (30 kV) is less hazardous.
The RBR function twosampwiltwosampr2+ with the score argument set at philogr
obtains the analysis based on the log-rank scores. . Briey, the results are:
Test of Delta = 0 Alternative selected is 1
Standardized (z) Test-Statistic 1.302 and p-vlaue 0.096
Estimate 0.680 SE is 0.776
95 % Confidence Interval is (-0.261, 2.662)
Estimate of the scale parameter tau 1.95
The corresponding Mann-Whitney-Wilcoxon analysis is
Test of Delta = 0 Alternative selected is 1
Test Stat. S+ is 118 Standardized (z) Test-Stat. 1.816 and p-vlaue 0.034
MWW estimate of the shift in location is 1.297 SE is 0.944
95 % Confidence Interval is (-0.201, 3.355)
Estimate of the scale parameter tau 2.37
2.9. TWO SAMPLE RANK SET SAMPLING (RSS) 123
Figure 2.8.1: Comparison Boxplots of Treatment and Control Quail LDL Levels
0.0 0.5 1.0 1.5 2.0 2.5
0
5
0
1
0
0
1
5
0
2
0
0
Exponential Quantiles
V
o
l
t
a
g
e

l
e
v
e
l
Exponential qq Plot
log 30 kv log 32 kv

1
0
1
2
3
4
5
B
r
e
a
k
d
o
w
n

t
i
m
e
Comparison Boxplots of log 32 kv and log 30 kv
While the log-rank is insignicant, the MWW analysis is signicant at level 0.034. This
dierence is not surprising upon considering the qq plot of the original data at the 32 kV
level found in the right panel of Figure 2.8.1. The population quantiles are drawn from an
exponential distribution. The plot indicates heavier tails than that of an exponential distri-
bution. In turn, the error distribution for the location model would have heavier tails than
the light-tailed extreme ,valued distribution. Thus the MWW analysis is more appropriate.
The two sample t-test has value 1.34 with the p-value also of .096. It was impaired by the
heavy tails too.
Although, the exponential model on the original data seems unlikely, for illustration we
consider it. The sum of the ranks of the 30 kV (Y ) sample is 184. The estimate of based
on the MWW statistic is .40. A 90% condence interval for based on the approximate (via
the delta-method) variance, (2.8.5), is (.06, .74); while, a 90% bootstrap condence interval
based 1000 bootstrap samples is (.15, .88). Hence the MWW test, the corresponding estimate
of and the two condence intervals indicate that the lower voltage level is less hazardous
than the higher level.
2.9 Two Sample Rank Set Sampling (RSS)
The basic background for rank set sampling was discussed in Section 1.9. In this section we
extend these ideas to the two sample location problem. Suppose we have the two samples
in which X
1
, . . . , X
n
1
are iid F(x) and Y
1
, . . . , Y
n
2
are iid F(x ) and the two samples
124 CHAPTER 2. TWO SAMPLE PROBLEMS
are independent of one another. In the corresponding RSS design, we take n
1
cycles of k
samples for X and n
2
cycles of q samples for Y . Proceeding as in Section 1.9, we display the
measured data as:
X
(1)1
, . . . , X
(1)n
1
iid f
(1)
(t) Y
(1)1
, . . . , Y
(1)n
2
iid f
(1)
(t )

X
(k)1
, . . . , X
(k)n
1
iid f
(k)
(t) Y
(q)1
, . . . , Y
(q)n
2
iid f
(q)
(t )
.
To test H
0
: = 0 versus H
A
: > 0 we compute the Mann-Whitney-Wilcoxon
statistic with these rank set samples. Letting U
si
=

n
2
t=1

n
1
j=1
I(Y
(s)t
> X
(i)j
), the test
statistic is
U
RSS
=
q

s=1
k

i=1
U
si
.
Note that U
si
is the Mann-Whitney-Wilcoxon statistic computed on the sample of the sth Y
order statistics and the ith X order statistics. Even under the null hypothesis H
0
: = 0,
U
si
is not based on identically distributed samples unless s = i. This complicates the null
distribution of U
RSS
.
Bohn and Wolfe (1992) present a thorough treatment of the distribution theory for U
RSS
.
We note that under H
0
: = 0, U
RSS
is distribution free and further, using the same ideas
as in Theorem 1.9.1, E
H
0
(U
RSS
) = qkn
1
n
2
/2. For xed k and q, provided assumption D.1,
(2.4.7), holds, Theorem 2.4.2 can be applied to show that (U
RSS
qkn
1
n
2
/2)/
_
V
H
0
(U
RSS
)
has a limiting N(0, 1) distribution. The diculty is in the calculation of the V
H
0
(U
RSS
); recall
Theorem 1.9.1 for a similar calculation for the sign statistic. Bohn and Wolfe (1992) present
a complex formula for the variance. Bohn and Wolfe provide a table of the approximate null
distribution of U
RSS
for q = k = 2, n
1
= 1, . . . , 5, n
2
= 1, . . . , 5 and likewise for q = k = 3.
Another way to approximate the null distribution of U
RSS
is to bootstrap it. Consider,
for simplicity, the case k = q = 3 and n
1
= n
2
= m. Hence the expert must rank three
observations and each of the m cycles consists of three samples of size three for each of the
X and Y measurements. In order to bootstrap the null distribution of U
RSS
, rst align the
Y -RSSs with

, the Hodges-Lehmann estimate of shift computed across the two RSSs.
Our bootstrap sampling is on the data with the indicated sampling distributions:
X
(1)1
, . . . , X
(1)m
sample

F
(1)
(x) Y
(1)1
, . . . , Y
(1)m
sample

F
(1)
(y

)
X
(2)1
, . . . , X
(2)m
sample

F
(2)
(x) Y
(2)1
, . . . , Y
(2)m
sample

F
(2)
(y

)
X
(3)1
, . . . , X
(3)m
sample

F
(3)
(x) Y
(3)1
, . . . , Y
(3)m
sample

F
(3)
(y

)
.
In the bootstrap process, for each row i = 1, 2, 3, we take random samples X

(i)1
, . . . , X

(i)m
from

F
(i)
(x) and Y

(i)1
, . . . , Y

(i)m
from

F
(2)
(y

). We then compute U

RSS
on these samples.
Repeating this B times, we obtain the sample of test statistics U

RSS,1
, . . . , U

RSS,B
. Then
the bootstrap p-value for our test is #(U

RSS,j
U
RSS
)/B, where U
RSS
is the value of the
statistic based on the original data. Generally we take B = 1000 for a p-value. It is clear
how to modify the above argument to allow for k ,= q and n
1
,= n
2
.
2.10. TWO SAMPLE SCALE PROBLEM 125
2.10 Two Sample Scale Problem
Frequently it is of interest to investigate whether or not one random variable is more dispersed
than another. The general case is when the random variables dier in both location and scale.
Suppose the distribution functions of X and Y are given by F(x) and G(y) = F((y )/),
respectively; hence L(Y ) = L(X +). For discussion, we consider one-sided hypotheses of
the form
H
0
: = 1 versus H
A
: > 1. (2.10.1)
The other one-sided or two-sided hypotheses can be handled similarly. Let X
1
, . . . , X
n
1
and
Y
1
, . . . , Y
n
2
be samples drawn on the random variables X and Y , respectively.
The traditional test of H
0
is the F-test which is based on the ratio of sample variances.
As we discuss in Section 2.10.2, though, this test is generally not asymptotically correct, (one
of the exceptions is when F(t) is a normal cdf). Indeed, as many simulation studies have
shown, this test is extremely liberal in many non-normal situations; see Conover, Johnson
and Johnson (1981).
Tests of H
0
should be invariant to the locations. One way of ensuring this is to rst
center the observations. For the F-test, the centering is by sample means; instead, we prefer
to use the sample medians. Let

X
and

Y
denote the sample medians of the X and Y
samples, respectively. Then the samples of interest are the folded aligned samples given by
[X

1
[, . . . , [X

n
1
[ and [Y

1
[, . . . , [Y

n
2
[, where X

i
= X
i

X
and Y

i
= Y
i

Y
.
2.10.1 Optimal Rank-Based Tests
To obtain appropriate score functions for the scale problem, rst consider the case when
the location parameters of X and Y are known. Without loss of generality, we can then
assume that they are 0 and, hence, that L(Y ) = L(X). Further because > 0, we have
L([Y [) = L([X[). Let Z

= (log [X
1
[, . . . , log [X
n
1
[, log [Y
1
[, . . . , log [Y
n
2
[) and c
i
, (2.2.1), be
the dummy indicator variable, i.e., c
i
= 0 or 1, depending on whether Z
i
is an X or Y ,
repectively. Then an equivalent formulation of this problem is
Z
i
= c
i
+ e
i
, 1 i n , (2.10.2)
where = log , e
1
, . . . , e
n
are iid with distribution function F

(x) which is the cdf of log [X[.

The hypotheses, (2.10.1), are equivalent to
H
0
: = 0 versus H
A
: > 1. (2.10.3)
Of course, this is the two sample location problem based on the logs of the absolute values
of the observations. Hence, the optimal score function for Model 2.10.2 is given by

f
(u) =
f

(F
1
(u)))
f

(F
1
(u)))
. (2.10.4)
126 CHAPTER 2. TWO SAMPLE PROBLEMS
After some simplication, see Exercise 2.13.30, we have

(x)
f

(x)
=
e
x
[f

(e
x
) f

(e
x
)]
f (e
x
) + f (e
x
)
+ 1 . (2.10.5)
If we further assume that f is symmetric, then expression (2.10.5) for the optimal scores
function simplies to

f
(u) = F
1
_
u + 1
2
_
f

_
F
1
_
u+1
2
__
f
_
F
1
_
u+1
2
__ 1. (2.10.6)
This expression is convenient to work with because it depends on F(t) and f(t), the
cdf and pdf of X, in the original formulation of this scale problem. The following two
examples obtain the optimal score function for the normal and double exponential situations,
respectively.
Example 2.10.1. L(X) Is Normal
Without loss of generality, assume that f(x) is the standard normal density. In this case
expression (2.10.6) simplies to

FK
(u) =
_

1
_
u + 1
2
__
2
1 , (2.10.7)
where is the standard normal distribution function; see Exercise 2.13.33. Hence, if we are
sampling from a normal distribution this suggests the rank test statistic
S
FK
=
n
2

j=1
_

1
_
R[Y
j
[
2(n + 1)
+
1
2
__
2
, (2.10.8)
where the FK subscript is due to Fligner and Killeen (1976), who discussed this score
function in their work on the two-sample scale problem.
Example 2.10.2. L(X) Is Double Exponential
Suppose that the density of X is the double exponential, f(x) = 2
1
exp [x[, <
x < . Then as Exercise 2.13.33 shows the optimal rank score function is given by
(u) = (log (1 u) + 1) . (2.10.9)
These scores are not surprising, because the distribution of [X[ is exponential. Hence, this
is precisely the log linear problem with exponentially distributed lifetime that was discussed
in Section 2.8; see the discussion around expression (2.8.8).
Example 2.10.3. L([X[) Is a Member of the Generalized F-family: MWW Statistic
2.10. TWO SAMPLE SCALE PROBLEM 127
In Section 3.10 a discussion is devoted to a large family of commonly used distributions
called the generalized F family for survival type data. In particular, as shown there, if
[X[ follows an F(2, 2)-distribution, then it follows, (Exercise 2.13.31), that the log [X[ has
a logistic distribution. Thus the MWW statistic is the optimal rank score statistic in this
case.
Notice the relationship between tail-weight of the distribution and the optimal score
function for the scale problem over these last three examples. If the underlying distribution
is normal then the optimal score function (2.10.8) is for very light-tailed distributions. Even
at the double-exponential, the score function (2.10.9) is still for light-tailed errors. Finally,
for the heavy-tailed (variance is ) F(2, 2) distribution the score function is the bounded
MWW score function. The reason for the dierence in location and scale scores is that
the optimal score function for the scale case is based on the distribution of the logs of the
original variables.
Once a scale score function is selected, following Section 2.5 the general scores process
for this problem is given by
S

() =
n
2

j=1
a

(R(log [Y
j
[ )) , (2.10.10)
where the scores a(i) are generated by a(i) = (i/(n + 1)).
A rank test statistic for the hypotheses, (2.10.3), is given by
S

= S

(0) =
n
2

j=1
a

(R(log [Y
j
[) =
n
2

j=1
a

(R([Y
j
[) , (2.10.11)
where the last equality holds because the log function is strictly increasing. This is not
necessarily a standardized score function, but it follows from the discussion on general scores
found in Section 2.5 and (2.5.18) that the null mean

and null variance

of the statistic
are given by

= n
2
a and
2

=
n
1
n
2
n(n 1)

(a(i) a)
2
. (2.10.12)
The asymptotic version of this test statistic rejects H
0
at approximate level if z z

where
z =
S

. (2.10.13)
The ecacy of the test based on S

is given by expression (2.5.28); i.e.,

=
1

2
, (2.10.14)
where

is given by

=
_
1
0
(u)
f
(u) du (2.10.15)
128 CHAPTER 2. TWO SAMPLE PROBLEMS
and the optimal scores function
f
(u) is given in expression (2.10.4). Note that this formula
for the eciacy is under the assumption that the score function (u) is standardized.
Recall the original (realistic) problem, where the distribution functions of X and Y are
given by F(x) and G(y) = F((y )/), respectively and the dierence in locations, , is
unknown. In this case, L(Y ) = L(X + ). As noted above, the samples of interest are
the folded aligned samples given by [X

1
[, . . . , [X

n
1
[ and [Y

1
[, . . . , [Y

n
2
[, where X

i
= X
i

X
and Y

i
= Y
i

Y
, where

X
and

Y
denote the sample medians of the X and Y samples,
respectively.
Given a score function (u), we consider the linear rank statistic, (2.10.11), where the
ranking is performed on the folded-aligned observations; i.e.,
S

=
n
2

j=1
a(R([Y

j
[)). (2.10.16)
The statistic S

is no longer distribution free for nite samples. However, if we further assume

that the distributions of X and Y are symmetric, then the test statistic S

is asymptotically
distribution free and has the same eciency properties as S

; see Puri (1968) and Fligner

and Hettmansperger (1979). The requirement that f is symmetric is discussed in detail by
Fligner and Hettmansperger (1979). In particular, we standardize the statistic using the
mean and variance given in expression (2.10.12).
Estimation and condence intervals for the parameter are based on the process
S

() =
n
2

j=1
a

(R(log [Y

j
[ )) , (2.10.17)
An estimate of is a value

which solves the equation (2.5.12); i.e.,
S

)
.
= 0 . (2.10.18)
An estimate of , the ratio of scale parameters, is then
= e
b

. (2.10.19)
The interval (

L
,

U
) where

L
and

U
solve the equations (2.5.21), forms (asymptotically)
a (1 )100% condence interval for . The corresponding condence interval for is
(exp

L
, exp

U
).
As a simple rank-based analysis, consider the test and estimation given above based on
the optimal scores (2.10.7) for the normal situation. The folded aligned samples version of
the test statistic (2.10.8) is the statistic
S

FK
=
n
2

j=1
_

1
_
R[Y

j
[
2(n + 1)
+
1
2
__
2
. (2.10.20)
2.10. TWO SAMPLE SCALE PROBLEM 129
The standardized test statistic is z

FK
= (S

FK

FK
)/
FK
, where
FK
abd
FK
are the
vaules of (2.10.12) for the scores (2.10.7). This statistic for non-aligned samples is given on
page 74 of Hajek and

Sidak (1967). A version of it was also discussed by Fligner and Killeen
(1976). We refer to this test and the associated estimator and condence interval as the
Fligner-Killeen analysis. The RBR function twoscale with the score function phiscalefk
computes the Fligner-Killeen analysis. We next obtain the ecacy of this analysis.
Example 2.10.4. Ecacy for the Score Function
FK
(u).
To use expression (2.5.28) for the ecacy, we must rst standardize the score function

FK
(u) =
1
[(u +1)/2]
2
1, (2.10.7). Using the substitution (u +1)/2 = (t), we have
_
1
0

FK
(u) du =
_

t
2
(t) dt 1 = 1 1 = 0.
Hence, the mean is 0. In the same way,
_
1
0
[
FK
(u)]
2
du =
_

t
4
(t) dt 2
_

t
2
(t) dt + 1 = 2.
Thus the standardized score function is

FK
(u) =
1
[(u + 1)/2]
2
1]/

2. (2.10.21)
Hence, the ecacy of the Fligner-Killeen analysis is
c

FK
=
_

2
_
1
0
1

1
[(u + 1)/2]
2
1]
f
(u) du, (2.10.22)
where the optimal score function
f
(u) is given in expression (2.10.4). In particular, the
ecacy at the normal distribution is given by
c

FK
(normal) =
_

2
_
1
0
1

1
[(u + 1)/2]
2
1]
2
du, =

2
_

2
. (2.10.23)
We illustrate the Fligner-Killeen analysis with the following example.
Example 2.10.5. Doksum and Sievers Data.
Doksum and Sievers (1976) describe an experiment involving the eect of ozone on weight
gain of rats. The experimental group consisted of n
2
= 22 rats which were placed in an ozone
environment for seven days, while the control group contained n
1
= 21 rats which were placed
in an ozone free environment for the same amount of time. The response was the weight
gain in a rat over the time period. Figure 2.10.1 displays the comparison boxplots for the
data. There appears to be a dierence in scale. Using the RBR software discussed above,
130 CHAPTER 2. TWO SAMPLE PROBLEMS
Figure 2.10.1: Comparison Boxplots of Treated and Control Weight Gains in rats.
Control Ozone

1
0
0
1
0
2
0
3
0
4
0
5
0
W
e
i
g
h
t

G
a
i
n
Comparison Boxplots of Control and Ozone
the Fligner-Killeen test statistic S

FK
= 28.711 and its standardized value is z

FK
= 2.095.
The corresponding p-value for a two sided test is 0.036, conrming the impression from the
plot. The associated estimate of the ratio (ozone to control) of scales is = 2.36 with a 95%
condence interval of (1.09, 5.10).
Conover, Johnson and Johnson (1981) performed a large Monte Carlo study of tests of
dispersion, including these folded-aligned rank tests, over a wide variety of situations for
the c-sample scale problem. The traditional F-test (Bartletts test) did poorly, (as would
be expected from our comments below about the lack of robustness of the classical F-test).
In certain null situations its empirical levels exceeded .80 when the nominal level was
.05. One rank test that performed very well was the aligned rank version of a test statistic
similar to S

FK
, (2.10.20), but with the exponent of one instead of two in the denition of the
score function. This performed well overall in terms of validity and power except for highly
asymmetric distributions, where it has a tendency to be liberal. However, in the following
simulation study the Fligner-Killeen test (??) (exponent of two) is empirically valid over the
asymmetric situations covered.
Example 2.10.6. Simulation Study for Validity of Tests S

Table 2.10.1 displays the results of a small simulation study of the validity of the rank-
based tests of scale for various score functions over mostly skewed error distributions. The
scores in the study are: (fk
2
), the optimal score function for the normal distribution; (fk),
similar to last except the exponent is one; (Wilcoxon), the linear Wilcoxon score function;
2.10. TWO SAMPLE SCALE PROBLEM 131
(Quad), the score function (u) = u
2
; and (Logistic) the optimal score function if the dis-
tribution of X is logistic (see Exercise 2.13.32). The error distributions include the normal
and the
2
(1) distributions and several members of the skewed contaminated normal dis-
tribution. In the later case, the random variable X is written as X = X
1
(1 I

) + I

X
2
,
where X
1
and X
2
have N(0, 1) and N(
c
,
2
c
) distributions, respectively, I

has a Bernoulli
distribution with probability of success , and X
1
, X
2
and I

are mutually independent. For

the study was set at 0.3 and
c
and
c
varied. The pdfs of the three SCN distributions
in Table 2.10.1 are shown in Figure 2.10.2. The pdf in the bottom right cornor panel of
the gure is that of
2
(1)-distribution. For all but the last situation in Table 2.10.1, the
sample sizes are n
1
= 20 and n
2
= 25. The last situation is for n
1
= n
2
= 10, The number
of simulations for each situation was set at 1000. For each run, the two sided alternative,
H
A
: ,= 1, was tested and the estimator of and an associated condence interval for
was obtained. Computations were performed by RBR functions.
The table shows the empirical levels at the nominal 0.10, 0.05, and 0.01 levels; the
emirical condence coecient for a nominal 95% condence interval, the mean of the esti-
mates of , and the MSE for . Of the ve analyses, overall the Fligner-Killeen analysis (fk
2
)
performed the best. This analysis was valid (nominal levels and empirical coverage) in all
the situations, except for the
2
(1) distribution at the 10% level and the larger sample sizes.
Even here, its empirical level is 0.128. The other tests were liberal in the skewed situations,
some as the Wilcoxon quite liberal. Also, the fk analysis (exponent 1 in its score function)
was liberal for the
2
(1) situations. Notice that the Fligner-Killeen analysis achieved the
lowest MSE in all the situations.
Hall and Padmanabhan (1997) developed a percentile bootstrap for these rank-based
tests which in their accompanying study performed quite well for skewed error distributions
as well as the symmetric error distributions.
As a nal remark, another class of linear rank statistics for the two sample scale problem
consists of simple linear rank statistics of the form
S =
n
2

j=1
a(R(Y
j
)) , (2.10.24)
where the scores are generated as a(i) = (i/(n + 1)). The folded rank statistics discussed
above suggest that be a convex (or concave) function. One popular score function is the
quadratic function (u) = (u 1/2)
2
. The resulting statistic,
S
M
=
n
2

j=1
_
R(Y
j
)
n + 1

1
2
_
2
, (2.10.25)
was proposed by Mood (1954) as a test statistic for (??). For the realistic problem with
unknown location, though, the observations have to be rst aligned. Asymptotic theory
holds, provided the underlying distribution is symmetric. This class of aligned rank tests,
though, did not perform nearly as well as the folded rank statistics, (2.10.16), in the large
Monte Carlo study of Conover et al. (1981). Hence, we recommend the folded rank-based
analyses discussed above.
132 CHAPTER 2. TWO SAMPLE PROBLEMS
Table 2.10.1: Empirical Levels, Condences, and MSEs for the Monte carlo Study of Ex-
ample ??.
Normal Errors, n
1
= 20, n
2
= 25

.10

.05

.01

Cnf
.95
MSE( )
Logistic 0.083 0.041 0.006 0.961 1.037 0.060
Quad. 0.080 0.030 0.008 0.970 1.043 0.076
Wilcoxon 0.073 0.033 0.004 0.967 1.042 0.097
fk
2
0.087 0.039 0.004 0.960 1.036 0.057
fk 0.077 0.033 0.005 0.969 1.037 0.067
SKCN(
c
= 2,
c
=

2,
c
= 0.3), n
1
= 20, n
2
= 25
Logistic 0.106 0.036 0.006 0.965 1.035 0.076
Quad. 0.106 0.046 0.008 0.953 1.040 0.095
Wilcoxon 0.103 0.049 0.007 0.952 1.043 0.117
fk
2
0.100 0.034 0.006 0.966 1.033 0.073
fk 0.099 0.047 0.006 0.953 1.034 0.085
SKCN(
c
= 6,
c
=

2,
c
= 0.3), n
1
= 20, n
2
= 25
Logistic 0.081 0.033 0.006 0.966 1.067 0.166
Quad. 0.122 0.068 0.020 0.933 1.105 0.305
Wilcoxon 0.163 0.103 0.036 0.897 1.125 0.420
fk
2
0.072 0.026 0.005 0.974 1.057 0.126
fk 0.111 0.057 0.015 0.942 1.075 0.229
SKCN(
c
= 12,
c
=

2,
c
= 0.3), n
1
= 20, n
2
= 25
Logistic 0.084 0.046 0.007 0.954 1.091 0.298
Quad. 0.138 0.085 0.018 0.916 1.183 0.706
Wilcoxon 0.171 0.116 0.038 0.886 1.188 0.782
fk
2
0.074 0.042 0.007 0.958 1.070 0.201
fk 0.115 0.069 0.015 0.932 1.109 0.400

2
(1), n
1
= 20, n
2
= 25
Logistic 0.154 0.086 0.023 0.913 1.128056 0.353
Quad. 0.249 0.149 0.047 0.851 1.170 0.482
Wilcoxon 0.304 0.197 0.067 0.804 1.196 0.611
fk
2
0.128 0.066 0.018 0.936 1.120 0.336
fk 0.220 0.131 0.039 0.870 1.154 0.432

2
(1), n
1
= 10, n
2
= 10
Logistic 0.132 0.062 0.018 0.934 1.360 1.495
Quad. 0.192 0.099 0.035 0.900 1.457 2.108
Wilcoxon 0.276 0.166 0.042 0.833 1.560 3.311
fk
2
0.111 0.057 0.013 0.941 1.335 1.349
fk 0.199 0.103 0.033 0.893 1.450 2.086
2.10. TWO SAMPLE SCALE PROBLEM 133
Figure 2.10.2: Pdfs of Skewed Distributions in the Simulation Study of Example 2.10.6.

2 0 2 4 6 8
0
.
0
0
0
.
1
0
0
.
2
0
0
.
3
0
x
f
(
x
)
SCN:
c
= 2,
c
= 1.41, = .3

2 0 2 4 6 8 10
0
.
0
0
0
.
0
5
0
.
1
0
0
.
1
5
0
.
2
0
0
.
2
5
x
f
(
x
)
SCN:
c
= 6,
c
= 1.41, = .3

0 5 10 15
0
.
0
0
0
.
0
5
0
.
1
0
0
.
1
5
0
.
2
0
0
.
2
5
x
f
(
x
)
SCN:
c
= 12,
c
= 1.41, = .3

0 1 2 3 4
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
1
.
2
x
f
(
x
)

2
, One Defree of Freedom
2.10.2 Ecacy of the Traditional F-Test
We next obtain the ecacy of the traditional F-test for the ratio of scale parameters.
Actually for our development we need not assume that X and Y have the same locations.
Let
2
2
and
2
1
denote the variances of Y and X respectively. Then in the notation in the
rst paragraph of this section,
2
=
2
2
/
2
1
. The classical F-test of the hypotheses (??) is to
reject H
0
if F

F(, n
2
1, n
1
1) where
F

=
2
2
/
2
1
,
and
2
2
and
2
1
are the sample variances of the samples Y
1
, . . . , Y
n
2
and X
1
, . . . , X
n
1
, respec-
tively. The F-test is exact size if f is a normal pdf. Also the test is invariant to dierences
in location.
We rst need the asymptotic distribution of F

under the null hypothesis. Instead of

working with F

it is more convenient mathematically to work with the equivalent test

statistic

nlog F

. We will assume that X has a nite fourth central moment; i.e.,

X,4
=
E[(XE(X))
4
] < . Let = (
X,4
/
4
1
) 3 denote the kurtosis of X. It easily follows that
Y has the same kurtosis under the null and alternative hypotheses. A key result, established
in Exercise 2.13.38, is that under these conditions

n
i
(
2
i

2
i
)
D
N(0,
4
i
( + 2)) , for i = 1, 2 . (2.10.26)
It follows immediately by the delta method that

n
i
(log
2
i
log
2
i
)
D
N(0, + 2) , for i = 1, 2 . (2.10.27)
134 CHAPTER 2. TWO SAMPLE PROBLEMS
Under H
0
,
i
= , say, and the last result,

nlog F

=
_
n
n
2

n
2
(log
2
2
log
2
)
_
n
n
1

n
1
(log
2
1
log
2
)
D
N(0, ( + 2))/(
1

2
)) . (2.10.28)
The approximate test rejects H
0
if

nlog F

_
( + 2))/(
1

2
)
z

. (2.10.29)
Note that = 0 if X is normal. In practice the test which is used assumes = 0; that
is, F

is not corrected by an estimate of . This is one reason that the usual F-test for
ratio in variances does not possess robustness of validity; that is, the signicance level is
not asymptotically distribution free. Unlike the t-test, the F-test for variances is not even
asymptotically distribution free under H
0
.
In order to obtain the ecacy of the F-test, consider the sequence of contiguous alter-
natives (??). Assume without loss of generality that the locations of X and Y are the same.
Under this sequence of alternatives we have Y
j
= e
n
U
j
where U
j
is a random variable with
cdf F(x) while Y
j
has cdf F(e
n
x). We also get
2
2
= exp 2
n

2
U
where
2
U
denotes the
sample variance of U
1
, . . . , U
n
2
. Let
F
() denote the power function of the F-test. The
asymptotic power lemma for the F test is
Theorem 2.10.1. Assuming that X has a nite fourth moment, with = (
X,4
/
4
1
) 3,
lim
n

F
(
n
) = P(Z z

c
F
) ,
where Z has a standard normal distribution and ecacy
c
F
= 2
_

2
/
_
+ 2 . (2.10.30)
Proof: The conclusion follows directly upon observing,

nlog F

n(log
2
2
log
2
1
)
=

n(log
2
U
+ 2(/

n) log
2
1
)
= 2 +
_
n
n
2

n
2
(log
2
U
log
2
)
_
n
n
1

n
1
(log
2
1
log
2
)
and that the last quantity converges in distribution to a N(2, ( + 2))/(
1

2
)) variate.
Let (u) denote a general score function for an foled-aligned rank-based analysis as
discussed above. It then follows that the asymptotic relative eciency of this analysis to the
F-test is the ratio of the squares of their ecacies, i.e., e(S, F) = c
2

/c
2
F
, where c

is given
in expression (2.5.28).
Suppose we use the Fligner-Killeen analysis. Then its ecacy is c

FK
which is given in
expression (2.10.22). The ARE between the Fligner-Killeen analysis and the traditional F-
test analysis is the ratio c
2

FK
/c
2
F
. In particular, if we assume that the underlying distribution
is normal, then by (2.10.23), this ratio is one.
2.11. BEHRENS-FISHER PROBLEM 135
2.11 Behrens-Fisher Problem
Consider the general model in Section 2.1 of this chapter, where X
1
, . . . , X
n
1
is a random
sample on the random variable X which has distribution function F(x) and density function
f(x) and Y
1
, . . . , Y
n
2
is a second random sample, independent of the rst, on the random
variable Y which has common distribution function G(x) and density g(x). Let
X
and
Y
denote the medians of X and Y , respectively, and let =
Y

X
. In Section 2.4 we
showed that the MWW test was consistent for the stochastically ordered alternative. In the
location model where the distributions of X and Y dier by at most a shift in location, the
hypothesis F = G is equivalent to the the null hypothesis that = 0. In this section we
drop the location model assumption, that is, we will assume that X and Y have distribution
functions F and G respectively, but we still consider the null hypothesis that = 0. In
order to avoid confusion with Section 2.4, we explicitly state the hypotheses of this section
as
H
0
: = 0 versus H
A
: > 0 , where =
Y

X
, and L(X) = F, and L(Y ) = G .
(2.11.1)
As in the previous sections we have selected a specic alternative for the discussion.
The above hypothesis is our most general hypothesis of this section and the modied
Mathisens test dened below is consistent for it. We will also consider the case where the
forms of F and G are the same; that is, G(x) = F(x/), for some parameter . Note in
this case that L(Y ) = L(X); hence, = T(Y )/T(X) where T(X) is any scale functional,
(T(X) > 0 and T(aX) = aT(X) for a 0). If T(X) =
X
, the standard deviation of
X, then this is a Behrens-Fisher problem with F unknown. If we further assume that the
distributions of X and Y are symmetric then the modied MWW, dened below, can be
used to test that = 0. The most restrictive case, is when both F and G are assumed to be
normal distribution functions. This is, of course, the classical Behrens-Fisher problem and
the classical solution to it is the Welch type t-test, discussed below. For motivation we rst
show the behavior of usual the MWW statistic. We then consider general rank procedures
and nally specialize to analogues of the L
1
and MWW analyses.
2.11.1 Behavior of the Usual MWW Test
In order to motivate the problem, consider the null behavior of the usual MWW test under
(2.11.1) with the further restriction that the distributions of X and Y are symmetric. Under
H
0
, since we are examining null behavior there is no loss of generality if we assume that

X
=
Y
= 0. The asymptotic form of the MWW test rejects H
0
in favor of H
A
if
S
+
R
=
n
1

i=1
n
2

j=1
I(Y
j
X
i
> 0)
n
1
n
2
2
+ z

_
n
1
n
2
(n + 1)
12
.
This test would have asymptotic level if F = G. As Exercise 2.13.41 shows, we still have
E
H
0
(S
+
R
) = n
1
n
2
/2 when the densities of X and Y are symmetric. From Theorem 2.4.5, Part
136 CHAPTER 2. TWO SAMPLE PROBLEMS
(a), the variance of the MWW statistic under H
0
satises the limit,
Var
H
0
(S
+
R
)
n
1
n
2
(n + 1)

1
Var(F(Y )) +
2
Var(G(X)) .
Recall that we obtained the asymptotic distribution of S
+
R
, Theorem 2.4.9, under general
conditions which cover the current assumptions; hence, the true signicance level of the
MWW test has the following limiting behavior:

S
+
R
= P
H
0
_
S
+
R

n
1
n
2
2
+ z

_
n
1
n
2
(n + 1)
12
_
= P
H
0
_
S
+
R

n
1
n
2
2
_
Var
H
0
(S
+
R
)
z

n
1
n
2
(n + 1)
12Var
H
0
(S
+
R
)
_
1
_
z

(12)
1/2
(
1
Var(F(Y )) +
2
Var(G(X)))
1/2

. (2.11.2)
Under the assumptions that the sample sizes are the same and that L(X) and the L(Y )
have the same form we can simplify expression (2.11.2) further. We express the result in the
following theorem.
Theorem 2.11.1. Suppose that the null hypothesis in (2.11.1) is true. Assume that the
distributions of Y and X are symmetric, n
1
= n
2
, and G(x) = F(x/) where is an
unknown parameter. Then the maximum observed signicance level is 1 (.816z

) which
is approached as 0 or .
Proof: Under the assumptions of the theorem, note that Var(F(Y )) =
_
F
2
(t)dF(t)
1
4
and Var(G(X)) =
_
F
2
(x/)dF(x)
1
4
. Dierentiating (2.11.2) with respect to we get

_
z

(12)
1/2
((1/2)Var(F(Y )) + (1/2)Var(G(X)))
1/2

(12)
1/2
__
F(t)tf(t)f(t)dt +
_
F(t/)f(t/)(t/
2
)f(t)dt
_
3/2
. (2.11.3)
Making the substitution u = t in the rst integral, the quantity in braces reduces to

2
_
(F(u) F(u/))uf(u)f(u/)du. Note that the other factors in (2.11.3) are strictly
positive. Thus to determine the graphical behavior of (2.11.2) with respect to , we need
only consider the factor in braces. First note that it has a critical point at = 1. Next
consider the case > 1. In this case F(u) F(u/) < 0 on the interval (, 0) and is
positive on the interval (0, ); hence the factor in braces is positive for > 1. Using a
similar argument this factor is negative for 0 < < 1. Therefore the limit of the function

S
+
R
() is decreasing on the interval (0, 1), has a minimum at = 1 and is increasing on the
interval (1, ).
Thus the minimum level of signicance occurs at = 1, (the location model), where it is
. By the graphical behavior of the function, maximum levels would occur at the extremes
2.11. BEHRENS-FISHER PROBLEM 137
of 0 and . But it follows that
Var(F(Y )) =
_
F
2
(t)dF(t)
1
4

_
0 if 0
1
4
if
and
Var(G(X)) =
_
F
2
(x/)dF(x)
1
4

_
1
4
if 0
0 if
.
From these two results and (2.11.2), the true signicance level of the MWW test satises

S
+
R

_
1 (z

(3/2)
1/2
) if 0
1 (z

(3/2)
1/2
) if
.
Hence,

S
+
R
1 (z

(3/2)
1/2
) = 1 (.816z

) ,
whether 0 or . Thus the maximum observed signicance level is 1 (.816z

) which
is approached as 0 or .
For example if = .05 then .816z

= 1.34 and
S
+
R
1 (1.34) = .09. Thus in the
equal sample size case when F and G dier only in scale parameter and are symmetric, the
nominal 5% level of the MWW test will not be worse than .09. In order to guarantee that
.05 choose z

so that 1 (.816z

) = .05. This leads to z

= 2.02 which is the critical

value for an = .02. Hence another way of saying this is: by performing a 2% MWW test
we are guaranteed that the true (asymptotic) level is at most 5%.
2.11.2 General Rank Tests
Assuming the most general hypothesis, (2.11.1), we will follow the development of Fligner
and Policello (1981) to construct general tests. Suppose T represents a rank test statistic,
used in the case F = G, and that the test rejects H
0
: = 0 in favor of H
A
: > 0 for
large values of T. Suppose further that n
1/2
(T
F,G
)/
F,G
converges in distribution to a
standard normal. Let
0
denote the null mean of T and assume that it is independent of F.
Next suppose that is a consistent estimate of
F,G
which is a function only of the ranks
of the combined sample. This will ensure distribution freeness under H
0
; otherwise, the test
statistic will only be asymptotically distribution free. The modied test statistic is

T =
n
1/2
(T
0
)

. (2.11.4)
Such a test can be used for the general hypothesis (2.11.1). Fligner and Policello (1981)
applied this approach to Moods statistic; see Hettmansperger and Malin (1975), also. In
the next section, we consider Mathisens test.
138 CHAPTER 2. TWO SAMPLE PROBLEMS
2.11.3 Modied Mathisens Test
We next present a modied version of Mathisens test for the most general hypothesis
(2.11.1). Let

X
= med
i
X
i
and dene the sign-process
S
2
() =
n
2

j=1
sgn(Y
j
) . (2.11.5)
Recall from expression (2.6.8), Section 2.6.2 that Mathisens test statistic (centered version)
is given by S
2
(

X
). This will be our test statistic. The modication lies in its asymptotic
distribution which is given in the next theorem.
Theorem 2.11.2. Assume the null hypothesis in expression (2.11.1) is true. Then under the
assumption (D.1), (2.4.7),
1

n
2
S
2
(

X
) is asymptotically normal with mean 0 and asymptotic
variance 1 +K
2
12
where K
2
12
is dened by
K
2
12
=

2

1
g
2
(
Y
)
f
2
(
X
)
. (2.11.6)
Proof: Assume without loss of generality that
X
=
Y
= 0. From the asymptotic linearity
results discussed in Example 1.5.2 of Chapter 1, we have that
1

n
2
S
2
(
n
)
.
=
1

n
2
S
2
(0) 2g(0)

n
2

n
,
for

n[
n
[ c, c > 0. Since

n
2

X
is bounded in probability, upon substitution in the last
expression we get
1

n
2
S
2
(

X
)
.
=
1

n
2
S
2
(0) 2g(0)

n
2

X
. (2.11.7)
In Example 1.5.2, we also have the approximation

X
.
=
1
n
1
2f(0)
S
1
(0) , (2.11.8)
where S
1
(0) =

n
1
i=1
sgn(X
i
). Combining (2.11.7) and (2.11.8), we get
1

n
2
S
2
(

X
)
.
=
1

n
2
S
2
(0)
g(0)
f(0)
_
n
2
n
1
1

n
1
S
1
(0) . (2.11.9)
The results follows because of independent samples and because S
i
(0)/

n
i
D
N(0, 1), for
i = 1, 2.
In order to use this test we need an estimate of K
12
. As in Chapter 1, selected order
statistics from the sample X
1
, . . . , X
n
1
will provide a condence interval for the median
of X. Hence given a level , the interval (L, U), where L
1
= X
(k+1)
, U
1
= X
(nk)
, and
2.11. BEHRENS-FISHER PROBLEM 139
k = n/2 z
/2
(

n/2) is an approximate (1 )100% condence interval for the median of

X. Let D
X
denote the length of this condence interval. By Theorem 1.5.9 of Chapter 1,

n
1
D
X
2z
/2
P
2f(0) . (2.11.10)
In the same way let D
Y
denote the length of the corresponding (1 )100% condence
interval for the median of Y . Dene

K
12
=
D
Y
D
X
. (2.11.11)
From (2.11.10) and the corresponding result for D
Y
, the estimate

K
12
is a consistent estimate
of K
12
, under both H
0
and H
A
.
Thus the modied Mathisens test for the general hypotheses (2.11.1), is to reject H
0
at
approximately level if
Z
M
=
S
2
(

X
)
_
n
2
(1 +

K
2
12
)
z

. (2.11.12)
To derive the ecacy of this statistic we will use the development of Section 1.5.2. The
average to consider is n
1
S
2
(

X
). Let denote the shift in medians and without loss of
generality let
X
= 0. Then the mean function we need is
lim
n
E

(n
1
S
2
(

X
)) = () .
Note that we can reexpress the expansion (2.11.9) as
1
n
S
2
(

X
) =
n
2
n
1
n
2
S
2
(

X
)
.
=
n
2
n
1
_
1
n
2
S
2
(0)
g(0)
f(0)
_
n
2
n
1
_
n
1
n
2
1
n
1
S
1
(0)
_
P

2
_
E

[sgn(Y )]
g(0)
f(0)
E

[sgn(X)]
_
=
2
E

[sgn(Y )] = () , (2.11.13)
where the next to last equality holds since
X
= 0. Using E

(sgn(Y )) = 1 2G(), we
obtain the derivative

(0) = 2
2
g(0) . (2.11.14)
By Theorem 2.11.2 we have the asymptotic null variance of the test statistic S
2
(

X
)/

n.
From the above discussion then the statistic S
2
(

X
) is Pitman regular with ecacy
c
MM
=
2
2
g(0)
_

2
(1 +K
2
12
)
=

2
2g(0)
_

1
+
2
(g
2
(0)/f
2
(0))
. (2.11.15)
Using Theorem 1.5.4 of Chapter 1, consistency of the modied Mathisens test for the
hypotheses (2.11.1) is obtained provided () > (0). But this follows immediately from
the inequality G() > G(0).
140 CHAPTER 2. TWO SAMPLE PROBLEMS
2.11.4 Modied MWW Test
Recall by Theorem 2.4.9 that the mean of the MWW test statistic S
+
R
is n
1
n
2
P(Y > X) =
1
_
G(x)f(x)dx. For general F and G, though, this mean may not be 1/2 under H
0
. Since
this section is concerned with methods for testing the specic hypothesis that = 0, we add
the further restriction that the distributions of X and Y are symmetric. Recall from
Section 2.11.1 that under this assumption and = 0 that E(S
+
R
) = n
1
n
2
/2; see Exercise
2.13.41.
Using the general development of rank tests, Section 2.11.2, our modied rank test is
given by: reject H
0
: = 0 in favor of H
A
: > 0 if Z > z

where
Z =
S
+
R
(n
1
n
2
)/2
_

Var(S
+
R
)
, (2.11.16)
where

Var(S
+
R
) is a consistent estimate of Var(S
+
R
), under H
0
. From the asymptotic distri-
bution theory obtained for S
+
R
under general conditions, Theorem 2.4.9, it follows that this
test has approximate level . By Theorem 2.4.5, we can express the variance as
Var(S
+
R
) = n
1
n
2
_
_
GdF
__
GdF
_
2
_
+ n
1
n
2
(n
1
1)
_
_
F
2
dG
__
FdG
_
2
_
+ n
1
n
2
(n
2
1)
_
_
(1 G)
2
dF
__
(1 G)dF
_
2
_
. (2.11.17)
Following the suggestion of Fligner and Policello (1981), we estimate Var(S
+
R
) by replacing
F and G by the empirical cdfs F
n
1
and G
n
2
respectively. As Exercise 2.13.42 demonstrates,
this estimate is consistent and, further, it is a function of the ranks of the combined sample.
Thus the test is distribution free when F(x) = G(x) and is asymptotically distribution free
when F and G have symmetric densities.
The ecacy for the modied MWW follows using an argument similar to that for the
MWW in Section 2.4. As there, the function S
+
R
() is a decreasing function of . Its mean
function is given by
E

(S
+
R
) = E
0
(S
+
R
()) = n
1
n
2
_
(1 G(x ))f(x)dx .
The average to consider here is S
R
= (n
1
n
2
)
1
S
+
R
. Letting () denote the mean of S
R
under
, we have

(0) =
_
g(x)f(x)dx > 0. The variance we need is
2
(0) = lim
n
nVar
0
(S
R
),
which using the above result on variance simplies to

2
(0) =
1
2
_
_
F
2
dG
__
FdG
_
2
_
+
1
1
_
_
(1 G)
2
dF
__
(1 G)dF
_
2
_
.
2.11. BEHRENS-FISHER PROBLEM 141
The process S
+
R
() is Pitman regular and, in particular, its ecacy is given by,
c
MMWW
=

2
_
g(x)f(x)
_

1
_
_
F
2
dG
__
FdG
_
2
_
+
2
_
_
(1 G)
2
dF
__
(1 G)dF
_
2
_
.
(2.11.18)
As with the modied Mathisens test, we show consistency of the modied MWW test
by using Theorem 1.5.4. Again we need only show that (0) < (). But this follows
immediately provided the supports of F and G overlap in a neighborhood of 0. Note that
this shows that the modied MWW is consistent for the hypotheses (2.11.1) under the further
restriction that the densities of X and Y are symmetric.
2.11.5 Eciencies and Discussion
Before obtaining the asymptotic relative eciencies of the above procedures, we shall briey
discuss traditional methods. Suppose we restrict F and G to have symmetric densities of the
same form with nite variance; that is, F(x) = F
0
((x
X
)/
X
) and G(x) = F
0
((x
Y
)/
Y
)
where F
0
is some distribution function with symmetric density f
0
and
X
and
Y
are the
standard deviations of X and Y respectively.
Under these assumptions, it follows that

n(Y X ) converges in distribution to
N(0, (
2
X
/
1
) + (
2
Y
/
2
)); see Exercise 2.13.43. The test is to reject H
0
: = 0 in favor of
H
A
: > 0 if t
W
> z

where
t
W
=
Y X
_
s
2
X
n
1
+
s
2
Y
n
2
,
where s
2
X
and s
2
Y
are the sample variances of X
i
and Y
j
, respectively. Under these as-
sumptions, it follows that these sample variances are consistent estimates of
2
X
and
2
Y
,
respectively; hence, the test has approximate level . If F
0
is also normal then, under H
0
,
t
W
has an approximate t distribution with a degrees of freedom correction proposed by Welch
(1949). This test is frequently used in practice and we shall subsequently call it the Welch
t-test.
In contrast, the pooled t-test can behave poorly in this situation, since we have,
t
p
=
Y X
_
(n
1
1)s
2
X
+(n
2
1)s
2
Y
n
1
+n
2
2
_
1
n
1
+
1
n
2
_
.
=
Y X
_
s
2
X
n
2
+
s
2
Y
n
1
;
that is, the sample variances are divided by the wrong sample sizes. Hence unless the sample
sizes are fairly close the pooled t is not asymptotically distribution free. Exercise 2.13.44
obtains the true asymptotic level of t
p
.
142 CHAPTER 2. TWO SAMPLE PROBLEMS
In order to get the ecacy of the Welch t, consider the statistic Y X. The mean fuction
at is () = ; hence,

(0) = 1. It follows from the asymptotic distribution discussed

above that

n
_

2
(Y X)
_
(
2
X
/
1
) + (
2
Y
)/
2
)
_
D
N(0, 1) ;
hence, (0) =
_
(
2
X
/
1
) + (
2
Y
)/
2
)/

2
. Thus the ecacy of t
W
is given by
c
t
W
=

(0)
(0)
=

2
_
(
2
X
/
1
) + (
2
Y
)/
2
)
. (2.11.19)
We obtain the AREs of the above procedures for the case where G(x) = F(x/) and
F(x) has density f(x) symmetric about 0 with variance 1. Thus is the ratio of standard
deviations
Y
/
X
. For this case the ecacies (2.11.15), (2.11.18), and (2.11.19) reduce to
c
MM
=
2

2
f(0)
_

2
+
1

2
c
MMWW
=

2
_
gf
_

1
[
_
F
2
dG(
_
FdG)
2
] +
2
[
_
(1 G)
2
dF (
_
(1 G)dF)
2
]
c
t
W
=

2
_

2
+
1

2
.
Thus the ARE between the modied Mathisens procedure and the Welch procedure is the
ratio c
2
MM
/c
2
t
W
= 4
2
X
f
2
(0) = 4f
2
0
(0). This is the same ARE as in the location problem. In
particular the ARE does not depend on =
Y
/
X
. Thus the modied Mathisens test in
comparison to t
W
would have poor eciency at the normal distribution, .63, but in general it
would be much more ecient than t
W
for heavy tailed distributions. Similar to the modied
Mathisens test, the Mood test can also be modied for these problems; see Exercise 2.13.45.
Its ecacy is the same as that of the Mathisens test.
Asymptotic relative eciencies involving the modied Wilcoxon do depend on the ratio
of scale parameters . Fligner and Rust (1982) show that if the variances of X and Y are
quite dierent then the modied Mathisens test may be as ecient as the modied MWW
irrespective of the shape of the underlying distribution.
Fligner and Policello (1981) conducted a simulation study of the pooled t, Welchs t,
MWW and the modied MWW over situations where F and G dier in scale only. The
unmodied tests did not maintain their level. Welchs t performed well when F and G were
normal whereas the modied MWW performed well over all situations, including unequal
sample sizes and normal and contaminated normal distributions. In the simulation study
performed by Fligner and Rust (1982), they found that the modied Mood test maintains
its level over the situations that were considered by Fligner and Policello (1981).
As a nal note, Welchs t requires distributions with the same shape and the modied
MWW requires symmetric densities. The modied Mathisens test and the modied Mood
test, though, are consistent tests for the general problem stated in expression (2.11.1).
2.12. PAIRED DESIGNS 143
2.12 Paired Designs
Consider the situation where we have two treatments of interest, say, A and B, which can be
applied to subjects from a population of interest. Suppose we are interested in a particular
response after these treatments have been applied. Let X denote the response of a subject
after treatment A has been applied and let Y be the corresponding measurement for a
subject after treatment B has been applied. The natural null hypothesis, H
0
, is that there
is no dierence in treatment eects. A one sided alternative, would be that the response
of a subject under treatment B is in general larger than of a subject under treatment A.
Reversing the roles of A and B would yield the other one sided alternative while the union
of the these two alternatives would result in the two sided alternative. Again for deniteness
we choose as our alternative, H
A
, the rst one sided alternative.
The completely randomized design and the paired design are two experimental designs
which are often employed in this situation. In the completely randomized design, n subjects
are selected at random from the population of interest and n
1
of them are randomly assigned
to treatment A while the remaining n
2
= n n
1
are assigned to treatment B. At the end
of the treatment period, we then have two samples, one on X while the other is on Y . The
two sample procedures discussed in the previous sections can be used to analyze the data.
Proper randomization along with carefully controlled experimental conditions give credence
to the assumptions that the samples are random and are independent of one another. The
design that produced the data of Example 2.3.1 was a a completely randomized design.
While the completely randomized design is often used in practice, the underlying vari-
ability may impair the power of any procedure, robust or classical, to detect alternative
hypotheses. The design discussed next usually results in a more powerful analysis but it
does require a pairing device; i.e., a block of length two.
Suppose we have a pairing device. Some examples include identical twins for a study on
human subjects, litter mates for a study on animal subjects, or the same exterior wall of a
house for a study on the durability of exterior house paints. In the paired design, n pairs
of subjects are randomly selected from the population of interest. Within each pair, one
member is randomly assigned to treatment A while the other receives treatment B. Again
let X and Y denote the responses of subjects after treatments A and B respectively have
been applied. This experimental design results in a sample of pairs (X
1
, Y
1
), . . . , (X
n
, Y
n
).
The sample dierences D
1
= X
1
Y
1
, . . . D
n
= X
n
Y
n
, however, become the single sample
of interest. Note that the random pairing in this design induces under the null hypothesis a
symmetrical distribution for the dierences.
Theorem 2.12.1. In a randomized paired design, under the null hypothesis of no treatment
eect, the dierences D
i
are symmetrically distributed about 0.
Proof: Let F(x, y) denote the joint distribution of (X, Y ). Under the null hypothesis
of no treatment eect and randomized pairing, it follows that X and Y are exchangable
random variables; that is, P(X x, Y y) = P(X y, Y x). Hence for a dierence
144 CHAPTER 2. TWO SAMPLE PROBLEMS
D = Y X we have,
P[D t] = P[Y X t] = P[X Y t] = P[D t] .
Thus D and D have the same distribution; hence D is symmetrically distributed about 0.
Let be a location functional for the distribution of D
i
. We shall further assume that D
i
is symmetrically distributed under alternative models also. Then we can express the above
hypotheses by H
0
: = 0 versus H
A
: > 0.
Note that the one sample analyses based on signs and signed-ranks discussed in Chapter
1 are appropriate for the randomly paired design. The appropriate sign test statistic is
S =

sgn(D
i
) while the signed-rank statistic is T =

sgn(D
i
)R([D
i
[).
From Chapter 1 we shall summarize the analysis based on the signed-rank statistic. A
level test would reject H
0
in favor of H
A
, if T c

where c

is determined from the null

distribution of the Wilcoxon signed-rank test or from the asymptotic approximation to the
distribution. The test is consistent for > 0 and it has the eciency results discussed in
Chapter 1. In particular, for normal errors the eciency of T with respect to the usual paired
t-test is .955. The associated point estimate of is the Hodges-Lehmann estimate given by

= med
ij
(D
i
+D
j
)/2. A distribution free condence interval for is constructed based
on the Walsh averages (D
i
+ D
j
)/2, i j as discussed in Chapter 1. Instead of using
Wilcoxon scores, general signed-rank scores as discussed in Chapter 1, can also be used.
A similar summary holds for the analysis based on the sign statistic. In fact for the sign
scores we need not assume that D
1
, . . . , D
n
are identically distributed; that is, there can be
a block eect. This is discussed further in Chapter 4.
We should mention that if the pairing is not done randomly then D
i
may or may not
be symmetrically distributed. If the symmetry assumption is realistic, then both sign and
signed-rank analyses can be used. If, however, it is not realistic then the sign analysis would
still be valid but caution would be necessary in interpretating the results of the signed-rank
analysis.
Example 2.12.1. Darwin Data:
The data, Table 2.12.1, are some measurements recorded by Charles Darwin in 1878.
They consist of 15 pairs of heights in inches of cross-fertilized plants and self-fertilized plants,
(Zea mays), each pair grown in the same pot.
RBR Results for Darwin Data
Results for the Wilcoxon-Signed-Rank procedure
Test of theta = 0 versus theta not equal to 0
Test-Stat. is T 72 Standardized (z) Test-Stat. is 2.016 p-vlaue 0.043
Estimate 3.1375 SE is 1.244385
95 % Confidence Interval is ( 0.5 , 5.2125 )
2.12. PAIRED DESIGNS 145
Table 2.12.1: Plant Growth
Pot 1 2 3 4 5 6 7 8
Cross- 23.500 12.000 21.000 22.000 19.125 21.500 22.125 20.375
Self- 17.375 20.375 20.000 20.000 18.375 18.625 18.625 15.250
Pot 9 10 11 12 13 14 15
Cross- 18.250 21.625 23.250 21.000 22.125 23.000 12.000
Self- 16.500 18.000 16.250 18.000 12.750 15.500 18.000
Estimate of the scale parameter tau 4.819484
Results for the Sign procedure
Test of theta = 0 versus theta not equal to 0
Test stat. S is 11 Standardized (z) Test-Stat. 2.581 p-vlaue 0.009
Estimate 3 SE is 1.307422
95 % Confidence Interval is ( 1 , 6.125 )
Estimate of the scale parameter tau 5.063624
Let D
i
denote the dierence between the heights of the cross-fertilized and self-fertilized
plants of the ith pot and let denote the median of the distribution of D
i
. Suppose we are
interested in testing for an eect; that is, the hypotheses are H
0
: = 0 versus H
A
: ,= 0.
The boxplot of the dierences is displayed in Panel A of Figure 2.12.1, while Panel B gives
the normal qq plot of the dierences. As the plots indicate, the dierences for Pot 2 and,
perhaps, Pot 15 are possible outliers. The results from the RBR functions onesampwil and
onesampsgn are shown below. The value of the signed-rank Wilcoxon statistic for this data
is T = 72 with the approximate p-value of .044. The corresponding estimate of is 3.14
inches and the 95% condence interval is (.50, 5.21).
There are 13 positive dierences, so the standardized value of the sign test statistic is
2.58, with the p-value of 0.01. The corresponding estimate of is 3 inches and the 95%
interpolated condence is (1.00, 6.13). The paired t-test statistic has the value of 2.15 with
p-value 0.050. The dierence in sample means is 2.62 inches and the corresponding 95%
condence interval is (0, 5.23). Note that the outliers impaired the t-test and to a lesser
degree the Wilcoxon signed-rank test; see Exercise 2.13.46 for further analyses.
2.12.1 Behavior under Alternatives
In this section we will compare sample size determination for the paired design with sample
size determination for the completely randomized design. For the paired design, let
+
()
denote the power function of Wilcoxon signed-rank test statistic for the alternative . Then
the asymptotic power lemma, Theorem 1.5.8 with c =
1
=

12
_
f
2
(t) dt, for the signed-
rank Wilcoxon from Chapter 1 states that at signicance level and under the sequence of
146 CHAPTER 2. TWO SAMPLE PROBLEMS
Figure 2.12.1: Boxplot of Darwin Data.

5
0
5
1
0
Darwin Data
P
a
i
r
e
d

d
i
f
f
e
r
n
c
e
s
contiguous alternatives,
n
= /

n,
lim
n

+
(
n
) = P
n
_
Z z

_
.
We will only consider the case where the random vector (Y, X) is jointly normal with
variance-covariance matrix
V =
2
_
1
1
_
.
Then =
_
/3
_
2(1 ).
Now suppose we select the sample size n

so that the Wilcoxon signed-rank test has

power
+
(
0
) to detect the one-sided alternative
0
> 0 for a level test. Then writing

0
=

we have by the asymptotic power lemma and (1.5.25) that

+
(
0
)
.
= 1 (z

0
/) ,
and
n

.
=
(z

+
(o)
)
2

2
0

2
.
Substituting the value of into this nal equation, we have that the necessary sample size
for the paired design to have the desired local power is
n

.
=
(z

+
(o)
)
2

2
0
(/3)
2
2(1 ) . (2.12.1)
2.12. PAIRED DESIGNS 147
Next consider a two-sample design with equal sample sizes n
i
= n

. Assume that X and

Y are iid normal with variance
2
. Then
2
= (/3)
2
. Hence by (2.4.25), the necessary
sample size for the completely randomized design to achieve power
+
(
0
) at the onesided
alternative
0
> 0 for a level test is given by,
n =
_
z

+
(
0
)

0
_
2
2(/3)
2
. (2.12.2)
Based on expressions (2.12.1) and (2.12.2), the sample size needed for the paired design is
(1 ) times the sample size needed for the completely randomized design. If the pairing
device is such that X and Y are strongly, positively correlated then it pays to use the paired
design. The paired design is a disaster, of course, if the variables are negatively correlated.
148 CHAPTER 2. TWO SAMPLE PROBLEMS
2.13 Exercises
2.13.1. (a). Derive the L
2
estimates of intercept and shift based on the L
2
norm on Model
(2.2.4).
(b). Next apply the pseudo norm, (2.2.16), to (2.2.4) and derive the estimating function.
Show that the natural test statistic is the pooled t-statistic.
2.13.2. Show that (2.2.17) is a pseudo norm. Show, also, that it can be written in terms of
ranks; see the formula following (2.2.17).
2.13.3. In the proof of Theorem 2.4.2, verify that L(Y
j
X
i
) = L(X
i
Y
j
).
2.13.4. Prove Theorem 2.4.3.
2.13.5. Prove that if a continuous random variable Z has cdf H(z), then the random variable
H(Z) has a uniform distribution on (0, 1).
2.13.6. In Theorem 2.4.4, show that E(F(Y )) =
_
F(y)dG(y) =
_
(1 G(x))dF(x) =
E(1 G(X)).
2.13.7. Prove that if Z
n
converges in distribution to Z and if Var(Z
n
W
n
) and EZ
n
EW
n
converge to 0, then W
n
also converges in distribution to Z.
2.13.8. Verify (2.4.10).
2.13.9. Explain what happens to the MWW statistic when one support is shifted completely
to the right of the other support. What does this imply about the consistency of the MWW
in this case?
2.13.10. Show that the L
2
estimating function is Pitman regular and derive the ecacy of
the pooled t-test. Also, establish the asymptotic power lemma, Theorem 2.4.13, for the L
2
case. Finally, establish the asymptotic distribution of

n(

Y

X).
2.13.11. Prove that the Hodges-Lehmann estimate of shift, (2.2.18), is translation and scale
equivariant. (See the discussion in Section 2.4.4).
2.13.12. Prove Theorem 2.4.15.
2.13.13. In Example 2.4.1, form the residuals Z
i

c
i
, i = 1, . . . , n. Then, similar to Section
1.5.5, use these residuals to estimate based on (1.3.30).
2.13.14. Simulate independent random samples from N(20, 5
2
) and N(22, 5
2
) distributions
of sizes 10 and 15 respectively. Let denote the shift in the locations of the distributions.
(a.) Obtain comparison boxplots for your samples.
(b.) Use the Wilcoxon procedure to test H
0
: = 0 versus H
A
: ,= 0 at level .05.
2.13. EXERCISES 149
(c.) Use the Wilcoxon procedure to estimate and obtain a 95% condence interval for it.
(d.) Obtain the true value of . Use your condence interval in the last item to obtain
an estimate of . Obtain a symmetric 95% condence interval for based on your
estimate.
(e.) Form a pooled estimate of based on the Wilcoxon signed rank process for each sample.
Obtain a symmetric 95% condence interval for based on your estimate. Compare
it with the estimate from the last item and the true value.
2.13.15. Write minitab macros to bootstrap the distribution of

. Obtain the bootstrap
distribution for 500 bootstraps of data of Problem 2.13.14. What is your bootstrap estimate
of ? Compare with the true value and the other estimates.
2.13.16. Verify the scalar multiple condition for the pseudo norm in the proof of Theorem
2.5.1.
2.13.17. Verify (2.5.9) and (2.5.10).
2.13.18. Consider the process S

(), (2.5.11):
(a). Show that S

() is a decreasing step function, with steps occurring at Y

j
X
i
.
(b). Using Part (a) and the MWW estimator as a starting values, write with some details
an algorithm which obtains the estimator

.
(c). Verify expressions (2.5.14), (2.5.15), and (2.5.16).
2.13.19. Consider the the optimal score function (2.5.22):
(a). Show it is location invariant and scale equivariant. Hence, show if g(x) =
1

f(
x

),
then
g
=
1

f
.
(b). Use (2.5.22) to show that the MWW is asymptotically ecient when the underlying
distribution is logistic. (F(x) = (1 + exp(x))
1
, < x < .)
(c). Show that (2.6.1) is optimal for a Laplace or double exponential distribution. ( f(x) =
1
2
exp([x[), < x < .)
(d). Show that the optimal score function for the extreme value distribution, (f(x) =
expx e
x
, < x < ), is given by (2.8.8).
(e). Show that the optimal score function for the normal distribution is given by (2.5.33).
Show that it is standardized.
(f). Show that (2.5.34) is the optimal score function for an underlying distribution that has
a left logistic tail and a right exponential tail.
150 CHAPTER 2. TWO SAMPLE PROBLEMS
2.13.20. Show that when the underlying density f is symmetric then
f
(1 u) =
f
(u).
2.13.21. Show that expression (2.6.6) is true and that the n = 2r dierences,
Y
(1)
X
(r)
< Y
(2)
X
(r1)
< < Y
(n
2
)
X
(rn
2
+1)
,
can be ordered only knowing the order statistics from the individual samples.
2.13.22. Develop the asymptotic linearity formula for Moods estimating function given in
(2.6.3). Then give an alternative proof of Theorem 2.6.1 based on this result.
2.13.23. Verify the moment formulas (2.6.9) and (2.6.10).
2.13.24. Show that any estimator based on the pseudo norm (2.5.2) is equivariant. Hence, if
we multiply the combined sample observations by a constant, then the estimator is multiplied
by that same constant.
2.13.25. Suppose X is a continuous random variable representing the time until failure
of some process. The hazard function for a continuous random variable X with cdf F is
dened to be the instantaneous rate of failure at X = t, conditional on survival to time t.
It is formally given by:
h
X
(t) = lim
t0
+
P(t X < t + t[X t)
t
.
(a). Show that
h
X
(t) =
f(t)
1 F(t)
.
(b). Suppose that Y has cdf given by (2.8.1). Show the hazard function is given by h
Y
(t) =
h
X
(t).
2.13.26. Verify (2.8.4).
2.13.27. Apply the delta method of nding the asymptotic distribution of a function to
(2.8.3) to nd the asymptotic distribution of . Then verify (2.8.5). Explain how this can
be used to nd an approximate (1 )100% condence interval for .
2.13.28. Verify (2.8.14).
2.13.29. Show that the asymptotic relative eciency of the Mann-Whitney-Wilcoxon test
to the Savage test at the log exponential model, is 3/4.
2.13.30. Verify (2.10.5).
2.13.31. Show that if [X[ has an F(2, 2) distribution then log [X[ has a logistic distribution.
2.13.32. Suppose f(t) is the logistic pdf. Show that the optimal scores function, (2.10.6) is
given by (u) = ulog[(u + 1)/(1 u)].
2.13. EXERCISES 151
2.13.33. (a). Verify (2.10.6).
(b). Apply (2.10.6) to the normal distribution.
(c). Apply (2.10.6) to the Laplace or double exponential distribution.
2.13.34. We consider the Siegel-Tukey (1960) test for the equality of variances when the
underlying centers are equal but possibly unknown. The test statistic is the sum of ranks of
the Y sample in the combined sample (MWW statistic). However, the ranks are assigned in
a dierent way: In the ordered combined sample assign rank 1 to the smallest value, rank
2 to the largest value, rank 3 to the second largest value, rank 4 to the second smallest
value, and so on, alternatively assigning ranks to end values. To test H
0
: varX = varY
vs H
A
: varX > varY , reject H
0
when the sum of ranks of the Y sample is large. Find
the mean, variance and the limiting distribution of the test statistic. Show how to nd an
approximate size test.
2.13.35. Develop a sample size formula for the scale problem similar to the sample size
formula in the location problem, (2.4.25).
2.13.36. Verify (??).
2.13.37. Compute the ecacy of Moods scale test, Ansari-Bradley scale test, and Klotzs
scale test discussed in Section ??.
2.13.38. Verify the asymptotic properties given in (2.10.26), (2.10.27) and ( 2.10.28).
2.13.39. Compute the eciency of Moods scale test and the Ansari-Bradley scale test
relative to the classical F test for equality of variances.
2.13.40. Show that the Ansari-Bradley scale test is optimal for f(x) =
1
2
(1 +[x[)
2
, <
x < .
2.13.41. Show that when F and G have densities symmetric at 0 (or any common point),
the expected value of S
R
+ = n
1
n
2
/2.
2.13.42. Show that the estimate of (2.11.17) based on the empirical cdfs is consistent and
that it is a function only of the combined sample ranks.
2.13.43. Under the general model in Section 2.11.5, derive the limiting distribution of

n(Y X).
2.13.44. Find the true asymptotic level of the pooled t-test under the null hypothesis in
(2.11.1).
2.13.45. Develop a modied Moods test similar to the modied Mathisens test discussed
in Section 2.11.5.
152 CHAPTER 2. TWO SAMPLE PROBLEMS
2.13.46. Construct and discuss a normal quantile plot of the dierences from Table 2.12.1.
Carry out the Boos test for asymmetry (??). Why do these results suggest that the L1
analysis may be the best analysis in this example?
2.13.47. Consider the data set of information on professional baseball players given in Ex-
ercise 1.12.32. Let denote the shift parameter of the dierence between the height of a
pitcher and the height of a hitter.
(a.) Obtain comparison dotplots between the heights of the pitchers and hitters. Does a
shift model seem appropriate?
(b.) Use the MWW test statistic to test the hypotheses H
0
: = 0 versus H
A
: > 0.
Compute the p-value.
(c.) Determine a point estimate for and a 95% condence interval for based on MWW
procedure.
(d.) Obtain an estimate of the standard deviation of

. Use it to obtain an approximate
95% condence interval for .
2.13.48. Repeat Exercise 2.13.47 when is the shift parameter for the dierence in pitchers
and hitters weights.
2.13.49. Repeat Exercise 2.13.47 when is the shift parameter for the dierence in left
handed (A-1) and right handed (A-0) pitchers ERAs and the hypotheses are H
0
: = 0
versus H
A
: ,= 0.
Chapter 3
Linear Models
3.1 Introduction
In this chapter we discuss the theory for a rank-based analysis of a general linear model.
Applications of this analysis to experimental design models will be discussed in Chapter 4.
The rank-based analysis is complete, consisting of estimation, testing, and diagnostic tools
for checking the adequacy of t of the model, outlier detection, and detection of inuential
cases. As in the earlier chapters, we present the analysis in terms of its geometry.
The analysis could be based on either rank scores or signed-rank scores. We have chosen
to use the general rank scores of Chapter 2. This allows the error distribution to be either
asymmetric or symmetric. An analysis based on signed-rank scores would parallel the one
based on rank scores except that the theory would require a symmetric error distribution; see
Hettmansperger and McKean (1983) for discussion. Although the results are established for
general score functions, we illustrate the methods with Wilcoxon and sign scores throughout.
We will commonly use the subscripts R and S for results based on Wilcoxon and sign scores,
respectively.
3.2 Geometry of Estimation and Tests
For i = 1, . . . , n. let Y
i
denote the ith observation and let x
i
denote a p 1 vector of
explanatory variables. Consider the linear model
Y
i
= x

i
+ e

i
, (3.2.1)
where is a p 1 vector of unknown parameters. In this chapter, the components of
are the parameters of interest. We are interested in estimating and testing linear
hypotheses concerning it. However, it will be convenient to also have a location parameter.
So accordingly let = T(e

i
) be a location functional. One that we will frequently use is the
median. Let e
i
= e

i
then T(e
i
) = 0 and the model can be written as,
Y
i
= +x

i
+ e
i
. (3.2.2)
153
154 CHAPTER 3. LINEAR MODELS
The parameter is called an intercept parameter. An argument similar to the one concerning
the shift parameter of Chapter 2 shows that does not depend on the location functional
used.
Let Y = (Y
1
, . . . , Y
n
)

denote the n1 vector of observations and let X denote the np

matrix whose ith row is x

i
. We can then express the model as
Y = 1 +X +e , (3.2.3)
where 1 is an n1 vector of ones, and e

= (e
1
, . . . , e
n
). Since the model includes an intercept
parameter, , there is no loss in generality in assuming that X is centered; i.e., the columns
of X sum to 0. Further, in this chapter, we will assume that X has full column rank p. Let

F
denote the column space spanned by the columns of X. Note that we can then write the
model as
Y = 1 + +e , where
F
. (3.2.4)
This model is often called the coordinate free model.
Besides estimation of the regression coecients, we are interested in tests of general linear
hypotheses of the form
H
0
: M = 0 versus H
A
: M ,= 0 , (3.2.5)
where M is a q p matrix of full row rank. In this section, we discuss the geometry of
estimation and testing with rank-based procedures for the linear model.
3.2.1 Estimation
With respect to model ( 3.2.4), we will estimate by minimizing the distance between Y
and the subspace
F
. In this chapter we will dene distance in terms of the norms or
pseudo-norms presented in Chapter 2. Consider, rst, the general R pseudo-norm discussed
in Chapter 2 which is given by expression ( 2.5.2) and which we write for convenience,
|v|

=
n

i=1
a(R(v
i
))v
i
, (3.2.6)
where a(1) a(2) a(n) is a set of scores generated as a(i) = (i/(n + 1)) for some
nondecreasing score function (u) dened on the interval (0, 1) and standardized such that
_
(u)du = 0 and
_

2
(u)du = 1. This was shown to be a pseudo-norm in Chapter 2. Recall
that the Wilcoxon pseudo-norm is generated by the linear score function (u) =

12(u
1/2). We will also discuss the sign pseudo-norm which is generated by (u) = sgn(u 1/2)
and show that it is equivalent to using the L
1
norm. In Section 3.10 we will also discuss a
class of score functions appropriate for survival type analyses.
For the general R pseudo-norm given above by ( 3.2.6), an R-estimate of is a vector

such that
D

(Y,
F
) = |Y

Y

= min

F
|Y|

. (3.2.7)
3.2. GEOMETRY OF ESTIMATION AND TESTS 155
Figure 3.2.1: The R-estimate of is a vector

Y

which minimizes the normed dierences,

( 3.2.6), between Y and
F
. The distance between Y and the space
F
is D

(Y,
F
).
about here
These quantities are represented geometrically in Figure 3.2.1.
Once has been estimated, can be estimated by solving the equation X =

Y

; that is,
the R-estimate of is

= (X

X)
1
X

. As discussed later in Section 3.7, the intercept

can be estimated by a location estimate based on the residuals e = Y

Y

. One that we
will frequently use is the median of the residuals which we denote as
S
= med Y
i
x

.
Theorem 3.5.7 shows, under regularity conditions, that
_

S

_
has an approximate N
p+1
__

_
,
_
n
1

2
S
0

0
2

X)
1
__
distribution ,
(3.2.8)
where

and
S
are the scale parameters dened in displays ( 3.4.4) and ( 3.4.6), respectively.
From this result, an asymptotic condence interval for the linear function h

is given by
h

t
(/2,np1)

_
h(X

X)
1
h , (3.2.9)
where the estimate

is discussed in Section 3.7.1. The use of t-critical values instead of

z-critical values is documented in the small sample studies cited in Section 3.7. Note the
close analogy between this condence interval and those based on LS estimates. The only
dierence is that has been replaced by

.
We will make use of the coordinate free model, especially in Chapter 4; however, in this
chapter we are primarily concerned with the properties of the estimator

and it will be
more convenient to use the coordinate model ( 3.2.3). Dene the dispersion function by
D

() = |YX|

. (3.2.10)
Then D

) = D

(Y,
F
) = |Y

Y

is the R-distance between Y and the subspace

F
. It is also the residual dispersion.
Because D

is expressed in terms of a norm it is a continuous and convex function of ;

see Exercise 1.12.3. Exercise 3.16.2 shows that the ranks of the residuals can ony change at
the boundaries of the regions dened by the
_
n
2
_
equations y
i
x

i
= y
j
x

j
. Note that in
the simple linear regression case, these equations dene the sample slopes
Y
j
Y
i
x
j
x
i
. Hence, in
the interior of these regions the ranks are constant. Therefore, D

() is a piecewise linear,
continuous, convex function of with gradient (dened almost everywhere) given by,
D

() = S

(Y X) , (3.2.11)
where
S

(Y X) = X

a(R(Y X))) (3.2.12)

156 CHAPTER 3. LINEAR MODELS
and a(R(YX)))

= (a(R(Y
1
x

1
)), . . . , a(R(Y
n
x

n
))). Thus

solves the equations

(Y X) = X

a(R(Y X)))
.
= 0 , (3.2.13)
which are called the R normal equations. A quadratic form in S

(YX
0
) serves as the
gradient R-test statistic for testing H
0
: =
0
versus H
A
: ,=
0
.
In terms of the simple regression problem S

() is a decreasing step function of , which

steps down at each sample slope. There may be an interval of solutions of S

() = 0
or S

() may step across the horizontal axis. Let

denote any point in the interval in

the former case and the crossing point in the latter case. The gradient test statistic is
S

(
0
) =

x
i
a(R(y
i
x
i

0
)). If the xs are distinct and equally spaced then for Wilcoxon
scores this test statistic is equivalent to the test for correlation based on Spearmans r
S
; see
Exercise 3.16.4.
For the asymptotic distribution theory of estimation and testing, we note that the es-
timate is location and scale equivariant. Let

(Y) denote the R-estimate for the lin-

ear model ( 3.2.3). Then, as shown in Exercise 3.16.6,

(Y + X) =

(Y) + and

(kY) = k

(Y). In particular these results imply, without loss of generality, that the
theory developed in the following sections can be accomplished under the assumption that
the true is 0.
As a nal note, we outline the least squares estimates. The LS estimates of in model
( 3.2.4) is given by

Y
LS
= Argmin |Y|
2
LS
,
| |
LS
denotes the least squares pseudo-norm given by ( 2.2.16) of Chapter 2. The value of
which minimizes this pseudo-norm is

LS
= HY , (3.2.14)
where H is the projection matrix onto the space
F
i.e.; H = X(X

X)
1
X

. Denote the
sum of squared residuals by SSE = min

F
|Y|
2
LS
= |(I H)Y|
2
LS
. In order to have
similar notation we shall denote this minimum by D
2
LS
(Y,
F
). Also, it is easy to show that
the least squares estimate of is

LS
= (X

X)
1
X

Y.
3.2.2 The Geometry of Testing
We next discuss the geometry behind rank-based tests of the general linear hypotheses given
by ( 3.2.5). As above, consider the model ( 3.2.4),
Y = 1 + +e , where
F
, (3.2.15)
and
F
is the column space of the full model design matrix X. Let

Y
,
F
denote the R-
tted value in the full model. Note that D

(Y,
F
) is the amount of residual dispersion not
accounted for in tting the Model ( 3.2.4). These are shown geometrically in Figure 3.2.2.
3.2. GEOMETRY OF ESTIMATION AND TESTS 157
Next let denote the subspace of
F
subject to H
0
. In symbols =
F
: =
X, for some such that M = 0. In Exercise 3.16.7 the reader is asked to show that
is a subspace of
F
of dimension p q. Let

Y
,
denote the R-estimate of when the
reduced model is t and let D

(Y, ) = |Y

Y
,
|
R
denote the distance between Y and
the subspace . These are illustrated by Figure 3.2.2. The nonnegative quantity
RD

= D

(Y, ) D

(Y,
F
) , (3.2.16)
denotes the reduction in residual dispersion when we pass from the reduced model to
the full model. Large values of RD

indicate H
A
while small values support H
0
.
Figure 3.2.2: The reduction in dispersion RD

is the dierence in normed distances between

Y and the subspaces
F
and .
about here
This drop in residual dispersion, RD

, is analogous to the drop in residual sums of squares

for the LS analysis. In fact to obtain this reduction in sums of squares, we need only replace
the R-norm with the square of the Euclidean norm in the above development. Thus the drop
in sums of squared errors is
SS = D
2
LS
(Y, ) D
2
LS
(Y,
F
) ,
where D
2
LS
(Y,
F
) is dened above. Hence the reduction in sums of squared residuals can
be written as
SS = |(I H

)Y|
2
LS
|(I H

F
)Y|
2
LS
.
The traditional least squares F-test is given by
F
LS
=
SS/q

2
, (3.2.17)
where
2
= D
2
LS
(Y,
F
)/(n p). Other than replacing one norm with another, Figures
3.2.1 and 3.2.2 remain the same for the two analyses, LS and R.
In order to be useful as a test statistic, similar to least squares, the reduction in dispersion
RD must be standardized. The asymptotic distribution theory that follows suggests the
standardization
F

=
RD/q

/2
, (3.2.18)
where

is the estimate of

discussed in Section 3.7. Small sample studies cited in Section

3.7 indicate that F

should be compared with F-critical values with q and n(p+1) degrees

of freedom analogous to the LS classical F-test statistic. Similar to the LS Ftest, the test
based on F

can be summarized in the ANOVA table, Table 3.2.1. Note that the reduction
in dispersion replaces the reduction in sums of squares in the classical table. These robust
ANOVA tables were rst discussed by Schrader and McKean (1976).
158 CHAPTER 3. LINEAR MODELS
Table 3.2.1: Robust ANOVA Table for H
0
: M = 0
Source Reduction Mean Reduction
in Dispersion in Dispersion df in Dispersion F

Regression RD

=
_
D

(Y, ) D

(Y,
F
)
_
q RD/q F

Error n (p + 1)

/2
Table 3.2.2: Robust ANOVA Table for H
0
: = 0
Source Reduction Mean Reduction
in Dispersion in Dispersion df in Dispersion F

Regression RD =
_
D

(0) D

(Y,
F
)
_
p RD/p F

Error n p 1

/2
Tests that all Regression Coecients are 0
As discussed more fully in Section 3.6, there are three R-test statistics for the hypotheses
( 3.2.5). These are the R-analogues of the classical tests: the likelihood ratio test, the scores
test, and the Wald test. We shall introduce them here for the special null hypothesis that
all the regression parameters are 0; i.e.,
H
0
: = 0 versus H
0
: = 0 . (3.2.19)
Their asymptotic theory and small sample properties are discussed in more detail in later
sections.
In this case, the reduced model dispersion is just the dispersion of the response vector
Y, i.e., D

(0). Hence, the R-test based on the reduction in dispersion is

=
_
D

(0) D

(Y,
F
)
_
/p

/2
. (3.2.20)
As discussed above, F

should be compared with F(, p, np1)-critical values. Similar to

the general hypothesis, the test based on F

can be expressed in the robust ANOVA table

given in Table 3.2.2. This is the robust analogue of the traditional ANOVA table that is
printed out for a regression analysis by most least squares regression packages.
The R-scores test is the test based on the gradient. Theorem 3.5.2, below, gives the
asymptotic distribution of the gradient S

(0) under the null hypothesis. This leads to the

asymptotic level test, reject H
0
if
S

(0)(X

X)
1
S

(0)
2

(p) . (3.2.21)
Note that this test avoids the estimation of

.
The R-Wald test is a quadratic form in the full model estimates. Based on the asymp-
totic distribution of the full model estimate

given in Corollary 3.5.1, an asymptotic level

3.3. EXAMPLES 159
Table 3.3.1: Data for Example 3.3.1. The number of calls is in tens of millions and the
years are from 1950-1973.
Year 50 51 52 53 54 55 56 57 58 59 60 61
No. Calls 0.44 0.47 0.47 0.59 0.66 0.73 0.81 0.88 1.06 1.20 1.35 1.49
Year 62 63 64 65 66 67 68 69 70 71 72 73
No. Calls 1.61 2.12 11.90 12.40 14.20 15.90 18.20 21.20 4.30 2.40 2.70 2.90
test, rejects H
0
if

/p

2

F(, p, n p 1) . (3.2.22)
3.3 Examples
We oer several examples to illustrate the rank-based estimates and test procedures discussed
in the last section. For all the examples, we use Wilcoxon scores, (u) =

12(u (1/2)),
for the rank-based estimates of the regression coecients. We estimate the intercept by the
median of the residuals and we estimate the scale parameter

as discussed in Section 3.7.

We begin with a simple regression data set and proceed to multiple regression problems.
Example 3.3.1. Telephone Data
The response for this data set is the number of telephone calls (tens of millions) made
in Belgium for the years 1950 through 1973. Time, the years, serves as our only predictor
variable. The data is discussed in Rousseeuw and Leroy (1987) and, for convenience, is
displayed in Table 3.3.1.
The Wilcoxon estimates of the intercept and slope are 7.13 and .145, respectively, while
the LS estimates are 26 and .504. The reason for this disparity in ts is easily seen in Panel
A of Figure 3.3.1 which is a scatterplot of the data overlaid with the LS and Wilcoxon ts.
Note that the years 1964 through 1969 had a profound eect on the LS t while the Wilcoxon
t was much less sensitive to these years. As discussed in Rousseeuw and Leroy the recording
system for the years 1964 through 1969 diered from the other years. Panels B and C of
Figure 3.3.1 are the studentized residual plots of the ts; see ( 3.9.31) of Section 3.9. As
with internal LS-studentized residuals, values of the internal R-studentized residuals which
exceed 2 in absolute value are potential outliers. Note that the internal Wilcoxon studentized
residuals clearly show that the years 1964-1969 are outliers while the internal LS studentized
residuals only detect 1969. The Wilcoxon studentized residuals also mildly detect the year
1970. Based on the scatterplot, this point does not follow the trend of the early (before
1964) years either. The scatterplot and Wilcoxon residual plot indicate that there may be a
quadratic trend over the years before the outliers occur. The last few years, though, do not
seem to follow this trend. Hence, a linear model for this data is questionable. On the basis
of these plots, we will not discuss any formal inference for this data set.
160 CHAPTER 3. LINEAR MODELS
Figure 3.3.1: Panel A: Scatterplot of the Telephone Data, overlaid with the LS and Wilcoxon
ts; Panel B: Internal LS studentized residual plot; Panel C: Internal Wilcoxon studentized
residual plot; and Panel D: Wilcoxon dispersion function.

Year
N
u
m
b
e
r

o
f

c
a
l
l
s
50 55 60 65 70
0
5
1
0
1
5
2
0
Panel A
LS-Fit
Wilcoxon-Fit

LS-Fit
L
S
-
S
t
u
d
e
n
t
i
z
e
d

r
e
s
i
d
u
a
l
s
0 2 4 6 8 10
-
1
0
1
2
Panel B

Wilcoxon-Fit
W
i
l
c
o
x
o
n
-
S
t
u
d
e
n
t
i
z
e
d

r
e
s
i
d
u
a
l
s
0 1 2 3
0
1
0
2
0
3
0
4
0
5
0
Panel C
Beta
W
i
l
c
o
x
o
n

d
i
s
p
e
r
s
i
o
n
-0.2 0.0 0.2 0.4 0.6
1
1
0
1
2
0
1
3
0
1
4
0
1
5
0
Panel D
Panel D of Figure 3.3.1 depicts the Wilcoxon dispersion function over the interval
(.2, .6). Note that Wilcoxon estimate

R
= .145 is the minimizing value. Next consider
the hypotheses H
0
: = 0 versus H
A
: ,= 0. The basis for the test statistic F

can be
read from this plot. The reduction in dispersion is given by RD = D(0) D(.145). Also,
the gradient test of these hypotheses would be the negative of the slope of the dispersion
function at 0; i.e., D

(0).
Example 3.3.2. Baseball Salaries
As a large data set, we consider data on the salaries of professional baseball pitchers for
the 1987 baseball season. This data set was taken from the data set on baseball salaries
which was used in the 1988 ASA Graphics Section Poster Session. It can be obtained at
the web site: http://lib.stat.cmu.edu/datasets. Our analysis concerns a subdata set of
176 pitchers, which can be obtained from the authors upon request. Our response variable
is the 1987 beginning salary (in log dollars) of these pitchers. As predictors, we took the
career summary statistics through the end of the 1986 season. The names of these variables
are listed in Table 3.3.2. Panels A - G of Figure 3.9.2 show the scatter plots of the log of
salary versus each of the predictors. Certainly the strongest predictor on the basis of these
plots is log years; although, linearity in this plot is questionable.
3.3. EXAMPLES 161
Figure 3.3.2: Panels A - G: Plots of log-salary versus each of the predictors for the baseball
data of Example 3.3.2; Panel H: Internal Wilcoxon studentized residual plot.

Log Years
L
o
g
s
a
la
r
y
0.0 0.5 1.0 1.5 2.0 2.5 3.0
4
5
6
7
Panel A

Ave. wins
L
o
g
s
a
la
r
y
0 5 10 15 20
4
5
6
7
Panel B

Ave. loses
L
o
g
s
a
la
r
y
2 4 6 8 10 12
4
5
6
7
Panel C

ERA
L
o
g
s
a
la
r
y
2.5 3.0 3.5 4.0 4.5 5.0 5.5
4
5
6
7
Panel D

Ave. games
L
o
g
s
a
la
r
y
0 20 40 60 80
4
5
6
7
Panel E

Ave. innings
L
o
g
s
a
la
r
y
0 50 100 150 200 250
4
5
6
7
Panel F

Ave. saves
L
o
g
s
a
la
r
y
0 5 10 15 20 25
4
5
6
7
Panel G

Wilcoxon fit
S
tu
d
e
n
tiz
e
d
r
e
s
id
.
4 5 6 7 8
-
8
-
6
-
4
-
2
0
2
4
6
Panel H
OO
O
The internal Wilcoxon studentized residuals, ( 3.9.31), versus tted values are displayed
in the Panel H of Figure 3.9.2. Based on Panels A and H, the pattern in the residual
plot follows from the fact that log years is not a linear predictor. Better tting models are
pursued in Exercise 3.16.1. Note that there are several large outliers. The three identied
outliers, circled points in Panel H, are interesting. These correspond to the pitchers Steve
Carlton, Phil Niekro and Rick Sutcli. These were very good pitchers, but in 1987 they
were at the end of their careers, (21, 23, and 21 years of pitching, respectively); hence,
they missed the rapid rise in baseball salaries. A diagnostic analysis, (see Section 3.9 and
Exercise 3.16.1), indicates a few mildly inuential points, also. For illustration, though, we
will consider the model that we t. Table 3.3.2 also displays the estimated coecients and
their standard errors. The outliers impaired the LS t, somewhat. The LS estimate of is
.515 in comparison to the estimate of which is .388.
Table 3.3.3 displays the robust ANOVA table for testing that all the coecients, except
162 CHAPTER 3. LINEAR MODELS
Table 3.3.2: Predictors for Baseball Salaries of Pitchers and Their Estimated (Wilcoxon Fit)
Coecients
Predictor Estimate Stand. Error t-ratio
log Years in professional baseball .839 .044 19.15
Average wins per year .045 .028 1.63
Average losses per year -.024 .026 -.921
Earned run average -.146 .070 -2.11
Average games per year -.006 .004 1.60
Average innings per year .004 .003 1.62
Average saves per year .012 .011 1.07
Intercept 4.22 .324
Scale () .388
Table 3.3.3: Wilcoxon ANOVA Table for H
0
: = 0
Source Reduction Mean Reduction
in Dispersion in Dispersion df in Dispersion F

Regression 78.287 7 11.18 57.65

Error 168 .194
the intercept, are 0. Based on the large value of F

, ( 3.2.20), the predictors are helpful

in explaining the response. In particular, based on Table 3.3.2, the predictors years in
professional baseball, earned run average, average innings per year, and average number of
saves per year seem more important than the variables wins, losses, and games. These last
three variables form a similar group of variables; hence, as an illustration of the rank-based
statistic F

, the hypothesis that the coecients for these three predictors are 0 was tested.
The reduction in dispersion for this hypothesis is RD = 1.24 which leads to F

= 2.12
which is signicant at the 10% level. This conrms the above observations on the regression
coecients.
Example 3.3.3. Potency Data
This example is part of an n = 34 multivariate data set discussed in Chapter 6; see Table
6.6.2 for the data.. The experiment concerned the potency of drug compounds which were
manufactured under dierent levels of 4 factors. Here we shall consider only one of the
response variables POT2, which is the potency of a drug compound at the end of two weeks.
The factors are: SAI, the amount of intragranular steric acid, which was set at the three
levels 1, 0 and 1; SAE, the amount of extragranular steric acid, which was set at the three
levels 1, 0 and 1; ADS, the amount of cross carmellose sodium, which was set at the three
levels 1, 0 and 1; and TYPE of steric acid which was set at two levels 1 and 1. The initial
potency of the compound, POT0, served as a covariate.
In Example 3.9.2 of Section 3.9 a residual analysis of this data set is performed. This
analysis indicates that the model which includes the covariate, the linear terms of the factors,
3.3. EXAMPLES 163
Table 3.3.4: Wilcoxon and LS Estimates for the Potency Data
Wilcoxon Estimates LS Estimates
Terms Parameter Est. SE Est. SE
Intercept 7.184 2.96 5.998 4.50

1
0.072 0.05 0.000 0.08
Linear
2
0.023 0.05 -0.018 0.07

3
0.166 0.05 0.135 0.07

4
0.020 0.04 -0.011 0.05

5
0.042 0.05 0.086 0.08

6
-0.040 0.05 0.035 0.08
Two-way
7
0.040 0.05 0.102 0.07
Inter.
8
-0.085 0.06 -0.030 0.09

9
0.024 0.05 0.070 0.07

10
-0.049 0.05 -0.011 0.07

11
-0.002 0.10 0.117 0.15
Quad.
12
-0.222 0.09 -0.240 0.13

13
0.022 0.09 -0.007 0.14
Covariate
14
0.092 0.31 0.217 0.47
Scale or .204 .310
the simple two-way interaction terms of the factors, and the quadratic terms of the three
factors SAE, SAI and ADS is adequate. Let x
j
for j = 1, . . . , 4 denote the level of the factors
SAI, SAE, ADS, and TYPE, respectively, and let c
i
denote the value of the covariate. Then
the model is expressed as,
y
i
= +
1
x
1,i
+
2
x
2,i
+
3
x
3,i
+
4
x
4,i
+
5
x
1,i
x
2,i
+
6
x
1,i
x
3,i
+
7
x
1,i
x
4,i
+
8
x
2,i
x
3,i
+
9
x
2,i
x
4,i
+
10
x
3,i
x
4,i
+
11
x
2
1,i
+
12
x
2
2,i
+
13
x
2
3,i
+
14
c
i
+ e
i
. (3.3.1)
The Wilcoxon and LS estimates of the regression coecients and their standard errors are
given in Table 3.3.4. The Wilcoxon estimates are more precise. As the diagnostic analysis
of Example 3.9.2 shows, this is due the outliers in this data set.
Note that the Wilcoxon estimate of the parameter
13
, the quadratic term of the factor
ADS is signicant. Again referring to the residual analysis given in Example 3.9.2, there is
some graphical evidence to retain the three quadratic coecients in the model. In order to
statistically conrm this evidence, we will test the hypotheses
H
0
:
12
=
13
=
14
= 0 versus H
A
:
i
,= 0 for some i = 12, 13, 14 .
The Wilcoxon test is summarized in Table 3.3.5 and it is based on the test statistic ( 3.2.18).
The value of the test statistic is signicant at the .05 level. The LS F-test statistic, though,
has the value 1.19. As with its estimates of the regression coecients, the LS F-test statistic
has been impaired by the outliers.
164 CHAPTER 3. LINEAR MODELS
Table 3.3.5: Wilcoxon ANOVA Table for H
0
:
12
=
13
=
14
= 0
Source Reduction Mean Reduction
of Dispersion in Dispersion df in Dispersion F

Quadratic Terms .977 3 .326 3.20

Error 19 .102
3.4 Assumptions for Asymptotic Theory
For the asymptotic theory developed in this chapter certain assumptions on the distribution
of the errors, the design matrix, and the scores are needed. The required assumptions for
each section may dier, but for easy reference, we have placed them in this section.
The major assumption on the error density function f for much of the rank-based anal-
yses, is:
(E.1) f is absolutely continuous, 0 < I(f) < . (3.4.1)
where I(f) denotes Fisher information, ( 2.4.16). Since f is absolutely continuous, we can
write
f(s) f(t) =
_
s
t
f

(x)dx
for some function f

. An application of the Cauchy-Schwartz inequality yields

[f(s) f(t)[ I(f)
1/2
_
[F(s) F(t)[ ; (3.4.2)
see Exercise 1.12.20. It follows from ( 3.4.2), that assumption (E.1) implies that f is
uniformly bounded and is uniformly continuous.
An assumption that will be used for analyses based on the L
1
norm is:
(E.2) f(
e
) > 0 , (3.4.3)
where
e
denotes the median of the error distribution, i.e.,
e
= F
1
(1/2).
For easy reference, we list again the scale parameter

, ( 2.5.23),

=
_
(u)
f
(u)du , (3.4.4)
where

f
(u) =
f

(F
1
(u))
f(F
1
(u))
. (3.4.5)
Under (E.1) the scale parameter

is well dened. Another scale parameter that will be

needed is
S
dened as:

S
= (2f(
e
))
1
; (3.4.6)
see ( 1.5.21). Note that it is well dened under Assumption (E.2).
3.4. ASSUMPTIONS FOR ASYMPTOTIC THEORY 165
As above let H = X(X

X)
1
X

denote the projection matrix onto , the column space

of X. Our asymptotic theory assumes that the design matrix X is imbedded in a sequence
of design matices which satisfy the next two properties. We should subscript quantities such
as X and the projection matrix with n to show this, but as a matter of convenience we have
not done so. We will subscript the leverage values h
iin
which are the diagonal entries of the
projection matrix H. We will often impose the next two conditions on the design matrix:
(D.2) lim
n
max
1in
h
iin
= 0 (3.4.7)
(D.3) lim
n
n
1
X

X = , (3.4.8)
where is a pp positive denite matrix. The rst condition has become known as Hubers
condition. Huber (1981) showed that (D.2) is a necessary and sucient design condition for
the least squares estimates to have an asymptotic normal distribution provided the errors,
e
i
, are iid with nite variance. Conditions (D.3) reduces to assumption (D.1), ( 2.4.7), of
Chapter 2 for the two sample problem.
Another design condition is Noethers condition which is given by
(N.1) max
1in
x
2
ik

n
j=1
x
2
jk
0 for all k = 1, . . . p . (3.4.9)
Although this condition will be convenient, as the next lemma shows it is implied by Hubers
condition.
Lemma 3.4.1. (D.2) implies (N.1).
Proof: By the generalized Cauchy-Schwarz inequality, (see Graybill, (1976), page 224),
for all i = 1, . . . , n we have the following equalities:
sup
=1

x
i
x

X
= x

i
(X

X)
1
x
i
= h
nii
.
Next for k = 1, . . . , p take to be
k
, the p 1 vector of zeroes except for 1 in the kth
component. Then the above equalities imply that
x
2
ik

n
j=1
x
2
jk
h
nii
, i = 1, . . . , n, k = 1, . . . , p .
Hence
max
1kp
max
1in
x
2
ik

n
j=1
x
2
jk
max
1in
h
nii
.
Therefore Hubers condition implies Noethers condition.
As in Chapter 2, we will often assume that the score generating function (u) satises
assumption ( 2.5.5). We will in addition assume that it is bounded. For reference, we will
assume that (u) is a function dened on (0, 1) such that
(S.1)
_
(u) is a nondecreasing, square-integrable, and bounded function
_
1
0
(u) du = 0 and
_
1
0

2
(u) du = 1
. (3.4.10)
166 CHAPTER 3. LINEAR MODELS
Occasionally we will need further assumptions on the score function. In Section 3.7, we
will need to assume that
(S.2) is dierentiable . (3.4.11)
When estimating the intercept parameter based on signed-rank scores, we need to assume
that the score function is odd about
1
2
, i.e.,
(S.3) (1 u) = (u) ; (3.4.12)
see, also, ( 2.5.5).
3.5 Theory of Rank-Based Estimates
Consider the linear model given by ( 3.2.3). To avoid confusion, we will denote the true
vector of parameters by (
0
,
0
)

; that is, the true model is Y = 1

0
+ X
0
+ e. In this
section we will derive the asymptotic theory for the R-analysis, estimation and testing, under
the assumptions (E.1), (D.2), (D.3), and (S.1). We will occassionally supress the subscripts
and R from the notation. For example, we will denote the R-estimate by simply

.
3.5.1 R-Estimators of the Regression Coecients
A key result for both estimation and testing concerns the gradient S(Y X), ( 3.2.12).
We rst derive its mean and covariance matrix and then obtain its asymptotic distribution.
Theorem 3.5.1. Under Model ( 3.2.3),
E [S(Y X
0
)] = 0
V [S(Y X
0
)] =
2
a
X

X ,

2
a
= (n 1)
1

n
i=1
a
2
(i)
.
= 1.
Proof: Note that S(Y X
0
) = X

a(R(e)). Under Model ( 3.2.3), e

1
, . . . , e
n
are iid;
hence, the ith component a(R(e)) has mean
E [a(R(e
i
))] =
n

j=1
a(j)n
1
= 0 ,
from which the result for the expectation follows.
For the result on the variance-covariance matrix, note that, V [S(Y X
0
)] = X

V [a(R(e)] X.
The digaonal entries for the covariance matrix on the RHS are:
V [a(R(e
i
))] = E
_
a
2
(R(e
i
))

=
n

j=1
a(j)
2
n
1
=
n 1
n

2
a
.
3.5. THEORY OF RANK-BASED ESTIMATES 167
The o-diagonal entries are the covariances given by
cov(a(R(e
i
)), a(R(e
l
))) = E [a(R(e
i
)a(R(e
l
)]
=

n
j=1

n
k=1
j=k
a(j)a(k)(n(n 1))
1
= (n(n 1))
1
n

j=1
a
2
(j)
=
2
a
/n , (3.5.1)
where the third step in the derivation follows from 0 =
_

n
j=1
a(j)
_
2
. The result, ( 3.5.1),
is obtained directly from these variances and covariances.
Under (D.3), we have that
V
_
n
1/2
S(YX
0
)

. (3.5.2)
This anticpates our next result,
Theorem 3.5.2. Under the Model ( 3.2.3), (E.1), (D.2), (D.3), and (S.1) in Section 3.4,
n
1/2
S(Y X
0
)
D
N
p
(0, ) . (3.5.3)
Proof: Let S(0) = S(Y X
0
) and let T(0) = X

(F(Y X
0
)). Under the above
assumptions, the discussion around Theorem A.3.1 of the appendix shows that (T(0)
S(0))/

n converges to 0 in probability. Hence we need only show that T(0)/

n converges
to the intended distribution. Letting W

= n
1/2
t

T(e) where t ,= 0 is an arbitrary p 1

vector, it suces to show that W

converges in distribution to a N(0, t

t) distribution.
Note that we can write W

as,
W

= n
1/2
n

k=1
t

x
k
(F(e
k
)) . (3.5.4)
Since F is the distribution function of e
k
, it follows from
_
du = 0 that E [W

] = 0, from
_

2
du = 1, and (D.3) that
V [W

] = n
1
n

k=1
(t

x
k
)
2
= t

n
1
X

Xt t

t > 0 . (3.5.5)
Since W

is a sum of independent random variables which are not identically distributed

we establish the limit distribution by the Lindeberg-Feller Central Limit Theorem; see The-
orem A.1.1 of the Appendix. In the notation of this theorem let B
2
n
= V [W

]. By ( 3.5.5),
B
2
n
converges to a positive real number. We need to show,
limB
2
n
n

k=1
E
_
1
n
(x

k
t)
2

2
(F(e
k
))I
_

n
(x

k
t)(F(e
k
))

> B
n
__
= 0 . (3.5.6)
168 CHAPTER 3. LINEAR MODELS
The key is the factor n
1/2
(x

k
t) in the indicator function. By the Cauchy-Schwarz inequality
and (D.2) we have the string of inequalities:
n
1/2
[(x

k
t)[ n
1/2
|x
k
||t|
=
_
n
1
p

j=1
x
2
kj
_
1/2
|t|

_
p max
j
n
1
x
2
kj
_
1/2
|t| . (3.5.7)
By assumptions (D.2) and (D.3), it follows that the quantity in brackets in equation ( 3.5.7),
and, hence, n
1/2
[(x

k
t)[ converges to zero as n . Call the term on the right side of
equation ( 3.5.7) M
n
. Note that it does not depend on k and M
n
0. From this string of
inequalities, the limit on the left side of (3.5.6) is less than or equal to
limB
2
n
limE
_

2
(F(e
1
))I
_
[(F(e
1
))[ >
B
n
M
n
__
limn
1
n

k=1
(x

k
t)
2
.
The rst and third limits are positive reals. For the second limit, note that the random
variable inside the expectation is bounded; hence, by Lebesgue Dominated Convergence
Theorem we can interchange the limit and expectation. Since
Bn
Mn
the expectation
goes to 0 and our desired result is obtained.
Similar to Chapter 2, Exercise 3.16.9 obtains the proof of the above theorem for the
special case of the Wilcoxon scores by rst getting the projection of the statistic W.
Note from this theorem we have the gradient test that all the regression coecients are
0; that is, H
0
: = 0 versus H
A
: ,= 0. Consider the test statistic
T =
2
a
S(Y)

X)
1
S(Y) . (3.5.8)
From the last theorem an approximate level test for H
0
versus H
A
is:
Reject H
0
in favor of H
A
if T
2
(, p) , (3.5.9)
where
2
(, p) denotes the upper level critical value of
2
-distribution with p degrees of
freedom.
Theorem A.3.8 of the Appendix gives the following linearity result for the process S(
n
):
1

n
S(
n
) =
1

n
S(
0
)
1

n(
n

0
) + o
p
(1) , (3.5.10)
for

n(
n

0
) = O(1), where the scale parameter

is given by ( 3.4.4). Recall that we

have made use of this result in Section 2.5 when we showed that the two sample location
process under general scores functions is Pitman regular. If we integrate the RHS of this
3.5. THEORY OF RANK-BASED ESTIMATES 169
result we obtain a locally smooth approximation of the dispersion function D(
n
) which is
given by the following quadratic function:
Q(YX) = (2

)
1
(
0
)

X(
0
)(
0
)

S(YX
0
)+D(YX
0
) . (3.5.11)
Note that Q depends on

and
0
so it cannot be used to estimate . As we will show, the
function Q is quite useful for establishing asymptotic properties of the R-estimates and test
statistics. As discussed in Section 3.7.3, it also leads to a Gauss-Newton type algorithm for
obtaining R-estimates.
The following theorem shows that Q provides a local approximation to D. This is an
asymptotic quadraticity result which was proved by Jaeckel (1972). It in turn is based on
an asymptotic linearity result derived by Jureckova (1971) and displayed above, ( 3.5.10).
It is proved in the Appendix; see Theorem A.3.8.
Theorem 3.5.3. Under the Model ( 3.2.3) and the assumptions (E.1), (D.1), (D.2), and
(S.1) of Section 3.4, for any > 0 and c > 0,
P
_
max

0
<c/

n
[D(YX) Q(Y X)[
_
0 , (3.5.12)
as n .
We will use this result to obtain the asymptotic distribution of the R-estimate. With-
out loss of generality assume that the true
0
= 0. Then we can write Q(Y X) =
(2

)
1

S(Y) +D(Y). Because Q is a quadratic function it follows from dier-

entiation that it is minimized by

X)
1
S(Y) . (3.5.13)
Hence,

is a linear function of S(Y). Thus we immediately have from Theroem 3.5.2:
Theorem 3.5.4. Under the Model ( 3.2.3), (E.1), (D.1), (D.2) and (S.1) in Section 3.4,

0
)
D
N
p
(0,
2

1
) . (3.5.14)
Since Q is a local approximation to D, it would seem that their minimizing values are
close also. As the next result shows this indeed is the case. The proof rst appeared in
Jaeckel (1972) and is sketched in the Appendix; see Theorem A.3.9.
Theorem 3.5.5. Under the Model ( 3.2.3), (E.1), (D.1), (D.2) and (S.1) in Section 3.4,

)
P
0 .
Combining this last result with Theorem 3.5.4, we get the next corollary which gives the
asymptotic distribution of the R-estimate.
170 CHAPTER 3. LINEAR MODELS
Corollary 3.5.1. Under the Model ( 3.2.3), (E.1), (D.1), (D.2) and (S.1),

0
)
D
N
p
(0,
2

1
) . (3.5.15)
Under the further restriction that the errors have nite variance
2
, Exercise 3.16.10
shows that the least squares estimate

LS
of satises

n(

LS
)
D
N
p
(0,
2

1
).
Hence as in the location problems of Chapters 1 and 2, the asymptotic relative eciency
between the R-estimates and least squares is the ratio
2
/
2

, where

is the scale parameter

( 3.4.4). Thus the R-estimates of regression coecients have the same high eciency relative
to LS estimates as do the rank-based estimates in the location problem. In particular, the
eciency of the Wilcoxon estimates relative to the LS estimates at the normal distribution
is .955. For longer tailed errors distributions this realtive ecency is much higher; see the
eciency discussion for contaminated normal distributions in Example 1.7.1.
From the above corollary, R-estimates are asymptotically unbiased. It follows from the
invariance properties, if we additionally asume that the errors have a symmetric distribution,
that R-estimates are unbiased for all sample sizes; see Exercise 3.16.11 for details.
The random vector

, ( 3.5.13), is an asymptotic representation of the R-estimate

.
The following representation will be useful later:
Corollary 3.5.2. Under the Model ( 3.2.3), (E.1), (D.1), (D.2) and (S.1) in Section 3.4,
n
1/2
(

0
) =

(n
1
X

X)
1
n
1/2
X

(F(Y X
0
)) +o
p
(1) , (3.5.16)
where the notation (F(Y)) means the n 1 vector whose ith component is (F(Y
i
)).
Proof: This follows immediately from ( A.3.9), ( A.3.10), the proof of Theorem 3.5.2,
and equation ( 3.5.13).
Based on this last corollary, we have that the inuence function of the R-estimate is
given by
(x
0
, y
0
;

) =

1
(F(y
0
))x
0
. (3.5.17)
A more rigorous derivation of this result, based on Frechet derivatives, is given in the Ap-
pendix; see Section A.5.2. Note that the inuence function is bounded in the Y -space but
it is unbounded in the x-space. Hence an outlier in the x-space can seriously impair an
R-estimate. Although as noted above the R-estimates are highly ecient relative to the
LS estimates, it follows from its inuence function that the breakdown of the R-estimate is
0. In Section 3.12, we present the HBR estimates whose inuence function is bounded in
both spaces and which can attain 50% breakdown; although, it is less ecient than the R
estimate.
3.5.2 R-Estimates of the Intercept
As discussed in Section 3.2, the intercept parameter requires the specication of a location
functional, T(e
i
). In this section we shall take T(e
i
) = med(e
i
). Since we assume, without
3.5. THEORY OF RANK-BASED ESTIMATES 171
loss of generality, that T(e
i
) = 0, = T(Y
i
x

i
). This leads immediately to estimating
by the median of the R-residuals. Note that this is analogous to LS, since the LS estimate
of the intercept is the arithmetic average of the LS residuals. Further, this estimate is
associated with the sign test statistic and the L
1
norm. More generally we could also consider
estimates associated with signed-rank test statistics. For example, if we consider the signed-
rank Wilcoxon scores of Chapter 1 then the corresponding estimate is the median of the
Walsh averages of the residuals. The theory of such estimates based on signed-rank tests,
though, requires symmetrically distributed errors. Thus, while we briey discuss these later,
we now concentrate on the median of the residuals which does not require this symmetry
assumption. We will make use of Assumption (E.2), ( 3.4.3), i.e, f(0) > 0.
The process we consider is the sign process based on residuals given by
S
1
(Y 1 X

) =
n

i=1
sgn(Y
i
x
i

) . (3.5.18)
As with the sign process in Chapter 1, this process is a nondecreasing step function of
which steps down at the residuals. The solution to the equation
S
1
(Y 1 X

)
.
= 0 (3.5.19)
is the median of the residuals which we shall denote by
S
= medY
i
x
i

. Our goal is
to obtain the asymptotic joint distribution of the estimate

b

= (
S
,

.
Similar to the R-estimate of the estimate of the intercept is location and scale equivari-
ant; hence, without loss of generality we will assume that the true intercept and regression
parameters are 0. We begin with a lemma.
Lemma 3.5.1. Assume conditions (E.1), (E.2), (S.1), (D.1) and (D.2) of Section 3.4. For
any > 0 and for any a 1,
lim
n
P[[S
1
(Y an
1/2
1 X

) S
1
(Yan
1/2
1)[

n] = 0 .
The proof of this lemma was rst given by Jureckova (1971) for general signed-rank scores
and it is briey sketched in the Appendix for the sign scores; see Lemma A.3.2. This lemma
leads to the asymptotic linearity result for the process ( 3.5.18).
We need the following linearity result:
Theorem 3.5.6. Assume conditions (E.1), (E.2), (S.1), (D.1) and (D.2) of Section 3.4.
For any > 0 and c > 0,
lim
n
P[sup
|a|c
[n
1/2
S
1
(Yan
1/2
1 X

) n
1/2
S
1
(Y X

) + a
1
S
[ ] = 0 ,
where
s
is the scale parameter dened in expression ( 3.4.6).
172 CHAPTER 3. LINEAR MODELS
Proof: For any xed a write
[n
1/2
S
1
(Y an
1/2
1 X

) n
1/2
S
1
(YX

) + a
1
S
[
[n
1/2
S
1
(Y an
1/2
1 X

) n
1/2
S
1
(Yan
1/2
1)[
+[n
1/2
S
1
(Yan
1/2
1) n
1/2
S
1
(Y) + a
1
S
[ +[n
1/2
S
1
(Y) n
1/2
S
1
(Y X

)[ .
We can apply Lemma 3.5.1 to the rst and third terms on the right side of the above
inequality. For the middle term we can use the asymptotic linearity result in Chapter 1 for
the sign process, ( 1.5.22). This yields the result for any a and the sup will follow from the
monotonicity of the process, similar to the proof of Theorem 1.5.6 of Chapter 1.
Letting a = 0 in Lemma 3.5.1, we have that the dierence n
1/2
S
1
(Y X

)
n
1/2
S
1
(Y) goes to zero in probability. Thus the asymptotic distribution of n
1/2
S
1
(Y
X

) is the same as that of n

1/2
S
1
(Y), namely, N(0, 1). We have two applications of these
results. The rst is found in the next lemma.
Lemma 3.5.2. Assume conditions (E.1), (E.2), (D.1), (D.2), and (S.1) of Section 3.4.
The random variable, n
1/2

S
is bounded in probability.
Proof: Let > 0 be given. Since n
1/2
S
1
(YX

) is asymptotically N(0, 1) there exists

a c < 0 such that
P[n
1/2
S
1
(Y X

) < c] <

2
. (3.5.20)
Take c

=
1
S
(c ). By the processs monotonicity and the denition of , we have the
implication n
1/2

S
< c

n
1/2
S
1
(Y c

n
1/2
1 X

) 0. Adding in and subtracting

out the above linearity result, leads to
P[n
1/2

S
< c

P[n
1/2
S
1
(Y n
1/2
c

1 X

) 0]
P[[n
1/2
S
1
(Y c

n
1/2
1 X

) (n
1/2
S
1
(YX

) c

1
S
[ ]
+ P[n
1/2
S
1
(Y X

) c

1
S
< ]] (3.5.21)
The rst term on the right side can be made less that /2 for suciently large n whereas the
second term is ( 3.5.20). From this it follows that n
1/2

S
is bounded below in probability.
To nish the proof a similar argument shows that n
1/2

S
is bounded above in probability.
As a second application we can write the linearity result of the last theorem as
n
1/2
S
1
(Yan
1/2
1 X

) = n
1/2
S
1
(Y) a
1
S
+o
p
(1) (3.5.22)
uniformly for all [a[ c and for c > 0.
Because
S
is a solution to equation ( 3.5.19) and n
1/2

S
is bounded in probability, the
second linearity result, ( 3.5.22), yields, after some simplication, the following asymptotic
representation of our result for the estimate of the intercept for the true intercept
0
,
n
1/2
(
S

0
) =
S
n
1/2
n

i=1
sgn(Y
i

0
) + o
p
(1) , (3.5.23)
3.5. THEORY OF RANK-BASED ESTIMATES 173
where
S
is given in ( 3.4.6). From this we have that n
1/2
(
S

0
)
D
N(0,
2
S
). Our interest,
though, is in the joint distribution of
S
and

.
By Corollary 3.5.2 the corresponding asymptotic representation of

for the true vector

of regression coecients
0
is
n
1/2
(

0
) =

(n
1
X

X)
1
n
1/2
X

(F(Y)) +o
p
(1) , (3.5.24)
where

is given by ( 3.4.4). The joint asymptotic distribution is given in the following

theorem.
Theorem 3.5.7. Under (D.1), (D.2), (S.1), (E.1) and (E.2) in Section 3.4,

=
_

S

_
has an approximate N
p+1
__

0

0
_
,
_
n
1

2
S
0

0
2

X)
1
__
distribution .
Proof: As above assume without loss of generality that the true parameters are 0. It
is easier to work with the random vector T
n
= (
1
s

n
S
,

n(
1

(n
1
X

. Let
t = (t
1
, t

2
)

be an arbitrary, nonzero, vector in 1

p+1
. We need only show that Z
n
=
t

T
n
has an asymptotically univariate normal distribution. Based on the above asymptotic
representations of
S
, ( 3.5.23), and

, ( 3.5.24), we have
Z
n
= n
1/2
n

k=1
(t
1
sgn(Y
k
) + (t

2
x
k
)(F(Y
k
)) +o
p
(1) , (3.5.25)
Denote the sum on the right side of ( 3.5.25) as Z

n
. We need only show that Z

n
converges
in distribution to a univariate normal distribution. Denote the kth summand as Z

nk
. We
shall use the Lindeberg-Feller Central Limit Theorem. Our application of this theorem is
similar to its use in the proof of Theorem 3.5.2. First note that since the score function
is standardized (
_
= 0) that E(Z

n
) = 0. Let B
2
n
= Var(Z

n
). Because the individual
summands are independent, Y
k
are identically distributed, is standardized (
_

2
= 1), and
the design is centered, B
2
N
simplies to
B
2
n
= n
1
(
n

k=1
t
2
1
+
n

k=1
(t

2
x
k
)
2
+ 2t
1
cov(sgn(Y
1
), (F(Y
1
))t

2
n

k=1
x
k
= t
2
1
+t

2
(n
1
X

X)t
2
+ 0 .
Hence by (D.2),
lim
n
B
2
n
= t
2
1
+t

2
t
2
, (3.5.26)
which is a positive number. To satisfy the Lindeberg-Feller condition, we need to show that
for any > 0
lim
n
B
2
n
n

k=1
E[Z
2
nk
I([Z

nk
[ > B
n
)] = 0 . (3.5.27)
174 CHAPTER 3. LINEAR MODELS
Since B
2
n
converges to a positive constant we need only show that the sum converges to 0.
By the triangle inequality we can show that the indicator function satises
I(n
1/2
[t
1
[ + n
1/2
[t

2
x
k
[[(F(Y
k
))[ > B
n
) I([Z

nk
[ > B
n
) . (3.5.28)
Following the discussion after expression ( 3.5.7), we have that n
1/2
[(x

k
t)[ M
n
where M
n
is independent of k and, furthermore, M
n
0. Hence, we have
I([(F(Y
k
))[ >
B
n
n
1/2
t
1
M
n
) I(n
1/2
[t
1
[ + n
1/2
[t

2
x
k
[[(F(Y
k
))[ > B
n
) . (3.5.29)
Thus the sum in expression ( 3.5.27) is less than or equal to
n

k=1
E
_
Z
2
nk
I
_
[(F(Y
k
))[ >
B
n
n
1/2
t
1
M
n
__
= t
1
E
_
I
_
[(F(Y
1
))[ >
B
n
n
1/2
t
1
M
n
__
+ (2/n)E
_
sgn(Y
1
)(F(Y
1
))I
_
[(F(Y
1
))[ >
B
n
n
1/2
t
1
M
n
__
t

2
n

k=1
x
k
+ E
_

2
(F(Y
1
))I
_
[(F(Y
1
))[ >
B
n
n
1/2
t
1
M
n
__
(1/n)
n

k=1
(t

2
x
k
)
2
.
Because the design is centered the middle term on the right side is 0. As remarked above, the
term (1/n)

n
k=1
(t

2
x
k
)
2
= (1/n)t

2
X

Xt
2
converges to a positive constant. In the expression
Bnn
1/2
t
1
Mn
, the numerator converges to a positive constant as the denominator converges to
0; hence, the expression goes to . Therefore since is bounded, the indicator function
converges to 0. Again using the boundedness of , we can interchange limit and expectation
by the Lebesgue Dominated Convergence Theorem. Thus condition ( 3.5.27) is true and,
hence, Z

n
converges in distribution to a univariate normal distribution. Therefore, T
n
con-
verges to a multivariate normal distribution. Note by ( 3.5.26) it follows that the asymptotic
covariance of

b

is the result displayed in the theorem.

In the above development, we considered the centered design. In practice, though, we are
often concerned with an uncentered design. Let

denote the intercept for the uncentered

model. Then

= x

where x denoted the vector of column averages of the uncentered

design matrix. An estimate of

based on R-estimates is given by

S
=
S
x

. Based
on the last theorem, it follows, (Exercise 3.16.14), that
_

_
is approximately N
p+1
__

0

0
_
,
_

n

2

X)
1

X)
1
x
2

X)
1
__
, (3.5.30)
where
n
= n
1

2
S
+
2

X)
1
x and
S
and and

are given respectively by ( 3.4.6) and

( 3.4.4).
3.5. THEORY OF RANK-BASED ESTIMATES 175
Intercept Estimate Based on Signed-Rank Scores
Suppose we additionally assume that the errors have a symmetric distribution; i.e., f(x) =
f(x). In this case, all location functionals are the same. Let
f
(u) = f

(F
1
(u))/f(F
1
(u))
denote the optimal scores for the density f(x). Then as Exercise 3.16.12 shows,
f
(1u) =

f
(u); that is, the scores are odd about 1/2. Hence, in this subsection we will additionally
assume that the scores satisfy property (S.3), ( 3.4.12).
For scores satisfying (S.3), the corresponding signed-rank scores are generated as a
+
(i) =

+
(i/(n+1)) where
+
(u) = ((u+1)/2); see the discussion in Section 2.5.3. For example
if Wilcoxon scores are used, (u) =

12(u 1/2), then the signed-rank score function is

+
(u) =

3u. Recall from Chapter 1, that these signed-rank scores can be used to dene a
norm and a subsequent R-analysis. Here we only want to apply the associated one sample
signed-rank procedure to the residuals in order to obtain an estimate of the intercept. So
consider the process
T
+
(e
R
1) =
n

i=1
sgn(e
Ri
1)a
+
(R[e
Ri
[) , (3.5.31)
where e
Ri
= y
i
x

; see ( 1.8.2). Note that this is the process discussed in Section 1.8,
except now the iid observations are replaced by residuals. The process is still a nonincreasing
function of which steps down at the Walsh averages of the residuals; see Exercise 1.12.28.
The estimate of the intercept is a value
+

which solves the equation

T
+
(e
R
)
.
= 0. (3.5.32)
If Wilcoxon scores are used then the estimate is the median of the Walsh averages, ( 1.3.25)
while if sign scores are used the estimate is the median of the residuals.
Let

b
+

= (
+

. We next briey sketch the development of the asymptotic distri-

bution of

b
+

. Assume without loss of generality that the true parameter vector (

0
,

0
)

is
0. Suppose instead of the residuals we had the true errors in ( 3.5.31). Theorem A.2.11
of the Appendix then yields an asymptotic linearity result for the process. McKean and
Hettmansperger (1976) show that this result holds for the residuals also; that is,
1

n
S
+
(e
R
1) = S
+
(e)
1

+ o
p
(1) , (3.5.33)
for all [[ c, where c > 0. Using arguments similar to those in McKean and Hettmansperger
(1976), we can show that

n
+

is bounded in probability; hence, by ( 3.5.33) we have that

n
+

n
S
+
(e) + o
p
(1) . (3.5.34)
176 CHAPTER 3. LINEAR MODELS
But by ( A.2.43) and ( A.2.45) of the Appendix, we have the second representation given by,

n
+

n
n

i=1

+
(F
+
[e
i
[)sgn(e
i
) + o
p
(1)
=

n
n

i=1

+
(2F(e
i
) 1) +o
p
(1) , (3.5.35)
where F
+
is the distribution function of the absolute errors [e
i
[. Due to symmetry, F
+
(t) =
2F(t)1. Then using the relationship between the rank and the signed-rank scores,
+
(u) =
((u + 1)/2), we obtain nally

n
+

n
n

i=1
(F(Y
i
)) . (3.5.36)
Therefore using expression ( 3.5.2), we have the asypmtotic representation of the estimates:

n
_

+

_
=

n
_
1

(F(Y))
(X

X)
1
X

(F(Y))
_
. (3.5.37)
This and an application of the Lindeberg Central Limit Theorem, similar to the proof of
Theorem 3.5.7, leads to the theorem,
Theorem 3.5.8. Under assumptions (D.1), (D.2), (E.1), (E.2), (S.1) and (S.3) of Section
3.4
_

+

_
has an approximate N
p+1
__

0

0
_
,
2

1
X
1
)
1
_
distribution , (3.5.38)
where X
1
= [1 X].
3.6 Theory of Rank-Based Tests
Consider the general linear hypotheses discussed in Section 3.2,
H
0
: M = 0 versus H
A
: M ,= 0 , (3.6.1)
where M is a q p matrix of full row rank. The geometry of R testing, Section 3.2.2,
indicated the statistic based on the reduction of dispersion between the reduced and full
models, F

= (RD/q)/(

/2), see ( 3.2.18), as a test statistic. In this section we develop the

asymptotic theory for this test statistic under null and alternative hypotheses. This theory
will be sucient for two other rank-based tests which we will discuss later. See Table 3.2.2
and the discussion relating to that table for the special case when M = I.
3.6. THEORY OF RANK-BASED TESTS 177
3.6.1 Null Theory of Rank Based Tests
We proceed with two lemmas about the dispersion function D() and its quadratic approx-
imation Q() given by expression ( 3.5.11).
Lemma 3.6.1. Let

denote the R-estimate of in the full model ( 3.2.3), then under
(E.1), (S.1), (D.1) and (D.2) of Section 3.4,
D(

) Q(

)
P
0 . (3.6.2)
Proof: Assume without loss of generality that the true is 0. Let > 0 be given. Choose
c
0
such that P
_

| > c
0
_
< /2, for n suciently large. Using asymptotic quadraticity,
Theorem A.3.8, we have for n suciently large
P
_
[D(

) Q(

)[ <
_
P
__
max
<c
0
/

n
[D() Q()[ <
_

| < c
0
_
_
> 1 . (3.6.3)
From this we obtain the result.
The last result shows that D and Q are close at the R-estimate of . Our next result
shows that Q(

) is close to the minimum of Q.

Lemma 3.6.2. Let

denote the minimizing value of the quadratic function Q then under
(E.1), (S.1), (D.1) and (D.2) of Section 3.4,
Q(

) Q(

)
P
0 . (3.6.4)
Proof: By simple algebra we have,
Q(

) Q(

) = (2

)
1
(

) (

S(Y)
= (2

)
1

_
n
1
X

n((

) n
1/2
S(Y)
_
.
It is shown in Exercise 3.16.15 that the term in brackets in the last equation is bounded
in probability. Since the left factor converges to zero in probability by Theorem 3.5.5 the
desired result follows.
It is easier to work with the the equivalent formulation of the linear hypotheses given by
Lemma 3.6.3. An equivalent formulation of the model and the hypotheses is:
Y = 1 +X

1
+X

2
+e , (3.6.5)
with the hypotheses H
0
:

2
= 0 versus H
A
:

2
,= 0, where X

i
and

i
, i = 1, 2, are dened
in display ( 3.6.7).
178 CHAPTER 3. LINEAR MODELS
Proof: Consider the QR-decomposition of M given by
M

= [Q
2
Q
1
] =
_
R
O
_
= Q
2
R , (3.6.6)
where the columns of Q
1
form an orthonormal basis for the kernel of the matrix M, the
columns of Q
2
form an orthonormal basis for the column space of M

, O is a (p q) q
matrix of 0s, and R is a q q upper triangular, nonsingular matrix. Dene
X

i
= XQ
i
and

i
= Q

i
for i = 1, 2 . (3.6.7)
It follows that,
Y = 1 +X +e
= 1 +X

1
+X

2
+e .
Further, M = 0 if and only if

2
= 0, which yields the desired result.
Without loss of generality, by the last lemma, for the remainder of the section, we will
consider a model of the form
Y = 1 +X
1

1
+X
2

2
+e , (3.6.8)
with the hypotheses
H
0
:
2
= 0 versus H
A
:
2
,= 0 . (3.6.9)
With these lemmas we are now ready to obtain the asymptotic distribution of F

. Let

r
= (

1
, 0

denote the reduced model vector of parameters, let

r,1
denote the reduced
model R-estimate of
1
, and let

r
= (

r,1
, 0

. We shall use similar notation with the min-

imizing value of the approximating quadratic Q. With this notation the drop in dispersion
becomes RD

= D(

r
) D(

). McKean and Hettmansperger (1976) proved the following:

Theorem 3.6.1. Suppose the assumptions (E.1), (D.1), (D.2), and (S.1) of Section 3.4
hold. Then under H
0
,
RD

/2
D

2
(q) ,
where RD

is formally dened in expression ( 3.2.16).

Proof: Assume that the true vector of parameters is 0 and suppress the subscript on
RD. Write RD as the sum of ve dierences:
RD = D(

r
) D(

)
=
_
D(

r
) Q(

r
)
_
+
_
Q(

r
) Q(

r
)
_
+
_
Q(

r
)
Q(

)
_
+
_
Q(

) Q(

)
_
+
_
Q(

) D(

)
_
.
3.6. THEORY OF RANK-BASED TESTS 179
By Lemma 3.6.1 the rst and fth dierences go to zero in probability and by Lemma 3.6.2
the second and fourth dierences go to zero in probability. Hence we need only show that
the middle dierence converges in